The Core Primitive of AI: Experimental…

Nov 28, 2023

How LLMs can effectively extrapolate beyond their training data

3 Comments

Nov 28, 2023Edited

I think the meta-learning approach of TabPFN (https://arxiv.org/abs/2207.01848) is a step in this direction. It's been trained on synthetic data, sampling over structural causal models and then sampling over data drawn from those models. Then it's trained to perform in-context learning to solve tabular classification problems. So you end up with a Transformer that's basically been trained to do science (make predictions by marginalizing over the space of causal models, weighted by how well they explain your in-context data). You can imagine training such a model, not merely to perform classification, but to tell you whether you should aggregate or segment data, or whether this would be invalid due to Simpson's Paradox or Berkson's Paradox. In other words, you can imagine training a model to make all these judgment calls that are a "necessary evil" in science, statistics, and data science, but in a fully-standardized fashion that avoids the human tendency (and social incentives) to exploit these judgment calls to put our fingers on the scales in favor of our desired conclusions.

And the above approach doesn't even begin to exploit knowledge about the meaning of each feature in a dataset. One can imagine a model that takes in the names of features, and uses an LLM to weight possible causal models by how much they accord with all of humanity's prior knowledge.

Expand full comment

Yacine

Nov 28, 2023

We're bootstrapped. They can click around. You best believe the labs aren't putting out their internal tooling to give these models the ability to experiment. But, they can. It's bootstrapped

Expand full comment

Brenton Milne

Nov 28, 2023

On the I've wondered before - when LLMs learn from text how well are they able to contextualize it to link stuff together?

When I read a text I find it via some process that plugs me into it's context, I start off Knowing "this text is writing by A who has B biases about event C at time D after A and learned E but before learning F", and this helps me comprehend it correctly.

Possibly they are able to guess this? Possibly they don't know but will in future? Possibly it's an opportunity for enriching the training data?

My default expectation is that all the stuff that LLMs have learned successfully, was written up already bundled into one piece of text

Expand full comment

Shakos Metaheuristics

The Core Primitive of AI: Experimental…