3 Comments

This type of thinking is a step in the right direction, but both the ML research and the AI alignment communities already have some quite detailed predictions about the actual mechanisms of language modeling -> intelligence:

https://www.alignmentforum.org/posts/vJFdjigzmcXMhNTsx/simulators

https://twitter.com/jacobandreas/status/1600118539263741952 (paper submitted in May 2022)

Essentially, in this Twitter language, the LM learns Guy Typology to next-token-predict text on the internet, including Guy Who Does Maths, Guy Who Knows Facts About The Roman Empire, and a zillion others. If we had an infinite amount of text produced from any single Guy, the low-loss limit of training would model what that Guy says perfectly.

And each Guy is only a view on the world model; by learning views from many different viewpoints, the LM develops a world model consistent with those views. Sort of like neural radiance fields intuition.

There are of course architectural constraints: functionalities require more parameters as they get more complex, and a priori the optimization landscape might be difficult.. But adaptive gradient methods on transformers seem to work well so far.

Expand full comment

Thanks for sharing, I'll read that. Looks great.

Expand full comment

Great article thank you.

Expand full comment