The whole design of an LLM is to consume and compress a huge space of human-gene...

ricardobeat · on Oct 31, 2024

That’s not an accurate description. Attention / multi-head attention mechanisms allow the model to understand relationships between words far apart and their context.

They still lack, as far as we know, a world model, but the results are already eerily similar to how most humans seem to think - a lot of our own behaviour can be described as “predict how another human would reply”.

thomashop · on Oct 31, 2024

When trained on simple logs of Othello's moves, the model learns an internal representation of the board and its pieces. It also models the strength of its opponent.

https://arxiv.org/abs/2210.13382

I'd be more surprised if LLMs trained on human conversations don't create any world models. Having a world model simply allows the LLM to become better at sequence prediction. No magic needed.

There was another recent paper that shows that a language model is modelling things like age, gender, etc., of their conversation partner without having been explicitly trained for it

_heimdall · on Oct 31, 2024

Do we know for a fact that the mechanisms are actually used that way inside the model?

My understand was that they know how the model was designed to be able to work, but that there's been very little (no?) progress in the black box problem so we really don't know much at all about what actually happens internally.

Without better understanding of what actually happens when an LLM generates an answer I stick with the most basic answer that its simply predicting what a human would say. I could be wildly misinformed there, I don't work directly in the space and its been moving faster than I'm interested in keeping up with.

ChadNauseam · on Oct 31, 2024

For a lot of the content they were trained on, it seems like the easiest way to predict the next token would be to model the world or work with axioms. So how do we know that an LLM isn't doing these things internally?

thomashop · on Oct 31, 2024

In fact, it looks like the model is doing those things internally.

  We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato’s concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.

https://arxiv.org/html/2405.07987v5

_heimdall · on Oct 31, 2024

Unless I misread this paper, their argument is entirely hypothetical. Meaning that the LLM is still a black box and they can only hypothesise what is going internally by viewing the output(s) and guessing at what it would take to get there.

There's nothing wrong with a hypothesis or that process, but it means we still don't know whether models are doing this or not.

thomashop · on Oct 31, 2024

Maybe I mixed up that paper with another but the one I meant to post shows that you can read something like a world model from the activations of the layers.

There was a paper that shows a model trained on Othello moves creates a model of the board, models the skill level of their opponent and more.

_heimdall · on Oct 31, 2024

Well my understanding is that there's ultimately the black box problem. We keep building these models and the output seems to get better, but we can't actually inspect how they work internally.