If you are interested in human-level AI, don’t work on LLMs.
—Yann LeCun
LeCun’s AMI has raised a $1 billion round - Europe’s largest ever seed round. The news matters for a reason: AMI is pursuing a very different technical approach to AI than the rest of the industry. An approach – Joint Embedding Predictive Architecture (JEPA) – that LeCun believes is the true path to human-level AI, and eventually superintelligence.
First, some background
Words cannot be used by LLMs as is. They are first embedded
- converted into a set of numbers.The numbers that represent a word is an ordered array
of numbers with anywhere from 384 to 1536+ dimensions. The visualization
is a projection of those dimensions into 3 dimensions.
Consider the pretty visualization, above. Words similar
in meaning create semantic clusters in the embedding space. This
provides the basis for inference: if the model is trained on the
sentence “the cat sat on the ___” and it learns that the word “mat” is
close to the words “cat” and “sat”, it can infer that the missing word
is likely to be “mat”. The key point is that mainstream LLMs are trained
to predict the next word based on all the previous words that
have appeared thus far.
Words are a handle for concepts
Consider the word “cat”. The word itself is not the concept of a cat. The concept of a cat includes all the attributes and associations we have with cats: they are animals, they have fur, they meow, they can be pets, etc. When we move in the world, when we think, we deal with concepts, not words. I am using words to communicate concepts to you. We form complex hierarchical representations of concepts, we associate concepts with other concepts, we form analogies between concepts... in other words, we think.
Embedding space – the high dimensional space into which words are mapped – is also called latent space because the relationships between words is implicit. Concepts are not captured explicitly. The model predicts “mat” and not “moon“ because it has learned that the embedding of “mat” is close to the embeddings of “cat” and “sat”, and not because it understands the concepts.
This is were JEPA gets really interesting. Whereas existing LLMs predict the next word based on the previous word, JEPA learns to predict the embedding of the next word. The thinking behind this approach is based on analogy: embeddings are to tokens as concepts are to words.
If the engineering makes the analogy hold, then a JEPA model understands concepts. A model that understands concepts would be a model that understands the world. It would be a model that can reason, plan, and predict the consequences of its actions. It would be a model that can learn from experience and adapt to new situations. It would be a model that can understand and generate language in a way that is grounded in the real world. It would be true human-level AI.
My excitement about AMI is not just the technical approach, huge
enough as that is, but as I posted a few days ago, the
metaphysics. It appears that LeCun is taking on two of the most
difficult problems in the philosophy of mind and artificial
intelligence: the symbol grounding problem and the framing problem.I will be writing about these problems in future
posts.
These are incredibly difficult problems that have stood
for over 40 years. I believe that solving these problems would also help
unlock the hard problem of consciousness.The hard problem of consciousness, by David Chalmers,
is the question of why and how physical brain processes give rise to
subjective, first-person experiences.
AMI’s $1B seed is a bet on LaCun being right.(LeCun 2025).
. If he is right, it will be the most important bet not
just in the history of AI, but in the history of humanity.