LeCun Against Language Models

LLMs are useful, but not enough

LeCun starts from a sharp, almost provocative line: language models are already useful products, but they are not the path to human or animal intelligence. He uses the argument to shift the conversation immediately from the industrial success of LLMs to the question that, for him, really matters: what is missing for a machine to understand the real world.

There is nothing wrong with LLMs in the sense that they are the basis of many very useful AI products that we all use, myself included. They are excellent at what they do, but they are not a path to human-level or human-like intelligence, or even animal-level intelligence.
2:45

The real world? The real world is much more complicated than language, because it is high-dimensional, continuous, noisy, messy. Training a system to understand the real world is much, much harder.
3:55

The distinction he draws is simple and also strategic. AMI, he says, stands for advanced machine intelligence and carries the motto “AI for the real world”: not text, not code, but systems capable of dealing with the physical, the continuous, the noisy. From there comes the shift from tokens to world models, systems that do not just generate answers but try to represent the consequences of actions.

AMI stands for advanced machine intelligence. The subtitle, the motto if you like, is AI for the real world.
3:22

Reality is much more complicated than language. It is high-dimensional, continuous, noisy, messy.
3:55

World model versus pixels

LeCun shifts the comparison from familiar ground, text, to a more uncomfortable one: the physical world. For him, the point is not that generative models are useless, but that the road to intelligence runs through systems capable of predicting the consequences of their actions, not reconstructing pixels or tokens. That is where he places JEPA and, more broadly, the non-generative architectures that shaped the work at Meta and then at AMI Labs.

Generative models that predict pixels, to me, have been a dead end. All the successful architectures for learning image and video representations are non-generative.
13:05

If you have the ability to predict the consequences of your actions, then you can plan a sequence of actions to reach a goal. You do not do it by predicting one action after another in an autoregressive way.
10:07

His argument is that the mind does not reason at the level of detail, but at the level of abstract representation. The water bottle example is the clearest: you can imagine it falling, slipping, or spilling, but you cannot precisely predict every pixel of the motion. From there LeCun derives a simple hierarchy that, for him, explains why successful vision models learned to compress and represent, not to copy the world.

LeCun says he arrived at this conclusion after years of failed attempts with classic generative methods, from autoencoders to masked autoencoders, and then the MAE project at Fair. The pattern, he says, was always the same: something was achieved, but the result remained disappointing relative to expectations. The turning point, for him, came with joint-embedding methods, where the input is corrupted and the system is asked to predict the representation of the original, not its pixels.

The MAE project was very disappointing. There was a lot of competition and the result was not really satisfying.
13:41

Joint-embedding architecture techniques ended up working much better for representing images and videos.
14:28

Robotics and scarce data

For Yann LeCun, the promise is not in robots that imitate a gesture they have already seen well. It is in systems that can generalize from a few examples, and therefore move from one task to another without being retrained every time. That is where his critique of imitation learning also becomes an economic critique: if every new skill requires mountains of data, robotics remains an expensive, fragile, almost artisanal craft.

What is needed is generalization. The degree of generalization you would get with a system based on a world model is much, much broader than with a system trained through imitation learning and fine tuning.
17:12

LeCun contrasts that model with the logic of a world model: a system that predicts the outcome of an action and then plans, instead of reacting by similarity. The distinction matters because it shifts the center of gravity from demonstrative data to inference, from collecting examples to building a representation of the world. In this view, the real barrier is not getting a robot to move once, but getting it to adapt to a new task without a new training cycle.

We are able to do it with a small amount of data or with no training data, and just a bit of fine tuning in the style of reinforcement learning. That is the problem: data efficiency.
18:36

Here LeCun pushes the technical argument further and brings it into domestic robotics and industry. He says that today no company yet knows how to build truly useful robots, not even in factories, except on a narrow set of gestures learned by imitation. For him the point is not to multiply demonstrations or synthetic videos, but to reach machines that learn with the speed of a teenager behind the wheel, because only then does robotics stop depending on endless data collection.

Where is your home robot? Where is your level-five self-driving car? None of these companies really knows how to make them intelligent enough to be useful.
22:07

The move into industry broadens the field even more. LeCun lists jet engines, chemical plants, power stations, production lines, patients, and human cells as systems too complex to be described with a few equations; in this scenario, the neural model becomes a kind of phenomenological surrogate that learns dynamics from data and then enables control and prediction. It is a powerful vision, but also a very ambitious one: it moves artificial intelligence away from language and images toward maintaining the infrastructure that holds the material world together.

Sovereignty versus closed platforms

For LeCun, the real stakes are not only technical but also political: if AI becomes the main interface with the world, leaving it in the hands of a few companies means importing their languages, their values, and their limits. Tapestry, he explains, was created precisely as a response to that risk, with the idea of an open model that any country or community can adapt without handing its data to an outside actor. In this reading, sovereignty is not a slogan but the condition for not depending on one of the two great digital powers.

If you are someone outside the United States or China, and your AI assistant was built in California or Beijing, that is not good for you.
29:34

The solution is an open platform, a foundation model in the LLM style that anyone can refine to adapt it to a language, a culture, a system of values, political biases, anything.
29:59

LeCun links Tapestry to an already under way shift in how people use AI: fewer traditional search engines, more assistants that filter almost everything we read and ask. If that mediation becomes the norm, he says, the issue is no longer just model efficiency, but who decides what enters the information stream of a citizen in India, Morocco, or France. That is why he insists on an international platform that gathers knowledge and culture without forcing participants to surrender control of their data.

Contributors would contribute data and compute resources, but they would retain control over their data. They would not have to share that data with the other contributors.
32:27

You are a country that is neither the United States nor China and you want a certain level of sovereignty for AI, not just for your industry but also for your citizens.
31:03

The safety split

Here, the split is not between AI optimists and pessimists, but between two different ideas of risk. Yann LeCun argues that the real problem with LLMs is not how scary they are, but how fragile they are by design: they may seem useful as long as they stay within narrow tasks, but they do not provide, in his view, the kind of control needed for a system that acts in the world. That is the line that separates him from Geoffrey Hinton and from those who read GPT-4 as a near-anthropological threshold.

In 2023. I did not change my mind, they changed theirs.
40:51

LeCun reconstructs the split as a reaction to GPT-4. Hinton, he says, would have seen in those systems something close to human intelligence, even a possible subjective experience; LeCun instead reads that turn as a mirage, or as a frame shift that confuses impressive capabilities with understanding. The point is not only technical, it is epistemological: for him, the LLM remains a system that answers well to familiar prompts, not a mind that knows what it is doing.

LLMs are inherently unsafe. I do not think they can be made reliable and safe.
45:52

Here LeCun raises the stakes. He does not just say that LLMs make mistakes; he says they cannot be made reliable because they cannot predict the consequences of their actions, and therefore cannot guarantee coherent behavior when they become agents. His critique is harsher than the classic objection about hallucinations: for him the flaw is not a bug to be fixed, it is a structural limit of the paradigm.

Coding is something where you can verify that the generated code satisfies the specification. But not everything is coding.

Health, FAIR, and Meta

At Meta, LeCun says he saw two companies inside one company for years. On one side was FAIR, a lab that produced ideas, tools, and people; on the other was the industrial engine that, with the arrival of GenAI and then the LLM race, began to live under tighter timelines and more defensive goals. His account is not one of a sudden farewell, but of a slow separation between research and product, between scientific ambition and organizational pressure.

Our goal was always to build intelligent systems. I had put my research in parentheses while leading FAIR, then I thought it was important to design the architecture of human-level AI systems.
1:02:17

The move to health care lets him test his thesis: LLMs, he says, can be useful when the problem is summarizing knowledge or imitating the best of existing practice, but they stop when the case depends on a physical or biological dynamic that must really be understood. That is why he shifts the example from doctor to patient, and then to the cell, where the point is not to repeat what was read in a book, but to predict how to change a state of the world.

If I am seeing a patient, it can be a cell. How do you tell a stem cell to become a beta cell in the pancreas that produces insulin? You have a patient with type 1 diabetes and their immune system is basically eating their own beta cells. How do you keep producing them?
51:27

LeCun concedes that the early part of FAIR was a near textbook success story: a lab capable of producing PyTorch ^*, spreading methods and people, and holding together scientific curiosity with usefulness for the rest of the industry. But then he describes the point at which, in his view, the mechanism stalls: when a research organization stays too far from the product, ideas are not picked up; when it gets too close to the product, it hardens.

We lost that structure. FAIR became essentially isolated inside the company, with many ideas that nobody picked up.
56:40

The best people are those who can sniff out what is worth doing, you give them the means to do it, and then you get out of the way.

The next leap

For LeCun, the question is no longer whether self-supervised learning works. The question is how to avoid collapse when it stops working on discrete symbols and tries to hold the real world inside. That is why, he says, the next breakthrough will not come from tokens, but from stable representations that can carry useful information without collapsing into a constant solution.

An LLM works because, when you have a sequence of discrete symbols, prediction is easy. In the real world, you cannot use a generative model, so you have to train a system that learns a representation and makes predictions in representation space.
7:29:58

The big issue in self-supervised learning for JEPA is how to prevent collapse. If you want to maximize the information content coming out of a neural network, you need to be able to measure it or at least have a lower bound, and we only have upper bounds.
7:35:31

At that point LeCun shifts the center of gravity to the methods that, in his view, are actually advancing the field. He rereads the history of self-supervised learning as a long search for ways to prevent collapse: first contrastive learning, then distillation methods, and today explicit regularizers. His judgment is cautious and selective at once: some paths work, but we do not always know why.

Contrastive learning works, but it does not scale with size. Then there are distillation methods, which prevent collapse, but we do not know why.
7:37:11

Sigreg is really promising, in my opinion. It forces the distribution of the variables coming out of the encoder to be approximately Gaussian, and it is a very different way of doing it from the work of Schmidt, Becker, or Hinton.
8:00:03

FAQ

Why does LeCun criticize LLMs?

According to LeCun, LLMs are excellent for language and code, but they cannot predict the consequences of their actions. That is why he считает them unsuitable for general intelligence and reliable robotics.

What is AMI Labs?

AMI stands for advanced machine intelligence. LeCun describes it as a company focused on AI for the real world, based on world models and new architectures, not just text generation.

What is Tapestry for?

Tapestry is his proposal for an open, global, locally fine-tunable model. The idea is to give countries and communities control over the data and culture represented by an AI assistant.

Why did he leave Meta?

LeCun says Meta shifted the focus almost entirely to LLMs, making exploratory research less central than the projects he was working on. In his view, it was no longer the right place to push those efforts.

What does he think about LLM safety?

LeCun considers them inherently unsafe because they can hallucinate and, if made agentic, can take actions without properly predicting their effects. He argues that systems need explicit goals and constraints.