Home United States USA — software Can large language models figure out the real world? New metric measures...

Can large language models figure out the real world? New metric measures AI's predictive power

August 26, 2025

166

In the 17th century, German astronomer Johannes Kepler figured out the laws of motion that made it possible to accurately predict where our solar system’s planets would appear in the sky as they orbit the sun. But it wasn’t .
In the 17th century, German astronomer Johannes Kepler figured out the laws of motion that made it possible to accurately predict where our solar system’s planets would appear in the sky as they orbit the sun. But it wasn’t until decades later, when Isaac Newton formulated the universal laws of gravitation, that the underlying principles were understood.
Although they were inspired by Kepler’s laws, they went much further, and made it possible to apply the same formulas to everything from the trajectory of a cannon ball to the way the moon’s pull controls the tides on Earth—or how to launch a satellite from Earth to the surface of the moon or planets.
Today’s sophisticated artificial intelligence systems have gotten very good at making the kind of specific predictions that resemble Kepler’s orbit predictions. But do they know why these predictions work, with the kind of deep understanding that comes from basic principles like Newton’s laws?
As the world grows ever-more dependent on these kinds of AI systems, researchers are struggling to try to measure just how they do what they do, and how deep their understanding of the real world actually is.
Now, researchers in MIT’s Laboratory for Information and Decision Systems (LIDS) and at Harvard University have devised a new approach to assessing how deeply these predictive systems understand their subject matter, and whether they can apply knowledge from one domain to a slightly different one. And by and large, the answer at this point, in the examples they studied, is—not so much.
The findings were presented at the International Conference on Machine Learning (ICML 2025), in Vancouver, British Columbia, last month by Harvard postdoc Keyon Vafa, MIT graduate student in electrical engineering and computer science and LIDS affiliate Peter G. Chang, MIT assistant professor and LIDS principal investigator Ashesh Rambachan, and MIT professor, LIDS principal investigator, and senior author Sendhil Mullainathan.
« Humans all the time have been able to make this transition from good predictions to world models », says Vafa, the study’s lead author. So the question their team was addressing was, « Have foundation models—has AI—been able to make that leap from predictions to world models? And we’re not asking are they capable, or can they, or will they. It’s just, have they done it so far? » he says.
« We know how to test whether an algorithm predicts well.