5 Comments
User's avatar
Manufacturing Tech's avatar

Love it!

Feisal Nanji's avatar

Informative. Thank you. Great work . Learned a fair bit

Hugo's avatar

Thanks for sharing this. Really enjoyed it.

I actually work in a field where everything described here shows up at its most extreme: robotics. Training robots in simulation is essentially the same problem as building surrogate models for engineering design, except the physics being approximated includes contact dynamics, which is about as chaotic as it gets. A robot hand touching an object involves friction, compliance, deformation, all with the kind of initial-condition sensitivity. Small errors don't average out. They cascade into completely different grasp outcomes.

The "continuous physics" framing at the end is also exactly what the robotics field calls a world model: physics as a prediction layer that runs continuously alongside decision-making, not as a validation step after the fact. The companies making real progress in robotics sim-to-real transfer are doing something very similar to what Vinci is doing for thermal. Start narrow, pick physics that's constrained enough to model reliably, and expand from there rather than trying to simulate everything at once.

The trust problem maps too. Just as aerospace engineers won't abandon FAA-vetted Fortran, robotics deployers won't trust sim-trained policies until they've been validated through thousands of hours of real-world operation. Verification is the bottleneck in both worlds.

T.D. Inoue's avatar

This is excellent. It brings an added dimension to work I'm doing on mapping AI capability space in a way that compares it with biological life forms (humans, crows, octopi, etc.). What you're doing is adding the "experience layer" missing from purely linguistic AI.

Interestingly, what we call "functional perceptual grounding" provides are remarkable amount of ability to predict macroscopic physical interactions. This recent post is an example of this:

https://tedsan.substack.com/p/what-happens-if-you-drop-a-glass?r=fvg04

This post goes into much more detail about functional perceptual grounding

https://synthsentience.substack.com/p/llms-dont-just-process-language-they

But like you said, Hugo, your work provides something that verbal training never can. You can't teach a child to ride a bike or pick up an egg by describing the process, they have to experience it.

Hugo's avatar
Mar 11Edited

Thanks T.D. The bike & egg examples are exactly on point. There's a ceiling to what you can learn about physics from description, no matter how sophisticated the language model. The interesting question is where that ceiling sits. Functional perceptual grounding clearly gets you further than pure token prediction, but contact-rich interaction, the kind where outcomes depend on millisecond-level force feedback, seems to live firmly on the other side of that line. You would need the experience, not just the concept of the experience.

Will check out those posts.