Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group
arXiv:2606.03003v2 Announce Type: replace-cross Abstract: A latent world model built from an equivariant encoder and predictor inherits a provable symmetry of its training loss: when the dynamics carries a group $G$ acting on latents by an orthogonal representation $\rho(g)$, the one-step prediction relMSE is exactly invariant across the whole group, so fitting a restricted slice of orientations mathematically determines it on the entire orbit. The symmetry survives a real Muon/AdamW+EMA+VICReg
Overview
arXiv:2606.03003v2 Announce Type: replace-cross Abstract: A latent world model built from an equivariant encoder and predictor inherits a provable symmetry of its training loss: when the dynamics carries a group $G$ acting on latents by an orthogonal representation $\rho(g)$, the one-step prediction relMSE is exactly invariant across the whole group, so fitting a restricted slice of orientations mathematically determines it on the entire orbit. The symmetry survives a real Muon/AdamW+EMA+VICReg run -- composed residual $\sim 10^{-6}$ after training, under any optimiser (intrinsic Vector-Neuron/e3nn parametrisation) -- and one-step error is flat across the group (5-seed medians: equivariant $\times 1.00$ vs a higher-capacity non-equivariant baseline $\times 12.7$ in 2D, $\times 17.2$ in 3D), while that baseline fits the slice but breaks out-of-distribution. The flatness is not a synthetic artefact: on real-robot DROID end-effector trajectories the equivariant model stays flat across the orbit ($\times 1.000$, rotation residual $1.5\times 10^{-16}$) while a $4.5\times$-larger baseline degrades $\times 11$. One caution is load-bearing: flatness is necessary, not sufficient -- the theorem transports the in-distribution error level unchanged but does not lower it (3D relMSE $\approx 0.43$): across-group error is constant, not low. The same isometry lifts to a closed-loop corollary: under a matching equivariant planner the control error is invariant across the group -- float-floor-exact in 2D/SO(2), statistically flat in 3D/SE(3). Stress-tested against Sutton's Bitter Lesson (augmentation, scale, soft-equivariance), each closes at most the across-group task metric, never the float-floor exactness. This is the generalisation-side foundation of a certified-world-models programme (arXiv:2606.13092, 2606.24945, 2606.24946): flatness transports competence, and the trust bounds built on it are downstream products.
Source
Originally published at arxiv.org.
Related Articles
Source: https://arxiv.org/abs/2606.03003