Introduction


We introduce MotionMillion , the largest human motion dataset to date, with over 2,000 hours and 2 million high-quality motion sequences paired with text. Alongside, we present MotionMillion-Eval , a comprehensive benchmark for evaluating zero-shot motion generation. Built on a scalable 7B-parameter architecture, our model demonstrates strong generalization to unseen and compositional motions, pushing the frontier of zero-shot text-to-motion generation.
Approach Image

Generation Gallery


An old man is slowly walking with a cane in the park.
A skilled surfer is navigating a massive swell.
A person walks like a cat.
A man executes a dynamic sword-fighting routine, lunging forward, pivoting, and slashing downward, then spinning and kicking with agility and strength.
A furious swordsman grips his blade tightly, stomps forward with an angry roar, then slashes diagonally at an invisible foe
The emaciated woman sat on the floor, her hands wrapped around her knees as she trembled, her head buried in her arms, sobbing, her body curled up.
The person is performing a dynamic dance, shifting weight between feet with quick steps and arm movements.
A person performs an energetic dance, showcasing fluid movements, confident gestures, and agile steps.
With a katana in hand, the person freezes, then swiftly rotates their body while raising the sword before delivering a swift slash, their fluid motion and intense focus evident throughout the movement.
With a katana in hand, the person freezes, then swiftly rotates their body while raising the sword before delivering a swift slash, their fluid motion and intense focus evident throughout the movement.
A woman practicing yoga, gracefully transitioning from a downward dog position to a cobra pose
A man walks in a dog pose.
A person stands still, then swings a bat with energy and focus, simulating a hit while shifting weight onto the back foot.
The man performs a dynamic sword-fighting routine involving a lunging step, pivot, and swift slash, followed by a spin, swing, and display of agility and strength, highlighting his focus and determination.
A zombie slowly dragging its feet forward, arms outstretched, letting out a low groan
The person is swinging a sword in a choreographed motion. He first lifts the sword behind his back, then steps forward while bringing the sword down in a wide arc, shifting his weight to the front foot and finishing in a low stance, showing focus and strong intention in his movements.
A dancer is performing a turn on the spot, first raising one arm with a graceful motion, then shifting weight to one side while the other arm moves out. She rotates smoothly, keeping her legs stable and extending her arms outward, expressing joy and light-heartedness in her movements

YouTube Video


Miscellaneous



Bibtex


            @misc{fan2025zerozeroshotmotiongeneration,
              title={Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data}, 
              author={Ke Fan and Shunlin Lu and Minyue Dai and Runyi Yu and Lixing Xiao and Zhiyang Dou and Junting Dong and Lizhuang Ma and Jingbo Wang},
              year={2025},
              eprint={2507.07095},
              archivePrefix={arXiv},
              primaryClass={cs.CV},
              url={https://arxiv.org/abs/2507.07095}, 
            }
            


Acknowledgements