"A person swings their fist towards others while the other use their both hands to defend. "
"A person receives a gesture and responds with equal intensity, but eventually backs away by one step and falls silent. "
"A person lowers their body into a squat position and holds out their camera towards others. "
"A person raises their leg and kicks towards another person."
"A person bows to others by bending their knees slightly."
"A person bends down to pick up an object."
"A person runs continuously with strides, and jumps up and downs excitedly, cheering and shouting encouragements."
"The first one crouches to capture an image of the second one. "
"Two humans are performing taekwondo, executing face-to-face punches. "
"The first person is sitting on a chair and examining a document, when the second approaches whilst holding a cellphone. "
"Both people rotate their bodies and move rhythmetically while spinning around. "
"The two persons circle around clockwise, raise both hands, stretch the right hand forward to cross each other. "
"Three people walk in a circle. "
"Three people prepare to make a strike. "
"Three engage in a fierce boxing match, with aggressively exchanging blows while also defending themselves. "
"Three people are playing a game of tossing and catching a toy duck."
"Four acknowledge each other's presence with a polite greeting."
We can fix the first motion as condition and generate different motion reactions under various text.
"A person swings their fist towards others while the other one kicks back for the incoming punch. "
"A person swings their fist towards others while the other use their both hands to defend. "
"A person swings their fist towards others while the other one evades the incoming punch by stepping back. "
We can generate the controllable multi-person motion under spatial signals.
"Two people run towards each other with joy and embrace each other."
"Three people are walking to get away from each other."
"Four people walks forward."
We can use the same text to generate diverse two-person motion results.
"The first person bows to apologize to the other person."
"The first person bows to apologize to the other person."
"The first person bows to apologize to the other person."
We compare against Intergen for two person motion generation. The synthesized motion by our proposed method are more consistent with the description.
@article{fan2024freemotion
title={FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis},
author={Ke Fan and Junshu Tang and Weijian Cao and Ran Yi and Moran Li and Jingyu Gong and Jiangning Zhang and Yabiao Wang and Chengjie Wang and Lizhuang Ma},
year={2024},
eprint={2405.15763},
archivePrefix={arXiv},
primaryClass={cs.CV}
}