AUTHOR=Fares Mireille , Pelachaud Catherine , Obin Nicolas TITLE=Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encoding JOURNAL=Frontiers in Artificial Intelligence VOLUME=6 YEAR=2023 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2023.1142997 DOI=10.3389/frai.2023.1142997 ISSN=2624-8212 ABSTRACT=
Modeling virtual agents with behavior style is one factor for personalizing human-agent interaction. We propose an efficient yet effective machine learning approach to synthesize gestures driven by prosodic features and text in the style of different speakers including those unseen during training. Our model performs zero-shot multimodal style transfer driven by multimodal data from the PATS database containing videos of various speakers. We view style as being pervasive; while speaking, it colors the communicative behaviors expressivity while speech content is carried by multimodal signals and text. This disentanglement scheme of content and style allows us to directly infer the style embedding even of a speaker whose data are not part of the training phase, without requiring any further training or fine-tuning. The first goal of our model is to generate the gestures of a source speaker based on the