ORIGINAL RESEARCH article

Front. Signal Process.

Sec. Audio and Acoustic Signal Processing

Volume 5 - 2025 | doi: 10.3389/frsip.2025.1525198

This article is part of the Research TopicSound Synthesis through Physical ModelingView all 5 articles

Voice synthesis using power-balanced simulation of a quasi-1D model of the vocal apparatus

Provisionally accepted
  • 1Sorbonne Université(CNRS), Paris, France
  • 2UMR9912 Sciences et Technologies de la Musique et du Son (STMS), Paris, Île-de-France, France
  • 3Institut de Recherche et Coordination Acoustique Musique (IRCAM), Paris, France
  • 4UPR7051 Laboratoire de mécanique et d'acoustique (LMA), Marseille, Provence-Alpes-Côte d'Azur, France

The final, formatted version of the article will be published soon.

The vocal apparatus is a biophysical dynamical system capable of self-oscillation, which involves fluid-structure interactions and human control. Interested in sound synthesis of voiced sounds, this paper presents a physical quasi-1D model of the vocal apparatus in the port-Hamiltonian framework and its validation through numerical experiments. The modelling ensures balanced power exchanges between fluid, tissues and human control. Fluid is represented in the larynx and in the vocal tract using a unified 1D PDE handling transverse geometry variations. A regularisation procedure is introduced to mitigate the numerically stiff behaviour of the model observed at channel closure. Vocal folds and vocal tract walls are represented by lumped element models, as well as the radiation load at the lips which consists of a first order high-pass filter.Spatial discretisation of the fluid model and temporal discretisation of the full system are made using structure-preserving methods to ensure energy consistency (passivity). The second part of the paper focuses on numerical experiments to progressively characterise the model and assess its validity. These experiments begin with frequency response analysis of a static vocal tract under quasi-linear conditions, followed by simulations of vowel transitions (diphthongs) under forced excitation. Next, self-oscillation studies are conducted on an isolated larynx where contact parameters are adjusted. Lastly, full simulations of the self-oscillating vocal apparatus with co-articulation, representing a voice synthesizer capable of articulating vowels, are presented.The dynamics are also analysed in terms of energy transfer and passivity. Finally, these results are discussed to establish a basis for future model refinements and to identify directions for enhancing the accuracy and realism of vocal synthesis.

Keywords: Voice, Fluid structural interaction, Physical modeling, Audio synthesis, Port-Hamiltonian systems

Received: 08 Nov 2024; Accepted: 24 Apr 2025.

Copyright: © 2025 Risse, Hélie and Silva. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Thomas Risse, Sorbonne Université(CNRS), Paris, France

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.