AUTHOR=Navarretta Costanza 

TITLE=Speech Pauses and Pronominal Anaphors

JOURNAL=Frontiers in Computer Science

VOLUME=Volume 3 - 2021

YEAR=2021

URL=https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2021.659539

DOI=10.3389/fcomp.2021.659539

ISSN=2624-9898

ABSTRACT=This paper addresses the usefulness of  speech pauses for determining whether third person singular neuter gender pronouns refer to individual or abstract entities in Danish spoken language. The annotations of map task dialogues are analysed and used in machine learning experiments act to automatically identify the anaphoric functions of pronouns and the type of referent of abstract reference.  The analysis of the data shows that abstract reference is more often performed by marked (stressed or demonstrative pronouns) than by unmarked personal pronouns in Danish speech as in English, and therefore previous studies of abstract reference in the former language must be corrected. The data also show that silent and filled pauses precede significantly more often third person singular neuter gender pronouns when they refer to abstract entities than when they refer  to individual entities. Since abstract entities are less salient and referring to them to them is cognitively more complex/heavy than referring to individual entities,  pauses signal this complex processes. This is in line with perception studies that have shown that subjects produced more pauses when they had to utter abstract or complex concepts.  We also found that unmarked pronouns  referring to an entity of a type that usually is referred to by a marked pronoun  are preceded significantly more often by a speech pause than marked pronouns with the same referent type. This can indicate that speech pauses also can signal that the referent of a pronoun of a certain type is not the most expected one.  Finally, the data with speech pause information is used for producing language models used  in machine learning experiments act to predict the function of third person singular neuter gender pronouns as a first step towards the identification of the anaphoric antecedents. The language models is also used for training classifiers to determine the referent type (speech act, event, fact or proposition) of abstract anaphors. In both cases, the best results were obtained by a multilayer perceptron with an F1-score of 0.67 for the three-class function prediction task and of 0.73 for the referential type prediction.