Research Overview

Here, you'll find short summaries of my published scientific work, loosely grouped by topic. Links are provided to the final paper if you would like to learn more.

Speech Rhythm

MacIntyre, A. D., Cai, C. Q., & Scott, S. K. (2022). Pushing the envelope: Evaluating speech rhythm with different envelope extraction techniques. The Journal of the Acoustical Society of America, 151(3), 2002-2026.

Increasingly, neuroscientists are interested in the correspondence between the speech time series and the brain response of listeners. For instance, some hypothesise that the temporally coordinated activity of groups of neurons may become entrained, or temporally coupled, with recurring elements in speech rhythm. The unit of this recurrence is speculated to be the syllable, a phonetically-defined construct that is difficult to acoustically delineate in natural continuous speech. In other words, "pure" syllables are limited to written language and our imaginations. Some researchers, however, assume that characteristic peaks within the speech envelope provide a close approximation of syllables. The bounds of this similarity have not been empirically established, especially in the context of spontaneous natural speech. 

To address this gap in the literature, we annotated an English and Mandarin corpus, consisting of read and spontaneous speech, with ~20,000 vowel onset tokens. We systematically compared across several methods of speech envelope extraction to determine the error between the annotated and automatically generated time series. The speakers moreover participated in a synchronous dyadic reading task, which allowed us to ascertain whether some acoustic events are more closely coordinated than others, thereby providing an implicit behavioural measure of rhythmic salience in speech. We find that the choice of speech envelope algorithm has nontrivial consequences concerning its resemblance to the annotated vowel time series. Moreover, algorithmic performance is also strongly modulated by language and speaking style. Our results call into question the equivalence of the syllabic time series with that calculated from acoustic features, and in addition, underscore the limitations of applying linguistic-theoretical concepts in speech perception research.


Speech Breathing 

MacIntyre, A. D. and Werner, R. (2023). An Automatic Method of Speech Breathing Annotation. Proceedings of the 34th Conference on Electronic Speech Signal Processing (ESSV), Munich, DE.

My thesis work included a rhythmic-prosodic analysis of speech breathing kinematic patterns. These patterns were recorded from > 40 Londoners, measured using inductance plethysmography (a device that shows how your chest moves when you breathe). Some of the code I wrote to analyse the breathing patterns is available on my GitHub and described in more detail in the paper listed above.

A. D. MacIntyre, G. Rizos, A. Batliner, A. Baird, S. Amiriparian, A. Hamilton, B. W. Schuller, Deep attentive end-to-end continuous breath sensing from speech, Proceedings INTERSPEECH. Shanghai, China: ISCA (2020) 2082-2086.

Together with collaborators at Imperial College London, we used the same speech breathing data set in a deep learning application to predict the respiratory signal from the acoustic speech recordings. You can watch the model's prediction from a speaker's voice here in this video πŸ‘‡ 


Rhythm and Learning

MacIntyre, A. D., Lo, H. Y. J., Cross, I., & Scott, S. (2022). Task-irrelevant auditory metre shapes visuomotor sequential learning. Psychological Research, 1-22.

Metre refers to temporal structure, and we use it to time our actions or organise the way we perceive the world. This can be as simple as the tick-tock of a clock, whose metre consists of just two events (the tick and the tock), or as complex as a Bulgarian folk dance, where the steps form recurring groupings of nine or even eleven beats. Sometimes we apply a metre without thinking, like when a runner's quick gait aligns within their slower breathing cycle to form a 4:1 ratio. There is some evidence that we are more alert or more sensitive during particular moments within a metre than others. One question is whether metric structure can help us to learn the identity of events that unfold in time. To test this idea, we paired a laboratory visual-motor learning task to some auditory metres formed using basic drum loops, hypothesising that visual elements that coincided with auditory metric accents would be learned faster than unaccented elements. We found that some experimental participants indeed seemed to integrate the metre that they heard with the visual pattern they were trying to learn, despite not being instructed to do so. When the metre they were hearing suddenly changed, it negatively impacted their performance in the task. But there was a lot of individual variability, so we need to run more studies like this to better understand how metre may affect the way we learn, and whether it works similarly across people, or is prone to individual differences.

Philosophy of Voice and Deaf Studies

 MacIntyre, A. D. (2018). The signification of the signed voice. Journal of Interdisciplinary Voice Studies, 3(2), 167-183. Available here.

 The interdisciplinary field of voice studies is concerned with the inherently material nature of voices and vocal actions, such as the bodies that do the voicing, or the environment in which voices are active. This is important, because much of the Western European intellectual tradition, as early as Ancient Greece, has systematically devalued these parts of our lived experience. In practical terms, anti-voice bias--and disembodiment, more generally--has meant that words, meaning, and the communication of abstract ideas often take precedence over nonverbal, affective, and sensorimotor aspects of vocalisation. In this essay, however, I argue that we have to be careful not to implicitly (and sometimes very explicitly) exclude Deaf voices or ways of voicing from such discussions. This can happen, for instance, when hearing people limit the embodied source of the voice to the mouth or larynx alone. By instead exploring Deaf ways of voicing, which are visual-motor and distributed across the body, attention is drawn to the ways in which vocal speech is itself also distributed across the body, intermingled with gesture, facial expressions, and posture. Hence, rather than seeing Deaf voicing as a special case or exception, we can open the box on what having or raising one's voice can mean for all people, whether they use signed or spoken language to accomplish this. 

Speech Breathing Perception

MacIntyre, A. D., & Scott, S. K. (2022). Listeners are sensitive to the speech breathing time series: Evidence from a gap detection task. Cognition, 225, 105171. 

Audible inhalation appears to facilitate smooth turn-taking during conversation; however, it is unclear whether listeners form strong temporal expectations concerning the onset of speech that follows a breath sound. Across three experiments, we explored this idea using modified auditory gap detection tasks. In one version, participants reported whether or not they heard a silent gap that was imposed between a breath sound and speech. In the other, participants identified where, within an utterance containing two breath sounds, they thought the silent gap occurred. We found that, in general, listeners are sensitive to violations of the natural speech breathing time series at the level of a few hundred milliseconds. Additionally, nonverbal rhythm discrimination ability consistently predicts how well listeners can place where a gap has occurred, but not whether or the gap was there at all. This could mean that our sense of rhythm may help us to make relative, but not absolute, judgements about speech breathing timing. Importantly, gap detection accuracy and thresholds are superior for trials where the gap occurs after, rather than before, a breath. This suggests that breath sounds may help focus listeners' attending to a particular point in time, with potential ramifications for speech entrainment.


My PhD Thesis

MacIntyre, A. D. (2022). The analysis of breathing and rhythm in speech (Doctoral dissertation, University of London, University College London). 

Natural human speech unfolds in time, and emerges as a multimodal trace from motor action. Although this statement is not in itself controversial, the temporal properties of speech remain scientifically contested, and embodied aspects of speech production, such as respiratory effort, are not well described empirically. The nature and history of these problems diverge, but a major challenge faced by researchers of both speech rhythm and speech breathing is that of methodology. For the former, disciplinary differences have complicated efforts to apply insights from linguistics-for example, the syllable as a building block of language-to experimental data from speech perception studies, which often use engineered acoustic features like the speech amplitude envelope as a proxy of phonetic annotation. In the case of speech breathing, the noisy and idiosyncratic character of plethysmography or "breath belt" data means that simple automatic approaches (e.g., peak-finding algorithms) are rarely appropriate, forcing researchers to take on laborious and subjective manual annotation projects. Before we can test phonetic predictions in the context of neural speech perception, we should establish how closely theoretical concepts like the syllable approximate acoustically defined forms, such as peaks in the speech envelope. Similarly, before we can run reproducible, largescale studies of speech breathing, we need a precise and reliable objective measure. My thesis addresses both of these problems in turn, and furthermore unites the topics of rhythm and speech breathing in a series of behavioural perceptual tasks that employ a novel gap detection paradigm.