AusTalk: An audio-visual corpus of Australian English

AusTalk is a large state-of-the-art database of modern spoken Australian English from all around the country. Recorded between June 2011 and June 2016, the final database contains full audio-visual data for 861 adult speakers (with age ranging from 18 to 83) from 15 different locations in all Australian states & territories, representing the regional and social diversity and linguistic variation of Australian English, including Australian Aboriginal English. Each speaker was recorded for one hour on three separate occasions to sample their voice in a range of scripted and spontaneous speech situations at various times (see About AusTalk for a more detailed description of the data collection protocol). Later, this database will be expanded to include more age groups, including children, more accents and more ways of speaking.

The Australian accent is distinctive and uniquely ours. The things we talk about and the ways we talk about them are intimately entwined with our sense of self. Our accent is a powerful and enduring symbol of national identity that we preserve despite the influx of electronic media and cultural icons from overseas. However, just as society changes, so too does language as it constantly evolves to meet the changing needs of its users. Australian English has seen much change throughout its history and it's in the national interest to carefully document our linguistic heritage as an important record of our collective identity within our changing culture. 

AusTalk provides a valuable and enduring digital repository of present day speech as a snapshot of present day speech in our linguistic history. As there is a close link between "national self-perception" and how we use language, AusTalk will be a profound cultural resource for all Australians.

For AusTalk Participants

If you were a participant in the data collection you can preview your own recordings via our Participant Portal. We encourage you to do this as we would like to invite you to review and agree to some additional terms for the distribution and use of your video recordings. To review your data you need to know your AusTalk speaker Colour-Animal identifier. Here is the complete list of AusTalk identifiers, giving both numbers and Colour-Animal names. 

For Researchers

To access the AusTalk corpus, please go to the Alveo Virtual Laboratory and register for an account. See these notes about how to access AusTalk data. A customised interface for AusTalk (AusTalk-Query) will allow you to browse, preview and download the audio recordings in the corpus.

Some of the AusTalk data has been transcribed and annotated, both manually and automatically. See this page for more information.

The AusTalk database is one step in a major research project spanning two decades. For the Australian community, AusTalk is a national treasure that will provide a permanent record of Australian English. It will also support Australian speech science research and development, and help develop Australian speech technology applications, from better telephone-based speech recognition systems (e.g., taxi bookings) and computer avatars, to hearing aids and Cochlear Implants improvements, or computer aids for learning-impaired children. See Research projects using AusTalk.