Signed language corpora

by Jordan Fenlon and Julie Hochgesang

Fenlon and Hochgesang video 2022

This vlog introduces our new book Signed Language Corpora published by Gallaudet University Press. We are really excited to see this book out and we’re looking forward to finding out what other people think about it.

We created this book because, while there are many resources about creating corpora for spoken and written languages, the challenges for developing corpora involving signed languages are not the same.

These challenges might involve working with a large amount of video data, annotating corpora for a visual language in a way that maximises its searchability, and working with signed language communities.

Using corpora for signed language research is also relatively new and, as such, there is yet a textbook devoted to its place within this field. We hope this book fills this gap in the literature and stimulates further discussion and ideas among those interested in the use of this methodology.

We had two main aims in mind for this book. The first aim was to provide an overview of what corpus building involves. Unlike with spoken language corpora, signed language linguists cannot always access publicly available corpora but must often start with the task of building a corpus.

There has been much work published in the last twenty years regarding best practices in building a signed language corpus and this book attempts to bring together these views within a single volume. In this way, if someone is interested in building a signed language corpus, we hope that this book may serve as a guide in how to get started.

The second aim was to encourage more linguists to use signed language corpora for their research. As more signed language corpora are being created and their annotations are made publicly available, doing research with corpora doesn’t necessarily have to begin with creating a corpus but with a curious research question about the way signed languages work.

To put the book together, we invited an international group of signed language corpus linguists to write chapters on different aspects of signed language corpora. The chapters in the book are also organised so that they reflect the different stages of building and working with signed language corpora.

The book begins with a foreword by Trevor Johnston, who worked on one of the earliest signed language corpus projects, before our introduction where we define what is meant by a corpus and discuss some of the advantages of corpus linguistics for signed language research.

Thomas Hanke and Jordan then discuss the type of video data that has been collected for signed language corpora to date. In the next chapter, Gabrielle Hodge and Onno Crasborn introduce the reader to the principles of annotating a signed language corpus, a key step towards creating a resource that can be quickly searched.

This leads us into the next chapter where Carl Borstell provides an overview of how corpora have been used for research in signed language linguistics and gives us a step-by-step guide to using ELAN to search corpora.

With the next chapter, we start to look at the applied uses of corpora. Lorraine Leeson, Ronice Muller de Quadros, Marianne Stumpf describe how signed language corpora can be used for teaching signed languages and for the training of interpreters. They also describe other types of corpora such as learner corpora and first language acquisition corpora.

This is followed by Nick Palfreyman and Julie discussing the ethics of working with signed language corpora and signed language communities themselves.

In the final chapter, Adam Schembri and Kearsy Cormier look to the future of signed language corpora. They describe how signed language corpora might grow, how they might change in the types of texts they contain, and how they might benefit from advances in machine translation.

In short, there is much more to come from the use of this methodology within signed language research. We are looking forward to learning about future corpus projects and seeing how the signed language communities, especially those under-represented in our book, may take part in and guide such efforts. 


Jordan Fenlon is a signed language linguist who was involved in the creation of the British Sign Language (BSL) Corpus and has published research on sociolinguistic variation using the BSL corpus.

Julie Hochgesang is a linguistics professor at Gallaudet University with a background in corpus linguistics and language documentation. Julie has also worked on several projects such as the ASL Signbank and the Sign Language Acquisition, Annotation, Archiving, and Sharing Project (SLAAASh). She’s on Twitter as @jahochcam

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s