Welcome!

ANM

Annotation standard for Non-Manual elements in visual communication

What can you find on this website?

This website (currently under construction) hosts the ANM annotation guidelines and related resources. ANM stands for Annotation standard for Non-Manual elements in visual communication . Non-manual elements include facial expressions, head movements such as headshakes and headnods, and movements of the shoulders and torso. These play an important role in face-to-face communication, both in spoken languages and in sign languages.

A first, crucial step in the analysis of non-manual visual communicative cues is to annotate them in video recordings of face-to-face interactions. The ANM annotation guidelines provide instructions for creating such annotations in a standardized way. We also provide an annotation template, annotation examples for new users of the guidelines, and a framework & toolkit to assess the reliability of annotations.

Why use the ANM guidelines and resources?

We have gone through several elaborate iterations in developing, evaluating, and further improving the ANM annotation guidelines, and we are continuing to further refine them. We share the guidelines and all related resources with the research community so that the wheel need not be invented again and again across research labs.

Our hope is that researchers investigating visual communication will ultimately all be using the same annotation standard so that it will become possible to meaningfully compare data across studies and to pull data together for meta-analyses.

As the user community grows, we envision that the standard will further evolve and that the community will develop further resources such as training materials for new annotators, translations of the guidelines into multiple languages (both spoken and signed), and computational tools to support and partially automate the annotation process.

Why are human annotations needed?

Annotating non-manual elements in visual communication data is difficult and very time-consuming. One might wonder whether we should attempt to do it at all. Shouldn't we rather invest our energy and resources in developing better techniques to measure the movements of non-manual articulators such as the eyebrows, head, and torso? Wouldn't such techniques make human annotations redundant?
We believe that both measurement techniques and human annotations are important, and have complementary advantages and disadvantages.
Human annotations are inherently categorical (e.g., an eyebrow can be labeled as being raised or not, but not as being raised to degree 0.36) and subjective (in the sense that they involve a decision made by a particular person).
Direct measurements, on the other hand, are quantitative/continuous (e.g., the degree to which the eyebrows are raised may be represented by a number between 0 and 1) and objective (they do not involve any human decisions).
The technologies that can be used to obtain such measurements are not perfect yet. For instance, the measured values are still sensitive to the distance and the angle between the camera and the signer’s face. But they are advancing fast and will yield more and more reliable measurements.
Suppose, for the sake of the argument, that in 5 years time they yield perfect measurements, no longer sensitive to the distance and angle between the camera and the signer’s face and other possible confounding factors. What, then, is the value of human annotations?
Our answer: Depending on the specific research question in a given project, researchers may not just be interested in continuous measurements, but also in human judgments on when, say, the eyebrows of the speaker or signer are raised or when their eyes are squinted. These judgments are relevant because, presumably, human participants in linguistic interactions also make such judgments and interpret what they see based on these judgments.
Of course, these judgments will not always be a black-and-white matter. In some cases they will be clearer than in others. The point is, however, that human annotations of visual communication data provide information about human categorical judgments (and if confidence ratings are included also about the relative clarity/fuzziness of these judgments), while continuous measurements do not provide such information at all.
For instance, if we measure that at some point in time the eyebrows of the speaker or signer are raises to a degree 0.57, we have no information yet as to whether this would be categorized as a brow raise by human conversational participants or not.
Human annotation of non-manuals is highly time-consuming. Ideally, this process could be automated to some extent. This would be possible if machines could learn to predict human annotations based on continuous measurements. However, to enable this, we first need to collect a sufficient amount of data that is annotated by humans in a consistent way, all following the same guidelines, with a reasonable level of inter-annotator agreement.
Once we have such a dataset, we can train and test neural networks to automatically annotate data according to the guidelines, in a way that mimics human annotation.
It is our intention to achieve this in the future. But a first necessary step in this direction is to establish standardized guidelines and methods to analyze inter-rater agreement. A second necessary step will be to collect a sufficient amount of data annotated by humans. And only then can we apply machine learning to (partly) automate the process.

Who wrote the guidelines?

The prototype version of the guidelines (Version 1.2024) was written by:

Marloes Oomen, Cindy van Boven, Lyke Esselink, Tobias de Ronde, and Floris Roelofsen

The first product version of the guidelines (Version 2.2025) was written by:

Florence Baills, Gemma Barberà, Anastasia Bauer, Cindy van Boven, Raquel Veiga Busto, Brendan Costello, Lyke Esselink, Johannes Heim, Annika Herrmann, Serpil Karabüklü, Vadim Kimmelman, Andrea Lackner, Clara Lombart, Cornelia Loos, Nina-Kristin Meister, Liona Paulus, Alexandra Navarrete-González, Marloes Oomen, Pilar Prieto, Sophie Repp, Floris Roelofsen, Patrick Louis Rohrer, Tobias de Ronde, Rosalee Wolfe, Rebecca Woods, Giorgia Zorzi.

How to cite?

When you use the ANM guidelines in your research, please cite ANM by including a link to the website (https://nonmanuals.net) and citing the following references:

Roelofsen, Floris, Florence Baills, Gemma Barberà, Anastasia Bauer, Cindy van Boven, Raquel Veiga Busto, Brendan Costello, Lyke Esselink, Johannes Heim, Annika Herrmann, Serpil Karabüklü, Vadim Kimmelman, Andrea Lackner, Clara Lombart, Cornelia Loos, Nina-Kristin Meister, Liona Paulus, Alexandra Navarrete-González, Pilar Prieto, Sophie Repp, Patrick Louis Rohrer, Tobias de Ronde, Rosalee Wolfe, Rebecca Woods, Giorgia Zorzi & Marloes Oomen. 2025. Developing an annotation standard for non-manual elements in visual communication: International and cross-disciplinary collaboration. Paper presented at International Society for Gesture Studies (ISGS) 10, Nijmegen.

Oomen, Marloes, Cindy van Boven, Lyke Esselink, Tobias de Ronde & Floris Roelofsen. 2025. ANM: An Annotation standard for Non-Manual elements in visual communication. Proof-of-concept and prototype development. Preprint available at https://doi.org/10.17605/OSF.IO/2KEFQ

Esselink, Lyke, Marloes Oomen & Floris Roelofsen. 2025. STEADY: A toolbox for analyzing the reliability of timed-event sequential data. Preprint available at https://doi.org/10.17605/OSF.IO/2KEFQ

Please also specify which version of the guidelines you used.

Example :

Non-manuals [optional: specify which] were annotated in accordance with the ANM guidelines Version [number.year] (https://nonmanuals.net; Roelofsen et al. 2025, Oomen et al. 2025, Esselink et al. 2025).