Reliability and performance of facial expression recognition

To properly use the best tools, we need to understand the context in which they operate best. That's what we explain from accuracy to best usage conditions

To properly use the best tools, we need to understand the context in which they operate best. For example, you don't use a screwdriver to drive a nail. So the question of reliability and performance is super relevant and here is what you need to understand.

Measuring facial expressions to understand people’s reaction is what VideoFeel provides, using a technology that is private by design and real-time.

General understanding

VideoFeel analyzes facial expressions over time using key points on the face as input data. The facial expressions are analyzed as temporal variations of those anonymized data.

It’s a statistical tool, built with mathematical concepts - i.e. artificial intelligence -  that operates with relevance in similar spaces it has been built in. Concretely, to get relevant statistical predictions, we need to have a significantly representative part of the population we want to observe.

It’s important to note that interpreting an individual facial expression data collection as a reliable way to read someone’s expression of emotions is not appropriate with the technology used with VideoFeel. The data makes sense when observed from a statistical point of view. If you want to understand one’s feelings individually, you might use other techniques that require a training dataset labeled by the person itself and a neural network trained for that particular task.

So, you may ask: if it's statistical data, why does VideoFeel produce a result for each individual viewer?

That’s because each experience produces an estimation. And this estimation is represented by numbers - ie. the predictions. Those predictions exist and they become relevant when they are analyzed while aggregated to a significant extent representing the population that we want to get an insight into how they behave.

Facial expression values

Facial expressions are detected from the movements of all the anonymized data points all over the face and also the orientation and position of the overall head and eyes.

Attention and Engagement 


Attention determines whether the user is paying attention to what is happening on the screen. It takes into account the user's head position and rotation in relation to the screen as well as his gaze. Attention value varies from 0.0 to 1.0

Engagement is the process by which participants establish, maintain and terminate their connection with the experience. The notion of engagement is composed of 4 phases : “engaging”, “engaged”, “disengaging”, “idle”.



Surprise is a brief emotional state experienced as the result of an unexpected significant event.

It’s a transient state that follows a strong trigger. Surprise value is therefore 1.0 when the emotion is triggered, then it decreases according to a decaying curve.



Amusement is a state of experiencing humorous and entertaining events or situations.

It’s a transient state that follows a strong trigger. Amusement value is therefore 1.0 when the emotion is triggered, then it decreases according to a decaying curve.



Confusion is a brief state of being bewildered or unclear.

It’s a transient state that follows a strong trigger. Confusion value is therefore 1.0 when the emotion is triggered, then it decreases according to a decaying curve.

The values for each output are proxies for how people tend to label the underlying patterns of behavior. They should not be treated as direct inferences of emotional experience.

Usage conditions

Each value is computed from the anonymized data on the flow, depending on the local user’s hardware and software capabilities. To assess the quality of the data, there is an additional confidence score from 0.0 to 1.0 for each prediction. Confidence score can be used further to filter data and keep the highest quality.

VideoFeel Viewer is working best in those situations :

  • Head orientation : ±25° facing the camera
  • Light facing the user

Within the following situations, VideoFeel will be less stable and confidence will be low

  • If the person is wearing a mask
  • When light is on the opposite side of the person.

Assessment Methodology

Our algorithms are evaluated and quality checked. We apply rigorous blinded evaluations of the performance of each by cross evaluation.

We run our models on videos of people expressing natural expressions that we want to capture with the particular model and print out the output on the video. For example, when we create a model that detects Joy, we take a video where someone is expressing joy, we run our algorithm on the entire video and assess the model’s performance.

The idea to assess the performance is that we ask for validators (real people) how precise the evaluation seems to them.


The accuracy of the models that we put in production are tested by internal and external testers and we only keep the top performing algorithms to deliver the best solutions.

As a SaaS solution, our models are easy to update and our customers are using the most up-to-date model every time they use our solution.

Preparation for the best performance and reliability

To get the most out of your usage of VideoFeel, we recommend using 2 simple principles:

  • get enough experiments to have representative data
  • clean poor quality data using confidence score

Let’s take concrete examples to understand what VideoFeel can do and what it cannot do.

Can do:

  • statistical estimation of amusement expression
  • statistical estimation of confusion expression
  • statistical estimation of surprise expression
  • individual estimation of attention expression

Can’t do:

  • estimate accurate individual confusion state
  • estimate accurate individual amusement state
  • estimate accurate individual surprise state
  • guess internal emotional state of a person
  • interpret vocal reaction, like tone of voice or vocal burst

For example, for testing an online video, like a 30 seconds ad, we recommend 30 high quality data experiments at least for each video, i.e. confidence score above 0.8. As it depends on the respondent’s context while accessing VideoFeel Viewer, you might require additional respondents experiments and ask them to be in a quiet place, where they are not disturbed and lighting conditions are adjusted.

If you have any question on the topic, we're always happy to discuss how we built our technology and how to make super useful for our customers, just contact us.


Similar posts

Get notified on new insights on emotion video analytics

Be the first to know about new Emotion Video Analytics insights to build or refine your marketing function with the tools and knowledge of today’s industry.