Why mood shouldn't be reduced to a tag based on one person’s opinion


As consumers and participants in the industry, few of us have missed the growing trend of using the mood of video content to enhance the content discovery experience. Consuming any type of digital content is an emotional investment by the end-user, video content even more so than audio, so to understand the emotional structure or the “mood” of video content makes perfect sense. Hulu recently released an excellent report called “Generation Stream” where they are touching on the importance of mood in content discovery (I highly encourage anybody in streaming media to read this report). So far so good… but emotions and mood are tricky things, a psychological topic with complex combinations of multiple factors. What makes a piece of content scary, inspiring, captivating, gripping or emotional? Is it the number of “jump scares” that makes a movie scary, or the average amount of fear in the movie, or is it how it builds tension over times (check out our post on A quiet place vs. The silence here that touches on this topic)? Moreover, can we say that just because Person A believes a movie is scary, that the same holds true for the general population of streamers out there, can we consider Scream and Hagazussa to be equally scary or are they actually on opposite sides of the same spectrum?



Mood has a scale, not a fixed keyword definition

Moods are not these fixed universal concepts where Captivating movie = Captivating movie, there are levels to how inspiring a movie is e.g., “Remember the Titans, 2001” with Denzel Washington or “Race, 2016” are certainly at the top of that range where we follow the inspiring struggles of underdog/underdogs overcoming adversity and in tearful/joyful endings makes us as viewers feel that anything is possible, whereas movies such as “Machine Gun Preacher, 2011” with Gerard Butler, albeit being inspiring as well, are not in the same league as the previous mentioned titles in this specific dimension.

Mood is not the opinion of a single individual

From a statistical standpoint, having 1 person tagging a movie as being inspiring, uplifting or emotional, is not a representable sample to adequately represent the general population of streamers. There is actually a high risk that this person will get it wrong, as an example we recently found the movie “Defiance” (sorry for the Swedish title below) tagged as “feel-good” by Netflix which is clearly an example of either a human error or the opinion of a person that doesn’t represent the general population.


So, what is a representable sample of people from a statistical standpoint? 10s, 100s, 1000s? Is it really feasible to have 1000s of people weigh in on each movie to align on a mood? We Swedes pride ourselves with our consensus and inclusion-culture, but even for a Swede having 1000s of people weigh in on each movie is too much…! But what if 1000s of humans from every corner of the world can tell an AI-network what makes them scared, sad, happy, surprised, disgusted etc? Would the consensus of those 1000s of people be a representable sample for the general population? Well, at least it’s a good start and gives us a higher level of data quality than the individual watching a movie trailer and then deciding which mood the full movie has.

Built on human emotional input – Scaled through AI

At Vionlabs we are leveraging established Psychological models, combined with the input of 1000s of humans across the globe to train an AI-network to accurately predict the emotional impact on a second-by-second basis for video content. The labeling and data preparation process is equally if not more important than the actual design of the network, to build a dataset that spans across the full spectrum of human emotions as well as getting representable input from Asia, Europe, North America, Latin America, Africa and Middle East. To do so we have carefully over many years perfected a labeling and dataset generation process to make sure we get a high quality dataset that we then can build and train neural networks on top off


(Labeling tool)

Through the training of AI-networks in this way, we are able to understand exactly what goes on from an emotional standpoint at any given point in time e.g. how stressful, engaging, scary, happy, controlling etc a scene is second by second.

From Emotions to Mood

How do we then go from this rich emotional understanding of content to defining a mood? What makes a movie captivating and inspiring? Well, it’s actually a combination of multiple factors ranging from averages, peaks, ups and downs, how quickly a movie goes from a high to low etc. all which is well defined in the universe of content production. Leveraging our emotional data allows us to define and automatically identify the content assets inside of a catalogue that fits each mood. This means that as a customer catalogue is onboarded or as a customer catalogue is updated, content assets are automatically assigned into the correct mood:


(Vionlabs mood lab - Captivating & Inspiring)

Each catalogue will have a number of moods that are relevant for their specific catalogue, and deciding on which those moods are is a key part of the onboarding process. What we have found in user testing and research is that the perceived value of a mood decreases when you make it too wide (includes too many titles on too big of a spectrum) or when there is too many overlaps between the moods where the same title is showing up in multiple moods.



(Examples of available moods)

The "Streaming Generation" expects more

The generation of streamers are picky and demanding, posing with the latest avenger movies on your start page will not impress them, only recommending the top 5 trending titles in a given week will not keep them engaged over time. Only by delighting them with the best match at the right time will you gain their loyalty and trust. Understanding the mood profile of content therefore becomes one of the key components in understanding your individual users behavior. But in order to truly delight them you need to understand content beyond a single tag, you need to understand that when a user is in the mood for a Captivating underdog story you need to bring them something like “Remember the Titans” and not “Machine Gun Preacher”, or when they are in the mood for something REALLY creepy and scary you bring them Hagazussa and not Scream. Recommendations are a delicate matter that requires building trust over a long time, day after day, week after week. The foundation for building that trust starts with good data!

80 views