Behind the Scenes of our Deep Learning Preview Engine

As streaming companies continue to grow, content discovery remains a crucial aspect of their business strategy. With an overwhelming amount of content available to stream, users need help finding relevant and exciting shows or movies to watch. That’s where AINAR comes in, offering AI Automated Thumbnails and AI Automated Shorts that helps streaming companies improve their content discovery experience. At the foundation of AINAR’s preview engine is an AI-based character tracking, emotion & energy detection, and speech detection system. Let’s take a closer look at how AINAR works and how it produces preview clips and thumbnails in this article. 


Thumbnails play a vital role in catching our attention and enticing us to click on movies or TV shows. But how do we ensure that we select captivating and intriguing images rather than dull and uninteresting portraits? At Vionlabs, we’ve embarked on an exciting journey to train our AI to recognize the nuances of a great thumbnail. Traditionally, thumbnail selection has been primarily based on character tracking. However, we believe in going beyond characters to find visually compelling images that pique curiosity. But here’s the challenge: Many “good” thumbnails are actually portrait images in various settings, making it a subtle and difficult task for AI to differentiate between captivating visuals and mundane portraits. Factors such as facial expressions, busy backgrounds, and obfuscated faces add complexity to the process. To overcome this challenge, we have devised an innovative approach. We define a subset of backdrop images that serve as positive examples, allowing us to narrow the scope and simplify the problem. We focus on non-busy images with clear salient regions, ensuring that each thumbnail contains visually appealing “space” within the frame. By creating more precise examples and establishing a clearer definition, we empower our AI to identify and recognize this specific type of captivating image. Our ultimate goal is to refine our AI’s ability to distinguish between great and mediocre thumbnails, leading to more engaging and enticing content recommendations. We want to ensure that every thumbnail represents the essence and intrigue of the movie or TV show it represents.

Here are some examples: We’ve included an image showcasing a captivating thumbnail that draws you in and another featuring a bad ones that fails to capture attention.

See the difference our AI can make in selecting the best visual representation for content.

AI generated thumbnailsAI generated thumbnails


Character Tracking

The character tracking component identifies the characters throughout the video file and assigns each character a weight to indicate its importance which is a combination of presence and prominence. This allows AINAR to produce thumbnails or previews that target the main characters. AI-based character tracking allows us to identify characters throughout the video and track their actions. This means that we can create accurate previews that focus on the main characters of the video, without having to manually mark each character or action one by one. With this data, AINAR precisely determines which scenes or moments should be included in the preview depending on how characters are featured throughout the video. Then AINAR tracks the characters and identifies them with a high degree of accuracy throughout the video, resulting in a preview that accurately shows the main characters and their actions. Main Character tracking from A Star is Born:


Action Detection for Preview Clips

The action detection component uses a deep learning algorithm to recognize the high energy in the video to help the AI know which scenes are important and should be included in the previews. This allows AINAR to recognize dynamic actions such as running, jumping, or fighting for an action movie, and for a comedy, it could be a funny moment. It finds energetic sections of the video and creates previews that showcase the most exciting moments or key action sequences of the video. By combining character tracking with action detection, users can also search through previews to find scenes based on characters. AINAR’s clip stretching process is designed to create high-quality preview clips for streaming content. The process involves running through the video asset and detecting the targeted character. Once the character is detected, AINAR “presses record” and looks at the next 20 seconds of footage. If the character remains present in that 20-second window, AINAR continues recording. When the character is no longer present for 20 seconds, AINAR stops recording. This creates a stretched clip that showcases the targeted character. By utilizing this clip-stretching process, AINAR can create highly-targeted previews that showcase the most relevant characters of a video and help streaming companies improve their content discovery experience. Below you can see the preview clips for Daddy’s Home generated by AINAR. The AI makes sure to track the main characters and also measures the energy level to give a good diversity of clips. The clip below is tracking all the Top characters and has high energy.


The clip below is tracking primary and secondary characters from the same movie with a lower energy.


Speech Detection 

Finally, AINAR utilizes speech detection technology as an added layer of precision when cutting previews. This component checks the dialogue in the video to make sure that the previews begin and end at natural breaks in the conversation. That way, viewers get a full sense of the scene without missing any important plot points. With all these features combined, AINAR is able to efficiently and accurately create previews that keep audiences engaged and maximize views for content creators. Overall, AINAR’s preview engine is a powerful tool for streaming companies looking to improve their content discovery experience. By leveraging AI-based character tracking, action detection, speech detection, and scene detection, AINAR can produce highly-targeted previews that showcase the most exciting moments of a video. This leads to more engaged users and higher watch time, making AINAR an essential tool for any streaming company looking to stay ahead of the competition.

Try AINAR Shorts & Thumbnails On Your Own Content

Sign up here and you will be able to access the automated video content processing interface to try our products on your own content. Any questions?

Your might also like…

Have you subscribed to our newsletters?

Have you subscribed to our newsletters?

Sign up for news from Vionlabs, don't miss out on the latest updates and breakthroughs in Cognitive AI. Subscribe here and ensure you're always in the loop with AINAR's cutting-edge news and updates:

read more


AINAR V6 Cognitive AI for Content Discovery and Enhancement Our AINAR has made significant stride in the landscape of cognitive AI technologies for content discovery and enhancement. The latest version, AINAR V6, introduces groundbreaking features that redefine how...

read more