For nearly five years, Netflix has had simple thumbs-up and thumbs-down icons for revealing vision options and giving its mechanisms the best recommendations. However, in opinion polls, people often express that this binary type of voting does not really do their taste justice.
What if they really, really love a show?
The streaming service, which was tasked with coming up with the best way to express such adoration, recently explored the idea of adding a heart icon to the Netflix app. The heart seemed a clear choice. It is a global symbol of love and is widely used in applications such as Instagram and Twitter.
But Netflix would not be the case if Netflix did not put such features through some rigorous testing; In this case, it took almost a year. At the time, the company discovered that hearts were not really the best-performing feature, but instead this week resolved a new two-thumb-up option available to its subscribers around the world.
Here’s how that change came about.
Discovering the universal symbol of love
Netflix on Monday unveiled its new two-thumbs-up feature for its mobile and smart TV applications and across its website. Subscribers are advised that this type of feedback will directly affect future recommendations. Thumbs-down means that a topic will not be recommended again; Netflix will suggest similar content if thumbs-up is done. Raising two thumbs up means “we know you’re a real fan” as the Netflix mobile app puts it.
The company started working on this feature about a year and a half ago based on feedback received from surveys and research interviews from its subscribers. Christine Toyk-Kardet, who leads the company’s customized UI product innovation team, said, “We heard from members that ‘like’ and ‘like’ were not enough.” There were some shows that they really enjoyed. It is important to distinguish between what they like and what they do not like.
Once it was decided to solve this problem, Netflix started a series of design sprints to bring footage to this level of fan. Some of the early ideas included the heart, the applause icon, the shooting stars and more. The designers consulted with the company’s globalization team to find a truly global icon. “The design team and the globalization team really [homed] In symbols of love, ”said Ratna Desai, Product Design Director, Netflix. “We wanted it to be very precise and very concise, because we wanted it to be very quick contacts.”
Netflix tested various reactions that could reflect audience interest in a show.
At the same time, Netflix kept asking its subscribers if they had any different advice. “We had a lot of interviews and studies, [and] The heart does not really echo, “said Toyk-Kardet.” Idea from members: Why don’t you try two thumbs? ”
At that point, two leading runners emerged. The heart seemed a clear choice, but both thumbs seemed to work well with Netflix’s current iconography. Also, anyone who has read the late Roger Ebert’s review knows that this is a vote of confidence for the best entertainment in the long run.
Carrying what its subscribers wanted seemed like a good idea, giving credibility to both thumbs. But what if those subscribers are wrong?
“Some people can speak loudly,” Toyk-Kardett said. “But when you look at the whole picture, when you talk to different members, when you look at how they interact with different features, it really isn’t always there. [match] Early loud voices. “
Proving that loud voices are wrong
Netflix has been trying for a long time to figure out how to better collect member-based content ratings, and dealing with those loud voices can be challenging. In its early days, just as people evaluated their Uber drivers, Netflix provided a five-star rating system.
At the time, Netflix showed an average of those ratings on its website to indicate how much a topic was liked among subscribers. As a result some titles have 4.5 stars or other fractions, so people wondered why they could not rate even half-star increments.
Thousands of people have told the company in surveys that they want this level of granularity, but Netflix employees are not sure if those comments really reflect how people use the service. To ensure that Voice does not succumb to minority opinions, Netflix has for years sought something that has become an integral part of its product development tool chest: the A / B test.
In the case of the half-star test, the results were clear: ratings dropped significantly when people were asked to provide feedback with that level of granularity. In other words: the A / B test proved that loud voices are wrong.
Netflix repeated this kind of experiment in 2017 when it completely replaced the five-star ratings with a thumbs up. Prior to that change, the rating functionality was increased by 200% in the A / B tests with the thumbs-up and thumbs-down icons. Part of the problem is that these icons are simple, but the data reveals that they are very accurate: people will eagerly evaluate the five stars they consider worthy of that status, including the award-winning documentary. Then they will suffer unseen in their queue for months. At the same time, they often appear on reality TV shows, rating themselves three stars.
Moment of Reality: Hearts or Thumbs?
Now, Netflix is ready to add a little more trouble to those ratings again. This is partly because media consumption habits and application interfaces have changed on board. “People use Netflix in the context of their overall lives,” Desai said. “They interact with Instagram, various social networks, and ride-sharing apps.” Some communication methods of those applications and experiences are not easily compatible with Netflix, which is primarily used on TVs and, for example, focuses more on leanback entertainment than Instagram. “But our members are now hearing some levers that they have not heard in the past,” he said.
Still, there were some unresolved questions, which would work better: hearts or thumbs? Will it really have a lasting impact beyond just addressing those loud voices in studies and other quality research?
“We are in a situation where we can hear very strong opinions in a standard setting against what we find in the A / P test,” Desai said. “That’s when the fun begins.”
Netflix launched A / B testing of the new rating feature last summer.Image: Netflix.
Netflix launched A / B tests for the new ratings feature last summer, testing the heart and two thumb options. At the same time, the company continued to ask subscribers, including those enrolled in trials, to see if the new features really add value.
The testing of the feature was extended until the fall because the teams working on it wanted to make sure they got things right. “We are not rushing a test,” Toyk-Kardet said. “Sometimes, the inspiration is to get started early, to break things down, to break everything up. not that [our] Attitude. “One of the reasons for conducting A / B tests for weeks or even months is to get people used to one aspect, to see if the engagement is high, or if people are attracted by the novelty of one aspect, and then get bored with it.
In conclusion, the numbers were clear: providing additional feedback worked. “We saw a huge improvement in the engagement because there was a new way for people to talk to us,” Desai said. That lift was bigger with two thumbs than the heart, which was amazing because everyone on Netflix expected the heart to win.
Those kinds of unintended consequences make the A / B test so valuable, Doig-Cardet said. “If we weren’t surprised, we would have done something wrong,” he said. “We check our own assumptions, rather than the numbers directing a better experience.”
Constant testing, it will spoil the big exposure
The extensive use of Netflix’s A / B testing has been well documented over the years, including its own data science team. The company is constantly testing various features with its subgroups of visitors. Basically, if you are a Netflix subscriber, you have a good chance of joining a trial now.
Some of these tests are for obvious interface changes, and some are for under-the-hood codec or infrastructure changes. In fact, Netflix does multiple tests that can add members to more than one test at a time, which is why the company has created a complete testing platform that helps its data science team avoid test conflicts and realize all the data collected. . (Netflix offers members the opportunity to opt out of testing through their account settings.)
However, the development of the new two-thumbs-up feature shows that A / B testing alone is not enough. Without speaking directly to subscribers, the company would have prioritized the development of the heart icon and would not have allowed two thumbs to prove itself in A / B tests. “We take this multidisciplinary approach of looking at different inputs,” Toyk-Kartet said. “We get insights from our customer service, surveys, interviews we do, and use them to report everything. [what] We need to invest and test.
Both studies and A / B testing come with the risk of exposing future features to public view. Subscribers often post about new things they have discovered in the app, and reporters tend to jump on those stories to shine a light on the company’s road map. For Netflix, this is only the cost of doing business. “We & # 39; re comfortable with providing advance vision because we want to make sure it works for our members,” Toyk-Kardet said.
“Where I have worked in the past, there is a wonderful expression of this aspect in the campaign and all,” Desai added. Netflix works a little more in the open, including testing out new and unannounced features with tens of thousands of members.
“This is our bread and butter,” Desai said. “How we innovate is our secret sauce.”