The Netflix Tech Blog: innovation

Showing posts with label innovation. Show all posts

Monday, December 16, 2013

Pioneering application design on TVs & TV-connected devices

Netflix recently launched the latest evolution of our core app for an increasing number of TVs and TV-connected devices. The app represents a unique mixture of platform and user interface design innovations. One of these innovations is that this app leverages both web standard technologies we love (like JavaScript) and a lightweight native graphics framework.

Understanding the motivations for our most recent platform advancement requires some context. In the past we’ve explored many different approaches. In 2009 we implemented a Flash Lite based app. Soon after, in 2010, we shifted to a WebKit based app using predominantly the QtWebKit port. Each app eventually reached a critical point where our innovation goals required us to adapt our technology stack.

Evolution of an application platform

We’ve seen WebKit mature into a full-fledged platform for application development. Advances in HTML5 and CSS3 introduced much needed semantics and styling improvements. JavaScript can utilize WebGL backed canvases, drag and drop, geolocation, and more. Increasingly WebKit will have hooks into device services allowing integration with hardware and data outside the browser sandbox. Improvements in mobile device capabilities have made many of these and future advances desirable.

We released our first QtWebKit app in 2010. Over the next 3 years our engineers shared our innovations and approaches with the WebKit community. Our platform engineers contributed our accelerated compositing implementation. Meanwhile our user interface engineers shared best practices and identified rendering optimizations deep in WebCore internals. In addition, we continue to drive standardization efforts for HTML5 Premium Video Extensions and have adopted them for desktop.

Devices running our core app on TVs and TV-connected devices subject WebKit to unique use cases. We deliver a long-lived, single-page, image and video heavy user interface on hardware with a range of CPU speed and addressable RAM, and varied rendering and network pipelines. The gamut of devices is considerable with significant variations in functionality and performance.

Our technology stack innovation

We strive to provide customers with rich, innovative content discovery and playback experiences. All devices running the new Netflix TV app are now running our own custom JS+native graphics framework. This framework enables us to reach our customer experience goals on the broadest set of TVs and TV-connected devices. We own the feature roadmap and tooling and can innovate with minimal constraints.

Our framework is optimized for fast 2D rendering of images, text and color fills. We render from a tree of graphics objects. The approach offers ease of use over immediate mode rendering contexts such as HTML canvas. Display property changes in these objects are aggregated then applied en masse post user interaction.

A bespoke rendering pipeline enables granular control over surfaces, the bitmap data representation of one or more graphics objects. Our surfaces are similar to accelerated compositing surfaces used by modern browsers. Intelligent surface allocation reduces surface (re)creation costs and the resulting memory fragmentation over time. Additionally we have fine-grained control of image decode activity leading up to surface creation.

As the platform matured it gained a pluggable cinematic effect pipeline with blur, desaturation, masking and tinting. These effects can be implemented very close to the metal, keeping them fast on more devices.

While we’re not running full WebKit, we are heavily leveraging JavaScriptCore. We experimented with V8 and SpiderMonkey (with JIT), yet both were impractical without stable ports for the various chipset architectures in use by device manufacturers.

We also rely on WebKit’s Web Inspector for debugging. Our framework integrates directly with a standalone Node server (and ultimately the Web Inspector) using the public remote debugging protocol. The Elements tab displays a tree of graphics objects. The Sources, Network and Timeline tabs work mostly like you’d expect. Familiar tools help while we debug the app running on a reference framework implementation or development devices.

An A/B test of our app written in this new framework performed better than our existing app. Our future is ours to define and we’re not done having fun.

Join our team

We’re working on exciting new features, constantly improving our platform, and we’re looking for help. Our growing team is looking for experts to join us. If you’d like to apply, take a look here.

Monday, November 18, 2013

Building the New Netflix Experience for TV

by Joubert Nel

We just launched a new Netflix experience for TV and game consoles. The new design is based on our premise that each show or movie has a tone and a narrative that should be conveyed by the UI. To tell a richer story we provide relevant evidence and cinematic art that better explain why we think you should watch a show or movie.

The new user interface required us to question our paradigms about what can be delivered on TV – not only is this UI more demanding of game consoles than any of our previous UIs, but we also wanted budget devices to deliver a richer experience than what was previously possible.

For the first time we needed a single UI that could accept navigation using a TV remote or game controller, as well as voice commands and remotes that direct a mouse cursor on screen.

Before we get into how we developed for performance and built for different input methods, let’s take a look at our UI stack.

UI Stack

My team builds Netflix UIs for the devices in your living room: PlayStation 3, PlayStation 4, Xbox 360, Roku 3, and recent Smart TVs and Blu-ray players.

We deploy UI updates with new A/B tests, support for new locales like the Netherlands, and new features like Profiles. While remaining flexible, we also want to take advantage of as much of the underlying hardware as possible in a cross-platform way.

So, a few years ago we broke our device client code into two parts: an SDK that runs on the metal, and a UI written in JavaScript. The SDK provides a rendering engine, JavaScript runtime, networking, security, video playback, and other platform hooks. Depending on the device, SDK updates range from quarterly to annually to never. The UI, in contrast, can be updated at any time and is downloaded (or retrieved from disk cache) when the user fires up Netflix.

Key, Voice, Pointer

The traditional way for users to control our UI on a game console or TV is via an LRUD input (left/right/up/down) such as a TV remote control or game controller. Additionally, Xbox 360 users should be able to navigate with voice commands and folks with an LG Magic Remote Smart TV must be able to navigate by pointing their remote control at elements on screen. Our new UI is our first to incorporate all three input methods in a single design.

We wanted to build our view components in such a way that their interaction behaviors are encapsulated. This code proximity makes code more maintainable and reusable and the class hierarchy more robust. We needed a consistent way to dispatch the three kinds of user input events to the view hierarchy.

We created a new JavaScript event dispatcher that routes key, pointer, and voice input in a uniform way to views. We needed an incremental solution that didn’t require refactoring the whole codebase, so we designed it to coexist with our legacy key handling and provide a migration path.

We must produce JavaScript builds that only contain code for those methods supported by the target device because reduced code size yields faster code parsing, in turn reducing startup time.

To produce lean builds, we use a text preprocessor to strip out input handling code that is irrelevant to a target platform. The advantage of using a text preprocessor instead of, for example, using mixins to layer in additional appearances and interactions, is that we get much higher levels of code proximity and simplicity.

Performance

Devices in the living room use DirectFB or OpenGL for graphics (or something OpenGL-like) and can use hardware acceleration for animating elements of the UI. Leveraging the GPU is key in creating a smooth experience that is responsive to user input – we’ve done it on WebKit using accelerated compositing (see WebKit in Your Living Room and Building the Netflix UI for Wii U).

The typical implementation of hardware accelerated animation of a rectangle requires width x height x bytes per pixel of memory. In our UI we animate entire scenes when transitioning between them; animating one scene at 1080p would require close to 8MB of memory (1920 x 1080 x 4) but at 720p requires 3.5MB (1280 x 720 x 4). We see devices with as little as 20MB memory allocated to a hardware-accelerated rendering cache. Moreover, other system resources such as main memory, disk cache, and CPU may also be severely constrained as compared to a mobile phone, laptop, or game console.

How can we squeeze as much performance as possible out of budget devices and add more cinematic animations on game consoles?

We think JavaScript, HTML and CSS are great technologies to build compelling experiences with, such as our HTML 5 player UI. But we wanted more fine-grained control of the graphics layer and wanted optimizations for apps that do not need reflowable content. Our SDK team built a new rendering engine with which we can deliver animations on very resource constrained devices, making it possible to give customers our best UI. We can also enrich the experience with cinematic animations & effects on game consoles.

The second strategy is by grouping devices into performance classes that give us entry points to turn different knobs such as pool sizes, prefetch ranges, effects, animations, and caching, to take advantage of fewer/more resources while maintaining the integrity of the UI design & interaction.

Delivering great experiences

In the coming weeks we will be diving into more details of our JavaScript code base on this blog.

Building the new Netflix experience for TV was a lot of work, but it gave us a chance to be a PlayStation 4 launch partner, productize our biggest A/B test successes of 2013, and delight tens of millions of Netflix customers.

If this excites you and want to help build the future UIs for discovering and watching shows and movies, join our team!

Monday, July 9, 2012

Embracing the Differences : Inside the Netflix API Redesign

As I discussed in my recent blog post on ProgrammableWeb.com, Netflix has found substantial limitations in the traditional one-size-fits-all (OSFA) REST API approach. As a result, we have moved to a new, fully customizable API. The basis for our decision is that Netflix's streaming service is available on more than 800 different device types, almost all of which receive their content from our private APIs. In our experience, we have realized that supporting these myriad device types with an OSFA API, while successful, is not optimal for the API team, the UI teams or Netflix streaming customers. And given that the key audiences for the API are a small group of known developers to which the API team is very close (i.e., mostly internal Netflix UI development teams), we have evolved our API into a platform for API development. Supporting this platform are a few key philosophies, each of which is instrumental in the design of our new system. These philosophies are as follows:

Embrace the Differences of the Devices
Separate Content Gathering from Content Formatting/Delivery
Redefine the Border Between "Client" and "Server"
Distribute Innovation

I will go into more detail below about each of these, including our implementation and what the benefits (and potential detriments) are of this approach. However, each philosophy reflects our top-level goal: to provide whatever is best for the Netflix customer. If we can improve the interaction between the API and our UIs, we have a better chance of making more of our customers happier.

Now, the philosophies…

Embrace the Differences of the Devices

The key driver for this redesigned API is the fact that there are a range of differences across the 800+ device types that we support. Most APIs (including the REST API that Netflix has been using since 2008) treat these devices the same, in a generic way, to make the server-side implementations more efficient. And there is good reason for this approach. Providing an OSFA API allows the API team to maintain a solid contract with a wide range of API consumers because the API team is setting the rules for everyone to follow.

While effective, the problem with the OSFA approach is that its emphasis is to make it convenient for the API provider, not the API consumer. Accordingly, OSFA is ignoring the differences of these devices; the differences that allow us to more optimally take advantage of the rich features offered on each. To give you an idea of these differences, devices may differ on:

Memory capacity or processing power, potentially modifying how much content it can manage at a given time
Requirements for distinct markup formats and broader device proliferation increases the likelihood of this
Document models, some devices may perform better with flatter models, others with more hierarchical
Screen real estate which may impact the content elements that are needed
Document delivery, some performing better with bits streamed across HTTP rather than delivered as a complete document
User interactions, which could influence the metadata fields, delivery method, interaction model, etc.

Our new model is designed to cut against the OSFA paradigm and embrace the differences across devices while supporting those differences equally. To achieve this, our API development platform allows each UI team to create customized endpoints. So the request/response model can be optimized for each team’s UIs to account for unique or divergent device requirements. To support the variability in our request/response model, we need a different kind of architecture, which takes us to the next philosophy...

Separate Content Gathering from Content Formatting/Delivery

In many OSFA implementations, the API is the engine that retrieves the content from the source(s), prepares that payload, and then ultimately delivers it. Historically, this implementation is also how the Netflix REST API has operated, which is loosely represented by the following image.

Diagram showing Netflix UIs interacting with the Netflix REST API

The above diagram shows a rainbow of colors roughly representing some of the different requests needed for the PS3, as an example, to start the Netflix experience. Other UIs will have a similar set of interactions against the OSFA REST API given that they are all required by the API to adhere to roughly the same set of rules. Inside the REST API is the engine that performs the gathering, preparation and delivery of the content (indifferent to which UI made the request).

Our new API has departed from the OSFA API model towards one that enables fine-grained customizations without compromising overall system manageability. To achieve this model, our new architecture clearly separates the operations of content gathering from content formatting and delivery. The following diagram represents this modified architecture:

Diagram showing Netflix UIs interacting with the new optimized Netflix non-REST API

In this new model, the UIs make a single request to a custom endpoint that is designed to specifically handle that request. Behind the endpoint is a handler that parses the request and calls the Java API, which gathers the content by calling back to a range of dependent services. We will discuss in later posts how we do this, particularly in how we parse the requests, trigger calls to dependencies, handle concurrency, support fallbacks, as well as other techniques we use to ensure optimized and accurate gathering of the content. For now, though, I will just say that the content gathering from the Java API is generic and independent of destination, just like the OSFA approach.

After the content has been gathered, however, it is handed off to the formatting and delivery engines which sit on top of the Java API on the server. The diagram represents this layer by showing an array of different devices resting on top of the Java API, each of which corresponds to the custom endpoints for a given UI and/or set of devices. The custom endpoints, as mentioned earlier, support optimized request/response handling for that device, which takes us to the next philosophy...

Redefine the Border Between "Client" and "Server"

The traditional definition of "client code" is all code that lives on a given device or UI. "Server code" is typically defined as the code that resides on the server. The divide between the two is the network border. This is often the case for REST APIs and that border is where the contract between the API provider and API consumer is engaged, as was the case for Netflix’s REST API, as shown below:

Diagram showing the traditional border between client and server code in REST APIs

In our new approach, we are pushing this border back to the server, and with it goes a substantial portion of the UI-specific content processing. All of the code on the device is still considered client code, but some client code now resides on the server. In essence, the client code on the device makes a network call back to a dedicated client adapter that resides on the server behind the custom endpoint. Once back on the server, the adapter (currently written in Groovy) explodes that request out to a series of server-side calls that get the corresponding content (in some cases, roughly the same rainbow of requests that would be handled across HTTP in our old REST API). At that point, the Java APIs perform their content gathering functions and deliver the requested content back to the adapter. Once the adapter has some or all of its content, the adapter processes it for delivery, which includes pruning out unwanted fields, error handling and retries, formatting the response, and delivering the document header and body. All of this processing is custom to the specific UI. This new definition of client/server is represented in the following diagram:

Diagram showing the modified border between client and server code in the optimized Netflix non-REST API

There are two major aspects to this change. First, it allows for more efficient interactions between the device and the server since most calls that otherwise would be going across the network can be handled on the server. Of course, network calls are the most expensive part of the transaction, so reducing the number of network requests improves performance, in some cases by several seconds. The second key component leads us to the final (and perhaps most important) philosophy to this approach, which is the distribution of the work for building out the optimized adapters.

Distribute Innovation

One expected critique with this approach is that as we add more devices and build more UIs for A/B and multivariate tests, there will undoubtedly be myriad adapters needed to support all of these distinct request profiles. How can we innovate rapidly and support such a diverse (and growing) set of interactions? It is critical for us to support the custom adapters, but it is equally important for us to maintain a high rate of innovation across these UIs and devices.

Example of how this new system works:

A device, such as the PS3, makes a single request across the network to load the home screen (This code is written and supported by the PS3 UI team)
A Groovy adapter receives and parses the PS3 request (PS3 UI team)
The adapter explodes that one request into many requests that call the Java API to (PS3 UI team)
Each Java API calls back to a dependent service, concurrently when appropriate, to gather the content needed for that sub-request (API team)
In the Java API, if a dependent service unavailable or returns a 4xx or 5xx, the Java API returns a fallback and/or an error code to the adapter (API team)
Successful Java API transactions then return the content back to the adapter when each thread has completed (API team)
The adapter can handle the responses from each thread progressively or all together, depending on how the UI team wants to handle it (PS3 UI team)
The adapter then manipulates the content, retrieves the wanted (and prunes out the unwanted) elements, handle errors, etc. (PS3 UI team)
The adapter formats the response in preparation for delivery back across the network to the PS3, which includes everything needed for the PS3 home screen in the single payload (PS3 UI team)
The adapter finally handles the delivery of the payload across the network (PS3 UI team)
The device will then parse this optimized response and populate the UI (PS3 UI team)

As described above, pushing some of the client code back to the servers and providing custom endpoints gives us the opportunity to distribute the API development to the UI teams. We are able to do this because the consumers of this private API are the Netflix UI and device teams. Given that the UI teams can create and modify their own adapter code (potentially without any intervention or involvement from the API team), they can be much more nimble in their development. In other words, as long as the content is available in the Java API, the UI teams can change the code that lives on the device to support the user experience and at the same time change the adapter code to deliver the payload needed for that experience. They are no longer bound by server teams dictating the rules and/or being a bottleneck for their development. API innovation is now in the hands of the UI teams! Moreover, because these adapters are isolated from each other, this approach also diminishes the risk of harming other device implementations with tactical changes in their device-specific APIs.

Of course, one drawback to this is that UI teams are often more skilled in technologies like HTML5, CSS3, JavaScript, etc. In this system, they now need to learn server-side technologies and techniques. So far, however, this has been a relatively small issue, especially since our engineering culture is to hire very strong, senior-level engineers who are adaptable, curious and passionate about learning and implementing these kinds of solutions. Another concern is that because the UI teams are implementing server-side adapters, they have the potential to bring down the servers through infinite loops or other processes that are resource intensive. To offset this, we are working on scrubbing engines that will hopefully minimize the likelihood of such mistakes. That said, in the OSFA world, code on the device can just as easily DDOS the server, it is just potentially a bigger problem if it runs on the server.

We are still in the early stages of this new system. Some of our devices have fully migrated over to it, others are split between it and the REST API, and others are just getting their feet wet. In upcoming posts, we will share more about the deeper technical aspects of the system, including the way we handle concurrency, how we manage the adapters, the interaction between the adapters and the Java API, our Groovy implementation, error handling, etc. We will also continue to share the evolution of this system as we learn more about it.

In the meantime, if you are interested in building high-scale, cloud-based solutions such as this one, we are hiring!

Daniel Jacobson (@daniel_jacobson)
Director of Engineering – Netflix API