It's a fascinating look into how they produce multiple encoded formats from source video, but something stuck out at me... One feature in their ingestion system is meant to reject source inputs that would lead to a poor viewer experience. This doesn't seem to include interlaced NTSC video as a "poor experience" metric. I'm willing to bet that the majority of the screens on which Star Trek: Deep Space Nine is played are progressive scan, yet that show is interlaced at NTSC resolution.
In cases like this, where the source material is analog and, I would have to assume, not available in progressive scan, is there a technical reason why Netflix doesn't de-interlace the source before encoding? DS9 seems big enough of a catalog to be worth encoding in a less jarring scan rate.
Not complaining, mind you. The DVDs are interlaced too. Seems that the original recordings were NTSC only. Honestly curious if anyone has any insight.
TNG was like this (sourced from a video version and with a very nasty interlace look) on Netflix up to a couple of months ago. Now they stream the magnificent remastered blu-ray HD version which is sourced from the original film with fresh CGI since the original CGI was tied to the video.
It's not clear whether DS9 will get the same treatment.
YES! I'm insanely sensitive to this too. I have to interrupt group TV watching to play around with the TV's menus and turn off the "automatically simulate 60fps" setting that so many Smart TVs have on by default these days.
My mother can't tell the difference at all. I'm pretty sure this is just something that only a subset of the population notices enough to care about.
The simulated 60 FPS (or higher) are significantly worse than just 60 FPS source material. TVs attempt to interpolate between the frames automatically and it just screws everything up.
Yes! I didn't mind the HFR Hobbit movies at all, but I can't stand the interpolated HFR that comes out of smart TVs. Once you start to notice the artifacts (often distortion or weird juddering in the background,) it becomes much less enjoyable to watch than the standard uninterpolated 24fps content. I look forward to a future of real HFR content, not fake interpolated crap.
And of course, smart TV HFR butchers hand-drawn animation.
Personally, I find the higher framerates valuable enough that I'm willing to put up with the artifacts. They're easy to see if you're looking for them, but I think most of them are pretty ignorable, although there are some pathological cases (any time the scene pans across regular vertical bars e.g. blinds).
I do production and post production for a living. I didn't like HFR in Hobbit. The lighting was off, so was motion blur - cheapening the effect of it. Maybe with further tweaking it will look 'filmic' eventually. I do like it in 3D though. Also, high frame rate is AWESOME for (almost) anything live TV, and I think that's a real future of it. For example: https://www.youtube.com/watch?v=XVXQlkpaC5k
I'm very much inclined to think this is just what people get used to. You're used to watching lower framerate, so your brain is yelling "something is different" all the time you're watching.
I got used to higher framerates (mostly via interpolation), and now 24fps stuff feels really jerky to me, almost like stop motion.
I've wondered if this could be fairly easily emulated with some playback filters locally if a viewer has a preference of experiencing material with lesser fidelity (CD v. vinyl comes to mind where it's almost more about ceremony and romanticism than technical merits 95%+ of the time to reject the hi-fi version, "remastering" insanity notwithstanding).
If you look at video game emulation, CRT shaders are in vogue for that very reason. Would be quite interesting if someone were to apply the same filters to video playback.
CD vs Vinyl is not a fidelity issue but actually a compression that is a part of analog which is what people call "warm" sounding. A good analog recording has just as much or more fidelity then digital. An analog recording can actually reproduce a accurate sound wave while a digital can technically never be able to do that. (Former sound engineer)
However, similar to how we perceive ultra low frequency more as a physical vibration than as sound, these harmonics above 20 kHz are probably merely annoying and subtly felt and impact one major issue of enjoyability - listening fatigue. So maybe we don't want those frequencies from a musical perspective anyway.
Regardless, there's hardly any equipment in use by even "audiophiles" on full-blown analog setups that can faithfully reproduce sounds beyond 30 kHz because of electronics design limitations in themselves rather than an analog-digital end user recording format distinction. In fact, a lot of vinyl historically had to be mastered with a low-pass filter cutting off a lot of the high frequencies because historically with sufficiently high enough energy in high frequencies, the needle would be tougher to control and fly right off the track sometimes (one explanation I read from an audio engineer - really not sure about that logic, but there's definitely low AND high pass filtering on vinyl that makes it lower fidelity in many respects than the master).
Bugs the crap out of me to see people claim vinyl is superior on technical merits rather than aesthetic ones (sound preference / taste is real). You'd think they're climate change deniers with their insistence and rhetoric. But this is what I meant by purists about the "original" - higher fidelity and clarity is oftentimes not what people desire.
You're simply incorrect. Within a defined boundary (e.g. the absolute limits of human hearing) digital audio can reproduce an audio signal perfectly. Most analog recording systems are incapable of this.
That isn't Netflix's job. A post house delievers the raw .mov files to Netflix. The post house gets the files or tapes from the studio or whomever manages the libary for them (aka another post house). The post house has to comply to a set of standard's set forth by Netflix or it is automatically rejected. Most of the time things like no interlacing is in those requirements but also there can be exceptions if no other source material is available. iTunes works the same way. So say the post house that delievers to Netflix is only getting digital raw .movs and not the actual tapes, they may not be allowed to re-encode those or have access to the tapes to do the correct conversion.
If you deinterlace an interlaced source before encoding you either discard information or store twice as much. And potentially people watching on an interlaced screen lose data - not all deinterlace/interlace pairings roundtrip cleanly. Better to store it in the original format, and then it can go through one round of deinterlacing on playback for those screens that need it, and none for those that don't.
(Also deinterlacing approaches get better over time - if a particular episode entered their catalogue 10 years ago and was deinterlaced using the state of the art approach at the time, it would look much worse than a modern deinterlace)
One of the overriding constraints on these sorts of video pipeline tasks is touching the input pixels as few times as possible. Every lossy transformation you do (cropping, scaling, color correction, transcoding) potentially introduces defects; and that's assuming that your lossless transformations (repackaging container formats, for instance) are bug-free.
I'd love to see some figures from Netflix' QC team; I bet at their scale they see all kinds of insane edge case problems.
Uhm, no, due to the way modern video formats work, you don't store twice as much data - on the contrary, H.264 and similar modern formats are significantly more efficient at storing progressive (including deinterlaced) video than equivalent interlaced stream.
Nope. Modern (and even ancient) video codecs can store interlaced data just as efficiently as progressive data - how could it be otherwise? But when you deinterlace e.g. a 30 frames per second interlaced source, either you store the result as 60 frames per second (twice as much data), or you lossily downsample to 30 frames per second.
If size of catalog trumps video quality, you roll with NTSC if NTSC is the best you can get. Deinterlacing is extremely tricky, and if you can deliver interlaced video to playback devices, you're often better off punting.
I'm envious of these capabilities. I wrote the backend for a streaming service that streams about forty channels to a quarter of a million mobile users in Africa. We get source video from our content providers that has given me grey hair. The problem is that I simply can't spin off the encoding and source quality checks to the cloud because of bandwidth costs here. So I do the quality checking and compression on local servers and then upload the compressed output to the servers.
I'd kill for a 100 megabit line at a decent price.
I've done a bunch of playing with ffmpeg at home, and I imagine the tech stack is probably similar at Netflix, at least for the Source-->Chunk, Chunk-->Assemble, and Assembled-->Encode steps.
The validation done during all of these is interesting. Netflix's early years are probably exactly like what you're doing - single file in, transcode, single file out and deploy.
Chunking the pieces up is clever. Getting it right must have been challenging. How do you write an oracle for something that complex?
Fascinating article. Forgive me for being ignorant about this, but how often do they need to encode video?
I would think that it's only done when new titles are added to streaming, and once the video has been encoded into all the required formats they would be done with it. Sure, there is a lot of video content out there to be encoded but it isn't unlimited. Is serving the content and providing the recommendation engine to users at scale not a greater challenge than encoding the video?
There are lots of reasons a reencode might be needed. For example, a new compression algorithm is developed, a new device is supported with a new codec, a new way of giving users a faster startup is developed, etc.
Basically any change to the way video is delivered over the internet could trigger a full or partial reencode of the entire library.
Its another one of their challenges. Time is an important factor, encoding high quality video is time consuming, multiply that by the many bitrates and codecs it could mean that they need to delay their content availability by days.
When I built a similar system (for a large consumer electronics company) we built parallel paths for high-priority content; we'd intelligently split the incoming mezzanine and distribute the 90-120s chunks to a farm of systems, while also completing the multi-pass encodes. When the latter finished, the system would swap them. Because of the business model, we never ran this in full production mode, but it was built and ready to go.
Generally it would take 2-3x of original duration of video to encode a source into a 1080p, so I am not sure why they take full 1 day? unless they do each bitrate serially which I think is not as hard to parallelize as it is to parallalize single bit rate by chunking.
Yes, I believe serving is lot harder, but serving is almost a solved problem since people are dealing with for long time.
>I am not sure why they take full 1 day? unless they do each bitrate serially which I think is not as hard to parallelize as it is to parallalize single bit rate by chunking.
I think the talk I know this from is https://www.youtube.com/watch?v=tQrsz3BrfwU - they chunk not only for encode but also for QC (and QC validation on the resulting transcoded asset).
If memory serves the talk also discussed the long transcode time, because their transcoder (EyeIO at the time and I have not heard differently since) is optimised for efficient packing over performance
For x264 that is true, HEVC which is also mentioned is much slower. For a 4k source transcoding can take more than a second per frame. For a normal movie this can quickly result in encoding times of more than a day.
Another problem is that you have to encode the movie for each codec profile times the number of different bitrates per profile. The article mentions four profiles (VC1, H.264/AVC Baseline, H.264/AVC Main and HEVC) and bitrates ranging from 100 kbps to 16 Mbps. Assuming now there are 20 different bitrates per code you already get 4*20 => 80 encoded copies per source. But of course this can be solved by parallelism.
Are there any codecs that can output multiple versions of an input at the same time? Seems like a lot of the encoding process (like motion estimation) is the same every time, so why do it once for every output instead of reusing it?
That would be interesting to know. A lot of transcoders can make multiple passes over the source, so being able to reuse the meta data generated for subsequent passes at different output qualities might help speed up the process. I dunno, not my forte, just thinking out loud.
Well do they not get re encode once in a while? I am pretty sure the x264 encoder now is significantly better then the one 3 - 4 years ago.
Same goes to HEVC.
I doubt that they do. The older streams if it is so old that meanwhile significantly better encoders have come to the market then probably very few people are watching those old streams.
I would be shocked if they didn't roll their catalog. Maybe not the whole thing every time, but pulling their sources and re-encoding the complete suite when a new bitrate/codec combination comes on line seems like a sensible use of resources.
I was wondering the same thing. I find it highly unlikely that they are doing this processing with in-house created tools, although their limited set of input and output codecs shrinks this down from "practically impossible" to "improbable".
It's possible their AVC encoder is not x264, and their VC1 encoder might be from Microsoft, in which case it would be a pretty modified ffmpeg. And some of the input formats might not be using it either.
But you can be sure it's involved somewhere, since there's no other ProRes decoder that works on Linux!
I would be surprised if Netflix cannot pay the patent fees. The ffmpeg page mentions that commercial ffmpeg users end up paying the groups like MPEG LA. In general, I'm guessing patent owners don't like saying "no, we don't want to accept your money".
In cases like this, where the source material is analog and, I would have to assume, not available in progressive scan, is there a technical reason why Netflix doesn't de-interlace the source before encoding? DS9 seems big enough of a catalog to be worth encoding in a less jarring scan rate.
Not complaining, mind you. The DVDs are interlaced too. Seems that the original recordings were NTSC only. Honestly curious if anyone has any insight.