Image default
Guides Reviews Video Encoders

YouTube Upload Quality Investigation: Does Source Codec Matter?

A viewer question recently sent me down the rabbit hole of trying to figure out which source codec (H264, HEVC, AV1, ProRes, etc.) produces the best image quality on YouTube. The results are underwhelming, but I thought it might be interesting to see the investigation to learn how YouTube’s systems work.

tl;dr

Upload in the best quality you can, and don’t worry about it. YouTube quality is constantly changing and evolving, and there’s no real way to min-max it other than to upscale your videos to 4K for maximum bitrate/quality transcodes. Beyond that, don’t worry about it.

This investigation answers 3 primary questions:

  1. Which codec is ACTUALLY best to upload to YouTube, both for 4K and 1080p videos?
  2. Does the file container – that being MP4, MOV, MKV, and so on – impact quality you get from YouTube’s processing?
  3. Does uploading in 4K – since I often recommend upscaling to 4K for better quality transcode results due to the higher bitrates 4K gets – does that 4K upload improve or worsen the quality of the 1440p and 1080p versions of those videos, compared to natively uploading at those resolutions?

Methodology

To analyze which video codec is best to encode your videos with for YouTube, I needed to create a lossless sample of a couple different gameplay clips and a color graded camera clip. For this, I experimented with a few different lossless formats and settled on the newly-popular FFV1 in a MKV container. FFV1 is mathematically lossless, is now one of the top preferred formats for the Library of Congress for preservation and archival, and is well supported in new versions of Resolve. MKV container is necessary for analyzing quality with the tools available, as the timescale and framerate data in MKV most closely matches what YouTube utilizes in the WEBM container for their transcodes, and using any other source container resulted in abysmal and inaccurate quality scores due to frame timings being not perfectly matched.

I then encoded this video to a variety of different codecs one might use for uploading to YouTube. HEVC, H264, ProRes 422 HQ, DNxHR, and so on – I even threw in some curveballs like PhotoJPEG and MagicYUV. Most of these were encoded using the Auto-Best settings I recommended in my Resolve Export Settings video, since this is what most people would be using for their exports. For a couple tests, I did do some encodes using higher quality settings via FFMPEG, but they were not the primary focus.

Next, I uploaded the videos to YouTube, and gave them 24 hours to process. On my main EposVox channel, I get transcodes immediately most of the time, but I often notice multiple passes of transcoding happening on videos, and the second pass tends to look better than the first, so I wanted time for those to take place.

Lastly, I acquired those transcodes from YouTube (using yt-dlp) and ran them through VMAF – Netflix’s perceptual quality analysis algorithm that aims to quantify image quality based on realistic viewer perception.

Aside

As I mentioned in the video, YouTube processing is not static, there are no hard locked-in rules. As an example of the bizarre things that sometimes happens, take this example video I caught as I’m archiving the Rooster Teeth channels given the recent news:

The SAME video has BOTH a 2:1 24FPS transcode AND a 16:9 30FPS transcode made of it. Stored separately on YouTube’s servers as the “best” quality options, presumably for different means of distribution. This is a first for me, but I’ve certainly seen other wacky stuff like this before.

Anyway… pixels.

File Sizes

Here you can see a comparison of the final file sizes (in gigabytes) of the different codecs used for this experiment. This helps inform your choices, as bigger files will mean longer upload times to YouTube.

Codec Analysis

The results are… quite surprising, but also not super consistent. Again, YouTube processing is a fickle, ever-changing beast, and you’re not going to get identical results twice in a row, most of the time.

Just looking at the numbers, there’s some obvious conclusions. DNxHR HQ and HQX 10-bit, ProRes 422 HQ, FFV1, Cineform YUV 10-bit, and Grass Valley HQX all tend to provide the absolute best results. These are the least-lossy and most bulky codecs here, so from that angle it’s not surprising that they result in higher-quality transcodes on YouTube. But operating from an assumption that YouTube might have higher-quality transcoding from H264 in their hardware due to… any number of reasons, it’s surprising to see these win.

First group is mixed gameplay footage from games such as God of War 2, Halo Infinite, Call of Duty Black Ops Cold War, Diablo 4, and TOXIKK

At the same time, other lossless formats with RGB color space and AVI containers score absolutely horribly. The previous set of bulky files already took far longer than traditional H264 and H265 to process on YouTube, but these – such as Cineform RGB 16-bit and MagicYUV – took even longer and often got big gamma shifts in the end result on YouTube and thus scored worse.

Second sample source clip is straight Halo Infinite gameplay captured in lossless UTVideo. This chart shows the RGB export codecs (such as MagicYUV and Cineform RGB) versus the standard YUV “lossless” codecs.

But what about the compressed formats? Which should you really be using?

Well, with this first sample, H264 Native (that is, CPU encoded rather than encoded on GPUs) scored the best, but HEVC native and H264 encoded with Nvidia NVENC were less than 1 point difference below it.

Apple’s HEVC hardware encoder is just about 1 point below all of this, with Nvidia’s HEVC surprisingly 2 or 3 points below H264. With Apple’s H264 being down at the bottom.

This was surprising to me. While I didn’t have much faith in the quality of Apple’s compressed codecs on my M2 Ultra Mac Studio, I expected NVENC HEVC to perform better here. It’s seen major quality improvements over the years and in my overall testing, is quite a powerful encoder. Something’s off.

 This time HEVC and H264 Native are neck in neck – meaning there’s no real quality difference on YouTube’s end between the two codecs, HEVC will just provide a better file size, and Nvidia NVENC H264 not far enough behind to make a difference. This time, Apple H264 and HEVC is a few points below these codecs, but where’s Nvidia NVENC HEVC? Wayyyyy below this, about 9 points lower than everyone else. That’s SIGNIFICANT.

My first angle to figure out what’s going on was to test the encoded samples BEFORE uploading to YouTube to compare – and sure enough, NVENC HEVC is still a couple points below Native H264 and HEVC, though not quite as drastic as the YouTube Transcoded copies present.

To test this further, I encoded my lossless sample to lossless HEVC and H264 using NVENC in FFMPEG, and these samples shot to the tops of the charts. Obviously being a fully lossless encode versus the auto-best settings in DaVinci Resolve perhaps provide an unfair advantage here, but it proves that HEVC, or NVENC HEVC specifically, isn’t getting some sort of limitation on YouTube’s end due to some weird stuff in their black box of processing.

The problem seems to be a bug or misconfiguration in how DaVinci Resolve handles encoding files with NVENC HEVC. I was able to replicate this over and over, where with the same samples Resolve would produce a 4.43GB file using the Native HEVC encoder, but a mere 318MB file with NVENC HEVC. Native averaged around 478mbps bitrate for this file, with NVENC HEVC averaging just 32MB. I’ve reported this to Nvidia and they’re investigating. This will likely be fixed in future Resolve updates, as I’ve encountered this kind of weirdness before.

Regardless of the bug, we can plainly see that there’s no disadvantage to choosing HEVC, but you may want to choose Native HEVC over your GPU-encoded HEVC copies. I did not get to test Intel Arc or AMD AMF encoders here, but they will fall similarly in line.

Finally, I ran this comparison on a bit of camera footage shot in RAW and color graded with my full stack and film grain and so on. Similar results here, though the NVENC H264 encoder was at the top of these common compressed codecs.

 

If we take a look visually – as the numbers only get us so far, especially with 1 point differences – there’s… honestly not much to see here. Sometimes GPU-encoded HEVC gets a little smoother in areas where H264 would have more detail at the cost of more artifacting. But most surprisingly, the top-scoring samples from the lossless side of things like DNxHR don’t actually look any better in the final result than H264 or H265. Yes, you can over-analyze every frame and find DIFFERENCES, but you’d be hard-pressed to make a compelling argument that one was better because typically one codec just has artifacts in different places than the other.

If you want the best possible quality sent to YouTube – send DNxHR HQX or ProRes 422 HQ, or perhaps FFV1 so you can have a lossless master to keep. But given the massive file size differences involved there AND the longer processing times on YouTube – it’s probably not worth the trade-off for most people.

Does File Container Matter?

Next, I wanted to see if file container mattered. File containers are the different ways video can be stored, usually reflected in the file extension – MP4, MOV, MKV, AVI, and so on. Each container can hold a variety of different codecs, with overlap. Thankfully I found in all of my samples that MP4, MKV, MOV, did not matter for the same source content. Either identical scores were achieved, or an insignificant difference between them.

Does Bit Depth or Chroma Subsampling Matter?

A probably unsurprising result was seeing that for the compressed codecs (which most of you would be uploading), utilizing 10-bit or higher chroma subsampling like 422 or 444 generally speaking did not help scores, if anything they scored worse. At least with the gameplay samples.

This surprised me because it should be a more efficiently-encoded file, but also YouTube only outputs 8-bit 4:2:0 right now, so their whole pipeline would be focused on that. I did notice with my camera footage however – which was recorded from 16-bit RAW 8K video scaled to 4K and had tons of film grain and other little details in it – that the 10-bit copies were actually scoring better in many cases than the non-10-bit versions of the same codec, but it’s not a consistent enough trend to matter.

Does uploading 4K improve or worsen 1440/1080p compared to native?

One of the lessons here would be more tangible evidence of just how high quality your video could be if you upscaled to 1440p or 4K instead of uploading native 1080p. But I get asked a lot if upscaling has any impact on the transcoded 1080p copy of your video? That is to say, does uploading in 4K net you better 1080p or 1440p quality than uploading directly to 1080p or 1440p?

Honestly? The answer’s inconclusive. Scores go back and forth where some samples are better in the 4K transcoded copy and some are better in the native resolution copy. Ultimately this means you aren’t LOSING any quality by upscaling to 4K or 1440p, so that’s a win.

Conclusion

This was a fun investigation, but ultimately the conclusion is pretty simple: Upload in the highest quality you can. If you have the internet speed and patience to upload a DNxHR, ProRes, or FFV1 master file – go for it, you’ll have a wonderful copy preserved for years to come AND YouTube will get the best quality it can. If you prefer the expedient workflow of H264 and HEVC – go for it. I still see nothing wrong with recommending HEVC, and the quality difference of NVENC HEVC versus Native or H264 (even with the reduced bitrate bug) is not honestly noticeable at all.

You can find my VMAF data, sample links, and other data in this Google Sheet if you want to explore a bit more.

So I stand by my recommendations in my Resolve Export Settings video – but perhaps you’ve learned a thing or two about how YouTube processing works.

Related posts

Elgato Stream Deck Mk.2 Review – Is a Stream Deck even WORTH IT in 2021?

EposVox

X264 is NOT worth it & a BAD benchmark – GN Follow-Up (Image Quality Analysis)

EposVox

Audio Technica AT2040 & Rode PodMic Review (Head to Head)

EposVox