Opus is weird because it can use fewer bits to encode complex sounds, and more bits on simple sounds. MP3/Vorbis/AAC all do the opposite. I’ve read it is because it has relatively poor frequency precision, for which it gains excellent time precision. So, on purer sounds it has to use more bits to maintain accuracy. But it’s so counterintuitive, that when I also consider it can’t do a 44.1k sample rate, I stick with vorbis for high-bitrate CD rips, just so I don’t have to wonder if artifacts are hiding somewhere. I think most of the testing by far has been on the low-bitrate encoding where it really excels.
Edit to add: I had my whole collection encoded with Opus at one point, and could never discern it from the FLACs in a few ABX tests I ran. It’s only theoretical problems that were nagging at me that made me switch back, nothing I actually heard.
If you worry about artifacts introduced by sample rate conversion, you shouldn't use a lossy format in the first place. The sample rate converter used by Opus (i.e., the speex resampler used in the opus-tools library) is completely transparent and does not introduce any audible artifacts. As per [1], the distortion caused by any lossy codec even at the highest bitrates is larger than that caused by re-sampling.
As for playback, most likely your sound card is already running at 48kHz; 44.1kHz may actually not be supported properly by your DAC (I guess since it requires a higher quality anti-aliasing filter). As [1] continues to explain, Opus is essentially shifting the burden of resampling to the encoding rather than the decoding side of things.
That being said, Opus technically supports odd sample rates such as 44.1kHz, but this has to be signalled in a side-channel. See [1] downwards.
Yep, I've read all that before. I didn't mean to focus the discussion on the resampling--what I was trying to get across is, this codec acts differently than codecs that have been extensively tested and ABXed at high bitrates for years. I didn't even mention other factors like how it injects noise into bands on purpose (where you can also find references claiming that's a benefit and not a downside, of course). It was about a year ago, but beyond my own ABX testing I looked around quite a bit, and didn't see many high-bitrate tests out there. All the focus seemed to be on the 64kbit range.
This should not matter to me personally, as I have proven to myself that pretty low bitrates are transparent to me, regardless of the codec. But... I have the same psychosis that a lot of people have, where I think I can hear differences when I know which is which.
If space were an issue I'd use 90kbit/s opus (that was the threshold for me in my testing). It's actually pretty amazing, but since I have the storage space, I archive FLAC and carry around 256kbit/s vorbis, and don't even question the quality. It's easier to use more space than to fix my faulty perception!
I didn’t hear a difference, and some even claim it’s a benefit since a lot of cheap hardware only speaks 48k, so it’s better to encode for it in advance than resample during playback. But, what can I say? It nagged at me in a totally unscientific way that I can encode without a resample to vorbis, but opus makes me resample.
You're probably noticing that the two numbers are not multiples of one another and time samples are discrete so the only way to convert sample rates between 44.1kHz and 48kHz is by interpolating. Going from 96kHz to 48kHz is easier, you just drop half the samples.
Obviously with low pass filtering for artifacts due to the nyquist frequency. But you are correct that it won't be a one-to-one correspondence between the audio samples like you'd get if it divided evenly.
If you had 24kHz encoded music would it bother you if it got upsampled to 48kHz? What if it was 26kHz? I think the prior would bother me less because I know the samples are synchronous in the two time bases.
Yeah the forced interpolation nags me, even though it shouldn't, but also: if I'm going to encode 128 kilobits per second, I can either use those bits to produce 44k samples or I can use them to produce 48k samples. That's 9% more samples per second that need to come out of the same number of compressed bits. I'm sure there are reasons (like the high correlation between adjacent samples) why that doesn't matter. But, would you resize a 4400px image to 4800px before compressing it to a jpeg? No way, because if you target the same file size either way, you'd encode more bits per pixel from the 4400px original.
I haven't read the code but it sounded like they encode in frequency space so if they're already putting all the bits into encoding below 20kHz it seems like it would not change the size (as 44.1kHz to 48kHz already have no bits allocated to it).
Since the MDCT is discrete, I assume it operates on power-of-2-sized batches of samples. So (like you, without looking at the code) I would have assumed that more samples/s mean you need more transform blocks, which means you have to allocate fewer output bits per output block to hit your target rate.
You are probably right. I forgot about the whole power of two thing for ffts. That would definitely irritate the same part of my brain that would be put off by interpolating discrete samples even if they're inaudible. Same vein as how 7 is more random than 6.
That actually does make some sense to me, the same thing is done for video encoding, sort of.
If there's a static scene it encodes it at super high quality (but makes up for it by saying "don't change" for a while).
But if there's a fast scene a lot of details can get smudged without anyone noticing. I think people only tend to notice block artifacts with steps in luminosity during dark scenes, but I think you have to have a really bad encoding for that to be an issue in 2020.
Edit to add: I had my whole collection encoded with Opus at one point, and could never discern it from the FLACs in a few ABX tests I ran. It’s only theoretical problems that were nagging at me that made me switch back, nothing I actually heard.