In the past decade, more and more music has become available in “high-definition” (HD) digital formats, such as 24-bit 192KHz and DSD. Now I hear talk about developing a new 32-bit 384KHz standard for HD music. Interestingly enough, not everyone agrees that greater bit depth and higher sampling rates are good things.
This blog attempts to explain the math and physics so you can decide for yourself if this is progress or simply marketing madness.
If you don’t want to wade through a lot of technical data, you may want to skip to the summary, where I hit all the major points. You also may want to refer to my other blog on “DSD vs. PCM: Myth vs. Truth.”
Sampling Rate and Bit Depth
The process of converting analog sound waves into digital numbers is known as “quantization,” which is often represented as points plotted on an XY axis.
Sampling rate is the frequency at which the amplitude of the analog sound wave is sampled. The 44.1KHz specified for Red Book CDs sample the amplitude of the music 44,100 times each second. Sampling rate is represented by the X axis.
Bit depth translates to the number of steps the amplitude of the analog sound wave is divided into at each sampling. A 16-bit recording has 65,536 steps, a 20-bit recording has 1,048,576 steps, and a 24-bit recording has 16,777,216 steps. Bit depth is represented by the Y axis.
The more bits and/or the higher the sampling rate used in quantization, the higher the theoretical resolution. For example, a 20-bit 96KHz recording has roughly 33 times the resolution of a 16-bit 44.1KHz recording. No small difference. Later I’ll explain the difference between theoretical and actual resolution.
Dynamic range is the difference in volume between the quietest and the loudest passage, commonly measured in decibels (db).
Here are some examples of dynamic range that you can relate to:
- The sound of a mosquito flying 3 meters away is 0db.
- The hum of an incandescent bulb at 1 meter away is 10db.
- The background noise in a quiet recording studio is 20db.
- The background noise in a normal quiet room is about 30db.
- Early analog master tape had a dynamic range of only 60db.
- LP micro-groove records have a dynamic range of 65db.
- Dolby increased analog master tape dynamic range to 90db.
- The sound of a jackhammer at 1 meter away is 110db.
- The sound of a full orchestra at 1 meter away is 120db.
- Over 130db causes irreparable hearing loss.
- The sound of a jet aircraft at takeoff is 140db.
Dynamic Range and Bit Depth
- 16-bit Red Book CDs have a dynamic range of over 96db.
- 20-bit digital master tape has a dynamic range of over 120db.
- 24-bit modern formats have a dynamic range of over 144db.
That means you can’t even appreciate the difference between a 20-bit and a 24-bit recording unless it’s played at a volume that would cause permanent hearing loss.
But wait…isn’t the background noise in a quiet room 30db?
So you can’t actually hear the difference between the dynamic range of a 16-bit recording and a 20-bit recording unless you turn the volume up high enough above the background noise that it could cause permanent hearing loss.
Also note that in order to appreciate the dynamic range difference between 16 bits and 20 bits, you would need to be in an ultralow-noise environment, such as a recording studio, with treated walls, isolated AC power, and 100% balanced electronics. I’ll get into how noise floor relates to all of this a bit later.
Bits, Bytes, and Digital Words
So why did 24-bit become the new standard? When digital data is transferred and manipulated, it’s moved in bytes rather than as individual bits. There are 8 bits to a byte, and a byte is known as a digital word. This is why everything in the digital world is divisible by 8. So 16 bits = 2 bytes and 24 bits = 3 bytes, and both 16 bits and 24 bits became standard because each represented the next digital word.
Historical note: The 16-bit format existed long before 16-bit digital-to-analog converters (DAC) were commercially available. The same is true of the 24-bit format.
Theoretical vs. Actual Resolution
According to mathematical theory, sampling at more than twice the maximum audible frequency only plots more points along the same curves when the digital signal is converted back into an analog waveform. So in order to correctly sample a 20KHz note, the maximum frequency human ears can hear, you would need to sample at greater than 40KHz. The 44.1KHz sampling rate of a Red Book CD was engineered to allow a 20KHz sound to be recorded accurately.
Also, quantizing incremental steps that represent voltages that are lower than the noise floor of the power supply in the DAC, are not discernible. According to the experts that manufacture the finest DAC chips, resistors, and power regulators in the world, there is theoretically no way to make electronics that are capable of discerning greater than a 20-bit resolution.
Any company that claims greater than 20-bit resolution from their DAC is simply full of shit. Oh they can decode 24-bits, because 24-bits does exist in software, but the output from their DAC has less than 20-bits of resolution and dynamic range.
Then why use these insanely high sampling rates and bit depths? The reason is that higher resolution digital formats minimize quantization errors and quantization noise when editing, mixing, and mastering the recording in a studio environment. These higher resolution digital formats truly only exist in a software and are not capable of existing in actual sound reproduction.
Any type of digital recording produces quantization noise that requires a low-pass filter at the output of the converter so as not to overload amplification and speakers with ultrahigh-frequency noise. DSD has significant amounts of noise just above 25KHz. Higher sampling rates are used to put quantization noise up into higher ranges that make it easier to design better low-pass filters. Of course, some still argue that even with these high sampling rates and sophisticated filters, many digital artifacts, such as uncontrolled intermodulation distortion, still exist in the audible range.
As for bit depth, even though recordings are edited, mixed, and mastered in a 24-bit or higher format, when recording studios do the final mastering, they significantly reduce bit depth to a dynamic range that is actually discernible. For example, even though many HD recordings are released in a 24-bit format, the actual music on the recording was mastered at 20 bits or less (usually less).
Another consideration of higher sampling rates and greater bit depth is system resources. Both require more storage space, more RAM, and faster processors. Though the minimum sampling frequency and bit depth that are required to reproduce accurate music are a matter of heated debate, there is no doubt that excessive resolution simply takes up unnecessary space and unnecessarily increases the size and cost of components.
Dynamic range expresses the loudest possible sound, and noise floor expresses the quietest. If you want to hear the least significant bit (LSB) on a recording, the volume (or voltage) of that bit has to be above the noise floor of both the room and the equipment in your system. We already know that a quiet room has a background noise level of about 30db that we need to rise above. Even after the equipment is playing above the 30db room noise, the power supply of the electronics will mask the LSB if the peak-to-peak voltage of the noise in the power supply is not less than the voltage of the LSB.
Based on a 2.5V output on a DAC (higher than average), below are the voltages power supply noise must be below in order to hear the LSB:
- 16-bit LSB noise floor voltage = 76uV
- 18-bit LSB noise floor voltage = 19uV
- 20-bit LSB noise floor voltage = 4.75uV
- 24-bit LSB noise floor voltage = 0.3uV
For a reference, a common LM317 regulator, the quality used in most commercial electronics, has about 150uV peak-to-peak noise, and the world’s lowest noise power supplies (we’re talking NASA, not audiophile) have about 5uV of peak-to-peak noise. That means even with the most sophisticated linear power supplies or batteries available today, 20-bit is theoretically the highest playback resolution and dynamic range possible.
Playback Equipment Requirements
There are very few systems, even among the best-of-the-best, that can accurately play back the full 120db dynamic range of a 20-bit recording. This is why few recordings are even released at the full capacity of 20bits, let alone the 144db dynamic range of a 24-bit recording. Keep in mind that the maximum dynamic range of LP records is only about 60db. Even Dolby analog master tapes had a maximum of about 90db dynamic range.
So that 120db live music can be played on most high-end audiophile systems, recording studios need to limit the dynamic range using a process called “dynamic compression.” The process of dynamic compression makes the quieter passages relatively louder and the louder passages relatively more quiet. This makes it easier to discern low-level details from the louder passages. Dynamic compression is part of what gives recorded music the illusion of having more detail and focus than live music.
There was wisdom to the LP record and the analog tape standards. Manufacturers knew that for every 3db consumers raised the volume, they would have to double the wattage of their amplifier and double the output of the speakers. So keeping the dynamic range of home audio under 60db is what allowed home entertainment equipment to be affordable, of modest size, and relatively high-fidelity.
A 60 db dynamic range on top of a 30db background noise equals 90db. Do you want to listen to more than 90db in your home? More importantly, for every additional 3db you increase, you would need to double the wattage of your amplifier and the output of your speakers.
All things being equal, to go from 90db output up to 99db, you would need an amplifier with 8 times the wattage and speakers with 8 times the output. To accurately reproduce a recording at 120db, you would need an amplifier with 1,000 times the wattage and speakers with 1,000 times the output than it would take to reproduce the same recording at 90db. I don’t know about you, but a system like that will neither fit in my room nor my budget.
Well, all that’s a real ear opener, isn’t it?
Bit depth translates to the number of steps in the amplitude of a digital recording. A 16-bit recording has 65,536 steps, a 20-bit recording has 1,048,576 steps, and a 24-bit recording has 16,777,216 steps.
A 20-bit 96KHz recording has roughly 33 times the resolution of a 16-bit 44.1KHz recording.
Using the lowest noise power supplies, the most sophisticated grounding, and the most sophisticated resonance control currently available in a digital-to-analog converter, you can’t resolve the least significant bit on a 20-bit recording.
In reality, there are no DACs in the world that are capable of discerning greater than an 20-bit resolution. So any company that claims greater than 20-bit resolution from their DAC is simply full of shit. Oh they can decode 24-bits, because 24-bits does exist in software, but the output from their DAC has less than 20-bits of resolution and dynamic range.
Of course that doesn’t even account for the significant amount of distortion added by signal cables, amplification, and speakers, all of which would not allow resolving even an 18-bit recording.
In order to reproduce anywhere near the dynamic range these high-res formats offer, you would need amplification with several times the wattage and a fraction of the noise floor of what is currently available to the high-end audiophile.
In order to appreciate the difference in resolution between a 16-bit and 20-bit recording, you would need to be in an ultralow-noise environment, such as a recording studio, with treated walls, isolated AC power, and 100% balanced electronics.
In order to hear the difference in dynamic range between a 16-bit and a 24-bit recording, you would have to play the music so loud it would cause permanent hearing loss.
When people claim to hear differences between 16-bit, 20-bit, and 24-bit recordings, it is not the difference between the bit depths that they are hearing, but rather the difference in the quality of the digital mastering. The fact is that even most so-called 24-bit recordings are mastered with less than 16-bit dynamic range (and wisely so).
Part of why HD recordings sound sterile has to do with lower dynamic compression that doesn’t allow the subtle low-level detail to rise above the noise floor. When music is sanely dynamically compressed, it allows you to listen at a reasonable volume and still hear all the subtle harmonic cues that reveal the tone, timbre, and room acoustics in the recording.
There is no doubt that excessively high-res formats take up unnecessary storage space and require unnecessary system resources in terms of faster processors and additional RAM.
Of course most recordings are engineered to sound best on a car stereo or portable device as opposed to on a high-end audiophile system. It’s a well-known fact that artists and producers will often listen to tracks on an MP3 player or car stereo before approving the final mix.
I believe that the quality of the recording plays a far more significant role than the format or resolution it is distributed in. Too bad most of the big recording houses don’t agree with me. To increase profits, recording studio executives insisted that errors be edited out in postproduction, significantly compromising the quality of the original master tapes. In my opinion, this was the end of the golden age of recording.
In contrast, some of my favorite digital recordings were digitally mastered from 1950s analog recordings made on tube-based reel-to-reels. When you hear the organic character and coherent in-the-room harmonics, it is clear why so many audiophiles prize these recordings.
I also believe that the simpler the signal path and the lower the power supply noise, the better the digital-to-analog conversion. Hence my decades of obsession with R-2R nonoversampling conversion and ultralow-noise power supplies, as are used in the Mystique DAC.
Hear It for Yourself
Are you curious about the potential of digital-to-analog conversion? Mojo Audio’s Mystique DAC has the purest digital conversion possible.
- A true nonoversampling R-2R multi-bit design
- No noise-shaping, upsampling, or oversampling algorithms
- MSB zero-crossing voltage adjustment circuitry to optimize linearity
- Perfectly bit-aligned L to R channel hardware-based demultiplexing
- Direct-coupled: no caps or transformers to distort phase and time
The Mystique is in a class by itself. Explosive micro-dynamics combined with harmonically coherent micro-details reveal the true time, tune, tone, and timbre of the original performance.
Of course with Mojo Audio’s 45-day no-risk audition, you can hear the Mystique DAC for yourself, in your own system. Experience all the purity and emotional content digital music is capable of delivering.
If you like what you read in this blog and are interested in getting more free tips and tricks, sign up for Mojo Audio’s Audiofiles blog. Also, sign up for our e-newsletter to get more useful info as well as coupons, special offers, and first looks at new products. Plus, don’t forget to “like us” on Facebook
Owner, Mojo Audio
Note: many of the graphics used in this blog were adapted from graphics taken from these reference sources.