We now have 16-bit CDs and 24-bit high-resolution recordings available to us. What are the advantages of a 24-bit word length? Are 24-bit recordings better? How many bits do we really need?
Bit depth (also known as word length) indicates how many bits are used to represent each sample in a digital sampling system. Each sample is a snapshot of a signal or voltage at an instant in time. The CD uses 16 bits to represent the voltage of an audio waveform at each instant in time. Other digital audio systems use different bit depths ranging from 1 to 64 bits. It is important to understand the relationship between bit depth and audio quality. The bit depth sets the absolute maximum signal to noise ratio (SNR) that can be represented by a digital system, but there are more factors to consider. Let's look at how this works:
We live in a decimal world, but digital systems operate in a binary world. In our decimal world, we can count from 0 to 9 with just one digit. If we add a second digit, we can count from 0 to 99. Adding a third digit allows us to count from 0 to 999. In our decimal system, every added digit expands our ability to count by a factor of 10.
In a binary system, we use "bits" instead of "digits". A bit can have only 2 values (0 or 1). A single bit allows us to count from 0 to 1. If we add a second bit, we can represent 4 unique numbers (00, 01, 10, and 11). In a binary system, every additional bit expands our ability to count by a factor of 2.
An 8-bit number can have 256 unique values ranging from 00000000 to 11111111 (in unsigned binary), or 0 to 255 (in decimal numbers). A 16-bit number can represent 65,536 unique values ranging from 0000000000000000 to 1111111111111111 (in unsigned binary), or 0 to 65,535 (in decimal numbers). A 24-bit number can represent 16,777,216 unique values.
If we represent an analog audio voltage accurately, we have a very small error. These errors are composed of noise and distortion. When we sample an analog voltage, we "quantize" it to a set of available digital values. If we use a high bit depth we have many binary values available, and we can quantize accurately. Noise and distortion are produced when we round off the the nearest unique value in our digital system. Every added bit cuts the noise and distortion in half. A factor of two is a 6 dB improvement.
Every added bit improves the SNR by about 6 dB when measured over the entire bandwidth of the audio channel. The maximum SNR of a digital system is about 6 dB times the number of bits per sample. Therefore, a 16-bit system gives us a maximum SNR of 16x6=96 dB. Likewise, a 24-bit system give us a maximum SNR of 24x6=144 dB.
Notice that I said "maximum" SNR. This is the theoretical limit of the digital channel. It does not imply that 24-bit recordings have a 144 dB SNR, nor does it imply that 16-bit recordings have a 96 dB SNR. But it does mean that a 16-bit recording can never have an SNR that is better than 96 dB, and a 24-bit recording can never have an SNR that is better than 144 dB.
Clearly, a 24-bit recording has the capability of being better than a 16-bit recording, but there is absolutely no guarantee that this is the case. Even if the 24-bit recording is better, there is no guarantee that this difference will be audible after it passes through the playback system. A noisy playback system will obscure all of the advantages promised by the 24-bit marketing people.
If 16 is better than 8, and 24 is better than 16, where do we stop? What about 32 bits or 64 bits? Where does this end?
The reality is that every additional bit comes with a cost. A 17-bit system needs twice the accuracy of a 16-bit system in order to make good use of the additional bit. Likewise an 18-bit system needs four times the accuracy of a 16-bit system in order to make good use of the two additional bits. A 24-bit system needs 256 times the accuracy of a 16-bit system in order to make full use of the additional 8 bits!
The additional bits must also be stored, processed, and transmitted. All of this has a cost, and at some point, we bite off more bits than we can chew.
At a bit depth of 16 to 22 bits, the accuracy of our digital notation begins to exceed the accuracy of the analog audio being captured by our digital system. At a bit depth of about 21 to 22 bits, we begin to exceed the capabilities of the very best audio A/D and D/A converters. And, at a bit depth of about 22-bits, we begin to exceed the capabilities of human hearing. The last limit, human hearing, is a hard and fast limit that will not change. The others will improve as technology progresses.
The quietest sound that a person with normal hearing can detect is 0 dB SPL (sound pressure level). The threshold of pain occurs at about a 130 dB SPL. If we were to create an audio system that had inaudible noise and could produce sounds reaching the threshold of pain, we would need a 130 dB SNR. This is equivalent to the SNR of a 21 to 22-bit digital system.
Over time, the quality of analog audio equipment has improved, and the quality of converters has improved, but the capabilities of the human ear has not changed. Ideally, the limits of our audio systems should exceed the limits of our hearing. Such a system will not reach obsolescence when audio equipment continues to improve. A 24-bit digital transmission system is capable of exceeding the limits of the human ear by a reasonable margin. For this reason, it should not be necessary to extend the bit-depth beyond 24-bits for the distribution of audio recordings.
A 24-bit finished product is more than sufficient to meet the needs of the human ear, but the production of these recordings will usually require the use of greater bit depths while mixing and editing. Digital filters and complex digital processing often require extended bit-depths to avoid the noise and distortion that can be created by cascaded roundoff errors. For this reason, internal 32-bit and 64-bit digital processing is now common in most professional digital audio workstations that support 24-bit inputs and outputs.
Likewise, the Benchmark DAC2 D/A accepts 24-bit inputs, but uses much greater bit-depths internally in order to prevent the buildup of noise and distortion due to roundoff errors. The DAC2 has a 32-bit digital to analog converter at its core. These internal resources allow the DAC2 to make good use of the 24-bit digital input. It is important to understand that the 24-bit input is more than sufficient to receive everything that the human ear can detect or tolerate. The large internal bit depths allow useful functions such as filtering and volume control without a loss of resolution.
If the production goal is to create a high-quality 16-bit finished product, the mixing still needs to be done using bit depths of at least 24-bits. Audio quality degrades very quickly when processing is executed at a 16-bit word length. For this reason, it is now standard practice to preserve at least 24-bits until the very last stage in production where the recording is rendered into a 16-bit master.
Quantization errors (roundoff errors) in a digital system can create noise or distortion or both. The sum of the noise and distortion will never be lower than the theoretical SNR of the digital channel. In most cases, distortion is the more objectionable defect. This is especially true when the distortion is not harmonically related to the music. Unfortunately, quantization errors naturally produce non-harmonic distortion products unless we use special processing techniques.
Fortunately there are techniques for manipulating the quantization errors so that they are less audible or less offensive. All of the quantization error energy can be manipulated so that it produces random noise but no distortion. This manipulation of the error energy is done using techniques known as "dither" and "noise shaping".
Dither is a random noise signal that is added to a signal before it is quantized into a limited set of digital numbers. Dither can randomize the quantization error energy so that it is evenly and randomly distributed over the entire bandwidth of the digital channel. When this is done properly, the quantization errors produce nothing other than white noise.
Dither is generally considered essential in a 16-bit system. It is less important, but still beneficial, in a 24-bit system.
Dither will usually increase the noise in the channel by about 3 dB, but it will eliminate all of the distortion that would have been created by quantization. A dithered 16-bit system can achieve a 93 dB SNR. This is a respectable level of performance, and it will push the limits of most playback systems. Nevertheless, additional improvements can be made using noise shaping.
Noise shaping is a technique for moving quantization noise to a desired portion of the channel bandwidth. In audio systems, noise is often moved to ultrasonic or near-ultrasonic frequencies where it is far more difficult to detect.
Dither is usually applied to randomize the quantization noise and eliminate distortion before noise shaping is applied. The resulting noise is then shaped or moved into the desired band using noise-shaping filters. Noise shaping is commonly used in 16-bit recordings, but is never used in 24-bit recordings.
Noise shaping is often used to improve the perceived SNR of a 16-bit delivery system. This technique can extend the quality of 16-bit recordings so that they rival the performance of a 20-bit system. Noise shaping can give the impression that the noise is as much as 118 below the peak level of the music. Noise shaping actually increases the overall noise energy in a channel, but it hides the noise at frequencies that are harder to hear.
Noise shaping must be applied when a recording is mastered and it is only fully effective when the input signal has a higher bit-depth and a very low noise level.
Noise shaping increases the ultrasonic noise in an audio system and therefore it must be used sparingly. Normally this means that noise shaping should only be used once in the production process. It also must be the very last process that is applied when rendering the final 16-bit master.
24-bit releases are often remastered from the original and may take advantage of newer production equipment and techniques. These differences will vastly overshadow any differences that are purely due to changes in bit depth. It is not uncommon to hear real differences in the sound of 24-bit re-releases. But these differences are almost certainly not related to the extended bit depth.
If we ignore any differences in mastering and if we ignore the possible benefits of the higher sample rate that is usually used with a 24-bit release, does a 24-bit release still offer advantages?
The answer is that a 24-bit release can be slightly quieter than a 16-bit release if the recording itself has a sufficient SNR. When a recording is sufficiently quiet, the advantages of a 24-bit delivery format will only be detectable if the playback system has an adequate SNR. The difference will also require sufficient playback volume to raise the noise above the threshold of hearing. Lets look at the numbers:
Earlier, we established that our ears have a 130 dB dynamic range. Without noise shaping, a 16-bit recording can only achieve an SNR of 93 dB. This falls well short of the 130 dB dynamic range of our auditory system. With noise shaping, a 16-bit recording can achieve a perceived SNR of about 118 dB. This is much closer to the 130 dB limit of our ears, but still somewhat short of an ideal system. In contrast, a 24-bit delivery format is capable of transmitting an incredible 141 dB SNR. The 24-bit format exceeds the capabilities of our ears and our recording equipment. It also exceeds the capabilities of all playback systems, so how much improvement can we really achieve with a 24-bit delivery format?
If our playback system were entirely noise-free, and we cranked the system up to a level that produced ear-splitting 130 dB SPL peaks, the 24-bit system could be 12 dB quieter than the noise-shaped 16-bit system. If we turned the system down by 12 dB, there would be no perceptible difference in the noise, and no perceptible improvement provided by the 24-bit word length. Remember we assumed an entirely noise-free playback system, and still we need to hit some really high playback levels before the 24-bit system offers any advantages!
The biggest bottleneck in most playback systems is the SNR of the playback equipment, not the lack of 24-bit source material. In most cases, the playback equipment will not even be capable of rendering the 118 dB perceived SNR that can be transmitted on a noise-shaped 16-bit channel.
Benchmark has addressed this limitation with the DAC2 D/A converter and the AHB2 power amplifier. The DAC2 can deliver a 126 dB A-weighted SNR and this can be delivered to the speakers using the 132 dB A-weighted SNR of the AHB2 power amplifier. Together these Benchmark products should be capable of rendering the noise difference between a 24-bit channel and a noise-shaped 16-bit channel if the playback level is high enough. Together these components achieve true high-resolution performance, and they may justify the move to 24-bit delivery formats.
But what is far more important is the fact that the Benchmark system is fully capable of rendering the 118 dB perceived SNR that can be delivered on a 16-bit format. To date, the Benchmark system may be the only playback system that can make this claim.
The bottom line is that the playback hardware is the bottleneck. Properly mastered noise-shaped 16-bit recordings will not limit the SNR of the playback experience.
It is very important to understand that improved noise performance is the only advantage offered by the extended bit depth of a 24-bit system. But, in most cases, this noise advantage is totally obscured by the noise limitations of the playback system.
The real advantage provided by 24-bit systems is the ability to record and produce releases that can fully utilize the SNR available in a 16-bit system. High-quality 16-bit recordings cannot be produced using a 16-bit processing chain. But a 24-bit processing chain can create 16-bit recordings that push the limits of most playback systems. There is certainly no harm in delivering a 24-bit product to the end user, but it may offer nothing more than what could have been delivered on a 16-bit format.
This discussion ignores the possible advantages provided by the higher sample rates that usually accompany 24-bit formats. The possible advantages of sample rate are an entirely separate issue.
At Benchmark, listening is the final exam that determines if a design passes from engineering to production. When all of the measurements show that a product is working flawlessly, we spend time listening for issues that may not have shown up on the test station. If we hear something, we go back and figure out how to measure what we heard. We then add this test to our arsenal of measurements.
Benchmark's listening room is equipped with a variety of signal sources, amplifiers and loudspeakers, including the selection of nearfield monitors shown in the photo. It is also equipped with ABX switch boxes that can be used to switch sources while the music is playing.
Benchmark's lab is equipped with Audio Precision test stations that include the top-of-the-line APx555 and the older AP2722 and AP2522. We don't just use these test stations for R&D - every product must pass a full set of tests on one of our Audio Precision test stations before it ships from our factory in Syracuse, NY.
Paul Seydor of The Absolute Sound interviews John Siau, VP and chief designer at Benchmark Media Systems. The interview accompanies Paul's review of the LA4 in the December, 2020 issue of TAS.
"At Benchmark, listening is the final exam that determines if a design passes from engineering to production. But since listening tests are never perfect, it’s essential we develop measurements for each artifact we identify in a listening test. An APx555 test set has far more resolution than human hearing, but it has no intelligence. We have to tell it exactly what to measure and how to measure it. When we hear something we cannot measure, we are not doing the right measurements. If we just listen, redesign, then repeat, we may arrive at a solution that just masks the artifact with another less-objectionable artifact. But if we focus on eliminating every artifact that we can measure, we can quickly converge on a solution that approaches sonic transparency. If we can measure an artifact, we don't try to determine if it’s low enough to be inaudible, we simply try to eliminate it."
- John Siau