Free shipping on USA orders over $700.

0

Your Cart is Empty

Word-Length Reduction of Digital Audio

by John Siau November 19, 1999

Word-Length Reduction of Digital Audio

Benchmark NN™ and NS™ Word Length Reduction Systems

An overview of the word length reductions systems incorporated in the AD2404-96 family of converters.

The AD2404-96 and the  SONIC AD2K+ are equipped with two state of the art world length reduction  systems: The Benchmark NN™ (Near Nyquist) system, and the Benchmark NS™ (Noise  Shaped) system. Unlike most competitive systems, the Benchmark NS™ system is  based upon the most current psycho-acoustic models. Furthermore, both Benchmark  systems are unique in that they were optimized while factoring in the noise  contribution of the recording environment.

 

 

The Benchmark NN™ and NS™ word length reduction systems are the result of a cooperative effort between Benchmark Media Systems, Inc., and the Audio Lab at the University of Waterloo in Ontario Canada. We would especially like to thank Stanley P. Lipshitz Ph.D., John Vanderkooy Ph.D., and Robert A. Wannamaker Ph.D.  for their pioneering research and for their significant contributions to the early stages of this project. Special thanks are also in order to Robert Wannamaker for creating and modifying the mathematical algorithms and filter coefficients which are at the core of the Benchmark NN™ and NS™ systems.

We have taken a new approach by optimizing word length reduction for use in three different levels of ambient noise. The ambient noise in a live recording situation is very different from ambient noise level in a studio, and neither can be considered insignificant when reducing 24-bit recording to 16-bits. All prior word-length-reduction systems ignored the effects of this ambient noise and were optimized for noise-free input signals. The Benchmark NN™ and NS™ systems were mathematically optimized while calculating the effects of ambient noise. Three levels of input noise were used and three different curves were produced. NN3™ and NS3™ represent optimal solutions where system noise is limited only by the 24-bit A/D conversion process. NN2™ and NS2™ represent optimal solutions where the ambient noise is 6 dB higher than the converter noise floor. NN1™ and NS1™ represent optimal solutions where the ambient noise is 12 dB higher than the converter noise floor. When properly used, this optimization can improve the dynamic range of a finished 16-bit recording by several decibels.

What is the Appropriate Setting to Use?

In general, NN3™ or NS3™ should be used for extremely low-noise studio recording environments, NN2™ or NS2™ should be used for live recording, while NN1™ and NS1™ should be reserved for noisy recording environments. The greatest possible dynamic range will be achieved when the proper function is selected.

The choice of NN™ versus NS™ is mostly a matter of preference while the choice of curves 1, 2 or 3 is largely dependent upon the dynamic range of the source material. Here are a few general guidelines that should be followed:

1. If very high playback levels are anticipated, (i.e. playback gain will be  high enough for the noise floor to be heard), use the NN™ settings as these  produce natural sounding noise floors. 2. If the source material has been subjected to a prior 16-bit word length  reduction process, select NN3™ for subsequent processing. 3. If low to moderately high playback levels are anticipated (i.e. playback gain  will be low enough that the noise floor is inaudible), use the NS™ settings as  these yield the greatest dynamic range. 4. The NS™ functions can achieve lower psycho-acoustic noise levels than the  corresponding NN™ functions. 5. When in doubt concerning ambient noise levels, use a higher numbered  function. 6. When in doubt concerning anticipated playback levels, use an NN™ function. 7. When totally in doubt, use NN3™ and then try other settings as you gain  familiarity with the system. 8. The mathematically inclined can use the charts in appendix 1 to calculate  dynamic range, and the audibility of the various NN™ and NS™ functions.

While each of the Benchmark NN™ and NS™ processes have been optimized for certain levels of ambient noise contribution, it is important to point out that the processes do not rely on this noise for dithering. TPDF dither is always applied to the 24-bit signal prior to word length reduction. The NN™ and NS™ processes are always fully dithered to insure full randomization of the quantization noise, and to insure that the quantization noise is de-correlated from the audio signal. Any of the NN™ and NS™ processes can be used on any source without the risk of distortion that can result from an inadequate dither process. Furthermore, the NN™ and NS™ processes can be used in cascade without any ill effects other than a slight increase in the noise floor.   

What Happens when NN™ or NS™ Processes are used in Cascade?

Every time the number of generations is doubled, the noise-floor will increase by 3 dB. For example, two passes through a NN™ or NS™ process will reduce the dynamic range by 3 dB (as compared to the results obtained after only one pass).  After 4 passes through a NN™ or NS™ process, the dynamic range will have decreased by an additional 3 dB for a total of 6 dB. And after the 8th pass, the dynamic range will have decreased by a total of 9 dB. The charts below show the results of cascaded processes. Please note that the noise of a first generation 16-bit process is at or slightly above the threshold of audibility. Multiple passes through 16-bit word length reduction processes will raise the noise floor above the threshold of audibility, and should therefore be avoided when possible. If there is no alternative, and 16-bit word length reduction must be cascaded, NN3™ or NS3™ should be used for the first process, while NN3™ should be used for all subsequent processing steps.

Why is NN3™ Recommended for Cascaded 16-bit Processes?

The NN™ processes produce a noise floor that sounds very much like white noise.  On the other hand, the NS™ processes produce a colored noise floor and are best suited for applications where this noise floor is below the threshold of hearing. The advantage of using a NS™ process is that the dither noise will remain inaudible at higher playback levels, than if a corresponding NN™ process were used. IF the NS™ dither noise exceeds the threshold of audibility (due to cascaded processing or very high playback levels), the NS™ process will yield better results. The reason for this is that the natural sound of white noise is less distracting than colored noise even when the colored noise is at a slightly lower level.

Dither

Reducing a 20-bit or 24-bit audio to 16-bits always requires the addition of  dither noise. Failure to add a source of dither prior to each truncation process  will create distortion in the output. Remember that dither noise is of a very  low level, and remains inaudible or nearly inaudible until the gain of a  playback system is made extremely high.

A good word length reduction system will remain inaudible during quiet portions  of a recording, even when the playback system is adjusted to achieve high peak  sound pressure levels. Dither that is audible will tend to mask musical details.  This masking effect of dither increases as the audibility of the dither  increases.

While it is desirable to keep the dither inaudible, it is also necessary to  apply enough dither to fully randomize the noise added by word length reduction.  The Benchmark NN™ and NS™ systems provide full randomization and are carefully  designed for minimum audibility. Benchmark word length reduction systems never  add distortion. They only add very low-level random noise. Unlike other noise  reduction systems, this low-level noise is not effected by the musical signal.

TPDF dither (white noise dither), will always be more audible than the dither  noise produced by a well-designed word length reduction system. The Benchmark  16-bit 44.1-kHz NS3™ system yields a whopping 14 dB improvement over 16-bit  TPDF. 16-bit 44.1 kHz NS3™ will remain inaudible unless play back levels are  adjusted such that a 0 dBFS signal exceeds 107 dB SPL. In contrast, 16-bit TPDF  dither will become audible when playback levels are adjusted such that a 0 dBFS  signal exceeds 93 dB SPL. Please note, at these gain settings, the dither will  only begin to be audible when there is a point of full silence in the recording,  and then only when the room itself is also sufficiently quiet. Dither noise is  never audible in the presence of a 0 dBFS signal.

44.1 kHz word length reduction curves

16-bit NS™ Word Length Reduction Curves for 44.1 kHz

The 16-bit 44.1-kHz NS3™ noise shaping curve (shown above) provides a 14-dB improvement over 16-bit TPDF, in terms of noise audibility.

48 kHz word length reduction curves

16-bit NS™ Word Length Reduction Curves for 48 kHz

The 16-bit, 48-kHz NS3™ curve (shown above) has a 17-dB advantage over 16-bit TPDF.

96 kHz word length reduction curves

16-bit NS™ Word Length Reduction Curves for 96 kHz

The 16-bit, 96-kHz NS3™ curve (shown above) provides a 28-dB improvement over 16-bit TPDF.

Will Dither Noise Damage My Speakers?

Please note that nothing bad happens when the gain of a playback system is increased enough to hear dither noise. Dither will not blow out your speakers, unless possibly someone inadvertently turns on an audio source while the amplifier is in this high gain state! Remember that dither is an extremely low-level signal (much like tape hiss, only of a much lower level).

How is the Performance of a Word Length Reduction System Measured?

The audibility of dither can be expressed in terms of "F-Weighted Noise Power".  The F-weighting function is derived from measurements of the ear's sensitivity  to very low level signals. At 16-bits, 44.1-kHz the F-weighted noise power of  TPDF dither is -93.3 dBFS and the F-weighted noise power of NS3™ is -107.5 dBFS.  In other words, 16-bit TPDF dither provides a 93.3 dB noise-free dynamic range,  while NS3™ provides a much greater 107.5 dB noise-free dynamic range.

All word length reduction systems add noise to the audio, it is a law of mathematics. However, the noise can be placed anywhere within the bandwidth of the digital system. If the noise is evenly spread out over the entire bandwidth (as it is with TPDF dither), the system will yield the lowest possible unweighted noise when measured on an audio analyzer. But, a uniform noise distribution is not the best solution from an audibility standpoint. Our ears are not equally sensitive to all frequencies within the 0 to 22.05 kHz bandwidth of a 44.1 kHz digital system. The audibility of the added noise is greatly reduced when it is concentrated at frequencies where our ears are least sensitive. Near-Nyquist systems reduce noise audibility by concentrating most of the noise energy between 18 kHz and the Nyquist frequency (1/2 of the sample rate) while maintaining a relatively flat and natural sounding noise floor below 16 kHz. Noise-Shaped systems attempt to achieve the greatest possible noise improvement by distributing the noise in a function that is the inverse of the ear’s sensitivity. TPDF will read the lowest on the meters, but will always sound louder than a good word-length-reduction system. Don’t let the meters fool you! Remember, unlike your ears, most meters “hear” equally well at all frequencies.

Avoid Truncation without Dither

There are numerous potential sources of noise within an A/D converter. These may include thermal noise, noise from a delta sigma modulation process, cross talk, clock feed-through, etc. None of these sources of noise are added intentionally, but in many cases these noise sources may be of a high enough level to allow truncation without the addition of a dither signal. For example, many 20-bit A/D converters have enough self-noise to allow their outputs to be truncated to 16-bits without ill effects. Similarly many 24-bit converters have enough self-noise to allow truncation to 20-bits. Many recording engineers have discovered that they can truncate the outputs of their A/D converters without causing distortion. Do not try this with the AD2404-96 or the AD2K+! These are very quiet 24-bit converters, and it do not have enough self-noise to provide adequate dither for truncation to 20-bits (nor 16-bits). For this reason it is imperative to use one of the 20-bit output settings when feeding 20-bit devices, and one of the 16-bit settings when feeding 16-bit devices. Remember that truncation without adequate dither will cause distortion.

One additional caution concerning truncation: Each world length reduction process requires a new source of dither noise. For example, consider a signal starts out at 24-bits and is dithered down 16-bits. Lets suppose we take this 16-bit signal and feed it into a 24-bit digital audio workstation, and apply a minor gain change or a touch of EQ. This 16-bit signal has now become a 24-bit signal inside the workstation. If the final product is going to be 16-bits, a second 24 to 16-bit word length reduction process must be applied. Some have assumed that it is not necessary to add dither in the second process because it was already added in the first process. However, truncation to 16-bits at the output of the workstation will add distortion to the audio. Instead, use the digital to digital word length reduction feature in the AD2404-96 and the AD2K+ to reduce the word length back to 16-bits. Remember every word length reduction process requires a new source of dither noise.

Digital Word Length Increases when a Signal is Processed

Every time a digital signal is processed, its word length increases.  "Processing" includes even simple operations such as level changes and the  mixing of two signals. Word lengths expand dramatically when more complex  operations such as equalization, sample rate conversion, and effects processing  are applied. Long word lengths created by digital signal processing must be  shortened before they can be re-recorded or sent to a DAC for monitoring.

Every time a digital word length is shortened, a new source of noise must be added to the signal prior to truncation. Dither noise that was applied for one truncation operation is not useful for dithering a subsequent truncation operation. Failure to add a new source of noise prior to each truncation process will create distortion. The self-noise of an A/D converter, the noise of the mic-pre, and ambient noise. Again, every word length reduction process requires a new source of dither noise.

Additional Information

Dither by Bob Katz

Dither - Wikipedia


Also in Audio Application Notes

Output spectrum of an overloaded interpolator

Interpolator Overload Distortion

by Benchmark Media Systems November 20, 2024

Most digital playback devices include digital interpolators. These interpolators increase the sample rate of the incoming audio to improve the performance of the playback system. Interpolators are essential in oversampled sigma-delta D/A converters, and in sample rate converters. In general, interpolators have vastly improved the performance of audio D/A converters by eliminating the need for analog brick wall filters. Nevertheless, digital interpolators have brick wall digital filters that can produce unique distortion signatures when they are overloaded.

10% Distortion

An interpolator that performs wonderfully when tested with standard test tones, may overload severely when playing the inter-sample musical peaks that are captured on a typical CD. In our tests, we observed THD+N levels exceeding 10% while interpolator overloads were occurring. The highest levels were produced by devices that included ASRC sample rate converters.

Read Full Post
Audiophile Snake Oil

Audiophile Snake Oil

by John Siau April 05, 2024

The Audiophile Wild West

Audiophiles live in the wild west. $495 will buy an "audiophile fuse" to replace the $1 generic fuse that came in your audio amplifier. $10,000 will buy a set of "audiophile speaker cables" to replace the $20 wires you purchased at the local hardware store. We are told that these $10,000 cables can be improved if we add a set of $300 "cable elevators" to dampen vibrations. You didn't even know that you needed elevators!  And let's not forget to budget at least $200 for each of the "isolation platforms" we will need under our electronic components. Furthermore, it seems that any so-called "audiophile power cord" that costs less than $100, does not belong in a high-end system. And, if cost is no object, there are premium versions of each that can be purchased by the most discerning customers.  A top-of-the line power cord could run $5000. One magazine claims that "the majority of listeners were able to hear the difference between a $5 power cable and a $5,000 power cord". Can you hear the difference? If not, are you really an audiophile?

Read Full Post
Closeup of Plasma Tweeter

Making Sound with Plasma - Hill Plasmatronics Tweeter

by John Siau June 06, 2023

At the 2023 AXPONA show in Chicago, I had the opportunity to see and hear the Hill Plasmatronics tweeter. I also had the great pleasure of meeting Dr. Alan Hill, the physicist who invented this unique device.

The plasma driver has no moving parts and no diaphragm. Sound is emitted directly from the thermal expansion and contraction of an electrically sustained plasma. The plasma is generated within a stream of helium gas. In the demonstration, there was a large helium tank on the floor with a sufficient supply for several hours of listening.

Hill Plasmatronics Tweeter Demonstration - AXPONA 2023

While a tank of helium, tubing, high voltage power supplies, and the smell of smoke may not be appropriate for every living room, this was absolutely the best thing I experienced at the show!

- John Siau

Read Full Post