Buy 2 components and get 4 free cables. Free shipping on USA orders over $700.


Your Cart is Empty

Listening vs. Measuring

Listening vs. Measuring

At Benchmark, listening is the final exam that determines if a design passes from engineering to production. When all of the measurements show that a product is working flawlessly, we spend time listening for issues that may not have shown up on the test station. If we hear something, we go back and figure out how to measure what we heard. We then add this test to our arsenal of measurements.

Benchmark's Listening Room A

Benchmark's Listening Room

Benchmark's listening room is equipped with a variety of signal sources, amplifiers and loudspeakers, including the selection of nearfield monitors shown below. It is also equipped with switch boxes that can be used to switch sources while the music is playing.

Loudspeaker collection

Nearfield Monitor Test in Benchmark's Listening Room

Level Matching is Critical

One challenge with listening tests is that a slight difference in level will skew the results. We match levels to a precision of +/- 0.1 dB when conducting listening tests. Slight differences in levels can be noticeable and these differences can skew the tests, especially when we are listening for small differences.

ABX Listening Tests - Eliminating Listener Bias

Another challenge of listening tests it that it can be hard for the listener to remain objective if the listener knows the identity of the sources. If we think we hear a difference between two sources, we often turn to an ABX test to verify that the difference is audible. We have a relay-controlled ABX test box that can be used to compare two DACs, two preamplifiers, or two power amplifiers. We also have software-based ABX players that can be used to compare two digital recordings. In both cases, the ABX test equipment allows the listener to compare an unknown source "X" with source "A" and source "B". In each of a series of trials, "X" is a random selection of either "A" or "B". In each trial, the listener can switch between A, B, and X in any order, any speed, and any number of times before identifying X as either A or B. A high score of correct trials is proof that an audible difference was heard by the listener. The photo below shows the remote control for our ABX test set.

ABX Test Set - Remote Control


Like all listening tests, an ABX test has limitations. An ABX test can prove that an audible difference exists, but it does not provide any indication as to which source sounds better. However, if the input and output of a device or DSP process are compared, an ABX test can confirm that the device or process creates an audible difference. If the goal is transparency, this audible difference would prove that a defect exists. Some critics of ABX tests suggest that long-term effects such as distortion-induced listener fatigue, cannot be detected in an ABX test. The important take away is that every listening test has limitations and it is important to understand those limitations before jumping to conclusions.

Benchmark's Test Lab

Our engineering lab is equipped with an Audio Precision APx555b test station similar to the one shown below. This is absolutely the finest audio test station available. We also have a selection of older test stations including an AP2722 and an AP2522. These tools are indispensable when it comes to detecting defects, including many that are inaudible.


APX555 panel

APX555 Test Set

Measurement Techniques Must be Driven by Listening Tests

Listening tests are never perfect and for this reason it is essential that we develop measurements for each artifact that we identify in a listening test. An APx555 test set has far more resolution than human hearing, but it has no intelligence. We have to tell it exactly what to measure and how to measure it. When we hear something that we cannot measure, we are just not doing the right measurements.

Benchmark Signal Chain


Listening Tests Reveal Problems but not the Root Cause

Any design process that relies solely on listening tests is doomed to fail. If we just listen, redesign, and then repeat, we fail to identify the root cause of the defect and we never approach perfection. We may arrive at a solution that just masks the artifact with another less-objectionable artifact. On the other hand if we focus on eliminating every artifact that we can measure, we can quickly converge on a solution that approaches sonic transparency. At Benchmark, if we can measure an artifact, we don't try to determine if it is low enough to be inaudible, we simply try to eliminate it. This process eliminates all but the most elusive artifacts.

Audible Artifacts that Elude Traditional Measurements

To date, one of the most elusive artifacts that we have encountered is the issue of intersample overs. These are intersample peaks that exceed 0 dBFS while the sample values themselves never exceed 0 dBFS. These peaks can reach +3 dBFS and can cause DSP overloads in fixed-point PCM sigma-delta converters and sample rate converters. It is important to note that the DSP overloads are caused by the finite boundaries of the fixed-point math and not by some inherent defect in PCM or in the upsampling process.

The figure below shows a +3 dBFS A/D conversion. The analog audio (blue line) is correctly represented by the samples (shown in red). Notice that the samples are just reaching the maximum digital codes (represented as 1 and -1 in the figure below). In a 16-bit PCM system, "1" would correspond to +32,767 and "-1" would correspond to -32,768. To simplify this discussion we will use 1 and -1 to represent the limits of the digital codes. The peak of the sinusoidal waveform is reaching 1.414 and -1.414 which is well beyond the maximum numeric values that fixed-point PCM digital systems are designed to handle.

Intersample Overs graph

The original analog waveform (blue line) can be reconstructed with exact precision by placing an analog lowpass filter at the output of a non-oversampled D/A converter. In contrast, most oversampled D/A converters will clip this signal. The next figure shows a fixed-point 4X interpolator attempting to insert interpolated samples prior to reconstruction. There are 3 clipped samples on each peak of the sine wave. The output of this traditional sigma-delta DAC will resemble a square wave when attempting to reconstruct this high-amplitude sine wave.

Intersample with Clipping graph

The result of this clipping is best described as artificial snare or hi-hat sound that is added to the music. This clipping also tends to add an artificial brightness to the apparent frequency response. In our listening tests, sample rate converters with THD+N better than -135 dB (0.000018%) had an audible impact on the sound and we could not explain this with any of the conventional audio measurements. We can't hear distortion at -135 dB, but we were hearing something! Eventually we discovered that intersample peaks could overload fixed-point DSP processing when interpolating in a reconstruction filter, and the resulting THD was very high (several percent). Once we identified the root cause, the test was easy to perform. We now have an 11.025 kHz test signal, at a sample rate of 44.1 kHz, that contains a clean tone at +3.01 dBFS. Our new DAC2 and DAC3 converters will interpolate without clipping and pass this tone without distortion. The figure below shows a high-headroom 4X interpolator that increases the sample rate by a factor of 4. The interpolator in the DAC2 and DAC3 converters runs at a much higher ratio, but this figure illustrates the process.

High-Headroom DAC graph

Virtually all other D/A converters will distort. This artifact is probably the primary audible difference between PCM and DSD and it is probably one of the most significant differences between oversampled and non-oversampled "ladder DAC" converters.

Non-oversampled converters are immune to intersample over problems, but they have a different set of problems that often cause audible artifacts. These defects tend to elude detection in basic audio measurements. These "ladder DACs" have linearity problems that are caused by mismatched binary-weighted elements. It is physically impossible to match weighted elements to anything much better than 16-bit accuracy, and this matching changes with temperature. Consequently, all non-oversampled ladder DACs have significant linearity problems. The worst linearity error would normally occur at the zero crossing, but chip manufacturers typically add a DC offset to move this error away from the zero crossing. This greatly improves the way the DAC measures in traditional THD+N tests, and it may provide a slight improvement in the sound. Nevertheless, this trick probably does more for the spec sheet than it does for the listener. If we run an IMD test or a linearity test, these cleverly-hidden defects are exposed. If we fail to run the right tests, we could fail to detect the defects that are inherent in all ladder DACs.

Cheating on Tests

I guess you could say that ladder DACs cheat on tests. They do this by clever design. They only get caught when we administer a tougher test. In school, a student may be able to cheat on a multiple-choice test, but they will have a hard time cheating on a written test. Tests should be designed to catch cheaters.

One such test is the Dynamic Range test. In a sense, it was created to prevent cheating on SNR (signal to noise) tests. Some early compact disk players had auto-mute circuits that shut off the D/A converter when no data was present. One beneficial result was a significant reduction of output noise when a track was paused or stopped. A second benefit was an outstanding SNR on the spec sheet.

The Dynamic Range test adds a -60 dB tone when measuring the output noise of an audio device. This low-level tone is intended to defeat auto-mute circuits and its presence in the output verifies the validity of the noise portion of the measurement sequence.

Please note that the DC offset and auto-mute circuits described above were not necessarily created to enhance product spec sheets. Both circuits delivered some benefit to the listener. The unfortunate side effect was an overly generous "grade report" on the spec sheet.

Curiously, some circuits have a knack for cheating without human intervention. One-bit sigma-delta converters produce low-level idle tones when they are not playing audio. These low-level tones reduce the SNR of the converter, but they elude detection in the Dynamic Range test. The -60 dB tone, used for the noise measurement, breaks up the idle tones, producing an overly optimistic Dynamic Range measurement. I guess we could say that 1-bit converters cheat on the test that was intended to catch cheaters! What is more impressive is that nobody told them how to cheat.

The lesson from these examples is that we need to do a comprehensive set of tests on audio circuits before concluding that they are defect free. The circuits will cheat if they can!

If we were to run a very comprehensive set of standard audio tests on D/A converters, the best multi-bit sigma-delta DACs would measure much better than the best ladder DACs or 1-bit DSD DACs. Given a choice between a DSD (1-bit sigma delta DAC), non-oversampled PCM (ladder DAC) and oversampled PCM (multi-bit sigma-delta DAC), our measurements would clearly show that the third choice should be the most transparent. Nevertheless, an early prototype of the DAC2 failed the final exam, it failed the listening test. Our listening tests revealed the inter-sample over problem described earlier. Once the root cause was identified with lab measurements, we were able to fix the prototype and add a new test to our arsenal. If we run this new +3 dBFS inter-sample over test, traditional multi-bit sigma-delta DACs fail miserably, but our DAC2 and DAC3 converters pass with honors.

The audible defect was not caused by oversampling or by some inherent defect in PCM, it was caused by a lack of headroom in the fixed-point DSP processing. We solved the intersample over problem by reducing the amplitude of the incoming audio signal by 3 dB before oversampling. We also added enough analog headroom to render +3 dBFS peaks without clipping. The results are better than those obtained with ladder DACs or with 1-bit DSD DACs.

The DAC2 and DAC3 passed the final exam, and as far as we can tell, they haven't cheated on the test. Nevertheless, we keep a close eye on them and on each of our products.

Benchmark Listening Room A side

Benchmark Listening Room A front

Also in Audio Application Notes

Audiophile Snake Oil

Audiophile Snake Oil

by John Siau April 05, 2024

The Audiophile Wild West

Audiophiles live in the wild west. $495 will buy an "audiophile fuse" to replace the $1 generic fuse that came in your audio amplifier. $10,000 will buy a set of "audiophile speaker cables" to replace the $20 wires you purchased at the local hardware store. We are told that these $10,000 cables can be improved if we add a set of $300 "cable elevators" to dampen vibrations. You didn't even know that you needed elevators!  And let's not forget to budget at least $200 for each of the "isolation platforms" we will need under our electronic components. Furthermore, it seems that any so-called "audiophile power cord" that costs less than $100, does not belong in a high-end system. And, if cost is no object, there are premium versions of each that can be purchased by the most discerning customers.  A top-of-the line power cord could run $5000. One magazine claims that "the majority of listeners were able to hear the difference between a $5 power cable and a $5,000 power cord". Can you hear the difference? If not, are you really an audiophile?

Read Full Post
Closeup of Plasma Tweeter

Making Sound with Plasma - Hill Plasmatronics Tweeter

by John Siau June 06, 2023

At the 2023 AXPONA show in Chicago, I had the opportunity to see and hear the Hill Plasmatronics tweeter. I also had the great pleasure of meeting Dr. Alan Hill, the physicist who invented this unique device.

The plasma driver has no moving parts and no diaphragm. Sound is emitted directly from the thermal expansion and contraction of an electrically sustained plasma. The plasma is generated within a stream of helium gas. In the demonstration, there was a large helium tank on the floor with a sufficient supply for several hours of listening.

Hill Plasmatronics Tweeter Demonstration - AXPONA 2023

While a tank of helium, tubing, high voltage power supplies, and the smell of smoke may not be appropriate for every living room, this was absolutely the best thing I experienced at the show!

- John Siau

Read Full Post
Benchmark AHB2 Power Amplifier

Audio Calculators

by John Siau June 04, 2023

We have added an "Audio Calculators" section to our webpage. Click "Calculators" on the top menu to see more like these:

THD % to dB Converter

Results update on input change.


THD dB to % Converter

Results update on input change.


Read Full Post