At Benchmark, listening is the final exam that determines if a design passes from engineering to production. When all of the measurements show that a product is working flawlessly, we spend time listening for issues that may not have shown up on the test station. If we hear something, we go back and figure out how to measure what we heard. We then add this test to our arsenal of measurements.
Benchmark's listening room is equipped with a variety of signal sources, amplifiers and loudspeakers, including the selection of nearfield monitors shown below. It is also equipped with switch boxes that can be used to switch sources while the music is playing.
Nearfield Monitor Test in Benchmark's Listening Room
One challenge with listening tests is that a slight difference in level will skew the results. We match levels to a precision of +/- 0.1 dB when conducting listening tests. Slight differences in levels can be noticeable and these differences can skew the tests, especially when we are listening for small differences.
Another challenge of listening tests it that it can be hard for the listener to remain objective if the listener knows the identity of the sources. If we think we hear a difference between two sources, we often turn to an ABX test to verify that the difference is audible. We have a relay-controlled ABX test box that can be used to compare two DACs, two preamplifiers, or two power amplifiers. We also have software-based ABX players that can be used to compare two digital recordings. In both cases, the ABX test equipment allows the listener to compare an unknown source "X" with source "A" and source "B". In each of a series of trials, "X" is a random selection of either "A" or "B". In each trial, the listener can switch between A, B, and X in any order, any speed, and any number of times before identifying X as either A or B. A high score of correct trials is proof that an audible difference was heard by the listener. The photo below shows the remote control for our ABX test set.
Like all listening tests, an ABX test has limitations. An ABX test can prove that an audible difference exists, but it does not provide any indication as to which source sounds better. However, if the input and output of a device or DSP process are compared, an ABX test can confirm that the device or process creates an audible difference. If the goal is transparency, this audible difference would prove that a defect exists. Some critics of ABX tests suggest that long-term effects such as distortion-induced listener fatigue, cannot be detected in an ABX test. The important take away is that every listening test has limitations and it is important to understand those limitations before jumping to conclusions.
Our engineering lab is equipped with an Audio Precision APx555b test station similar to the one shown below. This is absolutely the finest audio test station available. We also have a selection of older test stations including an AP2722 and an AP2522. These tools are indispensable when it comes to detecting defects, including many that are inaudible.
APX555 Test Set
Listening tests are never perfect and for this reason it is essential that we develop measurements for each artifact that we identify in a listening test. An APx555 test set has far more resolution than human hearing, but it has no intelligence. We have to tell it exactly what to measure and how to measure it. When we hear something that we cannot measure, we are just not doing the right measurements.
Any design process that relies solely on listening tests is doomed to fail. If we just listen, redesign, and then repeat, we fail to identify the root cause of the defect and we never approach perfection. We may arrive at a solution that just masks the artifact with another less-objectionable artifact. On the other hand if we focus on eliminating every artifact that we can measure, we can quickly converge on a solution that approaches sonic transparency. At Benchmark, if we can measure an artifact, we don't try to determine if it is low enough to be inaudible, we simply try to eliminate it. This process eliminates all but the most elusive artifacts.
To date, one of the most elusive artifacts that we have encountered is the issue of intersample overs. These are intersample peaks that exceed 0 dBFS while the sample values themselves never exceed 0 dBFS. These peaks can reach +3 dBFS and can cause DSP overloads in fixed-point PCM sigma-delta converters and sample rate converters. It is important to note that the DSP overloads are caused by the finite boundaries of the fixed-point math and not by some inherent defect in PCM or in the upsampling process.
The figure below shows a +3 dBFS A/D conversion. The analog audio (blue line) is correctly represented by the samples (shown in red). Notice that the samples are just reaching the maximum digital codes (represented as 1 and -1 in the figure below). In a 16-bit PCM system, "1" would correspond to +32,767 and "-1" would correspond to -32,768. To simplify this discussion we will use 1 and -1 to represent the limits of the digital codes. The peak of the sinusoidal waveform is reaching 1.414 and -1.414 which is well beyond the maximum numeric values that fixed-point PCM digital systems are designed to handle.
The original analog waveform (blue line) can be reconstructed with exact precision by placing an analog lowpass filter at the output of a non-oversampled D/A converter. In contrast, most oversampled D/A converters will clip this signal. The next figure shows a fixed-point 4X interpolator attempting to insert interpolated samples prior to reconstruction. There are 3 clipped samples on each peak of the sine wave. The output of this traditional sigma-delta DAC will resemble a square wave when attempting to reconstruct this high-amplitude sine wave.
The result of this clipping is best described as artificial snare or hi-hat sound that is added to the music. This clipping also tends to add an artificial brightness to the apparent frequency response. In our listening tests, sample rate converters with THD+N better than -135 dB (0.000018%) had an audible impact on the sound and we could not explain this with any of the conventional audio measurements. We can't hear distortion at -135 dB, but we were hearing something! Eventually we discovered that intersample peaks could overload fixed-point DSP processing when interpolating in a reconstruction filter, and the resulting THD was very high (several percent). Once we identified the root cause, the test was easy to perform. We now have an 11.025 kHz test signal, at a sample rate of 44.1 kHz, that contains a clean tone at +3.01 dBFS. Our new DAC2 and DAC3 converters will interpolate without clipping and pass this tone without distortion. The figure below shows a high-headroom 4X interpolator that increases the sample rate by a factor of 4. The interpolator in the DAC2 and DAC3 converters runs at a much higher ratio, but this figure illustrates the process.
Virtually all other D/A converters will distort. This artifact is probably the primary audible difference between PCM and DSD and it is probably one of the most significant differences between oversampled and non-oversampled "ladder DAC" converters.
Non-oversampled converters are immune to intersample over problems, but they have a different set of problems that often cause audible artifacts. These defects tend to elude detection in basic audio measurements. These "ladder DACs" have linearity problems that are caused by mismatched binary-weighted elements. It is physically impossible to match weighted elements to anything much better than 16-bit accuracy, and this matching changes with temperature. Consequently, all non-oversampled ladder DACs have significant linearity problems. The worst linearity error would normally occur at the zero crossing, but chip manufacturers typically add a DC offset to move this error away from the zero crossing. This greatly improves the way the DAC measures in traditional THD+N tests, and it may provide a slight improvement in the sound. Nevertheless, this trick probably does more for the spec sheet than it does for the listener. If we run an IMD test or a linearity test, these cleverly-hidden defects are exposed. If we fail to run the right tests, we could fail to detect the defects that are inherent in all ladder DACs.
I guess you could say that ladder DACs cheat on tests. They do this by clever design. They only get caught when we administer a tougher test. In school, a student may be able to cheat on a multiple-choice test, but they will have a hard time cheating on a written test. Tests should be designed to catch cheaters.
One such test is the Dynamic Range test. In a sense, it was created to prevent cheating on SNR (signal to noise) tests. Some early compact disk players had auto-mute circuits that shut off the D/A converter when no data was present. One beneficial result was a significant reduction of output noise when a track was paused or stopped. A second benefit was an outstanding SNR on the spec sheet.
The Dynamic Range test adds a -60 dB tone when measuring the output noise of an audio device. This low-level tone is intended to defeat auto-mute circuits and its presence in the output verifies the validity of the noise portion of the measurement sequence.
Please note that the DC offset and auto-mute circuits described above were not necessarily created to enhance product spec sheets. Both circuits delivered some benefit to the listener. The unfortunate side effect was an overly generous "grade report" on the spec sheet.
Curiously, some circuits have a knack for cheating without human intervention. One-bit sigma-delta converters produce low-level idle tones when they are not playing audio. These low-level tones reduce the SNR of the converter, but they elude detection in the Dynamic Range test. The -60 dB tone, used for the noise measurement, breaks up the idle tones, producing an overly optimistic Dynamic Range measurement. I guess we could say that 1-bit converters cheat on the test that was intended to catch cheaters! What is more impressive is that nobody told them how to cheat.
The lesson from these examples is that we need to do a comprehensive set of tests on audio circuits before concluding that they are defect free. The circuits will cheat if they can!
If we were to run a very comprehensive set of standard audio tests on D/A converters, the best multi-bit sigma-delta DACs would measure much better than the best ladder DACs or 1-bit DSD DACs. Given a choice between a DSD (1-bit sigma delta DAC), non-oversampled PCM (ladder DAC) and oversampled PCM (multi-bit sigma-delta DAC), our measurements would clearly show that the third choice should be the most transparent. Nevertheless, an early prototype of the DAC2 failed the final exam, it failed the listening test. Our listening tests revealed the inter-sample over problem described earlier. Once the root cause was identified with lab measurements, we were able to fix the prototype and add a new test to our arsenal. If we run this new +3 dBFS inter-sample over test, traditional multi-bit sigma-delta DACs fail miserably, but our DAC2 and DAC3 converters pass with honors.
The audible defect was not caused by oversampling or by some inherent defect in PCM, it was caused by a lack of headroom in the fixed-point DSP processing. We solved the intersample over problem by reducing the amplitude of the incoming audio signal by 3 dB before oversampling. We also added enough analog headroom to render +3 dBFS peaks without clipping. The results are better than those obtained with ladder DACs or with 1-bit DSD DACs.
The DAC2 and DAC3 passed the final exam, and as far as we can tell, they haven't cheated on the test. Nevertheless, we keep a close eye on them and on each of our products.
Secrets contributor Sumit Chawla recently caught up with Benchmark’s VP and Chief Designer, John Siau to get a little more in-depth on several subjects.
Q: "Benchmark is one of the few companies that publishes an extensive set of measurements, but you also balance that with subjective testing. Can you talk about the equipment, the listening room, and the process for subjective testing?"
Q: "Was there ever a time where you learned something from a subjective test that was not captured by measurements?"
Q: "You conducted some listening tests to determine whether distortion in the “First Watt” was audible. What test material did you use for this, and what did you find?"
Q: "The AHB2 amplifier incorporates THX Audio Achromatic Amplifier technology. When and how did the partnership with THX come about?"
Q: "Linear power supplies have been and remain quite popular in high-end devices. You favor switch-mode power supplies. When and why did you make this switch?"
... and more!
Paul Seydor of The Absolute Sound interviews John Siau, VP and chief designer at Benchmark Media Systems. The interview accompanies Paul's review of the LA4 in the December, 2020 issue of TAS.
"At Benchmark, listening is the final exam that determines if a design passes from engineering to production. But since listening tests are never perfect, it’s essential we develop measurements for each artifact we identify in a listening test. An APx555 test set has far more resolution than human hearing, but it has no intelligence. We have to tell it exactly what to measure and how to measure it. When we hear something we cannot measure, we are not doing the right measurements. If we just listen, redesign, then repeat, we may arrive at a solution that just masks the artifact with another less-objectionable artifact. But if we focus on eliminating every artifact that we can measure, we can quickly converge on a solution that approaches sonic transparency. If we can measure an artifact, we don't try to determine if it’s low enough to be inaudible, we simply try to eliminate it."
- John Siau