The sound a product makes influences customers’ perception of the product. Ensuring that a product will convey just the right brand values is not only an engineering challenge, but involves human psychology as well.
People evaluate the sound of a product subjected to their human opinion. For example, two coworkers on a noise engineering team could listen to their vacuum product and disagree on whether it sounds pleasant or not. This situation would make it difficult to proceed in engineering a better product sound.
An objective and empirical approach to solving this issue is needed. For example, a noise engineering team could use several different objective sound quality metrics: loudness, prominence ratio, tone-to-noise ratio, roughness, and fluctuation strength. Each metric is designed to evaluate a particular aspect of sound (whether it sounds sharp or dull, tonal or broadband, etc.). Usually, a single metric alone cannot predict end user satisfaction with the sound of the product, rather a combination of metrics is needed to fully describe the sound.
A jury test, in which a group of people rate sounds, is used to ascertain the exact combination of metrics needed to fully understand the perception of a product’s sound quality. Given the sound preferences from the jurors, a mathematical analysis is performed to determine the combination of objective sound metrics that would predict the jury ratings.
One possible result of a jury test is a “golden equation” (Figure 2): the golden equation is a weighted combination of sound metrics that predict how the jury will respond to a sound.
With the golden equation in hand, changes can be made to the product’s sound, and the jury results can be predicted, without having to assembly a large pool of people. If executed properly, the golden equation can be thought of as capturing the ‘DNA’ of the desired product sound.
How is a jury test executed? What are the main steps?
Jury testing consists of the following key steps (see Figure 3):
1. Measure Product Sound
When measuring the product sound in preparation for a jury test, the recordings should capture the sounds of interest as authentically as possible. Considerations include the type of recording, the recording environment, and product conditions.
The sounds that the jury will judge will either be recordings that the test engineer measures or modified versions of recordings.
To measure sounds for a jury test, ideally two types of measurement equipment are used (Figure 4):
The two measurement devices should be used simultaneously during recording:
The recording environment should also be carefully considered.
The recording environment should be completely silent other than the product of interest. The environment should also accurately reflect typical placement conditions for the product.
For example, if recording sounds from a coffee brewer, it would be wise to put the brewer in an anechoic chamber to ensure the recording does not have any other noise contamination. Coffee brewers are often placed on a hard reflective surface (like a granite counter) and also backed against another hard reflective surface (like a tile wall). Therefore, it may be wise to introduce these reflective surfaces in the anechoic chamber during recording to more accurately replicate operating conditions (Figure 5).
The recording device(s) should be put where the listener would usually be. So, in the brewer example, the recording device(s) should be about head level and a typical distance from the brewer.
Recording Conditions and Benchmarking:
The same conditions should be used for all tests. For example, if recording coffee brewers, the same beans should be used, the same initial water temperature, the same cup, etc.
2. Jury Selection, Attribute Rating, and Training
Selecting the correct personnel for the jury, and ensuring they are properly prepared, is important for the successful execution of a jury test.
Gathering an appropriate jury is just as important as recording the appropriate sounds. Different demographics of people may have varied subjective opinions of a sound sample. For example, in Figure 7, there are two different listeners of the same sound.
These two listeners have different reactions to the same sound.
When selecting a jury, it is important to keep the end user in mind. If you were on a motorcycle exhaust engineering team, you would probably want to gather jurors who own motorcycles or who are interested in purchasing a motorcycle. You would not want to select a juror who is woken up every morning by his neighbor’s revving motorcycle engine.
The way in which the sound is rated is also critical. Continuing the motorcycle example, if the jurors are asked to “Rate this motorcycle for sportiness” versus “Rate this motorcycle for luxuriousness”, different results will yield. It is important to define the adjective with the jurors so they know exactly what you mean by “sporty”. Does it mean that the engine has a lot of horsepower? Does it mean that it can accelerate quickly? Etc.
Once the jury is gathered they will need to be trained. They should be familiar with the software as well as the types of sounds they will listen to. It is a good idea to have them listen to a few sound samples before having them take the official test. That way the know how long the samples will be, what type of samples they will listen to, etc.
The jury should also be comfortable with how the software works so they are not tripped up by the buttons in the official test.
In LMS Test.Lab Jury Testing, it is possible to have jurors take a practice test before taking the actual test. It is also possible to select specific recordings to include in the training session vs in the main test (Figure 8).
Check out the video below for an example of a training session. The training session is to familiarize the jurors with the software, recordings, and test format.
The composition of the jury should be noted. Any relevant factor, such as experience with a type of product, age, income, gender, etc. should be gathered.
In LMS Test.Lab Jury Testing, it is possible to gather this demographic data and link it to the jurors.
An example distribution for product experience is shown in Figure 9.
Knowing some background information on the jurors allows for a more complete understand of their responses. For example, if two sounds are compared perhaps all jurors younger than 35 years will prefer the first sound and all jurors over 35 years will prefer the second sound. Essentially, by collecting demographic information from the jury it is possible to determine a link between preference for a particular sound and a demographic.
3. Play Sound Samples to the Jury and Get Subjective Ratings
The sounds selected for the jury test should be well planned, from both the selection of the sounds to be played, to how the sounds are presented to the jury.
Sound Sample Variation
If certain metrics are thought to be important, the selected sounds should have a wide range values for that particular metric. If all the values for the metric are close together, it will be impossible to determine if that metric drives jury perception of the sound.
In the top graph of Figure 10 (graph “a”, below), the metric values are too similar to determine if there is a correlation between the metric value and the jury preference. In the bottom two graphs, the metric values are more spaced out. This allows to determine if there is a correlation (graph “b”, bottom left) or no correlation (graph “c”, bottom right).
If there is no correlation between the metric and the jury result, that means that the metric likely does not drive the jury’s perception of the sound and does not need to be included in the golden equation.
Sound File Preparation
The sounds selected for the jury test can be actual recordings or artificially manipulated sounds. Either or both types of sounds may be included, depending on the objective of the jury test.
Recorded Sound Examples:
Manipulated Sound Examples:
To keep the jurors engaged, the test duration should be as not be so long as to fatigue the listeners. Some guidelines for the test:
Long recordings with varied sound content can confuse listeners, as their auditory memory may not be able to retain/comprehend the entire recording. For example, if evaluating the brewing of coffee, instead of recording the entire brew time (several minutes), individual events like filling and discharge (several seconds) can be broken apart and compared.
In addition, the following should be considered:
Once the jury test sounds are prepared, a rating scheme needs to be selected.
Jury Test Format
The ratings of the sounds by the jury can be performed in many different ways. The three most popular are:
Paired comparison is perhaps the simplest test type for a novice juror. In a paired comparison test, jurors are presented with two sounds. The juror listens to both sounds and indicates which sound he prefers. Alternatively, a question can also be presented… for example “which sound is more powerful?” The juror then listens to the two sounds and selects the more powerful sounding one.
The disadvantage of a paired comparison test is the execution time. The execution time increases exponentially with each additional sound being evaluated. Each question requires two sounds to be played. It is also recommended to do a consistency check. To do a consistency check, the same sound pair is presented more than once. A consistent juror should always pick the same sound as the preferred sound of the pair.
For the category judgment test, each sound is played once. The juror then rates the sound on a sliding scale for particular attributes. For example, after listening to the sound a juror may rate how “powerful” it is on a scale of 1-10.
Naïve listeners may struggle to rate sounds. For example, if a juror is listening to engine noise, he may rate the very first sound as a 10 for “powerful”. However, if the next sound is even more powerful, then he has already maxed out the scale and is unable to accurately rate the remaining sounds. Therefore, category judgment requires trained jurors with strong product knowledge.
The semantic differential test is similar to the category judgment test. However, instead of rating the sound using one adjective, a bipolar pair is presented. For example “weak vs powerful”.
This pair, which has opposing attributes can help a naïve jury. The test duration is similar to the category judgment test.
4. Objective Analysis
After the jury test has been performed, and all the votes are in, it is time to correlate the subjective results to the objective sound metrics.
To ensure high quality correlation, jury test results should be double-checked first, before attempting correlation! Two checks are done: consistency and concordance.
Consistency and concordance range in value from 0 to 1. The closer to 1, the more consistent or concordant the juror. After running a jury test, the concordance and consistency of each juror can be plotted on a graph like in Figure 16.
Explore the different regions of the graph:
After performing the jury test, the sound preferences are known subjectively. In the next step, objective sound metric values will be calculated and correlated to the subjective results.
For each sample, the subjective results of the jury test are tabulated as well as the objective metric values.
The subjective jury preference can be plotted against the objective metrics to see if a strong correlation exists as shown in Figure 18.
Metrics that are correlated with preference (like loudness and fluctuation strength) can be included in the golden equation. Metrics that are not correlated with preference (like sharpness and tonality) should not be included in the golden equation. Use the R^2 value to determine if there is a relationship between a metric and the jury preference.
Using a regression analysis, it is possible to determine the relationship between all of the metrics and the jury preference.
This is the golden equation.
The golden equation uses sound metrics to determine how a jury will react to a sound: will the jury like the sound or not? An engineer could then make slight modifications to a product’s sound, record that sound, calculate sound metrics, feed the values into the golden equation, and determine how a listener would react to the new sound (would the listener like the sound more or less?).
Future test iterations will not require assembling a jury together to predict results.
Questions? Email firstname.lastname@example.org