Say What?: How the Auditory System Works
Musical sounds contain a complex combination of individual sinusoidal components, each with a particular amplitude, frequency, and phase relationship with the other elements. By combining these components, the characteristic sounds of different instruments and voices are created. These combinations are known as timbre, a quality that distinguishes the sound of different instruments even when they play the same note at the same loudness. This explains in part why a horn sounds different from a stringed instrument: each produces a specific combination of mathematically related frequencies known as harmonic overtones that result in the characteristic sound. (Some sounds, such as bells, contain non–harmonically related overtones.) The timbre of a sound must be preserved in the recording process in order for a sound to be perceived as natural sounding. Timbre alone is not sufficient to fully discriminate between instrument sounds, however, as the time course of onset and decay of notes is also characteristic of instrument sounds. Instruments with similar timbres will sound different from each other if their attack, sustain, and release characteristics are different.
We often assume that what we perceive as pitch is exactly equivalent to the actual vibratory frequency of the sound wave and that what we perceive as loudness is directly proportional to the amplitude of the sound wave pressure variations. In fact, the operation of our auditory system deviates somewhat from these ideals, and we must factor these deviations into our understanding of the process of hearing: the first stage in the process of hearing, handled by the outer ear (pinna) and ear canal (Figure 4-1), distort the incoming pressure wave intentionally. The ridges of the ear reflect particular frequencies from certain directions in order to create an interference pattern that can be used to extract information about the elevation from which a sound originates. Sounds originating from above are reflected with increased high-frequency content relative to the same sound originating at the level of the ear. Front-to-rear discrimination also depends in part on the shadowing of rear-originating sounds by the pinna. The external auditory meatus, or auditory canal – the guide conducting the sound wave to the eardrum – is a resonant tube that further alters the frequency balance of the sound wave. The resonant frequency falls in the same frequency range as the peak in our sensitivity, around 3kHz, and creates a maximum boost of about 10 dB. This is the frequency range that conveys much of the information contained in speech; in fact, the wired telephone system transmits frequencies from only 300 to 3400 Hz. The vibrations that finally excite the eardrum differ from the original sound pressure variations: they contain an altered balance of frequency components, affecting the timbre of the sound.
The tympanic membrane, or eardrum, is a flattened conical membrane that stretches across the inner end of the ear canal. It is open to the auditory canal on the outside and in contact with a set of three tiny middle ear bones, the ossicles, on the inside. The pressure on the outside of the tympanic membrane is determined by the sound wave and the static atmospheric pressure. On the inside of the tympanic membrane, the static air pressure in the middle ear is equilibrated through the Eustachian tube to the throat, while the sound vibrations are conducted through the bones to the cochlea. Equilibrating middle ear pressure with the outer ear pressure reduces any damping of the tympanic membrane caused by pressure differences that result in unequal forces on opposite sides of the membrane. The mechanical characteristics of the middle ear allow for active control of the transmission efficiency between the eardrum and the cochlea. The tiny muscles connecting and suspending the bones and ligaments can contract, stiffening the connection and drawing the ossicles away from their attachments to the tympanic membrane and cochlear oval window. This contraction allows adjustments of the sensitivity of the hearing process through the acoustic reflex, which may be activated by loud sounds (to protect the inner ear from possible damage) as well as by the intention to begin vocalizing. The acoustic reflex mechanically reduces the dynamic range of the input to the cochlea, much like a compressor or limiter. The time course of activation and release of the reflex can be used to determine the attack and release of electronic compression characteristics so that they sound natural.
The primary function of the bones of the middle ear is to amplify mechanically the airborne vibrations in preparation for transfer to a liquid medium. Because liquids are denser than gases and less dense than solids like bone, we encounter a potential problem when converting the energy in one medium to energy in another: the systems require different amounts of force to drive them. The bones act to focus the vibrations of the relatively large eardrum and deliver them efficiently to the small oval window of the cochlea as well as to protect the cochlea from too much input. They act as an impedance converter, efficiently coupling the low-impedance air pressures with higher-impedance liquid pressures inside the cochlea.
Because the cochlea is stimulated by mechanical vibrations, it may be activated by vibrations of the surrounding temporal bone that do not come through the ear canal, an effect known as bone conduction. Although the strength of the bone conduction is well below that of sounds conducted through the middle ear bones, it is still audible. It partially explains why our voices sound different in recordings from when we are vocalizing – recordings do not contain the internally conducted sound we hear through bone conduction.
The cochlea (Figure 4-2) is a dual-purpose structure: it converts mechanical vibrations into neuronal electrical signals and separates the frequency content of the incoming sound into discrete frequency bands. It functions like a spectrum analyzer, a device that breaks sounds or other signals into their discrete frequency components. It is, however, an imperfect analyzer, and there is some overlap between the bands whereby strong signals in one band slightly stimulate adjacent ones, creating harmonic distortion. It is up to the brain to sort out the raw data from the cochlea and produce the sensation we call hearing.
Excerpt from The Science of Sound Recording by Jay Kadis. © 2012 Taylor and Francis Group. All Rights Reserved.
About the Author
As a Lecturer and Audio Engineer for the Center for Computer Research in Music and Acoustics, Stanford University, Jay Kadis has written and performed with several bands, including Urban Renewal and Offbeats. He has built home studios, recorded and produced dozens of albums, and designed electronic devices for neurological research and sound recording.
About the Book
The Science of Sound Recording helps you build a basic foundation of scientific principles, explaining how recording really works. Packed with valuable must know information, illustrations and examples of ‘worked through’ equations this book introduces the theory behind sound recording practices in a logical and practical way while placing an emphasis on the concepts of measurement as they relate to sound recording, physical principles of mechanics and acoustics, biophysics of hearing, introduction to electronics, analog and digital recording theory and how science determines mixing techniques.