From MIX Magazine
To the uninitiated, recording an announcer or "voice-over" artist would seem to be relatively simple compared to other things audio. But for those who have done it, it's a creative/technical task not to be taken lightly. Speech sounds are harmonically and dynamically complicated because of the way vocals are produced-by the chest; lungs; diaphragm; larynx; the oral cavity, including the tongue, hard and soft palates, the teeth and lips, the nasal cavities; and by the dynamic interaction of all those elements through time. Explosions of air bursting from the mouth, the lips and tongue can sound wet, and "sss" sounds can overmodulate a track.
Voice-over artists understand these factors, and the best know how to use them to produce their own voice character. As professionals, they can be counted on to back off for louder passages, to suppress hard plosives like P and T, and to stay a consistent distance and angle from the business end of the mic. Nonetheless, the engineer on the other side of the glass has to have a keen ear, a good technical understanding of how to capture the voice cleanly and a well-developed sense of how to interact with both the Vo artist and the clients.
Mix interviewed five people (see "The Vo Panel" sidebar) who record voice-overs and who edit and mix radio, TV commercials and long-format programs, such as documentaries for The History Channel.
What are the defining characteristics of a good voice-over recording?
Michael Mason: Vos have to be very present, which starts with the acoustics of the room. You want an absolutely dead room, and you want it large enough so that any reflected sound has had a chance to travel out a way, and then return. You want it dead because, while it's always possible to make a sound more "live," as yet there's no "dereverb" box.
Michael Mason: Vos have to be very present, which starts with the acoustics of the room. You want an absolutely dead room, and you want it large enough so that any reflected sound has had a chance to travel out a way, and then return. You want it dead because, while it's always possible to make a sound more "live," as yet there's no "dereverb" box.
James von Buelow: We're looking for people who are good storytellers and who have a voice that's not too sibilant or too dull. Mouth noises and the like can be taken care of during editing, even little clicks in the middle of words. But you've go to start with a Vo artist who has a voice that possesses clarity and a pleasant quality.
Joe Casalino: It's clean and quiet, with suppressed mouth noises, not overmodulated and not overly compressed.
Wouter van Herwerden: You want as little external interference as possible- just a clean feed from the mic, which lets you treat it however you want to in post. You don't want the talent to get too close and crowd the mic, for then you risk certain distortions. These include popping, mouth noise and too much modulation of the diaphragm of the mic.
Could you describe your announce studio?
Von Buelow: It has a floating floor and walls, so it's very quiet for a New York booth. It's a prefabricated, about 7 by 10 feet with an 8-foot ceiling, so it's fairly small. The air-conditioning piped in there can't be heard when it kicks in. Acoustically, the booth's pretty dead, with fabric-covered walls. It's so dead, in fact, that I'm often surprised when I go in there just how softly the talent is speaking, even when they appear to be loud when heard through the monitors.
Von Buelow: It has a floating floor and walls, so it's very quiet for a New York booth. It's a prefabricated, about 7 by 10 feet with an 8-foot ceiling, so it's fairly small. The air-conditioning piped in there can't be heard when it kicks in. Acoustically, the booth's pretty dead, with fabric-covered walls. It's so dead, in fact, that I'm often surprised when I go in there just how softly the talent is speaking, even when they appear to be loud when heard through the monitors.
Casalino: It's 8 by 12 feet with a 9-foot ceiling. It was custom-built, with 6- inch multilayered walls that float, and with a floating floor and acoustically sealed door. The window facing the control room has triple half-inch panes, so I can work at reasonably high levels. There's also a window in the studio looking uptown toward the Empire State Building.
Van Herwerden: The dimensions are about 10 feet by 25 feet with acoustic panels on the walls and ceilings, as well as carpet on the floor. There are also fixed diffusors to scatter sound and suppress standing waves.
What microphone and preamp combination do you use?
Butler: On the East Coast, a typical mic configuration is a [Neumann] U87 with a hunk of foam over it. I've found this to muffle the sound, so I tend to use as little pop filtering as possible. A nylon screen is about as far as I'll go. like U87s on females and thin-voiced guys. But I prefer a Sennheiser 416, which has a lot more punch to it. It's my primary Vo mic. I hate console mic pre's. I like Focusrite preamps, The Gold Channel from TC Electronic and the Millennia.
Butler: On the East Coast, a typical mic configuration is a [Neumann] U87 with a hunk of foam over it. I've found this to muffle the sound, so I tend to use as little pop filtering as possible. A nylon screen is about as far as I'll go. like U87s on females and thin-voiced guys. But I prefer a Sennheiser 416, which has a lot more punch to it. It's my primary Vo mic. I hate console mic pre's. I like Focusrite preamps, The Gold Channel from TC Electronic and the Millennia.
Mason: I use, on average, three different mics—a U87, a Sennheiser 416 and a Neumann U89. Basically you want a very quiet mic that allows you to get a lot of noise-free gain out of the mic preamp, which means a gain setting of no more than 45 dB while recording conversational-level speech. With an 87, I don't use the highpass filter, leaving that to the controls on the console. I don't use outboard preamps because the mic preamps in the Euphonix console are awesome. All the EQ and dynamics are just outstanding.
Von Buelow: We do about 95% of our work with a Neumann 89, leaving it flat. I use a Millennia Media Model HV3, with the gain at about 12 o'clock and no filtering.
Casalino: It's a Neumann U87 in cardioid, to a Focusrite Green preamp and compressor, patched into the 02R console, where it's bused directly into the AudioFile.
Van Herwerden: The mic we use most of the time is the Sennheiser 416. We also have a Neumann 87 here. It's very sensitive and has a wide pickup in the cardioid field, so there's more likelihood of it picking up the room acoustic than if you use a short rifle, like the 416. It's not that sonically I prefer the 416 to the Neumann, but it'll give me a cleaner voice sound. We're trying to a get a specific vocal sound that will cut through whatever else is going on in the track without having to do a lot of extra processing. We're using a TC Electronic Gold Channel [for the mic pre's]. The Euphonix CS2000 consoles we employ also have preamps, but we prefer the TC Golds. They're pleasing sonically, they have more headroom, and they have some builtin features, which are nice.
How do you position the mic relative to the talent?
Butler: The mic's capsule is right on line with the talent's mouth and parallel to his or her face. I'd say from their lips to the actual capsule is about 4 inches. The pop screen is about 1.5 inches or so from the capsule, and then it's about 1.5 inches from the pop screen to their mouth. If "P pops" are a problem, which they can be with a U87, I'll put it into figure-eight or omni. The broader the pattern, the less popping. Sometimes I'll do that if it's relatively tight-miked, quiet Vo where I won't get too much bounce around the room. If the 87 is inverted and comes in from above on a boom, you'll get a slightly brighter pickup than if the mic is used upright.
Butler: The mic's capsule is right on line with the talent's mouth and parallel to his or her face. I'd say from their lips to the actual capsule is about 4 inches. The pop screen is about 1.5 inches or so from the capsule, and then it's about 1.5 inches from the pop screen to their mouth. If "P pops" are a problem, which they can be with a U87, I'll put it into figure-eight or omni. The broader the pattern, the less popping. Sometimes I'll do that if it's relatively tight-miked, quiet Vo where I won't get too much bounce around the room. If the 87 is inverted and comes in from above on a boom, you'll get a slightly brighter pickup than if the mic is used upright.
Mason: On both a U87 and 416, I'll mount the mic on a boom coming in from above. They're generally about 6 inches away. I prefer to have the mic capsule's lower edge on line with their mouth but just above their upper lip. Using the 416, you've got to back up a little more, because it's a shotgun. I use a nylon pop filter to avoid pops. If that doesn't work, I'll angle the mic off to the side to suppress popping, though you have to be careful off-axis, because that does change the sound.
Von Buelow: In long-form work, because they're sitting down, it's maybe 6 to 8 inches from their mouth, and for commercial work, it varies. It will go from very close-3 to 4 inches-to maybe a foot for a very loud speaker. For the latter, I would tend to use the mic's pad to protect the front end of the mic.
Casalino: It's to the talent's side, turned toward the talent, at about a 40-degree angle from straight on to the mouth. They're not talking directly into it, which can help with "pops." Generally, it's about 6 inches away. I often use a nylon pop filter they work right up against.
Van Herwerden: You try to come in from the side and get it reasonably close without intruding too much on their space. I angle it about 20 degrees from their mouth axis and place it 2 to 4 inches away. I get good presence that way, without excessive danger of pops. Since the acceptance angle of the 416 is probably about 20 to 30 degrees, the artist has to stay "on mic" to maintain consistent results, but that's not a problem with pros.
How do you set up the gain structure of your system when recording a Vo track and when doing the mix?
Butler: Because I want to preserve headroom, I record significantly lower-probably 10 dB lower-on recording a Vo than I do on the final mix. Voice actors tend to become popular because of a unique harmonic structure in their voice, and part of that package seems to be a fair amount of transient information. That stuff can get clipped off or distorted rather easily. So I tend to record relatively low, with peaks 15 dB below 0 VU, which I could never have done with tape because of noise.
Butler: Because I want to preserve headroom, I record significantly lower-probably 10 dB lower-on recording a Vo than I do on the final mix. Voice actors tend to become popular because of a unique harmonic structure in their voice, and part of that package seems to be a fair amount of transient information. That stuff can get clipped off or distorted rather easily. So I tend to record relatively low, with peaks 15 dB below 0 VU, which I could never have done with tape because of noise.
Van Herwerden: During the first rehearsals, you'll get an idea of what kind of signal you're dealing with and adjust the headroom accordingly. The dynamic range isn't all that great: You're working with a 2, 4 or 5dB range. We operate here, like anywhere else, at a +8dB peak. We don't record Vos anywhere near that level, because we don't need to with digital systems. As long as we get it down cleanly into the system, we can deal with the odd peak or shout as long as it doesn't exceed that +8 level.
What kind of signal processing do you use during initial recording of a voice-over?
Butler: I tend to always record flat; if I record 60 people a month, I'd bet that 59 would be flat. In the mix, I may add just a sprinkling of EQ, but if you've got your mic placement and choice right, you shouldn't need much. once in a while, I'll go through a highpass if there's some kind of problem. And if the talent is sibilant, you've got to change the mic. I don't EQ that out, and I almost never use a de-esser. But sometimes the talent has such a wicked "s" that I'll apply some in post, though I never use it while recording. [For dynamics,] I might sometimes run just a stitch of limiting, just a smidgen. Down 2 or 3 dB, max. And I set attack and release times by ear, using theconsole compressors.
Butler: I tend to always record flat; if I record 60 people a month, I'd bet that 59 would be flat. In the mix, I may add just a sprinkling of EQ, but if you've got your mic placement and choice right, you shouldn't need much. once in a while, I'll go through a highpass if there's some kind of problem. And if the talent is sibilant, you've got to change the mic. I don't EQ that out, and I almost never use a de-esser. But sometimes the talent has such a wicked "s" that I'll apply some in post, though I never use it while recording. [For dynamics,] I might sometimes run just a stitch of limiting, just a smidgen. Down 2 or 3 dB, max. And I set attack and release times by ear, using theconsole compressors.
Mason: Overall, an 87 is a little too dull, and it needs some brightness generally in the 5k range, with some cut at 300 Hz or so. often, I'll use a highpass filter to get rid of some of that subharmonic stuff beginning at 80 Hz, because it just eats up headroom without being heard. Concerning compression, I find it's better to compress a Vo at 2-to-1, both when recording and when mixing, than to compress it once at 4-to-1. I prefer a fairly fast attack and a slower release, because I think a fast release tends to be heard. Regarding de-essing, during the mix I'll use the algorithms set up in my Euphonix con- sole dynamics. or if it's really nasty, I'll throw it into the Pro Tools and use some of the plug-ins to deal with it.
Von Buelow: I plug it into a channel on an 02R, which I use to boost 3 kHz and 10 kHz about +2 dB to brighten it.
Casino: On the 02R console, I'll dial in a very steep highpass filter at 94 or 105 Hz and below, so it just goes away. I don't use a whole lot of compression, 2 or 3 dB at the most. But I don't change it a lot and try to concentrate on consistency of microphone position and sound in the booth. I only EQ at the mix stage.
Van Herwerden: In the normal day-to-day recording sessions, we don't apply any processing at all, for the simple reason that if we had to continue on another day, in another room, or with someone else, that the voice will sound the same from session to session. Later, during the mix we'll do processing and EQ, but during initial recording absolutely nothing gets added, other than maybe a little compression or limiting.
During the finished mix, how do you handle EQ and compression/limiting of the Vo track?
Butler: One of the ways that commercial clients judge the mix is how loud their spot is perceived to be compared to others on the air. I have several stages of compression to achieve this. There will be a small amount of console limiting on the Vo input of the mix. Then I might have an 1176 compressor/ limiter on an insert, as well. I'll also have just a little bit of bus compression. And last, I'll patch in a TC Electronic Finalizer Plus, which is a 3-band compressor. With that I can give 4 or 5 dB more level to the DAT. I'm trying to maintain levels that don't exceed -7 or -8 on the meter of a Sony 7030 DAT, while achieving an average mix level of +1 on a VU meter. If you can achieve both of those goals, you've got a pretty hot mix.
Butler: One of the ways that commercial clients judge the mix is how loud their spot is perceived to be compared to others on the air. I have several stages of compression to achieve this. There will be a small amount of console limiting on the Vo input of the mix. Then I might have an 1176 compressor/ limiter on an insert, as well. I'll also have just a little bit of bus compression. And last, I'll patch in a TC Electronic Finalizer Plus, which is a 3-band compressor. With that I can give 4 or 5 dB more level to the DAT. I'm trying to maintain levels that don't exceed -7 or -8 on the meter of a Sony 7030 DAT, while achieving an average mix level of +1 on a VU meter. If you can achieve both of those goals, you've got a pretty hot mix.
Von Buelow: You tend to end up with 3 to 5 dB of boost at 3.5 kHz, or that area, and then a little boost at 8 to 10 kHz. That midrange and high end really seems to do the trick on television.
Casalino: I only EQ at the mix stage. I'll start at 60 Hz and pull that back to avoid "tubbiness." I'll start lifting the top at 5 kHz maybe, on up to 8k. But I'll avoid 2 to 3 kHz; that can be a little nasty.
Van Herwerden: The talent have a particular kind of vocal quality you're trying to maintain in the mix. With Pro Tools, you can save EQ setups, so I can recall them for the artists they were created for. Specifically, I'd be using a TDM plug-in within the virtual mixing page input channels. The same thing goes for compression and de-essing. My processing is "virtual," not hardware, and the nice thing about that is if the mix has to go to another room, as long as it has the plug-ins, too, they can just load the entire session from our backup CD-R and re-create everything I've done in the original session.
CAN WE ALL AGREE?
Common Technique in Voice-Over Recording
Each panel member has his own distinct approach to VO recording, but there are a few fundamentals that all can agree upon.
Common Technique in Voice-Over Recording
Each panel member has his own distinct approach to VO recording, but there are a few fundamentals that all can agree upon.
The main difference between short-form (commercials) and long-form (documentaries, audio books, etc.) is the total amount of compression used. There’s much more in the case of commercials, so as to make them “loudness competitive” with adjacent spots. Long-form readings are usually done with the talent seated, but there are exceptions because of personal taste (and endurance). Commercial spot VO artists usually stand; the body English makes for a better performance. Generally, standing while reading results in better vocal control because the diaphragm is free to move. Headphones were used by everyone, although Tim Butler felt they contributed to the talent being too concerned with the sound of their own voices.
Scripts are almost always placed on a music stand that’s padded and angled to avoid reflections back into the microphone. Usually the goal is to place the active part of the script high enough to avoid the talent looking down at it and getting off mic.
There were several other areas of complete agreement between everyone interviewed.
All record to the hard drive of a digital workstation with a DAT backup.
• All have a console with digitally stored control settings to enable recall of session parameters.
All edit most of their own material, and all perform final mixing on projects.
All are of the opinion that women’s voices are more likely than men’s to offer sibilance problems.
• All have a console with digitally stored control settings to enable recall of session parameters.
All edit most of their own material, and all perform final mixing on projects.
All are of the opinion that women’s voices are more likely than men’s to offer sibilance problems.
All the participants had several monitoring options, using Tannoys or Genelecs for large monitors (especially useful for revealing low-end thumps, pops, etc.), NS-10s and Auratones as small speaker references, and some kind of 2- or 3-inch television speaker as the final test of what works on the air. Wouter van Herwerden had some illuminating comments about this last piece of equipment, which he calls “Mr. Crappy.”
“My driving force is narration, so I’ll use him to help establish an EQ for the VO that gives me the sound that I want out of a 2-inch speaker,” he says. “Once I’ve set that, I’ll go to the NS-10s or bigger speakers and start doing my first pass on the mix, referencing everything to my narration track. While doing this, I’ll keep referring back to Mr. Crappy because he’s the final arbitrator of all the stages of our work here, that is until 5.1 takes hold much more widely. At the end of the day, a 2-inch speaker is what it all comes down to.”