How to Clean Up Your Audio for Podcasting or Videos

Ryuzaki

お前はもう死んでいる
Moderator
BuSo Pro
Digital Strategist
Joined
Sep 3, 2014
Messages
6,244
Likes
13,129
Degree
9
Presentation is everything. In the past when podcasting first hit the scene and Youtube was still growing you could get away with amateur production quality. But the competition level has changed tremendously and now you need to compete with full television crews and recording studios.

You guys may not know this, but signal processing is one of my main skills, perhaps equal to SEO and Web Development. I've been doing it longer than I've done SEO. I spent time in engineering labs, on the job, and in recording studios doing this specific act of cleaning up audio and other electrical signals for decades now.

So I'm here to help you enhance your game with the new Studio section of BuSo.

The Problem
You've likely noticed that your audio quality just simply isn't up to par with other podcasters and Youtubers. People think the answer is to buy better gear. That can be an issue if you're using complete trash but the reality is that you lack what they don't, and that is Post-Processing.

Post-Processing refers to what happens after the audio or video is recorded. While you edit the video or audio together and then press upload, others are engaging in Signal Processing. And THAT is the core difference.

People spend their whole lives mastering this, and entire books can and have been written on the topic. We don't have time for that. What you need is the 20% of the work that gives you 80% of the benefits and will boost your audio quality high enough that nobody will think twice or not if you're a professional.

The Solution
If people struggle to hear what you say due to volume fluctuations or the quality of your recording, you're losing money. Let's solve that.

Let me explain the signal processing options to you here:
  • Do it live as you record with software
  • Do it live as you record with hardware (my preferred method)
  • Do it afterwards with software
  • Do it afterwards with hardware
Software and hardware are both equally as good, so don't fall into the trap of thinking you have to buy a bunch of gear. But some of us are hands-on and can think about this more clearly that way. In fact, I have an entire rack of signal processing gear right next to me. That's how I roll. That's how you see guys like Joe Rogan roll when you get a far enough out perspective on their table.

Your video recording software and cruddy free audio recording software likely aren't going to cut it unless you're using professional software like Final Cut Pro, Logic Pro, Pro Tools, etc. You need access to either hardware or software plugins, and the free crap like iMovie doesn't have it. Garageband does.

However you plan to do it, it's all the same in the end. And the methods are the same. Those methods are, in this order:

Equalization
Equalization is the act of taking your raw audio signal and boosting or cutting the volume in specific frequency ranges.

Some common problems you encounter are:
  • Booms and static in the lowest frequencies
  • Hiss in the highest frequencies
  • Muddy sounds in the low-mids due to the acoustics of your room
  • Bad definition in your vocals due to the acoustics of your room, microphone, preamplifier, etc.
I'll talk about microphones and preamplifiers in a future post. I'll talk about acoustic treatment too. But for now, let's focus on what you have and fixing that first.

What you need is either a hardware or software plugin version of a Parametric Equalizer. This gives you several "bands" (options to target specific portions of the frequency spectrum) that you can control individually to boost or cut the volumes in those areas smoothly.

Let me whip open Logic Pro X and show you:

INyoFOq.png

This is a parametric equalizer. The different color numbers on the bottom are the bands. This is an 8-band EQ with a low-shelf and high-shelf at the extreme ends. I set it up for what I would suspect a podcaster or video creator in a crappy room would need.

It will change for every recording you do, though it'll stay kind of the same once you figure out what you need too. If you set it up to cover most recordings and save a template with it, you can use the same settings every time and save yourself tons of time and effort in boosting your quality.

So what's going on here?
  • The low roll-off at around 50 Hz drops off by 48 dB per octave in the sub-bass region. This blocks out tons of noise from your air conditioner, mic boom, your foot steps, noises from your desk, cars outside, plosives from when you blow bursts of air at the mic while saying "p" and "b" syllables, etc. This in itself makes a huge difference.​
  • Around 250 Hz I have a 5 dB cut because this is where bass waves bounce around a room and add more and more bass to your recordings. Without this, you'll sound like you're recording in a wooden box (because you are, that's what a room is).​
  • Around 2 kHz I boosted about 2 dB. You can try more or less. This is where the fundamental frequencies for the male voice reside. For females it's often a bit higher. Boosting this can add clarity.​
  • Around 5 kHZ is where "presence" resides in the human voice. It adds a sense of hardness and firmness to recordings that help the ear latch onto what's being said clearly.​
  • Around 17 kHz I added a high roll-off. Piercing hiss and sibiliance from when you blow bursts of air at the mic while saying "t" and "s" syllables all reside up here for the most part. Adults don't ofter hear frequencies this high but young people do and it can absolutely ruin a recording.​

If you don't do these things and have 2 or 3 layers of audio going at once (like mics for each speaker) all of the bad noises accumulate into a muddy mess. The difference this makes alone is monumental.

In the next post I'll be talking about compression, what it does, and why you need it. It's the other 50% of your 80% quality boost.
 
Compression
Another thing you've probably noticed is that your volume wavers far more than professional recordings do. Sometimes it's too quiet, sometimes it's too loud. Your listeners hate reaching for the volume control over and over again.

You can solve this with a type of signal processing called compression. Setting it up is kind of complicated but once you understand it, it's a game changer.

The basic idea is that you set what's called a threshold and any parts of your recording that get louder than that get turned down automatically by a certain amount. That means that the peaks aren't as loud, and thus the quiet parts are less quiet too. The range between the loudest and quietest parts becomes less and less until your listeners never have to touch the volume knob. It's just perfect.

You really only need to worry about four settings:
  • Threshold - The volume point where anything above that gets turned down in amplitude
  • Ratio - How much the volume gets turned down when it passes the threshold
  • Attack - How fast the volume starts turning down once it passes the threshold
  • Release - How fast the volume stops turning down once it goes below the threshold
That's it. And you really only need to worry about two of these, because you're going to set the fastest attack you can and the fastest release you can for recordings where you're just speaking normally.

All that's left is the Threshold and Ratio. But where should we set those? Set the Ratio around 5:1 (five-to-one). That means for every 5 dB that goes over the threshold, only 1 dB pops out. It gets turned down by 4 dB. For every 10 dB that goes over, only 2 dB pops out.

You can try more or less, but 5:1 is a great starting point and probably perfect to be honest. That makes setting the threshold much easier.

Choosing the threshold requires you to listen closely. If you put it too low, you'll squish the audio and it will sound unnatural. If you set it too high, you won't squish it enough and still have too much volume variance.

The other problem with this is that you can't always "set and forget" it like you might with an EQ template, because your recording will come into the computer at different volumes.

Once secret I do is to use a compressor "live as I'm recording" at low levels and then compress harder again in post-processing. This helps me predict the incoming volume levels (along with setting the preamplifier up right) so that I can get closer to a "set and forget" situation.

Once you get used to using a compressor, the real "metric" you want to start looking at is the amount of Gain Reduction you're applying to the dynamic range.

What does this look like? Back to Logic Pro X:

Ggq3XH8.png

In this image I set a 5:1 ratio, 1 millisecond attack, 1 ms release, and a threshold of -30 dB. That got me around -6.5 dB of gain reduction. So there's 6.5 dB less dynamic range, meaning the volume varies that much less. This is just an example. You may need a lot more in your case.

Notice I didn't use the Make Up Gain, but since I'm removing 6.5 dB of volume, I might add it back with that knob, or use it to turn the audio up so that the highest peak reaches just below zero decibels. That way my volume level matches what most people expect when they press play on Youtube or their iPhone.

----

Anyways, this is just a start of the discussion. I'm happy to answer any questions in any depth about it. I can talk for days about microphones, preamplifiers, any effects, how to route all these signals around, what equipment is required at minimum, and so forth. I made a lot of money doing this professionally before I got into SEO, and I'm happy to share the knowledge with the builders. Just ask!
 
Thanks for the new forum and for the audio info, @Ryuzaki. I'm currently getting into video as a hobby and, hopefully, soon to be using it to add value to projects.

Going on your 80/20 rule, if you are using decent standard non-linear video editing software (in my case, Vegas Pro, which I believe was developed from an audio background) do you think there is a tremendous amount to be gained from adding portable hardware such as a Tascam DR-05X? Obviously better equipment tends to give better results but would there be a major difference if you can already do the compression in post?
(This would be for lightweight outdoor videography. For potential indoor stuff such as voiceover, I have a Yeti Nano mike.)
 
@ToffeeLa, before buying new and "better" gear, I'd first tell you to make sure you're maximizing the performance of your existing gear. Proper mixing (mainly EQ and compression) can make a $15 USB mic outperform a $500 professional mic and preamp combo that has no signal processing applied and is using poor analog-to-digital conversion, etc.

The mileage you can get out of EQ and compression is astounding. I'm not sure what kind of recording equipment you're currently using, but I'm also not sure it matters, because you can't get a proper comparison between it and the Tascam until you've applied the post-processing techniques mentioned above.

If you do that and you're not satisfied, you may consider an upgrade. But I'd push what you have to the limit first. You'll be surprised, I'm sure of it.
 
This is some really good information and will get you a long way to making your audio sound great.

As I create so much content I rely on VST plugins to automate a lot of this process. I use Izotope's range of plugins to clean up, EQ and master every video I output. I hate the AI bandwagon that everyone jumps on - but their stuff really does do some magic - and in all honesty it's better than I could produce spending 10 times as long trying to do it all manually.

-----

I just got an email saying that iZotope are doing a sweet deal on their RX7 Elements plugin - only $29. That's $100 off. This has some great plugins like Voice Denoise, De-Clip and De-Hum.

https://www.izotope.com/en/shop/rx-elements.html
 
Last edited by a moderator:
Back