MP3 Is Not A Four Letter Word

Well, it’s not. It’s actually two letters and a number technically speaking.

However most audiophiles, or at least most who purport that their never-ending quest for fidelity is a holy one, curse the MP3 and blame it for single-handily destroying modern music. Why? Well the usual line of thinking goes as follows: Since MP3s are lossy they can never sound as good as the original; yet because the format and its ilk are so ubiquitous, fans have gotten used to its substandard fidelity and thus, don’t understand the plight of the budding audiophile. As a result, audiophiles around the world have taken up arms (Say hello to my little friend… -Dave) to raise the bar on fidelity and publicly denounce the evil that is the MP3.

There is certainly some truth to the above, but the whole story is a lot more complicated, and certainly a lot less black and white than audiophiles would like you to believe. As I stated above, MP3 gets mainly a bad rep because it is a form of lossy compression, but what exactly does that mean? Well first off, it is certainly not the same thing as dynamic range compression, which audiophiles seem to constantly confuse with lossy compression all the time. So in order to truly understand how MP3s work, we need to first tackle the “compression” part, which is really short for data compression, or the process of conveying information using fewer bits, and then work our way back to the “lossy” part.

Data compression has its roots in information theory, a discipline that was practically created overnight when Claude Shannon published his seminal 1948 paper, “A Mathematical Theory of Communication,” which formalized the concept of information entropy, or how much duplication of information is contained within a message, and also introduced Shannon’s theorem, which stated that for any given noise contained in a medium of communication, it is still possible to deliver error free information through it up to a maximum rate. If this theorem sounds vaguely familiar that’s because the Nyquist sampling rate is based on this concept as well.

Data compression can be further classified as either lossy or lossless. Lossless encoding schemes preserve every single bit of information that was conveyed in the original message. It does this by exploiting a message’s intrinsic information entropy, and then re-encodes these redundant bits more efficiently. However, when the message is decoded, all the bits, including the redundant ones, are restored intact. A great example is when you buy music off of Bandcamp, which is delivered to you as a downloadable zip archive. And since zip archives compress the files contained within them losslessly, you can be rest assured that when you uncompress those files every single brutal bit will be accurately accounted for.

But MP3 is lossy, which means it analyzes the music and throws out information that it deems “unnecessary.” But wait a minute, aren’t all samples created equally? How can it possibly know what bits of brutality are necessary versus the ones that are unnecessary? And I’m not talking about Sunn O))) here either. Well my friends, that leads us to the subject of psychoacoustics. Psycho-who-stics? Stay with me.

Pyschoacoustics is the study of how we perceive and respond to sound. If you really think about it, the sensation of hearing death metal is not just waves hitting our ears, but actually our ears collecting those waves and creating electrochemical signals for our brains to interpret. And guess what? Our hearing is not always perfect. In fact, it’s actually far from it.

Related Pages:

A Sanctuary For Dynamics

Everybody Was Pono Fighting

Depending on the frequency, intensity (loudness), and location (phase) of the sound in question, our ears may or may not perceive all the spectral content present. In fact, our ability to distinguish two different frequencies when played simultaneously is defined as our ear’s frequency resolution, and in humans, it’s about 2Hz give or take. Psychoacousticians classify this inability to perceive frequencies as a form of auditory masking, and it’s not the only one. Not only does our hearing have several types of auditory masking in the frequency domain, but we also have what is called temporal masking, which is our inability to hear distinct sounds in the time domain. For example, a really loud sound may mask a softer one if both are played simultaneously, but if those sounds are played with a small delay between them, oh say 5ms of each other, you’ll then be able to hear both of them.

So armed with the above in mind, you now have a feel for how the MP3 encoder may deem some samples as “unnecessary.” I’ve over simplified a lot for brevity’s sake, but in essence, when you feed a file to the MP3 encoder it will analyze the metal in question, determine how we will perceive the music using its built-in psychoacoustic model, and then start removing samples that it claims we can’t possibly hear anyway. Believe it or not, the MP3 encoder also employs a form of lossless data compression on top of its initial lossy pass in order to reduce the file’s size even further. Amazing.

Yet if the MP3 is such a marvelous engineering accomplishment, why, oh why are audiophiles always berating it? Simple. It takes all the ego out of critical listening. The MP3 encoder doesn’t care about the listener’s pedigree or how expensive the gear is in which it is being played on. It has its model that has been scientifically vetted and ruthlessly employs it as it sees fit. And that model knows that audiophile or not, our hearing is inherently flawed, and it takes great advantage of that simple fact. In fact, the MP3’s psychoacoustic model is so good that tests have shown that we can’t really hear the difference between high bit-rate MP3s and CDs.

Now don’t get me wrong, even though MP3’s pyschoacoustic model is not configurable, the number of bits you allow for encoding is. And if you don’t give the encoder enough bits to store information, then depending on the spectral content of the metal at hand, it will cause audible artifacts that were not part of the original recording. Case in point, a lot of MP3 encoded promos I receive sound terrible because there are no industry wide standards and labels purposely sacrifice fidelity as a poor man’s way to combat piracy. But the fact is high bit-rate MP3s sound glorious, and I much rather have a dynamic recording in MP3 than a smashed version of its lossless counterpart.

So should you rip everything to MP3 as your primary archival format? Absolutely not. The MP3 was never designed for that purpose. If you are selecting the MP3 option every time you buy music off of Bandcamp then you are making a grave mistake. For example, perceptual codecs such as MP3 and its better sounding successor, AAC, were not designed to be transcoded. So even though converting say a FLAC file to an MP3 can yield an equivalent headbanging experience, the reverse does not hold true. That’s why it is absolutely imperative that you always have a bit-perfect copy of the original source material in case you need to convert it. Finally, given the fact that storage and network bandwidth are orders of magnitude more abundant than when the MP3 was first invented, it seems superfluous now to try to compress megabytes into smaller megabytes when affordable storage is now measured in terabytes and high-speed Internet is fairly ubiquitous.

Here is something to think about before I leave you: The MP3 has helped fans discover more new music than any single piece of high-end audiophile gear ever invented. And last I checked, it is the music, not the gear which we should be most passionate about. If audiophiles really want to improve the way modern music sounds, start with how it is recorded and processed before waging war with the format in which it is distributed. Put simply, petition artists, labels, and fans to help stop the Loudness War. Until then, and only then, can we have a serious discussion about fidelity.