Discrete Fourier Transforms
This post assumes a basic familiarity with complex numbers, specifically,
.
What do this image:

and this video:
have in common? They both involve discrete Fourier transform: taking an input signal (in these cases, music) and determining the frequencies of the sine and cosine waves that make it up, then doing something with those frequencies.
So the interesting question is: how do you do this? At its core, the discrete Fourier transform is this equation, which transforms the sequence
into
:
The obvious question is: if this is only used for signal processing, why are there complex numbers? And the answer lies in the equation
. Even though the imaginary part isn’t physically meaningful, it turns out that adding it makes the mathematics a lot simpler to manipulate, because exponentials are easy to multiply, whereas individual cosines and sines are not.
How does this help you to figure out how to decompose this into sines and cosines? Well, consider the sequence
. This is the sequence that goes through
complete cosine waves from 0 to
, with a matching real component. What’s
for this sequence? It’s just

Obviously, if
, then this is just
. Less obvious, though, is the fact that if
is an integer, then
. So essentially by transforming the sequence in this fashion, you can ‘decompose’ it into cosine waves. (Sine waves will manifest themselves as imaginary components). So if you have a very large
, you can get a large number of different frequencies, ranging from 1 hertz all the way up to N hertz. There are various intricacies, such as the fact that signals are not, in general, complex numbers, but the core explanation here is enough.
Back to the original image and video. The image is just a spectrogram of the end of Aphex Twin’s ‘Windowlicker’; a spectrogram is just a graphic visualization of the discrete Fourier transform, so the curves in the spiral correspond to notes that move around in frequency, merge, and split. If you listen, you can make out the beginning and end with ease; the middle is harder, due to the sheer multiplicity of tones.
Click here to listen to the ‘spiral’ at the end of ‘Windowlicker’.
The analysis of which keys to press and for how long was likely done by analyzing the spectrum of each key on the piano, comparing it to the spectrum of a recitation of the Proclamation of the European Environmental Criminal Court, and then seeing how you can best combine the key presses to add up to the spectrum of the reading. Of course, it’s more complicated than that, but it just illustrates how powerful this sort of technique is.
As a final note, both the MP3 and JPEG formats are based off of variations on the DFT; however, I don’t understand enough about either of them to be able to explain further. But this does partially explain why JPEGs are so bad at representing sharp edges; due to the so-called ‘Gibbs phenomenon’, the DFT cannot represent a sharp, discontinuous edge well. This is why screenshots of web pages tend to suck as JPEGs; text is all about sharp boundaries, so you’ll always get JPEG artifacts. This is extra-noticeable since JPEGs also do not store high-frequency data; the magnitude of the Gibbs phenomenon is inversely proportional to the highest frequency that gets stored, so a low-frequency-only compression method will have much more artifacting.

