Beat Count Estimator

A Beat or Taala is a rhythmical pattern which determines the rhythmical structure of a composition. Each composition is set to a Taala, and as a composition is rendered by the main artist(s), the percussion artist(s) play the pattern repeatedly, marking time as well as enhancing the appeal of the performance. Feeling the beat of a song comes naturally to humans or animals. Indeed it is only a feeling one gets when listening to a melody, a feeling which will make you dance in rhythm or hit a table with your hands on the melody beats.

The Beat Count Estimator attempts to calculate the BPM or Tempo of a given song. To be specific a song is made of music of several instruments, background music, human voice etc, very often the background music might be running at a different tempo than the tempo of the human voice, thus we extract the frequency band containing only the human voice and then find its BPM.

Steps involved are:

1.        The Audio file (wav format) is read into a vector.
2.      A Sample is extracted from the middle of the song( approx. 5 seconds )
3.       FFT of the vector is taken to convert the time domain signal to frequency domain
4.      The Resultant vector is broken down into several frequency sub bands, as tempo analysis of original vector can result in wrong result due to downbeats of various instruments and background music.
5.      Inverse FFT of these separate sub bands is taken to get them back in Time Domain
6.      Full wave rectification is performed on the signals to have only the positive side for further operations.
7.      These signals converted back to frequency domain with FFT and are each convolved with the right half of a hanning window of size .4 seconds. This is done to get the Envelope of each sub band as we are concerned with changes in sound amplitude. This process is called Smoothing.
8.      Now we can simply differentiate them to accentuate when the sound amplitude changes. Each signal is differentiated in time and half wave rectified to see only increase in energy. The largest changes should correspond to beats since the beat is just a periodic emphasis of sound.
9.      Now we create a series of ‘train of impulses’ or comb filters. The impulses in these filters are kept from the range of 60bpm to 240 bpm.
10.   These comb filters are then convolved with the signals. This is because the convolving with the comb filter just results in an output vector made up of an echoed version of our original signal. This echoed output will have a higher energy if the tempo of the signal and comb filter match because it will result in there being higher peaks (overlap from echo) in the output which when squared will give a higher energy.
11.     This output takes the form of a peak at the fundamental tempo of the song followed by smaller, wider peaks at multiples of this tempo. We then choose the maximum value of these energies to be the fundamental tempo of our piece.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: