Skip to content

3. Feature Extraction

Theodoros Giannakopoulos edited this page Feb 8, 2015 · 18 revisions

General

There are two stages in the audio feature extraction methodology:

  • Short-term feature extraction: this is implemented in function stFeatureExtraction() of the audioFeatureExtraction.py file. It splits the input signal into short-term widnows (frames) and computes a number of features for each frame. This process leads to a sequence of short-term feature vectors for the whole signal.
  • Mid-term feature extraction: In many cases, the signal is represented by statistics on the extracted short-term feature sequences described above. Towards this end, function mtFeatureExtraction() from the audioFeatureExtraction.py file extracts a number of statistcs (e.g. mean and standard deviation) over each short-term feature sequence.
Feature ID Feature Name Description
1 Zero Crossing Rate The rate of sign-changes of the signal during the duration of a particular frame.
2 Energy The sum of squares of the signal values, normalized by the respective frame length.
3 Entropy of Energy The entropy of sub-frames' normalized energies. It can be interpreted as a measure of abrupt changes.

TODO AN EXAMPLE

Single-file feature extraction - storing to file

The function used to generate short-term and mid-term features is mtFeatureExtraction() from the audioFeatureExtraction.py file. This wrapping functionality also includes storing to CSV files and NUMPY files the short-term and mid-term feature matrices. The command-line way to call this functionality is presented in the following example:

python audioAnalysis.py -featureExtractionFile data/speech_music_sample.wav 1.0 1.0 0.050 0.050

The result of this procedure are two comma-seperated files: speech_music_sample.wav.csv for the mid-term features and speech_music_sample.wav_st.csv for the short-term features. In each case, each feature sequence is stored in a seperate column, in other words, colums correspond to features and rows to time windows (short or long-term). Also, note that for the mid-term feature matrix, the number of features (columns) is two times higher than for the short-term analysis: this is due to the fact that the mid-term features are actually two statistics of the short-term features, namely the average value and the standard deviation. Also, note that in the mid-term feature matrix the first half of the values (in each time window) correspond to the average value, while the second half to the standard deviation of the respective short-term feature. In the same way, the two feature matrices are stored in two numpy files (in this case: speech_music_sample.wav.npy and speech_music_sample.wav_st.npy). So in total four files are created during this process: two for mid-term features and two for short-term features.

Feature extraction - storing to file for a sequence of WAV files stored in a given path

This functionality is the same as the one described above, however it works in a batch mode, i.e. it extracts four feature files for each WAV stored in the given folder. Command-line example:

python audioAnalysis.py -featureExtractionDir data/ 1.0 1.0 0.050 0.050

The result of the above function is to generate feature files (2 CSVs and 2 NUMPY as described above), for each WAV file in the data folder.

Note: the feature extraction process described in the last two paragraphs, does not perform long-term averaging on the feature sequences, therefore a feature matrix is computed for each file (not a single feature vector). See functions dirWavFeatureExtraction() and dirsWavFeatureExtraction for long-term averaging after the feature extraction process.

Spectrogram and Chromagram visualization

Functions stSpectogram() and stChromagram() from the audioFeatureExtraction.py file can be used to generate the spectrogram and chromagram of an audio signal respectively.

Command-line examples:

python audioAnalysis.py -fileSpectrogram data/doremi.wav
python audioAnalysis.py -fileChromagram data/doremi.wav

Beat extraction

Tempo induction is a rather important task in music information retrieval. This library provides a baseline method for estimating the beats per minute (BPM) rate of a music signal. The beat rate estimation is implemented in function beatExtraction() of audioFeatureExtraction.py file. It accepts 2 arguments: (a) the short-term feature matrix and (b) the window step (in seconds). Obviously, the stFeatureExtraction of the audioFeatureExtraction.py file is needed to extract the sequence of feature vectors before extracting the beat.

Command-line example:

python audioAnalysis.py  -beatExtraction data/beat/small.wav 1

The last argument should be 1 for visualizing the intermediate algorithmic stages (e.g. feature-specific local maxima detection, etc) and 0 otherwise (visualization can be very time consuming for >1 min signals).

Note that the BPM feature is only applicable in the long-term analysis approach. Therefore, functions that perform long-term averaging on mid-term statistics (e.g. dirWavFeatureExtraction()) have also the choise to compute the BPM (and its confidence value) as features in the long-term feature representation.