> Neural nets are the obvious approach.
Hum.. fourier descriptors of the waveforms THEN neural nets could work.
I'd throw some fuzzy logic stuff on it, just to deal with cases where the beats/tempo/volume/etc are not clearly definable. Just for kicks, try to use Kohonen's SOM to identify clusters of music styles (e.g. Abba being somehow close to Barry Manilow ;-).
Try those mentioned. Get a good book on neural networks (Kohonen's Self Organizing Maps would be a good start), the algorithms are clearly explained there and wouldn't be that hard to code. A quick search on Google may bring several Fast Fourier Transform algorithms pages.
You will need to decode the audio stream, I have no idea on how to do that.
Please note that there aren't ready-to-use classes or methods in Java, so some study and implementation work will be needed.
Thank u for ur help.
i am using fast fourier trasnform for plotting the graph.but the problem which i am facing is that the values of frequencies which i get for different .wav files are same and the graph which i get is always a straight line ,which is wrong
if i get a graph with actual values i want to take the peak values from that and then my problem is solved.but i never get the actual values
i will be highly obliged if u help me on this
You said you are using FFT for plotting the graph but the values are the same and the graph is a straight line - clearly there is a problem with either the FFT routine or the graph plotting routine.
Suggestion: test the FFT routine dumping the transformed values to a file. See what they look like. If the values are all zero then the problem is on the FFT routine.
Hum, this shouldn't really happen - even if the low-frequencies of the two samples are very similar, some differences should appear on the FFT results.
I'd suggest:
1- Review the whole process to be absolutely sure that the samples are different, then are processed and dumped separately.
2- Find two completely different sound samples and check the values.
Are those values at least significant, i.e., different from NaN or zero or a constant ?
hi,
i am sending u the result for two different wav files which i get
for chimes.wav
magnitude=104737.5
magnitude=35238.39838351689
magnitude=17838.82472596759
magnitude=12142.324571026254
magnitude=9378.425162081243
magnitude=7795.8522625817395
magnitude=6813.82472596757
magnitude=6186.823384959381
magnitude=5796.185511113449
magnitude=5581.214005906351
magnitude=5512.5
magnitude=5581.214005906351
magnitude=5796.185511113449
magnitude=6186.823384959381
magnitude=6813.82472596757
magnitude=7795.8522625817395
magnitude=9378.425162081243
magnitude=12142.324571026254
magnitude=17838.82472596759
magnitude=35238.39838351689
Frequeincs551.25
Frequeincs1102.5
Frequeincs1653.75
Frequeincs2205.0
Frequeincs2756.25
Frequeincs3307.5
Frequeincs3858.75
Frequeincs4410.0
Frequeincs4961.25
Frequeincs5512.5
Frequeincs6063.75
Frequeincs6615.0
Frequeincs7166.25
Frequeincs7717.5
Frequeincs8268.75
Frequeincs8820.0
Frequeincs9371.25
Frequeincs9922.5
Frequeincs10473.75
and for chord.wav
magnitude=35238.39838351689
magnitude=17838.82472596759
magnitude=12142.324571026254
magnitude=9378.425162081243
magnitude=7795.8522625817395
magnitude=6813.82472596757
magnitude=6186.823384959381
magnitude=5796.185511113449
magnitude=5581.214005906351
magnitude=5512.5
magnitude=5581.214005906351
magnitude=5796.185511113449
magnitude=6186.823384959381
magnitude=6813.82472596757
magnitude=7795.8522625817395
magnitude=9378.425162081243
magnitude=12142.324571026254
magnitude=17838.82472596759
magnitude=35238.39838351689
Frequeincs551.25
Frequeincs1102.5
Frequeincs1653.75
Frequeincs2205.0
Frequeincs2756.25
Frequeincs3307.5
Frequeincs3858.75
Frequeincs4410.0
Frequeincs4961.25
Frequeincs5512.5
Frequeincs6063.75
Frequeincs6615.0
Frequeincs7166.25
Frequeincs7717.5
Frequeincs8268.75
Frequeincs8820.0
Frequeincs9371.25
Frequeincs9922.5
Frequeincs10473.75
the values are same. do i need to check these values when the files are running?
Note that the Frequeincs values seems steps or subsamples - multiples of 551.25 (44100 /80). I bet you samples are in 44100 hertz.
The magnitude is weirdly similar but note that it is "mirrored", the first value is equal to the last, the second equal to the second last and so on (except for "chimes.wav" which has a first value different from the others, maybe a cut-and-paste error?). Notice that the middle point of the magnitudes is 5512.5 (44100/8).
My diag is, there is something wrong with the FFT step. Do you have access to a different package which can be used to check the first N values of the FFT of your sample data ? Please also check http://forum.java.sun.com/thread.jsp?forum=426&thread=479715&tstart=0&trange=100
Rafael
First you will need to figure out how you are going to take the FFT of your sound sample.
One quick and dirty method is to just fun the FFT on all the data. However this only provides you with
average frequency response across the whole piece of music.
Another, perhaps better, way is to break your audio sample into small blocks (the size depends on the
frequencies you are interested in). Then run the FFT on each of these. The upside of this is that it provides
you with more information about the track but the downside is you now have more iformation to process.
One technique of post processing the data would be to run the FFT on the frequency data obtained by
running the FFT on subsections of the track. This should result in one frequency spectrum that contains
the periodicy at which certain frequencies appear in the audio track. This can be useful for picking
up information such as beats per minute and the strength of those beats.
Machine learning algorithms are useful (I would avoid Neural Networks for a while though as they can be
a right pain in the **** to get to learn) however to obtain the best results you tend to need to reduce your
data to as small and descritpive a set of features as is possible. Try naive bayesian or K-nearest neigbour
learners for starters. If these can learn the relationships you want them to then C4.5 or ANN's may produce
an improvd performance.
matfud
Now here comes the fun and hard part.
Before we get into that, I'll repeat tjacobs01 question:
You mean you want to train a machine to differentiate different types of music based on its wave-form? Is this an afternoon project or a Ph.D thesis? If it's the former, you probably don't stand a chance. If it's the later, why not ask your mentor / advisor?
Hum, I could say something about "running out of time" - the idea of a genre-identifying algorithm can be a simple test that may work for some samples and fail miserably for others or a PhD work (with better chances of success but probably never 100%).
Anyway, the problem now is a classification problem. You can either:
- Compare the FFT measures of an song with an yet-unknown genre (let's call it X) with several FFT measures of several songs which genre is known (let's call it them an array K[]) - the element of K which is closest, by some metric, to X, will probably have the same genre or a similar beat, tempo, pitch, whatever. This is know as a supervised classification technique, and you need to have some labeled (i.e. with known genre) samples to compare the unknown samples to them. Some algorithms that may be of help are K-Nearest Neighbors, Minimum Distance, (neural network) Backpropagation algorithm or even Parallelepiped classifiers. Google them.
- Alternatively, you can just get an unsupervised classification algorithm (ex. K-Means clustering, Fuzzy C-Means clustering, Kohonen self-organizing maps, I guess even a hierarchical clustering algorithm may be interesting), these ones will attempt to group together music which FFT measures are somehow similar, but cannot identify the genres - it could say, "if you like this song, our algorithm suggest you this another one". or "Songs A and B seems to be of very different genres" without even knowing the genres' names.
A word of caution: before trying to implement those algorithms and checking the results (which may be very unexpected!), consider those questions:
- How does a human decides to which genre a particular song belongs ? Hint: some can be clear-cut decisions, like abominable rap (some slow beats endlessly repeated), intolerable heavy metal (some fast beats and shrieks), painful sappy songs (slow endless droning) but I guess most of them may fall into more than one possible classification.
- Does all (or even two or three) human listeners agree on the classification of a song, even it if is not subjective ? In other words, will two listeners ever agree on the genre of Grace Jones' "Slave to the rhythm" ?
- What are those genres we've been talking about ? What does define them (objectively) ? What are the genres of, say, Depeche Mode's "My Secret Garden", Matt Bianco's "More than I can bear" or Phil Collins' "On The Air Tonight" ? How can one be sure ? :-)
Consider the questions, please note that genre classification is definitively not as simple as it seems, EVEN with powerful feature extraction algorithms and classification techniques. That said, I'd give a try to a Kohonen SOM or a hierarchical clustering algorithm just to see what happens. I wouldn't even try a supervised algorithm.
Rafael, giving some new strange search keywords for the forum and hoping I am not giving any hints to RIAA ;-)
Hi,
The NN approach will surely give results. But, since he appear to be "running out of time" I would consider using simpler approach.
From my point of view the main difference between different music cattegories is "the beat". You all know how the rap or techno sounds like. So, I would try a brutal force approach - FFT whole sound file in 1/10 of second steps. Split the frequency space into about 10 subspaces and find average amplitude in each of them over the time. Determine the "beat frequency" by finding the one which have highest peek-to-peek amplitude difference over the time (or something like that). Create the "beat signature" for music you are processing using the moments in time when "beat amplitude" reaches peaks. and relative peeks amplitude. The ratio of "beat rate" amplitude to average amplitude of other sounds may be usefull too.
Then, you can group your music basing on how close they beat signatures matches. Match them using some non-exact technique (less square, best likelyhood or similar). Don't forget to shift them in time a little whilie finding best match. You can use here the NN method - since it will be processign prepared data it should be easier than using NN from the same start.
The advantage of this method is that it is relatively simple, fully deterministic and does not need "learning" like NN. The disadvantage is, that it may not work...
hi,
i just went through some thesis papers on genre classification where i came to know that i have to calculate values like spectral rollof,centroid,flux etc on FFT values to get a absolute value which i can compare with existing values of confusion matrices.
is there any other way to find out the genre.will mean,standard deviation or varinace help me in getting good results
regards
> i just went through some thesis papers on genre
> classification where i came to know that i have to
> calculate values like spectral rollof,centroid,flux
> etc on FFT values
So those papers (or their references) should show how to calculate those values. Do they ?
> is there any other way to find out the genre.will
> mean,standard deviation or varinace help me in getting
> good results
Probably not.
I am puzzled. You have some papers which probably say which results they were - if those papers can't show which are the advantages (e.g. % of correct results) over competiting approaches, then they are useless.
It is way easier to implement the techniques on those papers just to ask questions like the above. Did you at least try the suggestion on other posts on this thread ? Did you try clustering/SOMs ?
hi,
i have calculated all the values viz spectralcentroid and flux and now i am waiting for charts so that i can compare those values and depict some results
i haven't tried using classifiers because i don't have much time for do that
i will be pleased if u can guide me as to how to use classifiers on these values
regards
The task you want to do (i.e. identify a song's style) will be done by comparison - either by "supervised" comparison, where you have a prototype of a style and want to compare a song with an unknown style with it or by "unsupervised" comparison, where you have several data which you want to group so similar style songs will be in the same group.
"Comparison" is best done with "classification". Here is a hint for supervised classification, suppose you have some data for the songs A ("Suspicious Minds" by Fine Young Cannibals), B ("Genius of Love" by Tom Tom Club) and C ("Horrible Unbearable Screeching Noise" by Mettalica). Using FFT, you have extracted some attributes of those songs which are represented by, say, a data vector of features:
A = [190;89;76;22;18;43]
B = [94;32;65;77;12;9]
C = [208;15;43;12;9;8
Now you have two songs you want to classify, i.e. compare with those above and see whether they are similar. Those two songs have the following data vectors:
X = [72;48;61;70;9;7]
Y = [190;22;48;13;8;10]
Now, is X similar to A, B or C ? We can calculate the Euclidean distance between the vectors:
D(A,X) = sqrt((A[0]-X[0])^2+A[1]-X[1])^2+A[2]-X[2])^2+A[3]-X[3])^2+A[4]-X[4])^2+A[5]-X[5])^2)
(i.e. get the squared difference between two elements of the vectors, sum them and get the square root. Of course D(A,A) would be zero).
Using this, I could get
D(A,X) = 139.68
D(B,X) = 28.6
D(C,X) = 152.56
D(A,Y) = 80.89
D(B,Y) = 117.12
D(C,Y) = 20.1
With this I can conclude that X is much similar to B than to A or C, then the style must be similar (probably "Wordy Rappinghoog") and Y is much similar to C than to A or B (probably being another hit by Mettalica or the sound of a train derailment.)
Note that it is a simplification, nonetheless may lead to results.