FFT to get DTMF on a Wav file
Hi, Im working on a program to extract dialed numbers from a wav file. The wav file has a recording of a phone dialing, therefore using DTMF. I have to do a FFT on the data to get the frequencies.
So far I've been able to get pass from the wav file to a byte array, and then to a double array using:
int numBytes2 = audioBytes.length;//audioBytes has the data read from the wav
double doubleArray1[] =newdouble[ numBytes2 / 2 ] ;
for(int i = 0 ; i < numBytes2 / 2 ; i++ ){
byte a = audioBytes[ ( i * 2 ) ] ;
byte b = audioBytes[ ( i * 2 ) + 1 ] ;
boolean negative = a < 0 || b < 0 ;
int byteA = (int)( a & 0xFF );
int byteB = (int)( b & 0xFF ) ;
if( audioInputStream.getFormat().isBigEndian() )
doubleArray1[ i ] = ( byteA ) | ( byteB << 8 ) ;// big endian
else
doubleArray1[ i ] = ( byteB ) | ( byteA << 8 ) ;// little endian
if( negative )
doubleArray1[ i ] *= -1 ;
}
//taken from another theread in the forum.
After getting the doubles, I need to make the fft. Im kind of lost here, because the array lenght is not a power of two, and according to this implementation
http://www.nauticom.net/www/jdtaft/JavaFFT.htm
it has to be.
Anybody has any ideas on how to address this problem? Or maybe a better implementation to solve this problem.
Help will be appreciated,
Thanks,
Dan
[2175 byte] By [
bmobananas] at [2007-9-30 22:52:18]

just pad the array with 0's to the nearest power two length (trust me it works).
However you probably want to do something slightly different. Taking the FFT of the entrie wav sample
will tell you all of the component frequencies in the entire file. This will not tell you in which order the
numbers were dialed and will only give you a vauge idea of how often each number was dialed.
What you probably want to do is break up your sample into small blocks and perform the fft on each
of those. The size of the block depends on the minimum time a tone can be active and still be classed
as a dialled number.
matfud
This is an interesting problem.
As a starting point I would use matfud's approach but I would overlap the segments by about 50% and I would probably use a Hamming 'window' in order to to reduce side lobe interference. You could try one of the more sophisticated windows but my experience is that it makes little difference.
Detection of the 'tones' within the spectrum presents another problem. I assume that you have knowledge of the tones frequencies so you could apply some simple threshold logic to decide which is the dominant tone within a segment.
Even after this you will have to decide how to interpret the tone sequence. If you get two 9s together is that because a single 9 has a component in two segments or because there realy are two 9s?
I suspect that in the real world the detection would be done using a phase lock loop.
So, you are tellling me to FFT small arrays of parts of the big one. How would I go about calculating the size of each small array, for example, In the test files I have,
they all have fram rate of 8000, frame size of 2, buffer size 1600, sample rate 8000 and the number of bytes in the sample are bettween 11000 and 18000. They last betwenn 1 and 3 seconds.
I get the last two values like this:
int bufSize = frameRate * frameSize / 10;
System.out.println( "Buffer Size: " + bufSize );
System.out.println("Number of bytes: " + ais.getFrameLength()*ais.getFormat().getFrameSize());
Any ideas on the size of the blocks?
One more question has anybody used the implementaion of the FFT on the page I mentioned before, does it work? or anybody has a better one?
Thanks for all your help,
Dan
Since you are trying to resolve tones, the segment length should be shorter than the length of a tone. Intuition says about half the length of a tone but I suspect that some experimentation will be required.
Once you have defined a segment period then the number of points you are going to use in your transform is (obviously) the segment period times the sample rate.
Out of interest, what is the typical length of a tone?
Im not sure the typical lenght of a tone, from the test files, that dial a 7 digit number, they are between 1 and 3 segunds each file. Im still not sure about the correct length..Dan
7 Digits in 1 second says 1/7 seconds per digit. So as a starting point you need a segment size of no more than 1/7 seconds. As a guess I would try about 1/20 second. Given 8,000 samples per second, this will give you 40 point per segment.
I have just Googled for the touch tone frequencies. The closest you need to resolve is between 697 and 770 Hz ie 73 Hz. Your sample period must therefore be significantly greater than 1/73 seconds or you won't be able to resolve the tone difference.
So, as deduced in my previous post, 1/20 of a second looks to be a reasonable starting point!
Your post looks intresting, Ill try to divide the whole vector and chunks that are 1/20 of a second long, and after exapnding them to the nearest power of two, apply the FFT.
Just for the info, here is the tables for the DTMF freq:
The DTMF tones are sums of two sine wave tones at following frequencies:
1209 Hz 1336 Hz 1477 Hz 1633 Hz
ABCDEF
697 Hz 123 A
GHIJKLMNO
770 Hz 456 B
PRSTUVWXY
852 Hz 789 C
oper
941 Hz *0#D
Dan
Sorry, the last post wasnt properly formatted, a better look is in this page: http://www.hut.fi/~then/mytexts/dtmf_generation.htmlDan
Dan,Do you know where I could download some test case audio files.
Some other points.
Try to avoid padding the chunks if you can as although it works fine it does
change the locations at which frequencies appear. Therefore you would need to
perform additional calculations, based on the amount of padding you used, to
determine where to look for a specific frequency.
Ideally each tone will resolve to four impulses in the fourier domain. These four
impulses are the positive and negative frequencies of the two tones. As there
are positive and negative impulses in the ourput you only need to search the
first half of the fourier result for your two frequencies.
However this is not an ideal world so instead of two impulses in the first half of
the fourier transform you will likely end up with two slightly smoothed peaks.
Noise in the WAV will also appear as peaks/wobbles in the fourier response.
Unless the tones are seperated by silence then you have an additional problem
that occurs when one tone pair transitions to a different pair in the middle of your
sample. In this case you will end up with four peaks in the first half of your FFT (8
in total) and no way of determining between them.
In essence I am saying that after you get the FFT's of your samples you will need
to perform some form of processing to try and determine the most powerful
peaks in that FFT. This data will need to be stored and then post processed to
determine the likely keypad value at each point ( perhaps by merging adjacent
key values that are the same and discarding undertermined values).
matfud
Some other points.
Try to avoid padding the chunks if you can as although it works fine it does
change the locations at which frequencies appear. Therefore you would need to
perform additional calculations, based on the amount of padding you used, to
determine where to look for a specific frequency.
Ideally each tone will resolve to four impulses in the fourier domain. These four
impulses are the positive and negative frequencies of the two tones. As there
are positive and negative impulses in the ourput you only need to search the
first half of the fourier result for your two frequencies.
However this is not an ideal world so instead of two impulses in the first half of
the fourier transform you will likely end up with two slightly smoothed peaks.
Noise in the WAV will also appear as peaks/wobbles in the fourier response.
Unless the tones are seperated by silence then you have an additional problem
that occurs when one tone pair transitions to a different pair in the middle of your
sample. In this case you will end up with four peaks in the first half of your FFT (8
in total) and no way of determining between them.
In essence I am saying that after you get the FFT's of your samples you will need
to perform some form of processing to try and determine the most powerful
peaks in that FFT. This data will need to be stored and then post processed to
determine the likely keypad value at each point ( perhaps by merging adjacent
key values that are the same and discarding undertermined values).
matfud
Oh the joys of the broken submit button that throws an exception but still submits the post.
I have some test files, give me your email or email me and Ill send them to you.my mail is bmobananas@yahoo.comDan
Dan,
Having once been the subject of some rather childish hate emails I don't publish my email address on any of these forums (or any other).
I found some tone 'wav' file to play with. These are fairly noise free and each represents a single tone but I should be able to play with them for the moment. If I don't get bored and I decide to try my stuff on real data I may take you up on your offer of some test files.
Please excuse me working on this. I'm not trying to 'steal' your project but it does interest me.
Sabre
Not to disuade you from the joys of programming an FFT and having fun and all that, but did you bother to look at Goertzel's algorithm for detecting touch tones?
I don't known anything about it personally, I just found it incredibly hard to believe that you are the first person that has every tried to write code to detect touch tones in an audio stream. I would think that maybe the phone company has looked into this particular problem previously.
Hi, thanks for the tip. I've never heard of Goertzel's algorithm. I will have to look more into it to find out how it works. Does anybody has any info on this algorithm, or some source code, hopefully Java, to study it?Thanks,Dan
one goes to Google and types in "Goertzel's algorithm"
the top few links give you a nice mathematical description of the algorithm. About halfway down the page you get to a link like this:
http://www.embedded.com/showArticle.jhtml?articleID=17301593
which looks interesting. You browse through it an find near the top a paragraph like this:
"For instance, to detect specific frequencies when you're looking for tones from telephones or to detect 60Hz noise on power lines, the Goertzel algorithm ("The Goertzel Algorithm," by Kevin Banks, Embedded Systems Programming, Sept 2002, p. 34) finds specific frequencies faster than the FFT."
When you click on the link to "the Goertzel Algorithm" you get to a page full of how to choose bin and sample size, and all the grubby details. Here's the link:
http://www.embedded.com/story/OEG20020819S0057
I shudder to think what you might find if you actually typed "Java version of Goertzel's algorithm" into google.
As I said before, I know nothing about this algorithm and not much about signal processing. I found the name Goertzel by first googling something like "touch tone detection algorithms"
Google is your library, and unlike the forums you don't need to wait for someone to read it and to reply.
Go forth and search to your heart's content. A wealth of information awaits you.
PS - echoing something I read somewhere else, once you find what you are looking for, it would be polite to come back here and reply to your own question with any useful links that you find, so for example if you find that someone has already written the code you want, you come back and put the link here so that folks that trip over this thread can share in the results of your search.
Enjoy!
