Signal Processing Stack Exchange is a question and answer site for practitioners of the art and science of signal, image and video processing. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

i have big database of audio files, about 1-2 s long. different people say different words, but during process i done some duplicates whitch i have to find and remove (database is about 100000 sounds and it is difficult to listen, have some noise, or are similar to human, but are different). Some duplicate audio files cant be shifted by half a second or less during cutting process.

Please help. How can i find real duplicates in my audio database?

share|improve this question
    
Are you asking about how to compare if two "similar" audio files are actually the same word being pronounced? Or are you saying you actually have "bit by bit" duplicates within your database? If you have the latter problem then dsp.stackexchange is not the place to ask, since you question is more about algorithms and fast and practical database processing. But if it is the former, could you clarify your question a bit. "Some duplicate audio files cant be shifted by half a second or less during cutting process", did you mean "... audio files can be..."? – bone 1 hour ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.