Signal Processing Stack Exchange is a question and answer site for practitioners of the art and science of signal, image and video processing. Join them; it only takes a minute:

Sign up

Here's how it works:

Anybody can ask a question
Anybody can answer
The best answers are voted up and rise to the top

Proper way of representing Static, delta, delta delta in a plot

up vote 0 down vote favorite

I am currently working on recreating the result of this paper. The paper is about applying cnn in speech recognition, in which cnn is used to for feature extraction, for which a proper way of representing the feature is needed such that it can it easily can detect them.

On page 3 section III subsection A : Organization of the Input Data to the CNN

Is it stated that the how the in paper are representing the input data, they use there so-called MFSC features, which are MFCC features without the DCT performed. It is then on next page stated that they make use of the static, delta and delta delta and plot them next you each other for each frame as such [Static Delta Delta Delta], and create a spectogram of this.

There exist several different alternatives to organizing these MFSC features into maps for the CNN. First, as shown in Fig. 1(b), they can be arranged as three 2-D feature maps, each of which represents MFSC features (static, delta and delta-delta) distributed along both frequency (using the frequency band index) and time (using the frame number within each context window). In this case, a two-dimensional convolution is performed (explained below) to normalize both frequency and temporal variations simultaneously. Alternatively, we may only consider normalizing frequency variations. In this case, the same MFSC features are organized as a number of one-dimensional (1-D) feature maps (along the frequency band index), as shown in Fig. 1(c). For example, if the context window contains 15 frames and 40 filter banks are used for each frame, we will construct 45 (i.e., 15 times 3) 1-D feature maps, with each map having 40 dimensions, as shown in Fig. 1(c). As a result, a one-dimensional convolution will be applied along the frequency axis. In this paper, we will only focus on this latter arrangement found in Fig. 1(c), a one-dimensional convolution along frequency

I tried doing this but seem to get a pretty weird spectogram..

Here is only the static:

and Here is [static delta delta delta]

I would expect it be a bit different, but not this much.. it looks like the data is placed incorrectly, or is it supposed to be like this?

asked 2 hours ago

Bob Burt

558

add a comment |

Your Answer

Sign up or log in

Post as a guest

Name

Post as a guest

Name

discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged convolution speech-recognition spectrogram neural-network or ask your own question.

question feed

asked	today
viewed	8 times

current community

your communities

more stack exchange communities

Proper way of representing Static, delta, delta delta in a plot

Your Answer

Browse other questions tagged convolution speech-recognition spectrogram neural-network or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Proper way of representing Static, delta, delta delta in a plot

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged convolution speech-recognition spectrogram neural-network or ask your own question.

Related

Hot Network Questions