Comparative Study Of Pitch Detection Algorithms English Language Essay

Pitch is defined as the cardinal frequence of a harmonic signal. There are three wide ways of pitch sensing based on the clip sphere, frequence sphere, or a combination of both clip and frequence sphere techniques. A survey, so far, has been made on autocorrelation method and cepstral method to find the pitch. A survey has besides been conducted to cognize the practical jobs related to the pitch sensing of the address signals every bit good every bit good as to larn assorted pitch sensing betterment techniques like opposite filtering, etc. Flip sensing is really of import in voice acknowledgment systems every bit good as designing of vocoders.

1. Introduction

The word pitch ( in context of address processing ) is subjectively defined as the frequence of a pure tone that is matched by a hearer to a more complex ( normally periodic ) signal. A pitch sensor in an applied scientist ‘s linguistic communication is referred to a device that measures the cardinal frequence of an incoming signal. Pitch perceptual experience is a subjective consequence while pitch sensing is an nonsubjective consequence. Flip sensing is of importance, specifically in address or music, when the country of survey is individual quasi periodic sound beginning. Pitch sensing algorithms can be divided into methods which operate in the clip sphere, frequence sphere, or both. A pitch sensor is non merely an indispensable constituent of address processing signals but besides gives penetrations into the nature of excitement beginning for address production and helps in acknowledging talkers through the survey of pitch contours.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Some pitch sensing methods uses the sensing and timing of some clip domain characteristic while other clip sphere attacks use autocorrelation maps or difference methods to observe similarity between the wave form and a clip lagged version of the same wave form. Another group of attacks operate in the frequence sphere in which they locate sinusoidal extremums in the frequence transform of the input signal. Assorted other pitch sensing techniques use a combination of clip and frequence based attacks. Time sphere describes the analysis of mathematical maps or physical signals with regard to clip. In this attack the value of the map or the signal is known for all existent Numberss for the instance of uninterrupted clip or at assorted blink of an eyes for distinct clip. In frequence sphere method the signal is frequence transformed and so the frequence sphere representation is inspected for the first harmonic, the greatest common factor of all harmonics, or other such indicants of the period. The signal is windowed to avoid spectral smearing. To accurately turn up the harmonic extremums a minimal figure of periods of the signal must be analyzed depending on the type of window. Assorted additive pre treating stairss can be used to do the procedure of turn uping frequence sphere characteristics easier, such as executing additive anticipation on the signal and utilizing the residuary signal for pitch sensing. Harmonicss can be easy located by executing non-linear operations like extremum modification. Flip appraisal or cardinal frequence appraisal of a address wave form is an of import factor in rating of public presentation of address based systems. The prosodic information of the address is conveyed by its pitch. Pitch is besides related to the quasiperiodic excitement of the vocal piece of land resulted due to the quivers of the vocal cords when air flows during contraction of the lungs during sonant address. Based on the assorted clip and frequence sphere attacks, pitch sensing can be done utilizing the following techniques:

Autocorrelation method.

Cepstral method.

A particular flattening Linear Predictive Coding method.

Average Magnitude Difference Function.

Harmonic Product Spectrum.

Maximal Likelihood method.

There are assorted jobs in pitch sensing like the accurate and dependable measuring of the pitch period ( PP ) , finding the period of a address wave form which varies both in period and the elaborate construction of the wave form within the period, etc.

There are assorted signal processing techniques used to better pitch sensing like low base on balls filtering, reverse filtering, comb filtering, spectral flattening and correlativity, etc. Measuring the spectrum with high declaration high spots the places of the harmonics which can take to a powerful pitch sensing algorithm like pattern-recognition methods for pitch sensing.

Pitch appraisal is really of import in instance of vocoders and is a really utile parametric quantity in address acknowledgment. Accuracy in gauging pitch period, truth in doing voiced-unvoiced determinations, velocity of operation, complexness of the algorithm, etc. are the factors which help in measuring pitch sensors.

2. SOME DIFFICULTIES IN PITCH DETECTION

Accurate and dependable measuring of the pitch period of address signals from the acoustic force per unit area wave form is really hard because the glottal excitement wave form is non a absolutely periodic in nature. Furthermore a address wave form varies both in period and in the elaborate construction within the period unlike a absolutely periodic wave form. The existent pitch period becomes hard to observe in some cases when the formants of the vocal piece of land alter the construction of the glottal wave form. A major pitch mensurating trouble arises in specifying the pitch period window for sonant address sections. Consistency, the deficiency of which may take to specious pitch period estimation, from period-to-period is the lone demand for doing such measurings. Fig.1 shows two different possibilities for specifying pitch marker based on the wave form measuring. It may be noted here that extremum measurings are sensitive to the formant construction during the pitch period while zero crossings are sensitive to the formants, noise, and any dc degree in the wave form. Distinguishing between the voiceless and the low-level voiced speech sections is really hard because the passages between the two are really elusive. Pitch extraction of the address transmitted through the telephone line is yet another cause of concern. The effects of telephone system on address include additive filtering, non additive processing, and the add-on of noise to the address signal which renders the appraisal of cyclicity hard. Phase deformation, attenuation of the address signal, crosstalk between two or more messages, cutting of highly high-ranking sounds are the assorted non-linear parts to the telephone system to the address signal.

Figure 1: two wave form measurings which can be used to specify pitch markers.

Fig.2 illustrates some of the jobs encountered in appraisal of the pitch. Fig.2 ( a ) shows two address wave forms in which the underside signal has a period about one 4th of the top signal. A big dynamic scope of sonant cardinal frequence is illustrated. It is of import to detect that pitch of some male voices can be every bit low as 60 Hz while that of kids may be every bit high as 800 Hz. Fig.2 ( B ) shows the drastic and instantaneous fluctuation of the period. The leftmost period is rather short, but the following five periods are more than twice as long before leaping back to shorter periods. This invariant behaviour makes pitch tracking more hard. Fig.2 ( degree Celsius ) illustrates a sudden alteration in the spectrum caused due to the speedy closing as in vowel-to-nasal passage. This makes the wave form based pitch sensing hard. Fig.2 ( vitamin D ) shows a passage part from nonperiodic ( hiss ) excitement to quasi-periodic ( bombilation ) excitement. Fig.2 ( vitamin E ) and Fig.2 ( degree Fahrenheit ) show the consequence of address debasement due to telephonic transmittal and added noise which makes it further hard to observe pitch.

Figure 2: six illustrations of troubles in pitch sensing.

3. Signal Processing TO IMPROVE PITCH DETECTION

Several following methods of conditioning the address signal to better the pitch sensing have proved utile:

Low-pass filtering: Human flip perceptual experience wages more attending to lower frequences. Estimating the pitch period by oculus is typically easy with low-passed wave forms like those illustrated in Fig.3.1 than with full set wave forms of Fig.3.2. A pitch sensor would easy happen the period of wave forms in Fig.3.1 than in Fig.3.2.

Spectral flattening and Correlation: This impression was proposed by Sondhi. It is based on the observation that a Fourier series representation of harmonics of equal amplitude and zero stage consequences in a signal that is really much like a pulsation train. In this proposal the original signal is flattened spectrally. An estimate to this operation is shown in Fig.3.3 where the end products of a bank of set base on balls filters ( BPFs ) are divided by their ain energy and the constituents added thenceforth. The amount is now sent through an autocorrelator, which creates a zero-phase clip map, therefore come closing the equal harmonic-zero stage standards. Fig.3.4 shows the consequence of autocorrelation.

Inverse filtering: A hypothesis that speech signal is whirl of an excitement and vocal piece of land filter. If it were possible to stipulate the time-varying vocal piece of land at all the times, so speech signal could be passed through a filter with a spectrum opposite to that of the vocal piece of land filter ; the end product of which should be a glottal wave form, simplifying pitch sensing. Markel has implemented such a filtering as a portion of his SIFT ( simplified opposite filter tracking ) algorithm for pitch appraisal.

Comb filtering: the address signal is passed through assorted holds, matching to all distinct periods of the input. The system is depicted in Fig.3.5. The figure of possible periods range from 20 to 200 for 10-kHz sampling and a 50-500 Hz cardinal frequence scope. Thus the filter must incorporate at least 181 lights-outs. The delayed version is subtracted from the original signal at each pat. One of the tapped end products should be zero if the signal is periodic.

Cepstral pitch sensing: a deconvolution of the beginning and the filter is performed. The high-time parts of the cepstrum contain a really clear intimation about the cardinal frequence. Fig.4 shows the ensuing cepstra for a male ( two left columns ) and female ( two right columns ) .

Measuring the spectrum with high declaration high spots the places of the harmonics.

Figure 3.1: full-band address signal.

Figure 3.2: low-pass filtered speech signal.

Figure 3.3: spectral flatteinig.

Figure 3.4: autocorrelation map of spectrally flattened address.

Figure 3.5: comb-filtering of address moving ridge.

Figure 4: cepstral analysis.

4. TYPES OF PITCH DETECTORS

Despite the assorted troubles in mensurating pitch, a figure of pitch sensing techniques have been proposed. A pitch sensor is a device which can do a voiced-unvoiced determination and at the same clip step pitch periods for sonant address. Flip sensing algorithms merely find the pitch during sonant sections of address but depend on other attacks for doing the voiced-unvoiced determinations. Assorted pitch sensing algorithms autumn under the three wide classs:

A group utilizing the time-domain belongingss of address signals.

A group utilizing the frequency-domain belongingss of address signals.

A group utilizing both the clip and frequency-domain belongingss of address signals.

Time-domain sensors work on address wave form to gauge the pitch period. The measurings under such an attack brand usage of extremum and vale measurings, zero-crossing measurings, and autocorrelation measurings. Such measurings provide good estimations of the period if a quasiperiodic signal has been appropriately processed to minimise the effects of the formant construction.

The frequency-domain sensors use the belongings that if the signal is periodic in clip sphere, so its frequence spectrum will incorporate consecutive urges at the cardinal frequence and its harmonics. To gauge the signal period, simple measurings can be made on the frequence spectrum.

A intercrossed pitch sensor makes usage of both the belongingss of address signals. It may utilize the frequency-domain attack to obtain a planate wave form and at the same clip usage autocorrelation method to mensurate pitch period of the signal.

5. AUTOCORRELATION Technique

Basically, this algorithm exploits the fact that a periodic signal, even if it is non a clean sine moving ridge, will be similar from one period to the following. This is true even if the amplitude of the signal is changing in clip, provided those alterations do non happen excessively rapidly. To comprehend the pitch, we take a window of the signal, with a length at least twice every bit long as the longest period that we might observe.

Using this section of signal, we generate the autocorrelation map R ( s ) defined as the amount of the point wise unconditioned difference between the two signals over some interval, possibly 600 points.

Diagrammatically, this corresponds to the followers:

Figure 5.1: Here, the bluish signal is the original and the green signal is a transcript of the original, shifted left by an sum about approaching the cardinal period. Detect how the signals begin to aline with each other as the displacement sum nears the cardinal period.

Naturally, it should do sense that as the displacement value s Begins to make the cardinal period of the signal T, the difference between the shifted signal and the original signal will get down to decrease and we can see this in the secret plan below, in which the autocorrelation map quickly approaches nothing at the cardinal period.

Figure5.2: cardinal period is identified as the first lower limit of the autocorrelation map. Notice that the map is periodic. R ( s ) measured the entire difference between the signal and its shifted transcript, the signals once more align and the difference approaches zero as the displacement approaches k*T.

We can observe this value by distinguishing the autocorrelation map and so looking for a alteration of mark, which yields critical points. We so look at the way of the mark alteration across points ( positive difference to negative ) , to take merely the lower limit. We so search for the first lower limit below some threshold, i.e. the minimal correspondent to the smallest s. The location of this minimal gives us the cardinal period of the windowed part of signal, from which we can easy happen out the frequence utilizing the autocorrelation technique.

5.1 FAST-Autocorrelation

Clearly, this algorithm requires a great trade of computation. First, we produce the autocorrelation map R ( s ) for some positive scope of s. For each value of s, we need to calculate the entire difference between the shifted signals. Following, we need to distinguish this signal and expression for the lower limit, eventually finding the right lower limit for each window.

In bring forthing the R ( s ) map, we define a field for s of 0 to 599. This allows for cardinal frequences between about 50 and 22000 Hz, which works nicely for human voice. However, this does necessitate ciphering R ( s ) 600 times for each window.

In attempt to percolate up the effectivity of this algorithm, an option called FAST Autocorrelation is created which can give velocity betterments to 70 % .

We exploit the nature of the signal, intentionally the fact that if the signal was generated utilizing a high sampling rate and if the Windowss are narrow plenty, we can presume that the pitch will non contrast significantly from window to window. Therefore, we can get down ciphering the R ( s ) map utilizing values of s that correspond to countries near the anterior lower limit. This means that, if the old window had a cardinal period of 156 samples, we start on ciphering R ( s ) for s = 136. If we fall short to happen the minimum s in this country, we calculate farther and more from the old s until we find a lower limit.

Besides, we note that the first lower limit ( valued below the threshold ) is ever traveling to fit up to the cardinal frequence. Therefore, we can cipher the difference equation dr ( s ) /ds as we generate R ( s ) . Then, when we find the first minimal below brink, we can halt ciphering wholly and travel on to the following window.

If we use merely the 2nd sweetening, we typically cut down the scope of s from 600 points to more or less 200. If we so brace in the first betterment, we wind up ciphering R ( s ) for merely about 20 values of s, which is a nest eggs of ( 580 ) * ( 1200 ) = 700000 computations per window. When the signal may dwell of 100s of Windowss, this betterment is of import without a uncertainty.

Figure 5.3: wave form and the autocorrelation map. The estimated pitch is 156.863Hz.

5.2 Restrictions of Autocorrelation

The followers are the restrictions of the autocorrelation map:

Tough sensitive to trying rate, it is imperviable to resound.

Cardinal frequence is straight calculated from a displacement in samples which result in hapless pitch declaration for low sampling rates.

Highly expensive computationally due to big sum of computations.

Not really efficient for high cardinal frequence.

6. Cepstrum Pitch Determination ( CPD )

Cepstral analysis besides provides a manner for the pitch appraisal. The cepstrum of sonant address intervals has strong peak matching to the pitch period. Cepstrum pitch finding technique has some advantages over autocorrelation based PDAs. It is assumed that the sequence of sonant address s ( n ) can be presented as

( 1 )

where vitamin E ( n ) is beginning excitement sequence and H ( n ) is the vocal piece of land ‘s distinct impulse response. In the autocorrelation map the effects of the vocal beginning and vocal piece of land are convolved with each other. This consequences in wide extremums and in some instances multiple extremums in the autocorrelation map. In frequence sphere whirl relationship between vocal beginning and vocal piece of land effects becomes a multiplicative relationship

( 2 )

where S ( ? ) =F { s ( n ) } , E ( ? ) =F { vitamin E ( n ) } and H ( ? ) =F { H ( n ) } . Symbol F stands for Discrete Fourier Transform ( DFT ) . Function ( 7 ) so can be represented as ( 3 ) ,

( 3 )

The multiplicative relationship between beginning and tract effects in cepstrum is transformed into an linear relationship. The effects of the vocal beginning and vocal piece of land are about independent or easy identifiable and dissociable. It is possible to divide the portion of the cepstrum, which is represents beginning signal and happen true pitch period. That is why, in general, cepstrum pitch finding is more accurate than autocorrelation PDAs. For pitch finding, existent portion of the cepstrum is sufficient. The existent cepstrum of the distinct signal s ( n ) is defined as

( 4 )

where S ( K ) is logarithmic magnitude spectrum of s ( N )

( 5 )

The cepstrum consists of extremum happening at a high quefrency equal to the pitch period in seconds and low quefrency information matching to the formant construction in the log spectrum. To obtain an appraisal of the cardinal frequence from the cepstrum we look for a extremum in the quefrency part matching to typical address cardinal frequences. In Figure 6.1 an illustration of cardinal frequence appraisal from the cepstrum of sonant frame is presented.

Figure 6.1: Wave form ( a ) and existent cepstrum ( B ) of voiced address section.

Procedure of treating operations for cepstrum based pitch sensor is similar to the PDAs described above. It should besides be noted that the cepstral pitch sensor uses the full-band address signal for processing. Each block of 240 samples is weighted by a 240-point Hamming window and the cepstrum of that block is computed. The peak cepstral value and its location is determined. If the value of this peak exceeds a fixed threshold, the subdivision is called voiced and the pitch period is the location of the extremum. If the extremum does non transcend the threshold, a zero-crossing count is made on the block. If the zero-crossing count exceeds a given threshold, the block is marked as voiceless. Otherwise, it is called voiced and the period is the location of the maximal value of the cepstrum.

The cepstrum is a method of address analysis based on a spectral representation of the signal. To explicate the general thought of the cepstrum method used for spectral envelope appraisal, two attacks are possible. First, one can merely believe of obtaining the spectral envelope from a Fourier magnitude spectrum by in turn smoothing its curve to acquire rid of the rapid fluctuations. This boils down to using a low base on balls filter to the spectrum, interpreted as a signal, which lets merely the slow fluctuations ( low frequence oscillations of the curve ) base on balls, therefore the smoothing.

6.1 Disadvantages of the Cepstrum Method

There are two chief disadvantages of the cepstrum technique of spectral envelope appraisal. They are:

1. As the cepstrum is basically a low base on balls filtering of the curve of the spectrum interpreted as a signal and it will really average-out the fluctuations of the curve of the spectrum. This consequence can be seen in figure 6.2 and this is non what we want, because so the ensuing curve has no longer the enfolding belongings to associate the extremums of the curve

FIGURE 6.2 disadvantages of cepstrum.

2. Similar to LPC, in analysing harmonic sounds ( with a conspicuous partial construction ) they will follow the curve of the spectrum down to the residuary noise degree in the spread between two partials, particularly when the partials are spaced far apart as for high pitched sounds. See figure 6.2 as an illustration of this behaviour.

To be a Nurse