Source code for phonlab.acoustic.gci

import numpy as np
from scipy.signal import windows, filtfilt, buttord, butter, sosfiltfilt, argrelmax, argrelmin, find_peaks
from pandas import DataFrame
from ..utils.prep_audio_ import prep_audio
from ..acoustic.lpc_residual import lpcresidual


def get_mbs(x,fs, f0median, width = 1.4):
    ''' filter with a blackman window, to smooth glottal pulses to something more sinusoidal.

    width is the size of the window in number of glottal pulses.  Drugman and Dutoit used 1.75,
    1.4 seems to be better for higher pitched voices without harming performance with lower 
    voices.

    TO DO: What about using F0 trajectory rather than a single median?  This would mean 
    having a time series of windows w with length determined frame by frame.   
    '''
    
    # calculate the MeanBased Signal - smooth with a window that is 1.4 glottal pulses wide
    t0mean = round(fs/f0median)  # expected number of samples per glottal cycle
    halfL = round((0.5*width)*t0mean)     # using a factor of 1.4 (0.7*2) instead of 1.75
    w = windows.blackman(2*halfL +1)  # the length of the smoothing filter is critical
    mbs = filtfilt(w,1,x)  # smooth audio with blackman window, for mean-based signal

    # remove lowfrequency oscillation in the mbs
    n,Wp = buttord(50,30,3,60,fs=fs)
    sos = butter(n, Wp, 'hp', output='sos',fs=fs)
    mbs = sosfiltfilt(sos, mbs)  # highpass filter the mean-based signal
    mbs = mbs/np.max(np.abs(mbs))

    return mbs

# GCI function

[docs] def gci_sedreams(x,fs,f0median=170, target_fs = 16000, order=None, cthresh=0.0): '''Identify Glottal Closure instances (GCI) using Drugman & Dutoit's (2009) `sedreams` algorithm which was published as a Matlab function in the Covarep repository. The function also returns f0 and vocal jitter based on the derived GCI estimates. Two parameters are given here which were not a part of the original implementation. Based on a comment in D&D(2009), the function includes a process to choose different LPC orders for different f0medians. .. code-block:: Python low_order = int(fs/1000) + 2 # at 16kHz sampling this is 18 if order==None: if f0median<190: order = low_order elif f0median<250: order = low_order-2 elif f0median<300: order = low_order-4 else: order = low_order-6 If you don't explicitly specify and LPC order one will be chosen for you. This differs from D&D in that they just used order=18 for everything. This doesn't seem to change much in the operation of this function. The other additional parameter is a threshold for peaks in the residual function. Changing this parameter changes the output of the function quite a lot. The algorithm looks for a peak in a temporal span of the residual in each presumed glottal cycle and reports the location of that peak as the location of the glottal closing instant, and height of the peak as the SOE (strength of excitation). With the **cthresh** parameter the user is given the option of disregarding intervals that have weak evidence of a glottal closure. D&D used cthresh=0, resulting in a lot of apparent jitter in regions where the actual f0 is much lower than f0median. If you add a threshold, the algorithm rejects candidate GCIs for which there is weak evidence of glottal closure. This may be a mistake for breathy voice. Further testing is needed. Parameters ========== x : ndarray A one-dimensional array of audio samples fs : int sampling rate of **x** f0median : float, default = 200 The median of the fundamental frequency of voicing in **x**. This value can be estimated by the user, but it is best to measure the median of values given by a pitch tracker. target_fs : int, default=16000 the sampling rate of the returned residual and mbs waveforms. order : int, default = None By default the order used in LPC analysis (to calculate the LPC residual signal) is guessed based on the value of F0median. Drugmand & Dutoit (2009) used order=18. cthresh: float, default = 0.1 A peak in the LPC residual must be greater than this thresholf value in in order to be considered a glottal closure instant. The residual is amplitude normalized to the range [0,1], so 0.1 is 10% of the amplitude range. Returns ======= df : pandas DataFrame See the note below mbs : ndarray A waveform the same length as the input **x** containing the "Mean Based Signal" resid : ndarray A waveforem the same length as the input **x** containing the LPC residual signal. Note ==== The columns in the returned DataFrame are: * sec - the time points (in seconds) of each glottal closure instant. These are the GCI times. * f0 - pitch of voicing, which is `1/t`, where `t` is the glottal period - the duration between adjacent GCI times. * jitter - the relative discrepancy between adjacent glottal period durations (call them t1 and t2). The value is calculated as `j = 2 * |0.5 - t1/(t1+t2)|`. This results in values on a scale from 0 (adjacent periods (t) are equal in duration), to 1 (the theoretical limit of jitter). A value of 0.5 means that t1 is twice (or 1/2) as long as t2. This corresponds to 1/2 of Praat's "local jitter" measurement (which is scaled from 0 to 2). * soe - strength of excitation (SOE)is the height of the LPC residual peak for each GCI * shimmer - relative discrepancy between adjacent glottal SOE, measured in dB with the formula: shimmer = 20*log(soe[i+1]/soe[i]) References ========== T. Drugman, T. Dutoit (2009) Glottal Closure and Opening Instant Detection from Speech Signals, Interspeech09, Brighton, U.K. T. Drugman, M. Thomas, J. Gudnason, P. Naylor, T. Dutoit (2012) Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review, IEEE Transactions on Audio, Speech and Language Processing, vol. 20, Issue 3, pp. 994-1006. Example ======= The example here shows glottal closure instances for a small 40 millisecond window in one of the exaample audio files. .. code-block:: Python example_file = importlib.resources.files('phonlab') / 'data/example_audio/im_twelve.wav' x,fs = phon.loadsig(example_file, fs=16000,chansel=[0]) f0df = phon.get_f0_B93(x,fs) f0med = np.nanmedian(f0df.f0) gci_df, mbs,resid = gci_sedreams(x,fs,f0med, cthresh = 0.01) times = np.array(range(len(x)))/fs # construct a time axis for plotting start = 1.8 # choose a chunk end = start+0.04 s = int(start*fs) e = int(end*fs) plt.plot(times[s:e], x[s:e]+0.75) # plot the sound wave (centered on y = 0.75) plt.plot(times[s:e], mbs[s:e]) # the meanbased signal plt.plot(times[s:e], resid[s:e],'c-') # the lpc residual plt.vlines(np.ma.masked_outside(gci_df.sec, start,end),0.4,1.1,color='red') plt.axhline(0.1) plt.axhline(-0.1) The figure here shows the derived waves used in finding GCIs. In the top trace, the sound wave is shown in blue and the glottal closure instants (GCIs) are shown with red vertical lines. In the bottom trace, the mean-based signal is shown in orange and the LPC residual is in cyan. Horizontal lines show the threshold (cthresh=0.1) for considering a peak in the residual to be a possible GCI. .. figure:: images/gci.png :scale: 50 % :alt: Results of the above code showing the derived waves used in finding GCIs. :align: center ''' y,fs = prep_audio(x,fs,target_fs,pre=0,quiet=False) # ======================== # 1. get the mean based signal and find peaks and valleys in it mbs = get_mbs(y,fs,f0median) # get the locations of the peaks and valleys in the mbs imax = argrelmax(mbs)[0] # find peaks in the mean-based signal imin = argrelmin(mbs)[0] # and valleys while imax[0]<imin[0]: # insure that sequences of max and min are as expected. imax = np.delete(imax,0) while imin[-1]>imax[-1]: imin = np.delete(imin,-1) # ======================= # 2. calculate the lpc residual signal low_order = int(fs/1000) + 2 if order==None: if f0median<190: order = low_order elif f0median<250: order = low_order-2 elif f0median<300: order = low_order-4 else: order = low_order-6 resid,fs = lpcresidual(y,fs,target_fs, order = order) # ======================== # 3. set an expectation for the median relative postion of the GCI in the MBS # get some big residual peaks -- to find an estimate for where to find the GCI in the MBS rp,_ = find_peaks(resid,height=0.3) relpos = np.empty(0) for k in range(len(rp)): q = np.argmin(np.fabs(imin-rp[k])) if mbs[imax[q]]>0.3: rel = (rp[k] - imin[q])/(imax[q] - imin[q]) if rel>0 and rel < 1: relpos = np.append(relpos,rel) if len(relpos)==0: ratioGCI = 0.5 else: ratioGCI = np.median(relpos) # a reasonable expectation of where the GCI will be # ========================= # 4. Find the glottal closure instants (GCI) - peaks in the residual gci = np.full(len(imin),np.nan) # initialize with nan values soe = np.full(len(imin),np.nan) # initialize with nan values idx = 0 for k in range(len(imin)): t = imax[k] - imin[k] start = imin[k] + round((ratioGCI - 0.3)*t) # back a bit if start < 1: start = 1 if start > len(resid): break stop = imin[k] + round((ratioGCI + 0.3)*t) # forward a bit if stop > len(resid): stop = len(resid) peak_resid = np.max(resid[start:stop]) if peak_resid > cthresh: # threshold to posit glottal closure i = np.argmax(resid[start:stop]) soe[k] = peak_resid # strength of excitation = height of residual peak gci[k] = start + i -1 soe = soe[~np.isnan(soe)] gci = gci[~np.isnan(gci)] # remove rows that didn't get filled gci = (gci)/fs # ========================== # 5. Measure F0 from the list of GCI t1 = gci[1:]-gci[:-1] # duration between adjacent GCI t1 = np.pad(t1,(0,1),mode='edge') # repeat the last one f0 = 1/t1 # should filter this to f0 values in a range # ========================== # 6. Measure jitter of adjacent periods t2 = gci[2:]-gci[:-2] # duration between intervals of 2 GCI t2 = np.pad(t2,(0,2),mode='edge') # repeat the last two jitter = 2 * np.fabs(0.5 - (t1/t2)) # 0 = identical adjacent periods, 1 = one is twice as long as the other # ========================== # 7. Shimmer shimmer = 20*np.log10(soe[1:]/soe[:-1]) # amplitude difference between adjacent pulses. shimmer = np.pad(shimmer,(1,0),mode='edge') # repeat the first one df = DataFrame({'sec': gci, 'f0':f0, 'jitter':jitter, 'soe':soe, 'shimmer':shimmer}) return df,mbs,resid