Source code for phonlab.auditory.sinewave_synth

import numpy as np

[docs] def sine_synth(formant_data,fs=16000): """ Produces 'sinewave speech' - an audio signal made up of time-varying sinusoidal waves at the frequencies of the vowel formants. Note ==== The input dataframe is usually produced by :func:`phonlab.track_formants` and in any case should have these columns: * sec : Time (seconds) of the frames. * rms : The RMS amplitude of each frame. * F1,F2,F3,F4: Frequencies (Hz) of the lowest four vowel formants, in the frames. The overall amplitude contour of the sine wave speech waveform is determined by the RMS contour in the input file. The frequencies of four sine wave components are given by the formant estimates in the input file. The amplitude of the sine wave speech waveform is scaled to use the full amplitude range available with 16 bit integer samples. Also, short 20ms on-ramp and off-ramp amplitude contours are applied to the beginning and end of the audio. This is a python translation of code that Keith Johnson got from Howard Nusbaum, via Alexander Francis, in 1998. Parameters ========== formant_data : dataframe a pandas dataframe with speech analysis data as produced by phonlab.track_formants() fs : number, default=16000 the sampling frequency of the resulting sound wave. Returns ======= wav : ndarray a one-dimensional numpy array containing audio samples fs : number the sampling frequency of the audio samples in wav References ========== R. E. Remez, P.E. Rubin, D.B. Pisoni & T.D. Carrell (1981) Speech perception without traditional speech cues. `Science` **212** (4497), 947–950. doi:10.1126/science.7233191 Example ======= .. code-block:: Python x,fs = phon.loadsig("sf3_cln.wav",chansel=[0]) fmtsdf = phon.track_formants(x,fs) # track the formants x,fs = phon.sine_synth(fmtsdf,fs=fs) # produce the sinewave synthesis librosa.output.write_wav('sf3_cln_sinewave.wav', x, fs) # save wav file """ pifac = np.pi*2/fs step=formant_data.sec[1]-formant_data.sec[0] # read step size from input nframes = len(formant_data) # read file length from input npoints = int(np.round(step*fs)) wav = np.zeros(npoints*nframes) # allocate a waveform buffer formant_data.interpolate(inplace=True,limit_direction='both') # no NaNs rms = formant_data.rms rms = (rms-np.min(rms))/(np.max(rms)-np.min(rms)) formants = np.array((formant_data.F1, formant_data.F2, formant_data.F3, formant_data.F4)) # TODO: check that formant frequencies do not exceed fs/2 # if they do, consider increasing fs for f in range(formants.shape[0]): # synthesize each formant rfreq = 0.0 iwv = 0 for y in range(1,nframes): # synthesize frame by frame amp = rms[y-1] freq = formants[f,y-1]*pifac ainc = (rms[y]-rms[y-1])/npoints finc = ((formants[f,y]-formants[f,y-1])*pifac)/npoints for i in range(npoints): # synthesize each point in the frame rfreq += freq #if (rfreq > 2*np.pi): # is this "if" really necessary? # rfreq -= 2*np.pi wav[iwv] += np.sin(rfreq)*amp amp += ainc freq += finc iwv += 1 # add rise time and decay time ns = int(0.02 * fs) # number of samples in 20ms fac = 0 facinc = 1.0/ns # ramp from 0 to 1 for x in range(ns): wav[x] *= fac # apply a short (20ms) rise time wav[-x] *= fac # and a short (20ms) decay time fac += facinc # scale wav /= np.max(np.abs(wav)) return wav,fs