Prepare Audio

phonlab.loadsig(path, chansel=[], offset=0.0, duration=None, fs=None, dtype=<class 'numpy.float32'>)[source]

Load signal(s) from an audio file.

By default audio samples are returned at the same sample rate as the input file. and channels are returned along the first dimension of the output array y.

Parameters:

path (string, int, pathlib.Path, soundfile.SoundFile, or file-like object) – The input audio file.
chansel (int, list of int (default [])) – Selection of channels to be returned from the input audio file, starting with 0 for the first channel. For empty list [], return all channels in order as they appear in the input audio file. This parameter can be used to select channels out of order, drop channels, and repeat channels.
offset (float (default 0.0)) – start reading after this time (in seconds)
duration (float) – only load up to this much audio (in seconds)
fs (number > 0 [scalar]) – target sampling rate. ‘None’ returns y at the file’s native sampling rate.
dtype (numeric type (default float32)) – data type of y. No scaling is performed when the requested dtype differs from the native dtype of the file. Float types are usually in the range [-1.0, 1.0), and integer types usually make use of the full range of integers available to their size, e.g. int16 may be in the range [-32768, 32767].

Returns:

ys (list of 1d signal arrays y (plus fs)) – Each channel is returned as a separate 1d array in the output list. The number of arrays is equal to the number of channels in the input file by default. If chansel is specified, then the number of 1d arrays is equal to the length of chansel. Technically, the last value of the list is fs, see below.
fs (number > 0 [scalar]) – sampling rate of the y arrays

Example

Load a stereo audio file, report the sampling rate of the file, and plot the left channel. Note, this will produce an error with a one channel file. left and right are one-dimensional arrays of audio samples.

left, right, fs = loadsig('stereo.wav', chansel=[0,1])
print(fs)
plt.plot(left);

To load a one channel (mono) file, you can do this:

x,fs = loadsig('mono.wav',chansel=[0])
print(fs,len(x))
plt.plot(x)

In this example we load channels from a wav file that has an unknown number of channels, downsampling to 12 kHz sampling rate. Use len(chans) to determine how many channels there are in the file, and plot the last channel. chans is a list of 1d audio signal arrays. You can pop the sample rate parameter off the list.

chans = loadsig('threechan.wav', fs=12000)
fs = chans.pop()     # Remove sample rate from end of the list of channels
print(len(chans))      # the number of channels
plt.plot(chans[-1])    # plot the last of the channels

phonlab.prep_audio(x, fs, target_fs=32000, pre=0, scale=True, add_tiny_noise=True, outtype='float', pad_to=0.0, quiet=False)[source]

Prepare an array of audio waveform samples for acoustic analysis.

Parameters:

x (array) – a one-dimensional numpy array with audio samples in it.
fs (int) – The sampling rate of the sound in x.
target_fs (int, default=32000) – The desired sampling rate of the audio samples that will be returned by the function. Set target_fs = None if you don’t want to change the sampling rate.
pre (float, default = 0) – how much high frequency preemphasis to apply (between 0 and 1).
scale (boolean, default = True) – scale the samples to use the full range for audio samples (based on the peak amplitude in the signal)
add_tiny_noise (boolean, default = True) – add a tiny bit of noise to the audio to avoid problematic waveforms with many samples at zero amplitude.
pad_to (float, default = 0.0) – add samples so duration is a multiple of pad_to. For example, if the duration is 1.99 seconds and pad_to is 0.1 then the signal will be padded to 2.0 seconds
outtype (string {"float", "int"), default = "float") – The “int” waveform is 16 bit integers - in the range from [-32768, 32767]. The “float” waveform is 32 bit floating point numbers - in the range from [-1, 1].

Returns:

y (ndarray) – a 1D numpy array with audio samples
fs (int) – the sampling rate of the audio in y.

Note

By default, this function will return audio with a sampling rate of 32 kHz and scaled to be in the range from [1,-1]

Example

Open a sound file and prepare it for acoustic analysis. By default, prep_audio() will resample the audio to a sampling rate of 32000, and scale the waveform to use the full range. In this example, we have also asked the function to apply a preemphasis factor of 1 (about 6dB/octave).

y,fs = phon.loadsig("sound.wav",chansel=[0])
x,fs = phon.prep_audio(y, fs, pre=1)

Take the right channel, and resample to 16,000 Hz

*chans,fs = phon.loadsig("sound.wav")
print(f'the old sampling rate is: {fs}')
y,fs = phon.prep_audio(chans[1],fs, target_fs=16000)
print(f'the new sampling rate is: {fs}')