`AudioClip` - for working with audio data¶

Overview¶

AudioClip(samples[, sampleRateHz, userData])

Class for storing audio clip data.

Details¶

class psychopy.sound.AudioClip(samples, sampleRateHz=48000, userData=None)[source]¶

Class for storing audio clip data.

This class is used to store and handle raw audio data, such as those obtained from microphone recordings or loaded from files. PsychoPy stores audio samples in contiguous arrays of 32-bit floating-point values ranging between -1 and 1.

The AudioClip class provides basic audio editing capabilities too. You can use operators on AudioClip instances to combine audio clips together. For instance, the + operator will return a new AudioClip instance whose samples are the concatenation of the two operands:

sndCombined = sndClip1 + sndClip2

Note that audio clips must have the same sample rates in order to be joined using the addition operator. For online compatibility, use the append() method instead.

There are also numerous static methods available to generate various tones (e.g., sine-, saw-, and square-waves). Audio samples can also be loaded and saved to files in various formats (e.g., WAV, FLAC, OGG, etc.)

You can play AudioClip by directly passing instances of this object to the Sound class:

import psychopy.core as core
import psychopy.sound as sound

myTone = AudioClip.sine(duration=5.0)  # generate a tone

mySound = sound.Sound(myTone)
mySound.play()
core.wait(5.0)  # wait for sound to finish playing
core.quit()

Parameters:

samples (ArrayLike) – Nx1 or Nx2 array of audio samples for mono and stereo, respectively. Values in the array representing the amplitude of the sound waveform should vary between -1 and 1. If not, they will be clipped.
sampleRateHz (int) – Sampling rate used to obtain samples in Hertz (Hz). The sample rate or frequency is related to the quality of the audio, where higher sample rates usually result in better sounding audio (albeit a larger memory footprint and file size). The value specified should match the frequency the clip was recorded at. If not, the audio may sound distorted when played back. Usually, a sample rate of 48kHz is acceptable for most applications (DVD audio quality). For convenience, module level constants with form SAMPLE_RATE_* are provided to specify many common samples rates.
userData (dict or None) – Optional user data to associated with the audio clip.

static _checkCodecSupported(codec, raiseError=False)[source]¶

Check if the audio format string corresponds to a supported codec. Used internally to check if the user specified a valid codec identifier.

Parameters:

codec (str) – Codec identifier (e.g., ‘wav’, ‘mp3’, etc.)
raiseError (bool) – Raise an error (``) instead of returning a value. Default is False.

Returns:

True if the format is supported.

Return type:

bool

append(clip)[source]¶

Append samples from another sound clip to the end of this one.

The AudioClip object must have the same sample rate and channels as this object.

Parameters:: clip (AudioClip) – Audio clip to append.
Returns:: This object with samples from clip appended.
Return type:: AudioClip

Examples

Join two sound clips together:

snd1.append(snd2)

asMono(copy=True)[source]¶

Convert the audio clip to mono (single channel audio).

Parameters:: copy (bool) – If True an AudioClip containing a copy of the samples will be returned. If False, channels will be mixed inplace resulting in the same object being returned. User data is not copied.
Returns:: Mono version of this object.
Return type:: AudioClip

asStereo(copy=True)[source]¶

Convert the audio clip to stereo (two channel audio).

Parameters:: copy (bool) – If True an AudioClip containing a copy of the samples will be returned. If False, channels will be mixed inplace resulting in the same object being returned. User data is not copied.
Returns:: Stereo version of this object.
Return type:: AudioClip

property channels¶

Number of audio channels in the clip (int).

If channels > 1, the audio clip is in stereo.

convertToWAV()[source]¶

Get a copy of stored audio samples in WAV PCM format.

Returns:: Array with the same shapes as .samples but in 16-bit WAV PCM format.
Return type:: ndarray

copy()[source]¶

Create an independent copy of this AudioClip.

Return type:: AudioClip

property duration¶

The duration of the audio in seconds (float).

This value is computed using the specified sampling frequency and number of samples.

gain(factor, channel=None)[source]¶

Apply gain the audio samples.

This will modify the internal store of samples inplace. Clipping is automatically applied to samples after applying gain.

Parameters:

factor (float or int) – Gain factor to multiply audio samples.
channel (int or None) – Channel to apply gain to. If None, gain will be applied to all channels.

property isMono¶: True if there is only one channel of audio data.

property isStereo¶

True if there are two channels of audio samples.

Usually one for each ear. The first channel is usually the left ear, and the second the right.

static load(filename, codec=None)[source]¶

Load audio samples from a file. Note that this is a static method!

Parameters:

filename (str) – File name to load.
codec (str or None) – Codec to use. If None, the format will be implied from the file name.

Returns:

Audio clip containing samples loaded from the file.

Return type:

AudioClip

resample(targetSampleRateHz, resampleType='default', equalEnergy=False, copy=False)[source]¶

Resample audio to another sample rate.

This method will resample the audio clip to a new sample rate. The method used for resampling can be specified using the method parameter.

Parameters:

targetSampleRateHz (int) – New sample rate.
resampleType (str) – Fitler (or method) to use for resampling. The methods available depend on the packages installed. The ‘default’ method uses scipy.signal.resample to resample the audio. Other methods require the user to install librosa or resampy. Default is ‘default’.
equalEnergy (bool) – Make the output have similar energy to the input. Option not available for the ‘default’ method. Default is False.
copy (bool) – Return a copy of the resampled audio clip at the new sample rate. If False, the audio clip will be resampled inplace. Default is False.

Returns:

Resampled audio clip.

Return type:

AudioClip

Notes

Resampling audio clip may result in distortion which is exacerbated by successive resampling.
When using librosa for resampling, the fix parameter is set to False.
The resampling types ‘linear’, ‘zero_order_hold’, ‘sinc_best’, ‘sinc_medium’ and ‘sinc_fastest’ require the samplerate package to be installed in addition to librosa.
Specifying either the ‘fft’ or ‘scipy’ method will use the same resampling method as the ‘default’ method, howwever it will allow for the equalEnergy option to be used.

Examples

Resample an audio clip to 44.1kHz:

snd.resample(44100)

Use the ‘soxr_vhq’ method for resampling:

snd.resample(44100, resampleType='soxr_vhq')

Create a copy of the audio clip resampled to 44.1kHz:

sndResampled = snd.resample(44100, copy=True)

Resample the audio clip to be playable on a certain device:

import psychopy.sound as sound
from psychopy.sound.audioclip import AudioClip

audioClip = sound.AudioClip.load('/path/to/audio.wav')

deviceSampleRateHz = sound.Sound().sampleRate
audioClip.resample(deviceSampleRateHz)

rms(channel=None)[source]¶

Compute the root mean square (RMS) of the samples to determine the average signal level.

Parameters:: channel (int or None) – Channel to compute RMS (zero-indexed). If None, the RMS of all channels will be computed.
Returns:: An array of RMS values for each channel if channel=None (even if there is one channel an array is returned). If channel was specified, a float will be returned indicating the RMS of that single channel.
Return type:: ndarray or float

property sampleRateHz¶: Sample rate of the audio clip in Hz (int). Should be the same value as the rate samples was captured at.

property samples¶

Nx1 or Nx2 array of audio samples (~numpy.ndarray).

Values must range from -1 to 1. Values outside that range will be clipped, possibly resulting in distortion.

save(filename, codec=None)[source]¶

Save an audio clip to file.

Parameters:

filename (str) – File name to write audio clip to.
codec (str or None) – Format to save audio clip data as. If None, the format will be implied from the extension at the end of filename.

static sawtooth(duration=1.0, freqHz=440, peak=1.0, gain=0.8, sampleRateHz=48000, channels=2)[source]¶

Generate audio samples for a tone with a sawtooth waveform.

Parameters:

duration (float or int) – Length of the sound in seconds.
freqHz (float or int) – Frequency of the tone in Hertz (Hz). Note that this differs from the sampleRateHz.
peak (float) – Location of the peak between 0.0 and 1.0. If the peak is at 0.5, the resulting wave will be triangular. A value of 1.0 will cause the peak to be located at the very end of a cycle.
gain (float) – Gain factor ranging between 0.0 and 1.0. Default is 0.8.
sampleRateHz (int) – Samples rate of the audio for playback.
channels (int) – Number of channels for the output.

Return type:

AudioClip

static silence(duration=1.0, sampleRateHz=48000, channels=2)[source]¶

Generate audio samples for a silent period.

This is used to create silent periods of a very specific duration between other audio clips.

Parameters:

duration (float or int) – Length of the sound in seconds.
sampleRateHz (int) – Samples rate of the audio for playback.
channels (int) – Number of channels for the output.

Return type:

AudioClip

Examples

Generate 10 seconds of silence to enjoy:

import psychopy.sound as sound
silence = sound.AudioClip.silence(10.)

Use the silence as a break between two audio clips when concatenating them:

fullClip = clip1 + sound.AudioClip.silence(10.) + clip2

static sine(duration=1.0, freqHz=440, gain=0.8, sampleRateHz=48000, channels=2)[source]¶

Generate audio samples for a tone with a sine waveform.

Parameters:

duration (float or int) – Length of the sound in seconds.
freqHz (float or int) – Frequency of the tone in Hertz (Hz). Note that this differs from the sampleRateHz.
gain (float) – Gain factor ranging between 0.0 and 1.0. Default is 0.8.
sampleRateHz (int) – Samples rate of the audio for playback.
channels (int) – Number of channels for the output.

Return type:

AudioClip

Examples

Generate an audio clip of a tone 10 seconds long with a frequency of 400Hz:

import psychopy.sound as sound
tone400Hz = sound.AudioClip.sine(10., 400.)

Create a marker/cue tone and append it to pre-recorded instructions:

import psychopy.sound as sound
voiceInstr = sound.AudioClip.load('/path/to/instructions.wav')
markerTone = sound.AudioClip.sine(
    1.0, 440.,  # duration and freq
    sampleRateHz=voiceInstr.sampleRateHz)  # must be the same!

fullInstr = voiceInstr + markerTone  # create instructions with cue
fullInstr.save('/path/to/instructions_with_tone.wav')  # save it

static square(duration=1.0, freqHz=440, dutyCycle=0.5, gain=0.8, sampleRateHz=48000, channels=2)[source]¶

Generate audio samples for a tone with a square waveform.

Parameters:

duration (float or int) – Length of the sound in seconds.
freqHz (float or int) – Frequency of the tone in Hertz (Hz). Note that this differs from the sampleRateHz.
dutyCycle (float) – Duty cycle between 0.0 and 1.0.
gain (float) – Gain factor ranging between 0.0 and 1.0. Default is 0.8.
sampleRateHz (int) – Samples rate of the audio for playback.
channels (int) – Number of channels for the output.

Return type:

AudioClip

static synthesizeSpeech(text, engine='gtts', synthConfig=None, outFile=None)[source]¶

Synthesize speech from text using a text-to-speech (TTS) engine.

This method is used to generate audio samples from text using a text-to-speech (TTS) engine. The synthesized speech can be used for various purposes, such as generating audio cues for experiments or creating audio instructions for participants.

This method returns an AudioClip object containing the synthesized speech. The quality and format of the retured audio may vary depending on the TTS engine used.

Please note that online TTS engines may require an active internet connection to work. This also may send the text to a remote server for processing, so be mindful of privacy concerns.

Parameters:

text (str) – Text to synthesize into speech.
engine (str) – TTS engine to use for speech synthesis. Default is ‘gtts’.
synthConfig (dict or None) – Additional configuration options for the specified engine. These are specified using a dictionary (ex. synthConfig={‘slow’: False}). These paramters vary depending on the engine in use. Default is None which uses the default configuration for the engine.
outFile (str or None) – File name to save the synthesized speech to. This can be used to save the audio to a file for later use. If None, the audio clip will be returned in memory. If you plan on using the same audio clip multiple times, it is recommended to save it to a file and load it later.

Returns:

Audio clip containing the synthesized speech.

Return type:

AudioClip

Examples

Synthesize speech using the default gTTS engine:

import psychopy.sound as sound
voiceClip = sound.AudioClip.synthesizeSpeech(
    'How are you doing today?')

Save the synthesized speech to a file for later use:

voiceClip = sound.AudioClip.synthesizeSpeech(
    'How are you doing today?', outFile='/path/to/speech.mp3')

Synthesize speech using the gTTS engine with a specific language, timeout, and top-level domain:

voiceClip = sound.AudioClip.synthesizeSpeech(
    'How are you doing today?',
    engine='gtts',
    synthConfig={'lang': 'en', 'timeout': 10, 'tld': 'us'})

transcribe(engine='whisper', language='en-US', expectedWords=None, config=None)[source]¶

Convert speech in audio to text.

This function accepts an audio clip and returns a transcription of the speech in the clip. The efficacy of the transcription depends on the engine selected, audio quality, and language support.

Speech-to-text conversion blocks the main application thread when used on Python. Don’t transcribe audio during time-sensitive parts of your experiment! Instead, initialize the transcriber before the experiment begins by calling this function with audioClip=None.

Parameters:

engine (str) – Speech-to-text engine to use.
language (str) – BCP-47 language code (eg., ‘en-US’). Note that supported languages vary between transcription engines.
expectedWords (list or tuple) – List of strings representing expected words or phrases. This will constrain the possible output words to the ones specified which constrains the model for better accuracy. Note not all engines support this feature (only Sphinx and Google Cloud do at this time). A warning will be logged if the engine selected does not support this feature. CMU PocketSphinx has an additional feature where the sensitivity can be specified for each expected word. You can indicate the sensitivity level to use by putting a : after each word in the list (see the Example below). Sensitivity levels range between 0 and 100. A higher number results in the engine being more conservative, resulting in a higher likelihood of false rejections. The default sensitivity is 80% for words/phrases without one specified.
config (dict or None) – Additional configuration options for the specified engine. These are specified using a dictionary (ex. config={‘pfilter’: 1} will enable the profanity filter when using the ‘google’ engine).

Returns:

Transcription result.

Return type:

TranscriptionResult

Notes

The recommended transcriber is OpenAI Whisper which can be used locally without an internet connection once a model is downloaded to cache. It can be selected by passing engine=’whisper’ to this function.
Online transcription services (eg., Google) provide robust and accurate speech recognition capabilities with broader language support than offline solutions. However, these services may require a paid subscription to use, reliable broadband internet connections, and may not respect the privacy of your participants as their responses are being sent to a third-party. Also consider that a track of audio data being sent over the network can be large, users on metered connections may incur additional costs to run your experiment. Offline transcription services (eg., CMU PocketSphinx and OpenAI Whisper) do not require an internet connection after the model has been downloaded and installed.
If the audio clip has multiple channels, they will be combined prior to being passed to the transcription service if needed.

property userData¶

User data associated with this clip (dict). Can be used for storing additional data related to the clip. Note that userData is not saved with audio files!

Example

Adding fields to userData. For instance, we want to associated the start time the clip was recorded at with it:

myClip.userData['date_recorded'] = t_start

We can access that field later by:

thisRecordingStartTime = myClip.userData['date_recorded']

static whiteNoise(duration=1.0, sampleRateHz=48000, channels=2)[source]¶

Generate gaussian white noise.

New feature, use with caution.

Parameters:

duration (float or int) – Length of the sound in seconds.
sampleRateHz (int) – Samples rate of the audio for playback.
channels (int) – Number of channels for the output.

Return type:

AudioClip

AudioClip - for working with audio data¶

Overview¶

Details¶

`AudioClip` - for working with audio data¶