First, install the WaveFile gem.
Next, if you’re new to audio programming, you might want to read up on some of the basics of digital audio first. Check out this blog post for an introduction.
The WaveFile gem lets you both read and write wave files. Reading is done using the Reader
class, and writing is done using the Writer
class.
The Buffer
class represents a collection of samples in a given sample format (e.g. stereo 16-bit PCM samples at a 44,100Hz sample rate). When samples are read using Reader
they are returned in a Buffer
instance. Samples to be written are given to Writer
wrapped in a Buffer
instance as well.
A Buffer
consists of two parts: an array of samples, and a Format
instance that describes the sample format (since it’s not necessarily possible to determine just by looking at the raw samples). For example, the sample array in a Buffer
read out of a mono 8-bit PCM file (in which each sample is an integer between 0 and 255) might look like this:
When there is more than one channel, each sample frame will be represented by a sub array. For example, a set of stereo floating point samples (in which each sample is between -1.0 and 1.0) might look like this:
To write a program that creates sound, first generate an array like this containing sample data, then wrap the array in a Buffer
, and finally use Writer
to write the samples contained in the Buffer
to disk.
A Buffer
has the ability to convert its samples to any other format this gem supports. This means you can read samples from a file in whatever format you like, regardless of the actual sample format in the file (e.g. read a file with 8-bit samples and get 16-bit samples back). You can also do the same with Writer
– for example, rather than remember the sample range of 16-bit integer PCM format (is the maximum sample value 32,767? or 32,768?) you can instead generate floating point samples between -1.0 and 1.0, and transparently write them out as 16-bit integer PCM samples.
Let’s write a simple tone to a wave file. A square wave is one of the simplest ways to create sound, so let’s do that. Our square wave will consist of a sample repeated a certain number of times, followed by the same number of repeated samples at the opposite amplitude. For example:
We’ll write some code to generate these samples, wrap them in a Buffer
, and then write this Buffer
to a file using Writer
.
The samples we’ll generate will be between -1.0 and 1.0. The larger each sample value, the higher the amplitude of the square wave (i.e. how loud it is). The faster that we alternate between the positive and negative samples, the higher the frequency (i.e. pitch). For example, [0.2, 0.2, -0.2, -0.2]
will have a higher pitch than [0.2, 0.2, 0.2, -0.2, -0.2, -0.2]
).
When the sample rate is 44,100Hz, 50 positive samples followed by 50 negative samples will produce a frequency of 441Hz, very close to middle A on a piano (440Hz). Let’s use that and generate our array of samples:
Next, let’s wrap the samples in a Buffer
:
Notice that we used the Format
class to describe our samples. The Format
constructor takes 3 arguments: the number of channels, the format of each sample, and the sample rate. In our case, since our samples are floating point values between -1.0 and 1.0, we should use :float
for the sample format. There is no validation that the Format
instance matches the actual samples. If it doesn’t match you might get unexpected results.
Now let’s write the buffer to a file called "square.wav"
in the current working directory:
Notice that we gave the Writer
a Format
as well. This determines the format that will be used when writing the samples to the file. Notice that the :pcm_16
sample format is different from the :float
sample format in the Buffer
we created – the gem will handle the necessary translation behind the scenes.
All of the samples are written inside a block. When the block exits the file will automatically be closed. (You can have more manual control over when the file is closed by not passing a block, calling write()
as needed, and manually calling close()
when done).
Here’s the full program so far:
When you run this program it should create a file called "square.wav"
in the current working directory. If you play this file (for example on a Mac using afplay square.wav
from the command line) it should sound like this:
…which… doesn’t sound like anything! The reason is that we didn’t generate enough samples. At the sample rate we’re using, 44,100Hz, we’ll need 44,100 samples for 1 second of sound. We only generated 100 samples, or about 1/441th of a second. No problem, we can fix this by repeating our cycle more times:
Now when you re-run the program and play "square.wav"
it should sound like this.
You are well on your way to writing an epic NES soundtrack!
Here’s the full program:
Let’s now read the file we just wrote. We can use the Reader
class for that.
After constructing the Reader
we call the each_buffer
method. This method is useful when you want to read an entire file. It reads successive buffers of a given size, and passes each to the given block. When all buffers have been read, the file is automatically closed. (You can also manually control what to read and when to close the file, see the examples page for more info).
When you run the program, it should print out repeated output similar to the following:
Buffer number of channels: 1
Buffer bits per sample: 16
Number of samples in buffer: 4096
First 10 samples in buffer: [8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192]
Notice how the first samples in each buffer are 8192
, rather than the 0.25
value that we used when generating the file. This is because when we saved that file, we indicated (via the Format
instance we gave) that the samples should be written as 16-bit PCM samples instead of floating point.
Also notice that we only read 4,096 samples at a time, instead of trying to the read the whole file. It’s a generally a good idea to read a larger number of smaller buffers, rather than one giant buffer. For this file it probably doesn’t matter, but longer files can have millions of samples, and Ruby can have trouble with arrays this large.
OK, well that’s cool, but let’s say we want to read this file so we can do some transformation on it, and it will be easier to work with if the samples are in floating point format, and are stereo (since we want to combine it with some other files that are stereo). No problem, when constructing the Reader
we can pass a Format
instance that describes our desired sample format.
Now when you run this program, the output should look like below. Notice that we got our original 0.25
sample value back, and that each sample frame now consists of 2 channels:
Buffer number of channels: 2
Buffer bits per sample: 32
Number of samples in buffer: 4096
First 10 samples in buffer: [[0.25, 0.25],
[0.25, 0.25],
[0.25, 0.25],
[0.25, 0.25],
[0.25, 0.25],
[0.25, 0.25],
[0.25, 0.25],
[0.25, 0.25],
[0.25, 0.25],
[0.25, 0.25]]
Head over to the examples page to see more code examples, read full API documentation, and if you’re interested, learn how the Wave file format works behind the scenes.
View the source on GitHub
Copyright © Joel Strait 2009-23