First, install the WaveFile gem.
Next, if you’re new to audio programming, you might want to read up on some of the basics of digital audio first. Check out this blog post for an introduction.
The WaveFile gem lets you both read and write wave files. Reading is done using the
Reader class, and writing is done using the
Buffer class represents a collection of samples in a given sample format (e.g. stereo 16-bit PCM samples at a 44,100Hz sample rate). When samples are read using
Reader they are returned in
Buffer instances. Samples to be written are given to
Writer wrapped in
Buffer instances as well.
Buffer consists of two parts: an array of samples, and a
Format instance that describes the sample format (since it might not be possible to determine just by looking at the raw samples). For example, the sample array in a
Buffer read out of a mono 8-bit PCM file (in which each sample is an integer between 0 and 255) might look like this:
When there is more than one channel, each sample frame will be represented by a sub array. For example, a set of stereo floating point samples (in which each sample is between -1.0 and 1.0) might look like this:
When writing a program that creates sound, you would generate an array like this with the sample data, then wrap it in a
Buffer, and then use
Writer to write the samples in the
Buffer to disk.
Buffer has the ability to convert its samples to any other format this gem supports. This means you can read samples from a file in whatever format you like, regardless of the actual sample format in the file (e.g. read a file with 8-bit samples and get 16-bit samples back). You can also do the same with
Writer – for example, rather than remember the sample range of 16-bit integer PCM format (was it 32,767? or 32,768?) you can instead generate floating point samples between -1.0 and 1.0, and transparently write them out as 16-bit integer PCM samples.
Let’s write a simple tone to a wave file. A square wave is one of the simplest ways to create sound, so let’s do that. Our square wave will consist of a sample repeated a certain number of times, followed by the same number of repeated samples at the opposite amplitude. For example:
We’ll write some code to generate these samples, wrap them in a
Buffer, and then write this
Buffer to a file using
The samples we’ll generate will be in
:float format, which means they should be between -1.0 and 1.0. The larger each sample value, the higher the amplitude of the square wave (i.e. how loud it is). The faster that we alternate between the positive and negative samples, the higher the frequency (i.e. pitch). For example,
[0.2, 0.2, -0.2, -0.2] will have a higher pitch than
[0.2, 0.2, 0.2, -0.2, -0.2, -0.2]).
When the sample rate is 44,100Hz, 50 positive samples followed by 50 negative samples will produce a frequency of 441Hz, very close to middle A on a piano (440Hz). Let’s use that and generate our array of samples:
Next, let’s wrap the samples in a
Notice that we used the
Format class to identify the sample format. The
Format constructor takes 3 arguments: the number of channels, the format of each sample, and the sample rate. You’re on the honor system to use the correct format here, weird stuff could happen if you use a format that doesn’t match what’s in the samples array.
Now let’s write the buffer to a file called
"square.wav" in the current working directory:
Notice that we gave the
Format as well. This determines which format samples will be written as. Notice that the sample format (
:pcm_16) is different from the
Buffer we created – the gem will handle the necessary translation behind the scenes.
All of the code to write the samples is done inside a block. When the block exits the file will automatically be closed. (If you want more manually control over when the file is closed you can do that as well by not passing a block and manually calling
Here’s the full program so far:
When you run this program it should create a file called
"square.wav" in the current working directory. If you play this file (for example on a Mac using
afplay square.wav from the command line) it should sound like this:
…which… doesn’t sound like anything! The reason is that we didn’t generate enough samples. At the sample rate we’re using, 44,100Hz, you’ll need 44,100 samples for 1 second of sound. We only generated 100 samples, or about 1/441th of a second. No problem, we can easily fix this by repeating our cycle more times:
Now when you re-run the program and play
"square.wav" it should sound like this.
You are well on your way to writing an epic NES soundtrack!
Here’s the full program:
Let’s now read the file we just wrote. We can use the
Reader class for that.
After constructing the
Reader we call the
each_buffer method. This method is useful when you want to read an entire file. It reads successive buffers of a given size, and passes each to the given block. When all buffers have been read, the file is automatically closed. (You can also manually control what to read and when to close the file, see the examples page for more info).
When you run the program, it should print out repeated output similar to the following:
Buffer number of channels: 1 Buffer bits per sample: 16 Number of samples in buffer: 4096 First 10 samples in buffer: [8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192]
Notice how the first samples are
8192, rather than the
0.25 value that we used when generating the file. This is because when we saved that file, we indicated (via the
Format instance we gave) that the samples should be written as 16-bit PCM samples instead of floating point.
Also notice that we only read 4,096 samples at a time, instead of trying to the read the whole file. It’s a generally a good idea to read a larger number of smaller buffers, rather than one giant buffer. For this file it probably doesn’t matter, but longer files can have millions of samples, and Ruby can have trouble with arrays this large.
OK, well that’s cool, but let’s say we want to read this file so we can do some transformation on it, and it will be easier to work with if the samples are in floating point format, and are stereo (since we want to combine it with some other files that are stereo). No problem, when constructing the
Reader we can pass a
Format instance that describes our desired sample format.
Now when you run this program, the output should look like below. Notice that we got our original
0.25 sample value back, and that each sample frame now consists of 2 channels:
Buffer number of channels: 2 Buffer bits per sample: 32 Number of samples in buffer: 4096 First 10 samples in buffer: [[0.25, 0.25], [0.25, 0.25], [0.25, 0.25], [0.25, 0.25], [0.25, 0.25], [0.25, 0.25], [0.25, 0.25], [0.25, 0.25], [0.25, 0.25], [0.25, 0.25]]