WaveFile Gem - Tutorial

First Steps

Next, if you’re new to audio programming, you might want to read up on some of the basics of digital audio first. Check out this blog post for an introduction.

Basic Concepts

The WaveFile gem lets you both read and write wave files. Reading is done using the Reader class, and writing is done using the Writer class.

The Buffer class represents a collection of samples in a given sample format (e.g. stereo 16-bit PCM samples at a 44,100Hz sample rate). When samples are read using Reader they are returned in a Buffer instance. Samples to be written are given to Writer wrapped in a Buffer instance as well.

A Buffer consists of two parts: an array of samples, and a Format instance that describes the sample format (since it’s not necessarily possible to determine just by looking at the raw samples). For example, the sample array in a Buffer read out of a mono 8-bit PCM file (in which each sample is an integer between 0 and 255) might look like this:

[45, 192, 13, 231, 201, 101, 15, ...etc...]

When there is more than one channel, each sample frame will be represented by a sub array. For example, a set of stereo floating point samples (in which each sample is between -1.0 and 1.0) might look like this:

[[-0.2, 0.4], [0.3, 0.9], [-0.4, -0.8], [0.9, -0.2], [-0.3, 0.4], ...etc...]

To write a program that creates sound, first generate an array like this containing sample data, then wrap the array in a Buffer, and finally use Writer to write the samples contained in the Buffer to disk.

A Buffer has the ability to convert its samples to any other format this gem supports. This means you can read samples from a file in whatever format you like, regardless of the actual sample format in the file (e.g. read a file with 8-bit samples and get 16-bit samples back). You can also do the same with Writer – for example, rather than remember the sample range of 16-bit integer PCM format (is the maximum sample value 32,767? or 32,768?) you can instead generate floating point samples between -1.0 and 1.0, and transparently write them out as 16-bit integer PCM samples.

Writing a New Wave File

Let’s write a simple tone to a wave file. A square wave is one of the simplest ways to create sound, so let’s do that. Our square wave will consist of a sample repeated a certain number of times, followed by the same number of repeated samples at the opposite amplitude. For example:

[0.3, 0.3, 0.3, 0.3, -0.3, -0.3, -0.3, -0.3,  ...and repeated...]

We’ll write some code to generate these samples, wrap them in a Buffer, and then write this Buffer to a file using Writer.

The samples we’ll generate will be between -1.0 and 1.0. The larger each sample value, the higher the amplitude of the square wave (i.e. how loud it is). The faster that we alternate between the positive and negative samples, the higher the frequency (i.e. pitch). For example, [0.2, 0.2, -0.2, -0.2] will have a higher pitch than [0.2, 0.2, 0.2, -0.2, -0.2, -0.2]).

When the sample rate is 44,100Hz, 50 positive samples followed by 50 negative samples will produce a frequency of 441Hz, very close to middle A on a piano (440Hz). Let’s use that and generate our array of samples:

AMPLITUDE = 0.25
one_square_cycle = ([AMPLITUDE] * 50) + ([-AMPLITUDE] * 50)

Next, let’s wrap the samples in a Buffer:

buffer = Buffer.new(one_square_cycle, Format.new(:mono, :float, 44100))

Notice that we used the Format class to describe our samples. The Format constructor takes 3 arguments: the number of channels, the format of each sample, and the sample rate. In our case, since our samples are floating point values between -1.0 and 1.0, we should use :float for the sample format. There is no validation that the Format instance matches the actual samples. If it doesn’t match you might get unexpected results.

Now let’s write the buffer to a file called "square.wav" in the current working directory:

Writer.new("square.wav", Format.new(:mono, :pcm_16, 44100)) do |writer|
  writer.write(buffer)
end

Notice that we gave the Writer a Format as well. This determines the format that will be used when writing the samples to the file. Notice that the :pcm_16 sample format is different from the :float sample format in the Buffer we created – the gem will handle the necessary translation behind the scenes.

All of the samples are written inside a block. When the block exits the file will automatically be closed. (You can have more manual control over when the file is closed by not passing a block, calling write() as needed, and manually calling close() when done).

Here’s the full program so far:

require "wavefile"
include WaveFile   # So we don't have to prefix all classes with 'WaveFile::'

AMPLITUDE = 0.25
one_square_cycle = ([AMPLITUDE] * 50) + ([-AMPLITUDE] * 50)

buffer = Buffer.new(one_square_cycle, Format.new(:mono, :float, 44100))

Writer.new("square.wav", Format.new(:mono, :pcm_16, 44100)) do |writer|
  writer.write(buffer)
end

When you run this program it should create a file called "square.wav" in the current working directory. If you play this file (for example on a Mac using afplay square.wav from the command line) it should sound like this:

…which… doesn’t sound like anything! The reason is that we didn’t generate enough samples. At the sample rate we’re using, 44,100Hz, we’ll need 44,100 samples for 1 second of sound. We only generated 100 samples, or about 1/441th of a second. No problem, we can fix this by repeating our cycle more times:

CYCLE_COUNT = 441  # 441 x 100 samples == 44,100 samples, or 1 second of sound

Writer.new("square.wav", Format.new(:mono, :pcm_16, 44100)) do |writer|
  CYCLE_COUNT.times { writer.write(buffer) }
end

Now when you re-run the program and play "square.wav" it should sound like this.

You are well on your way to writing an epic NES soundtrack!

Here’s the full program:

require "wavefile"
include WaveFile   # So we don't have to prefix all classes with 'WaveFile::'

AMPLITUDE = 0.25
CYCLE_COUNT = 441  # 441 x 100 samples == 44,100 samples, or 1 second of sound
one_square_cycle = ([AMPLITUDE] * 50) + ([-AMPLITUDE] * 50)

buffer = Buffer.new(one_square_cycle, Format.new(:mono, :float, 44100))

Writer.new("square.wav", Format.new(:mono, :pcm_16, 44100)) do |writer|
  CYCLE_COUNT.times { writer.write(buffer) }
end

Reading a Wave File

Let’s now read the file we just wrote. We can use the Reader class for that.

require "wavefile"
include WaveFile   # So we don't have to prefix all classes with 'WaveFile::'

Reader.new("square.wav").each_buffer do |buffer|
  puts "Buffer number of channels:   #{buffer.channels}"
  puts "Buffer bits per sample:      #{buffer.bits_per_sample}"
  puts "Number of samples in buffer: #{buffer.samples.length}"
  puts "First 10 samples in buffer:  #{buffer.samples[0...10].inspect}"
  puts "--------------------------------------------------------------"
end

After constructing the Reader we call the each_buffer method. This method is useful when you want to read an entire file. It reads successive buffers of a given size, and passes each to the given block. When all buffers have been read, the file is automatically closed. (You can also manually control what to read and when to close the file, see the examples page for more info).

When you run the program, it should print out repeated output similar to the following:

Buffer number of channels:   1
Buffer bits per sample:      16
Number of samples in buffer: 4096
First 10 samples in buffer:  [8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192]

Notice how the first samples in each buffer are 8192, rather than the 0.25 value that we used when generating the file. This is because when we saved that file, we indicated (via the Format instance we gave) that the samples should be written as 16-bit PCM samples instead of floating point.

Also notice that we only read 4,096 samples at a time, instead of trying to the read the whole file. It’s a generally a good idea to read a larger number of smaller buffers, rather than one giant buffer. For this file it probably doesn’t matter, but longer files can have millions of samples, and Ruby can have trouble with arrays this large.

OK, well that’s cool, but let’s say we want to read this file so we can do some transformation on it, and it will be easier to work with if the samples are in floating point format, and are stereo (since we want to combine it with some other files that are stereo). No problem, when constructing the Reader we can pass a Format instance that describes our desired sample format.

require "wavefile"
include WaveFile   # So we don't have to prefix all classes with 'WaveFile::'

# Read the file's samples as floating point in 2 channels, regardless
# of how the samples are actually stored in the file.
reader = Reader.new("square.wav", Format.new(:stereo, :float, 44100))

reader.each_buffer do |buffer|
  puts "Buffer number of channels:   #{buffer.channels}"
  puts "Buffer bits per sample:      #{buffer.bits_per_sample}"
  puts "Number of samples in buffer: #{buffer.samples.length}"
  puts "First 10 samples in buffer:  #{buffer.samples[0...10].inspect}"
  puts "--------------------------------------------------------------"
end

Now when you run this program, the output should look like below. Notice that we got our original 0.25 sample value back, and that each sample frame now consists of 2 channels:

Buffer number of channels:   2
Buffer bits per sample:      32
Number of samples in buffer: 4096
First 10 samples in buffer:  [[0.25, 0.25],
                              [0.25, 0.25],
                              [0.25, 0.25],
                              [0.25, 0.25],
                              [0.25, 0.25],
                              [0.25, 0.25],
                              [0.25, 0.25],
                              [0.25, 0.25],
                              [0.25, 0.25],
                              [0.25, 0.25]]

Next Steps

Head over to the examples page to see more code examples, read full API documentation, and if you’re interested, learn how the Wave file format works behind the scenes.