WaveFile Gem

Wave File Format

The Wave file format has changed over time and is defined in several documents:

These documents aren’t solely focused on Wave files, and also describe other file formats. This article attempts to consolidate the relevant information into a single document.

Getting Started

If you’re new to audio programming, you might want to read up on some of the basics of digital audio first. Check out this blog post for an introduction.

Wave Files Store Audio Data

Wave files are a container format that allows storing audio data in many formats. The most common sample format is integer PCM. This is raw, uncompressed sample data where each sample is an integer. (PCM stands for pulse code modulation). Similarly, PCM data can be defined using a floating point value for each sample, although this is technically considered a different audio format.

There are many other audio formats officially defined. Many are seemingly obsolete and unlikely to be encounted in the wild.

Currently, the WaveFile gem supports these sample formats:

Wave Files are RIFF Files

Back in the late 80s Electronic Arts came up with a general container file format that could be used to store different types of data – audio, graphics, etc. It was called IFF, for Interchange File Format. Microsoft then took this format, switched the byte order to little-endian to match Intel processors, and dubbed it RIFF (Resource Interchange File Format). Many of the venerable Microsoft multimedia file formats are stored as RIFF files, including *.rtf (“rich text format”, a WYSIWYG text format), *.avi (a basically obsolete movie format), and of course, *.wav.

As mentioned above, all multi-byte numbers in a RIFF file are stored as little-endian. (Some non-numeric data is stored as a sequence of bytes, in which endianness isn’t relevant per se).

RIFF Files Contain “Chunks”

An IFF file, and therefore a RIFF file, is broken up into several “chunks” of data. Each chunk has an 8-byte header containing a 4-byte identifier code, and a 4-byte size field.

The identifier code, called a FourCC, is a sequence of 4 bytes. When each byte is interpreted as an ASCII character, they typically form a human readable string. For example, 0x52 0x49 0x46 0x46 (i.e. "RIFF"), or 0x64 0x61 0x74 0x61 (i.e. "data"). Since this is a raw sequence of bytes, the characters are case-sensitive.

The size field indicates the size of the chunk’s body in bytes. The size does not include the 8-byte header. I.e., if a chunk consists of the 8-byte header plus 1,000 bytes of data, the size field will indicate 1,000, not 1,008. Chunks can internally contain nested sub-chunks, if the spec for that chunk allows it.

Important!If a chunk body is an odd number of bytes, it must be followed by a padding byte with value 0. In other words, a chunk must always occupy an even number of bytes in the file. The padding byte is not counted in the chunk size field. For example, if a chunk body is 17 bytes in size, the size ID field will be set to 17, but the actual chunk body will occupy 18 bytes (17 bytes of data followed by the padding byte).

High Level Wave File Structure

At top level, a Wave file consists of a single "RIFF" chunk, which contains all of the data for the wave file. The RIFF chunk body starts with a format code "WAVE" which indicates that the child chunks are for a Wave file. (As opposed to a rich text file, bitmap, etc). This is followed by the child chunks. A Wave file is required to contain at minimum a format chunk and a data chunk (described below), and the format chunk must come before the data chunk. If the format code in the format chunk is not 1 (see below), then it must also contain a "fact" chunk. It can also contain other optional chunks.

For example a typical file might look like this:

RIFF Chunk ID ("RIFF") RIFF Chunk Body Size Format Code: "WAVE"
Format Chunk ID ("fmt ") Format Chunk Body Size Chunk Body

Data Chunk ID ("data") Data Chunk Body Size Chunk Body

Important!Other than the format chunk coming before the data chunk, there isn’t any requirement that the chunks come in any particular order. You shouldn’t assume that the data chunk is the last chunk. (Although in practice, it often is).

The RIFF Chunk

Like all chunks, the RIFF chunk starts with an FourCC ID code. In this case, it is "RIFF". Next is the size field, which is the size of the entire Wave file except for the 8-byte RIFF chunk header.

The first 4 bytes following the header will identify the type of RIFF chunk. In the case of a Wave file, it will be "WAVE". Immediately following that will be the inner Wave file chunks.

Field Bytes Description
Chunk ID 4 0x52 0x49 0x46 0x46 (i.e. "RIFF")
Chunk Body Size 4 32-bit unsigned integer
RIFF Format Code 4 0x57 0x41 0x56 0x45 (i.e. "WAVE")
Sub Chunks Variable Variable

The Format Chunk

The format chunk describes the format that the samples in the data chunk are encoded in. The exact structure of the format chunk depends on the value of the format code field. If the format code is 1 (integer PCM), then the format chunk will only contain the fields above the dashed line in the diagram below. If it’s not 1, the chunk will also contain the fields after the dashed line.

Field Bytes Description
Chunk ID 4 0x66 0x6d 0x74 0x20 (i.e. "fmt ")
Chunk Body Size 4 32-bit unsigned integer
Format Code 2 16-bit unsigned integer
Number of Channels 2 16-bit unsigned integer
Samples per second 4 32-bit unsigned integer
Bytes per Second
(a.k.a byte rate)
4 32-bit unsigned integer
Bytes per Sample Frame
(a.k.a block align)
2 16-bit unsigned integer
Bits per sample 2 16-bit unsigned integer
These fields are only present if format code is not 1:
Extension Size 2 16-bit unsigned integer
Extra fields Variable It depends on the format code

The reason for the different types of extension is that the Wave format is a container for many different kinds of sample formats, and because the Wave format has evolved over time to support new formats. Extra fields that are needed for one sample format might not be needed for another sample format. This also allows new fields to be added without having to change pre-existing Wave files.

While some of these fields have a large range of possible values, in practice there are only a few that will actually be used. For some background on what some of this terminology means, check out this blog post.

Format Code – Indicates how the sample data for the wave file is stored. The most common format is PCM, which has a code of 1. Other formats include IEEE floating point (3), ADPCM (2), μ-law (7), and WaveFormatExtensible (65534).

Number of channels – Typically a file will have 1 channel (mono) or 2 channels (stereo). A 5.1 surround sound file will have 6 channels.

Sample rate – The number of sample frames that occur each second. A typical value would be 44,100, which is the same as an audio CD.

Bytes per second (a.k.a. byte rate) – The spec calls this byte rate, which means the number of bytes required for one second of audio data. This is equal to the bytes per sample frame times the sample rate. So with a bytes per sample frame of 32, and a sample rate of 44,100, this should equal 1,411,200.

Bytes per sample frame (a.k.a. block align) – Called block align by the spec, this is the number of bytes required to store a single sample frame, i.e. a single sample for each channel. (Sometimes a sample frame is also referred to as a block). It should be equal to the number of channels times the bits per sample rounded up to a multiple of 8. For example:

Channels Bits Per Sample Bytes per sample frame
1 8 8
2 8 16
1 16 16
2 16 32
6 32 192

This field can be used to calculate the bytes per second field. Another possible use is for seeking around in a file. For example, if the bytes per sample frame is 32, then to seek forward 10 sample frames you need to seek forward 320 bytes.

For PCM data, this field is essentially redundant since it can be calculated from the other fields. However, be sure to note the point of rounding bits per sample values to the nearest multiple of 8.

Bits per sample – For PCM data, typical values will be 8, 16, or 32. If the sample format doesn’t require this field, it should be set to 0.

Extension Size – This field should only be present if the format code is not 1. This indicates the size of the extra fields in bytes. It does not include the bytes in this field itself. If the given sample format has no extra fields, then this field should be set to 0.

Extra Fields – It depends on the format code! The next sections describe the extra fields for a few audio formats.

Extra Format Fields for Floating Point

If the format code is 3, then the sample data is stored as PCM using floating point numbers. There are no extra fields for this format, so the extension size field should be set to 0.

Field Bytes Description

Other fields in format chunk
Extension Size 2 16-bit unsigned integer (value 0)

Extra Format Fields for EXTENSIBLE format

If the format code is 65534, then the format is called “WAVE_FORMAT_EXTENSIBLE”. This comes from the name of a data structure given to this format in the Windows API. The extensible format is a container format (within *.wav, which is itself a container format). It exists to work around some ambiguities in the original Wave file format without having to break compatibility with pre-existing files.

When the format is WAVE_FORMAT_EXTENSIBLE, the extension size in the format chunk should be 22, and the following three fields should be included:

Field Bytes Description

Other fields in format chunk
Extension Size 2 16-bit unsigned integer (value 22)
Valid Bits Per Sample 2 16-bit unsigned integer
Channel Mask 4 32-bit unsigned integer
Sub Format 16 16-byte GUID

Valid Bits Per Sample – Allows storing samples with bit-depths that are not a byte multiple in size. For example, to store 12-bit samples, this value can be set to 12, and the bits-per-sample field in the format chunk set to 16. Each sample will still take up 16 bytes on disk, but the reader can be informed that only the lower 12 bits should be used.

Channel Mask – Indicates which audio channels map to which speakers.

Sub Format – Identifies the format of the sample data in the data chunk. This is a replacement for the original format code field, since it will have a format code of 65534. Some GUIDs include:

The Fact Chunk

Field Bytes Description
Chunk ID 4 0x66 0x61 0x63 0x74 (i.e. "fact")
Chunk Body Size 4 32-bit unsigned integer
Number of sample frames 4 32-bit unsigned integer

The fact chunk indicates how many sample frames are in the file. It’s required if the format code in the format chunk is not 1. If the format code is 1, it’s optional.

The reason for this chunk is that with some sample formats the number of sample frames can’t be determined by obvious means (e.g. because they store data in a compressed format). This gives a way of determining e.g. the playing time for the file, without having to decode the entire data chunk.

The number of samples frame is per channel. For example, if a stereo file contains 1,000 samples for both channels, the value of this field should be 1,000, not 2,000.

It’s not needed for integer PCM data (format code 1), because the total number of sample frames can be derived by dividing the bytes-per-sample-frame field from the format chunk by the total bytes in the data chunk body. For example, if the data chunk body is 1,411,200 bytes, and bytes-per-sample-frame is 32 (e.g. two 16-bit channels), then the total number of sample frames is 1,411,200 ÷ 32 = 44,100.

The Data Chunk

Field Bytes Description
Chunk ID 4 0x64 0x61 0x74 0x61 (i.e. "data")
Chunk Body Size 4 32-bit unsigned integer
Sample Data Various It depends on the format code

The layout for the data chunk is simpler than the format chunk: the normal 8-byte chunk header, followed by nothing but raw, unfiltered sample data. The sample data can be stored in a number of formats, which will be indicated by the format chunk.

The next several sections describe various formats that data in the data chunk can be stored as.

PCM Data Chunk

The simplest, and most common, is to store PCM samples (format code 1). This is just raw sample data stored as integers. The bits per sample field will indicate the range of the sample data:

Bits per sample Minimum Sample Maximum Sample
8 0 255
16 -32,768 32,767
24 -8,388,608 8,388,607
32 -2,147,483,648 2,147,483,647

Important!Notice that 8-bit samples are unsigned, while larger bit depths are signed.

Samples in a multi-channel PCM wave file are interleaved. That is, in a stereo file, one sample for the left channel will be followed by one sample for the right channel, followed by another sample for the left channel, then right channel, and so forth.

The samples for all channels at a moment in time are called a sample frame (also called a block). That is, a sample frame will contain one sample for each channel. In a monophonic file, a sample frame will consist on 1 sample. In a stereo file, a sample frame has 2 samples (one for the left channel, one for the right channel). In a 5-channel file, a sample frame has 5 samples. The block align field in the format chunk gives the size in bytes of each sample frame. This can be useful when seeking to a particular sample frame in the file.

For example, for a 2 channel file with 16-bit PCM samples, the sample data would look like this:

Sample Frame 1
Left Channel
Byte 1 Byte 2
Right Channel
Byte 1 Byte 2
Sample Frame 2
Left Channel
Byte 1 Byte 2
Right Channel
Byte 1 Byte 2

Floating Point Data Chunk

Another basic format is to store samples as floating point values (format code 3). This is essentially the same as PCM format, except that samples are in the range -1.0 to 1.0. The bits per sample field for floating point files should be set to 32 or 64.


Since WAVE_FORMAT_EXTENSIBLE is a container format, the format code of 65534 doesn’t imply any particular sample format. The sample format is instead determined by the sub format GUID in the format chunk. For example, if the sub format GUID is the GUID for PCM, then the samples are in the same format as if the format code was 1.


Documents from Microsoft defining the initial file format, and changes over time:

Other links: