Wave File Format

The Wave file format has changed over time and is defined in several documents:

These documents aren’t solely focused on Wave files, and also describe other file formats. This article attempts to consolidate the relevant information into a single document.

Getting Started

If you’re just getting started with audio programming, you might want to read up on some of the basics of digital audio first. Check out this blog post for an introduction.

Wave Files Store Audio Data

Wave files are a container format that allows storing audio data in many formats. The most common sample format is PCM, which stands for pulse code modulation. This is raw, uncompressed sample data where each sample is an integer. Similarly, PCM data can be defined using a floating point value for each sample, although this is techincally considered a different audio format.

There are many other audio formats officially defined. Many are seemingly obsolete and unlikely to be encounted in the wild.

Currently, the WaveFile gem supports these sample formats:

Wave Files are RIFF Files

Back in the late 80s Electronic Arts came up with a general container file format that could be used to store different types of data – audio, graphics, etc. It was called IFF, for Interchange File Format. Microsoft then took this format, switched the byte order to little-endian to match Intel processors, and dubbed it RIFF (Resource Interchange File Format). Many of the venerable Microsoft multimedia file formats are stored as RIFF files, including *.rtf (“rich text format”, a WYSIWYG text format), *.avi (a basically obsolete movie format), and of course, *.wav.

As mentioned above, all data in a RIFF file is stored as little-endian, owing to its Wintel heritage.

RIFF Files Contain “Chunks”

An IFF file, and therefore a RIFF file, is broken up into several “chunks” of data. Each chunk has an 8-byte header containing a 4-byte identifier code, and a 4-byte size field.

The identifier code, called a FourCC, is typically a more-or-less human-readable ASCII string. For example, "wave", "fmt ", or "data". This identifier is case-sensitive.

The size field indicates the size of the chunk in bytes. The size does not include the 8-bytes in the header. I.e., if a chunk consists of the header plus 1,000 bytes of data, the size field will indicate 1,000, not 1,008. Chunks can internally contain nested sub-chunks, if the spec for that chunk allows it.

Important!If a chunk body is an odd number of bytes, it must be followed by a padding byte with value 0. In other words, a chunk must always occupy an even number of bytes in the file. The padding byte is not counted in the chunk size field. For example, if a chunk body is 17 bytes in size, the size ID field will be set to 17, but the actual chunk body will occupy 18 bytes (17 bytes of data followed by the padding byte).

High Level Wave File Structure

At top level, a Wave file consists of a single "RIFF" chunk, which contains all of the data for the wave file. The RIFF chunk body starts with a format code "WAVE" which indicates that the sub-chunks are for a Wave file. (As opposed to a rich text file, bitmap, etc). This is followed by the sub chunks. A Wave file is required to contain at minimum a format chunk and a data chunk (described below), and the format chunk must come before the data chunk. If the format code in the format chunk is not 1 (see below), then it must also contain a "fact" chunk. It can also contain other optional chunks.

Visually this is what it looks like:

RIFF Chunk
Format: "WAVE"
Format Chunk ("fmt ")
other optional chunk
other optional chunk
Data Chunk ("data")

Important!Other than the format chunk coming before the data chunk, there isn’t any requirement that the chunks come in any particular order. You shouldn’t assume that the data chunk is the last chunk. (Although in practice, it usually is).

The RIFF Chunk

Like all chunks, the RIFF chunk starts with an ID code, in this case the ASCII string "RIFF". Next is the size field, which is the size of the entire Wave file except for the 8-byte RIFF header.

The first 4 bytes following the header will identify the type of RIFF chunk. In the case of Wave files, it will be "WAVE". Immediately following that will be the inner Wave file chunks.

Field Size in Bytes Description
Chunk ID 4 ASCII string "RIFF"
Chunk Size 4 32-bit unsigned integer
RIFF Format Code 4 ASCII string "WAVE"
Sub Chunks Variable Variable

The Format Chunk

The format chunk describes the format that the samples in the data chunk are encoded in. The exact structure of the format chunk depends on the value of the format code field. If the format code is 1 (PCM), then the format chunk will only contain the fields above the dashed line in the diagram below. If it's not 1, the chunk will also contain the fields after the dashed line.

The reason for these differences is that the Wave format is a container for many different kinds of sample formats, and because the Wave format has evolved over time to support new formats. Extra fields that are needed for one sample format might not be needed for another sample format. This also allows new fields to be added without having to change pre-existing Wave files.

Field Size in Bytes Description
Chunk ID 4 ASCII string "fmt " (note the space after ‘t’)
Chunk Size 4 32-bit unsigned integer
Format Code 2 16-bit unsigned integer
Number of Channels 2 16-bit unsigned integer
Samples per second 4 32-bit unsigned integer
Bytes per Second 4 32-bit unsigned integer
Bytes per Sample Frame
(a.k.a block align)
2 16-bit unsigned integer
Bits per sample 2 16-bit unsigned integer
These fields are only present if format code is not 1:
Extension Size 2 16-bit unsigned integer
Extra fields Variable It depends on the format code

While some of these fields have a large range of possible values, in practice there are only a few that will actually be used. For some background on what some of this terminology means, check out this blog post.

Format Code – Indicates how the sample data for the wave file is stored. The most common format is PCM, which has a code of 1. Other formats include IEEE floating point (3), ADPCM (2), μ-law (7), and WaveFormatExtensible (65534).

Number of channels – Typically a file will have 1 channel (mono) or 2 channels (stereo). A surround sound file will have 6* channels. Although this field technically allows you to have up to 65,535 channels, for audio data that would be flat out ridiculous. You would only hear all of the channels if you had 65,535 different speakers, and since a chunk can only hold 4GB of data (due to the 32-bit size field), you would only be able to store about a second and a half** of 8-bit PCM data.

Sample rate – The number of sample frames that occur each second. A typical value would be 44,100, which is the same as an audio CD.

Bytes per second (byte rate) – The spec calls this byte rate, which means the number of bytes required for one second of audio data. This is equal to the bytes per sample frame times the sample rate. So with a bytes per sample frame of 32, and a sample rate of 44,100, this should equal 1,411,200.

Bytes per sample frame – Called block align by the spec, this is the number of bytes required to store a single sample frame, i.e. a single sample for each channel. (Sometimes a sample frame is also referred to as a block). It should be equal to the number of channels times the bits per sample rounded up to a multiple of 8. For example:

Channels Bits Per Sample Bytes per sample frame
1 8 8
2 8 16
1 16 16
2 16 32
6 32 192

This field can be used to calculate the bytes per second field. Another possible use is for seeking around in a file. For example, if the bytes per sample frame is 32, then to seek forward 10 sample frames you need to seek forward 320 bytes.

For PCM data, this field is essentially redundant since it can be calculated from the other fields. However, be sure to note the point of rounding bits per sample values to the nearest multiple of 8.

Bits per sample – For PCM data, typical values will be 8, 16, or 32. If the sample format doesn't require this field, it should be set to 0.

Extension Size – This field should only be present if the format code is not 1. This indicates the size of the extra fields in bytes. It does not include the bytes in this field itself. If the given sample format has no extra fields, then this field should be set to 0.

Extra Fields – It depends on the format code! The next sections describe the extra fields for a few audio formats.

Extra Format Fields for Floating Point

If the format code is 3, then the sample data is stored as PCM using floating point numbers. There are no extra fields for this format, so the extension size field should be set to 0.

Field Size in Bytes Description

Other fields in format chunk
Extension Size 2 16-bit unsigned integer (value 0)

Extra Format Fields for EXTENSIBLE format

If the format code is 65534, then the format is called “WAVE_FORMAT_EXTENSIBLE”. This comes from the name of a data structure given to this format in the Windows API. The extensible format is a container format (within *.wav, which is itself a container format). It exists to work around some ambiguities in the original Wave file format without having to break compatibility with pre-existing files.

When the format is WAVE_FORMAT_EXTENSIBLE, the extension size in the format chunk should be 22, and the following three fields should be included:

Field Size in Bytes Description

Other fields in format chunk
Extension Size 2 16-bit unsigned integer (value 22)
Valid Bits Per Sample 2 16-bit unsigned integer
Channel Mask 4 32-bit unsigned integer
Sub Format 16 16-byte GUID

Valid Bits Per Sample – Allows storing samples with bit-depths that are not a byte multiple in size. For example, to store 12-bit samples, this value can be set to 12, and the bits-per-sample field in the format chunk set to 16. Each sample will still take up 16 bytes on disk, but the reader can be informed that only the lower 12 bits should be used.

Channel Mask – Indicates which audio channels map to which speakers.

Sub Format – Identifies the format of the sample data in the data chunk. This is a replacement for the original format code field, since it will have a format code of 65534.

The Fact Chunk

Field Size in Bytes Description
Chunk ID 4 ASCII string "fact"
Chunk Size 4 32-bit unsigned integer
Number of sample frames 4 32-bit unsigned integer

The fact chunk indicates how many sample frames are in the file. It’s required if the format code in the format chunk is not 1. If the format code is 1, it’s optional.

The reason for this chunk is that with some sample formats the number of sample frames can’t be determined by obvious means (e.g. because they store data in a compressed format). This gives a way of determining e.g. the playing time for the file, without having to decode the entire data chunk.

The number of samples frame is per channel. For example, if a stereo file contains 1,000 samples for both channels, the value of this field should be 1,000, not 2,000.

It’s not needed for integer PCM data (format code 1), because the total number of sample frames can be derived by dividing the bytes-per-sample-frame field from the format chunk by the total bytes in the data chunk body. For example, if the data chunk body is 1,411,200 bytes, and bytes-per-sample-frame is 32 (e.g. two 16-bit channels), then the total number of sample frames is 1,411,200 ÷ 32 = 44,100.

The Data Chunk

Field Size in Bytes Description
Chunk ID 4 ASCII string "data"
Chunk Size 4 32-bit unsigned integer
Sample Data Various It depends on the format code

The layout for the data chunk is simpler than the format chunk: the normal 8-byte chunk header, followed by nothing but sweet, raw, unfiltered sample data. The sample data can be stored in a number of formats, which will be indicated by the format chunk.

The next several sections describe various formats that data in the data chunk can be stored as.

PCM Data Chunk

The simplest, and most common, is to store PCM samples (format code 1). This is just raw sample data stored as integers. The bits per sample field will indicate the range of the sample data:

Bits per sample Minimum Sample Maximum Sample
8 0 255
16 -32,768 32,767
24 -8,388,608 8,388,607
32 -2,147,483,648 2,147,483,647

Important!Notice that 8-bit samples are unsigned, while larger bit depths are signed.

Samples in a multi-channel PCM wave file are interleaved. That is, in a stereo file, one sample for the left channel will be followed by one sample for the right channel, followed by another sample for the left channel, then right channel, and so forth.

The samples for all channels at a moment in time are called a sample frame (also called a block). That is, a sample frame will contain one sample for each channel. In a monophonic file, a sample frame will consist on 1 sample. In a stereo file, a sample frame has 2 samples (one for the left channel, one for the right channel). In a 5-channel file, a sample frame has 5 samples. The block align field in the format chunk gives the size in bytes of each sample frame. This can be useful when seeking to a particular sample frame in the file.

Floating Point Data Chunk

Another basic format is to store samples as floating point values (format code 3). This is essentially the same as PCM format, except that samples are in the range -1.0 to 1.0. The bits per sample field for floating point files should be set to 32 or 64.


Since WAVE_FORMAT_EXTENSIBLE is a container format, the format code of 65534 doesn't imply any particular sample format. The sample format is instead determined by the sub format GUID in the format chunk. For example, if the sub format GUID is the GUID for PCM, then the samples are in the same format as if the format code was 1.


Documents from Microsoft defining the initial file format, and changes over time:

Other links: