Welcome to “Handmade Hero Notes”, the book where we follow the footsteps of Handmade Hero in making the complete game from scratch, with no external libraries. If you'd like to follow along, preorder the game on handmadehero.org, and you will receive access to the GitHub repository, containing complete source code (tagged day-by-day) as well as a variety of other useful resources.
We are currently in the middle of setting up our sound support. Last time, we pretty much finished setting up DirectSound and the buffers. Today, we're going to focus on outputting the sound. For this purpose, we're also going to write a simple square wave sound to make sure the system works properly.
(Top)
1 DirectSound Init Review
2 Plan the Sound Output
2.1 Determine the Sound to Output
2.2 Visualize Circular Buffer
3 Fill the Buffer
3.1 Fill the buffer regions
3.2 Implement the Square Wave
3.3 Get Cursor Position and Region Size
3.4 Simplify Square Wave
3.5 Lock and Start Playing
4 Recap
5 Exercises
5.1 Practice RAII on Sound Buffer
5.2 Extract Your Sound Functions
6 Programming Notions
6.1 Intro to Digital Sound Theory
6.2 Compression-Oriented Programming
7 Navigation
Our Win32InitDSound
function currently does the following things:
dsound.dll
library.
DirectSoundCreate
.
dsound.dll
.
DirectSound
object.
DSBUFFERDESC
structure.
WAVEFORMATEX
structure we defined in step 4.
BufferDescription
.And... that's it! The secondary buffer is the only thing that we'll be using from here on. We can't write to the primary buffer anymore as you used to be able in the early days of Windows. Nowadays Windows doesn't give exclusive privileges to write to the sound card: one has to write in the secondary buffer, and the kernel will take all the programs writing to their secondary buffers, mix them together and output to the sound card.
Since we're going to use this secondary buffer, we will elevate it to our globals. Let's cut it out from Win32InitDSound
and paste it as a global_variable
next to its peers. We will also add it a Global
prefix to its name, in line with the other globals.
global_variable bool GlobalRunning;
global_variable win32_offscreen_buffer GlobalBackbuffer;global_variable IDirectSoundBuffer *GlobalSecondaryBuffer;
IDirectSoundBuffer *SecondaryBuffer;if(SUCCEEDED(DirectSound->CreateSoundBuffer(&BufferDescription, &GlobalSecondaryBuffer, 0))){
OutputDebugStringA("Secondary buffer created successfully.\n");
}
Again, we could eventually remove it as it's not a good idea having too many globals. In a platform layer it's usually not that big of a deal because the code is kind of isolated from the rest of your program. At the same time, you want to be aware of it, as the globals can be modified by anyone, and sometimes it creates a bit of a stringiness in your code (or spaghettinnes, if you prefer). Be aware of what you do global, make sure to understand a) why it's global, b) if it should stay global, and c) if there should be only one of these globals around. If any of these things aren't true, don't do it.
You might have noticed by now that we always try to lay out what we're about to do. If you so prefer, you can skim through this part, do the implementation, and return later to fully understand why we took the decisions we took.
That said, you should try and avoid thinking too much ahead. During the implementation new things that you might not have thought through will appear, and if your plan was too tight, it will fall apart or become very ugly very quickly.
Once we get our buffer, we will be able to output sound from it. In order to do that, we only need to ask where the write
cursor is and fill in some memory with the sound.
Now, which sound do we want to reproduce? We can't load audio .wav
files just yet, so we're going to generate a sound on the fly. It's going to be a really annoying sound, called Square wave. It is however the simplest sound we can generate, so it will do. Its name comes from a very characteristic waveform in which the high, positive signal is immediately followed by a low, negative signal, with nothing inbetween. It looks something like that.
We can roughly translate how we encode sound to how the speaker membrane behaves in real life. When the membrane is pushed out, you have a positive number; when it's sucked back, it's a negative number.
Conversely, to turn a stream of samples into real audio, a digital-to-analog converter (DAC) is first used to generate a time-varying analog voltage corresponding to the samples. This analog voltage is applied to the input of a power amplifier that drives the speaker.
Let's review what our buffer actually represents. We said last time that the sound buffer we initialized is a circular buffer. if you remember our rendering backbuffer, it was a simple chunk of memory that we agreed was a 2D representation of a bitmap. We would fill the entire buffer, and then pass it to be rendered on screen, again in its entirety. Rinse and repeat.
We decided not to do it with the sound buffer. Instead, we agreed that our sound buffer would be a 2-second stream of data that we will be constantly updating to make sure the most up to date sounds are played. And, instead of filling it from the beginning to the end, we will write only from a specific point for a while using a “cursor” which will constantly run from the beginning to the end of the buffer.
Behind our writing cursor, another one will be running. Like a vacuum cleaner picking up the dust we drop in front of it, it will be reading our bytes and passing them to the sound card for playing.
Play and Write cursor, in a constant chase of cat and mouse around the buffer.
You could imagine this buffer as if you were infinitely adding a new buffer to the end of the existing one. Thing is, you can't really write ahead infinitely. Once you hit the Play cursor's position right now, you should stop, otherwise the newly written sounds will overwrite whatever the Play cursor wants to read on the current iteration.
For instance, if we write our square wave to the buffer, this how it would look in an “unwrapped” state:
In summary, once the Play cursor passes an area, it becomes available to write new sounds to it. On the other hand, the Write cursor marks the minimum safe area. Anything before that risks to produce distorted sounds.
Now, before writing to the buffer, we will need to lock an area that we will specify. We will get in return either one or two regions. Looking at the Figure 3 again, it's easy to see why: you either get one region if all the desired size fits before the end of the buffer, or two regions if it doesn't.
This is the gist of it. Now, let's think about a few other details before we're ready to make it happen:
GlobalSecondaryBuffer
so we will use this object to call its methods.
Lock
ing and Unlock
ing the sound buffer.
Now, luckily DirectSound knows that its buffers are often used in a circular manner, so it accounts for the probability that we can receive in return up to two regions. The internal logic of the API in that case is the following:
Of course, this doesn't always happen. The whole region we've requested might fit well within the borders of the borders of the buffer.
We should be prepared to handle both of these cases, whether we get the requested memory in one block, or two. It's easy to do, but it's important to understand the idea before implementing it.
Let's get to coding this down. Inside our WinMain
, we'll clear some space and get ready to output our buffer, say, right after we render our gradient:
// ...
RenderWeirdGradient(&GlobalBackbuffer, XOffset, YOffset);
// NOTE(casey): DirectSound output test
win32_window_dimension Dimension = Win32GetWindowDimension(Window);
Win32DisplayBufferInWindow(&GlobalBackbuffer, DeviceContext, Dimension.Width, Dimension.Height);
// ...
Locking is achieved by calling Lock method from our GlobalSecondaryBuffer
object (if you recall, we call methods any functions which are retrieved from objects, as opposed to straight from the source file). This is what the Lock
's signature looks like:
HRESULT Lock(
DWORD dwWriteCursor, // Input
DWORD dwWriteBytes, // Input
LPVOID lplpvAudioPtr1, // Output
LPDWORD lpdwAudioBytes1, // Output
LPVOID lplpvAudioPtr2, // Output
LPDWORD lpdwAudioBytes2, // Output
DWORD dwFlags // Input
);
There're a few interesting things to note. First, the method returns an HRESULT
which, as we remember from the last time, can (and really, should) be tested with SUCCEEDED
macro. As for its parameters, we have a big amount of output values which roughly correspond to what we said.
dwWriteCursor
: The starting address of our write pointer. Let's say we will have a certain ByteToLock
which we will lock. We'll return to it a bit later.
dwWriteBytes
: How many bytes we intend to write to, i.e. our “desired size”. Let's say we're going to write some BytesToWrite
value that we will come up with later.
lplpvAudioPtr1
, lpdwAudioBytes1
: Address of a pointer and a size variable for the first region you will receive back.
lplpvAudioPtr2
, lpdwAudioBytes2
: Address of a pointer and a size variable for the potential second region, if any.
dwFlags
: There a couple flags that we could pass here. We don't really need them, so just pass 0
.// NOTE(casey): DirectSound output testDWORD ByteToLock = ; // TODO!
DWORD BytesToWrite = ; // TODO!
VOID *Region1;
DWORD Region1Size;
VOID *Region2;
DWORD Region2Size;
if (SUCCEEDED(GlobalSecondaryBuffer->Lock(ByteToLock, BytesToWrite,
&Region1, &Region1Size,
&Region2, &Region2Size,
0)))
{
// All good, we can write to the buffer
}
// ...
Region1, Region1Size, Region2
and Region2Size
will be filled out by the buffer, so we can leave them as that. As for ByteToLock
and BytesToWrite
, these will need to come from somewhere. We'll return to them in a minute, for now lets move on.
Now that we have our buffer ready and willing to accept our samples, let's think about how exactly will we write to it. Our graphics backbuffer was accepting 32-bit Pixels in a specific format. Similarly, we'll be writing sound in Samples, each 16-bit sample alternating between two channels.
You can quickly realize that it will be simpler for us to think about samples in terms of a single Left-Right
unit, instead of Left
or Right
separate things. There's no sense in writing to the Left
channel if we aren't writing to the Right
channel, and vice versa. Therefore, we could abstract this even further and think of a Left-Right
pair as a single, 32-bit wide, Sample
. The name might be confusing with the individual Left
or Right
samples, but we'll be calling those Channels
from here on.
We can also say with certainty that our cursor and BytesToWrite
will always be an even multiple of a Sample size (which is, under this logic, 4 bytes long). We can't really output less than that because it wouldn't make sense for our stereo sound.
So when Region1Size
or Region2Size
come back, they'd better be even multiple of a Sample size as well. Because if it's not the case, something weird has happened.
We will skip the talk about assertions for now, because it's a pretty important topic and must be handled separately. For now we can leave a TODO for future us:
if (SUCCEEDED(GlobalSecondaryBuffer->Lock(...)))
{
// All good, we can write to the buffer// TODO(casey): assert that Region1Size/Region2Size are valid }
So, now that we know how to think about this next part, let's implement it. There're many ways approaching it, let's do the most explicit one.
As with our rendering, we'll throw in a couple of for
loops. First, we will iterate over Region1
and then, if Region2Size
is anything greater than 0
, over Region2
. Since we get 0
in Region2Size
if the Region2
isn't necessary, we don't need to put any extra if
statements: the second loop will simply not go off.
We will be iterating over a value called SampleCount
for regions 1 and 2. This will be our actual region size divided by the BytesPerSample
value we calculated last time.
GlobalSecondaryBuffer->Lock(...);
// TODO(casey): assert that Region1Size/Region2Size are valid DWORD Region1SampleCount = Region1Size / BytesPerSample;
for (DWORD SampleIndex = 0;
SampleIndex < Region1SampleCount;
++SampleIndex)
{
}
DWORD Region2SampleCount = Region2Size / BytesPerSample;
for (DWORD SampleIndex = 0;
SampleIndex < Region2SampleCount;
++SampleIndex)
{
}
Please note that contrary to the “2D” render buffer loop we only need to iterate once inside each “1D” sound buffer loop. The two different loops we typed in are consecutive for two different regions that we might get.
As with Pixel
, we'll have our SampleOut
. We'll make it the sound of a single channel (16 bit) and use to iterate over our regions. This means that we initialize it pointing at the region beginning, and will be slowly advancing it as we write down our samples.
if (SUCCEEDED(GlobalSecondaryBuffer->Lock(...)))
{
// TODO(casey): assert that Region1Size/Region2Size are valid s16 *SampleOut = (s16 *)Region1; DWORD Region1SampleCount = Region1Size / BytesPerSample;
for (DWORD SampleIndex = 0;
SampleIndex < Region1SampleCount;
++SampleIndex)
{*SampleOut++ = LEFT;
*SampleOut++ = RIGHT; }
SampleOut = (s16 *)Region2; DWORD Region2SampleCount = Region2Size / BytesPerSample;
for (DWORD SampleIndex = 0;
SampleIndex < Region2SampleCount;
++SampleIndex)
{ *SampleOut++ = LEFT;
*SampleOut++ = RIGHT; }
}
We assume that you have some basic knowledge how digital sound sampling works. If not, check out subsection 6.1.
Now, square wave is super simple (that's why we're implementing it). Its samples are either at their maximum or their minimum, with no inbetween. So in practice we should only know when flip the sign, and what the “maximum” and “minimum” values are. It's calculated in the following way:
A few things can be precalculated before we enter our main loop:
SquareWavePeriod
.
SamplesPerSecond
and divide it by desired ToneHz
.
SquareWavePeriod
.
SquareWavePeriod
.
The actual wave sampling will of course happen inside the main loop:
SampleValue
. If our SquareWaveCounter
is greater than our HalfSquareWavePeriod
, it will be SoundPitch
. If not, -SoundPitch
.
SampleValue
to the left and right channels.Try to implement it all yourself! You will find our implementation below:
// NOTE(casey): Sound test constantsint SamplesPerSecond = 48000;
int BytesPerSample = sizeof(s16) * 2;
int SecondaryBufferSize = 2 * SamplesPerSecond * BytesPerSample;int ToneHz = 256;
int SquareWavePeriod = SamplesPerSecond / ToneHz;
int HalfSquareWavePeriod = SquareWavePeriod / 2;
int ToneVolume = 3000;
int SquareWaveCounter = 0;
GlobalRunning = true;
while (GlobalRunning)
{
// Main game loop
// ...
// NOTE(casey): DirectSound output test
// ...
s16 *SampleOut = (s16 *)Region1;
DWORD Region1SampleCount = Region1Size / BytesPerSample;
for (DWORD SampleIndex = 0;
SampleIndex < Region1SampleCount;
++SampleIndex)
{ if(!SquareWaveCounter)
{
SquareWaveCounter = SquareWavePeriod;
}
s16 SampleValue = (SquareWaveCounter > HalfSquareWavePeriod) ? ToneVolume : -ToneVolume; *SampleOut++ = SampleValue;
*SampleOut++ = SampleValue; --SquareWaveCounter;
}
SampleOut = (s16 *)Region2;
DWORD Region2SampleCount = Region2Size / BytesPerSample;
for (DWORD SampleIndex = 0;
SampleIndex < Region2SampleCount;
++SampleIndex)
{ if(!SquareWaveCounter)
{
SquareWaveCounter = SquareWavePeriod;
}
s16 SampleValue = (SquareWaveCounter > HalfSquareWavePeriod) ? ToneVolume : -ToneVolume; *SampleOut++ = SampleValue;
*SampleOut++ = SampleValue; --SquareWaveCounter; }
// ...
}
You will notice the following line in our implementation:
s16 SampleValue = (SquareWaveCounter > HalfSquareWavePeriod) ? ToneVolume : -ToneVolume;
Simply put, it's an assignment based on a test. It's a shorthand for the following:
s16 SampleValue;
if (SquareWaveCounter > HalfSquareWavePeriod)
{
SampleValue = ToneVolume;
}
else
{
SampleValue = -ToneVolume;
}
The syntax for Ternary Operator is variable = condition ? value_if_true : value_if_false;
.
If you try and build now, you should remain with only have two errors remaining. (Plus remember, we'll also need to unlock the buffer once we are done with it, and start playing!)
If you recall, we left ByteToLock
and BytesToWrite
as stubs, now let's think where we can get them from. ByteToLock
will tell DirectSound where to start writing from, while BytesToWrite
will specify the size of our desired region.
As we said at the beginning, in order to calculate ByteToLock
we need to “unwrap” our buffer. We will do it by introducing an unsigned integer which will keep track of the Samples we write. We will then calculate ByteToLock
by multiplying the running sample index by the BytesPerSample
and getting the remainder of division by the secondary buffer size. The latter can be easily produced by using the modulo operator (%
).
Why Unsigned Integer? It has to do with the number overflow.
A 32-bit signed integer goes from \(โ2,147,483,648 (โ2^{31})\) through \(2,147,483,647 (2^{31} โ 1)\), while unsigned 32-bit goes from \(0\) through \(4,294,967,295 (2^{32} โ 1)\). That's simply how much 32 binary digits can store.
If you go one past the maximum value, the number overflows, i.e. restarts from the minimum number. However, while for unsigned integers this means restarting from \(0\), for signed integers it means start from the lowest negative number (\(โ2,147,483,648\) for 32-bit integers). We'd really rather not have it here.
As for the BytesToWrite
, we don't want to write immediately past the Play cursor, so we need to know whether it's before or after the byte we're locking.
To do that, we should find the position of the Play cursor, and we can do it by calling another buffer method, GetCurrentPosition. GetCurrentPosition
returns an HRESULT
and takes two pointers. These will be returned by the method as the offsets in bytes to the PlayCursor
and WriteCursor
from the beginning of the buffer. Again, we need to test whether or not this method SUCCEEDED
. If SUCCEEDED
, this means that something bad happened, and we should not try to output sounds to it. Let's wrap the whole sound output code block that we've written, and only execute it if we got current position correctly.
int SamplesPerSecond = 48000;
int BytesPerSample = sizeof(s16) * 2;
int SecondaryBufferSize = 2 * SamplesPerSecond * BytesPerSample;u32 RunningSampleIndex = 0;int ToneHz = 256;
int SquareWavePeriod = SamplesPerSecond / ToneHz;
int ToneVolume = 3000;
int SquareWaveCounter = 0;
// ...
GlobalRunning = true;
while (GlobalRunning)
{
// Main game loop
// ...
// NOTE(casey): DirectSound output test DWORD PlayCursor;
DWORD WriteCursor;
if(SUCCEEDED(GlobalSecondaryBuffer->GetCurrentPosition(&PlayCursor, &WriteCursor)))
{ DWORD ByteToLock = RunningSampleIndex * BytesPerSample % SecondaryBufferSize; DWORD BytesToWrite = ;
VOID *Region1;
DWORD Region1Size;
VOID *Region2;
DWORD Region2Size;
if(SUCCEEDED(GlobalSecondaryBuffer->Lock(...)))
{
// ...
} } // ...
}
If you recall Figure 4, we need to account for two scenarios: if the PlayCursor
is after ByteToLock
(requested size will fit in the buffer), and if it's before (we'll need to add second region at the beginning). What we don't know is our “desired size”, and that's what BytesToWrite
will represent.
Since both the PlayCursor
and ByteToLock
are expressed in bytes, calculation of the bytes to write will be pretty straightforward.
DWORD ByteToLock = RunningSampleIndex * BytesPerSample % SecondaryBufferSize;DWORD BytesToWrite;if(ByteToLock > PlayCursor)
{
// Play cursor is behind
BytesToWrite = SecondaryBufferSize - ByteToLock; // region 1
BytesToWrite += PlayCursor; // region 2
}
else
{
// Play cursor is in front
BytesToWrite = PlayCursor - ByteToLock; // region 1
}
VOID *Region1;
DWORD Region1Size;
VOID *Region2;
DWORD Region2Size;
Now that we have our running buffer index, we can simplify our square wave significantly. We no longer need the SquareWaveCounter
, as we can derive the position of the wave from the RunningSampleIndex
.
To do that, instead of comparing SquareWaveCounter
with HalfSquareWavePeriod
, we will divide our RunningSampleIndex
by HalfSquareWavePeriod
, and get the remainder of division by 2
(“modulo 2", or % 2
). This will give us 0
or 1
which we can use to determine whether we're on a positive pitch, or negative one.
In other words, SquareWaveCounter > HalfSquareWavePeriod
becomes (RunningSampleIndex / HalfSquareWavePeriod) % 2
. We also want to advance our RunningSampleIndex
(so, you know, it keeps running), se we can increment it in the same line, as well.
int SquareWaveCounter = 0;// ...
s16 *SampleOut = (s16 *)Region1;
DWORD Region1SampleCount = Region1Size / BytesPerSample;
for (DWORD SampleIndex = 0;
SampleIndex < Region1SampleCount;
++SampleIndex)
{ if(!SquareWaveCounter)
{
SquareWaveCounter = SquareWavePeriod;
} s16 SampleValue = ((RunningSampleIndex++ / HalfSquareWavePeriod) % 2) ? ToneVolume : -ToneVolume; *SampleOut++ = SampleValue;
*SampleOut++ = SampleValue; --SquareWaveCounter;}
SampleOut = (s16 *)Region2;
DWORD Region2SampleCount = Region2Size / BytesPerSample;
for (DWORD SampleIndex = 0;
SampleIndex < Region2SampleCount;
++SampleIndex)
{ if(!SquareWaveCounter)
{
SquareWaveCounter = SquareWavePeriod;
} s16 SampleValue = ((RunningSampleIndex++ / HalfSquareWavePeriod) % 2) ? ToneVolume : -ToneVolume; *SampleOut++ = SampleValue;
*SampleOut++ = SampleValue; --SquareWaveCounter;}
We need to clean up a couple of things before we're ready to reproduce our beautiful square wave. First, we need to unlock the buffer so that Windows can read again from it. The method we're after is simply Unlock, and we pass to it the same Region
s and RegionSize
s that we received from Lock
:
if(SUCCEEDED(GlobalSecondaryBuffer->Lock(ByteToLock, BytesToWrite,
&Region1, &Region1Size,
&Region2, &Region2Size,
0)))
{
// ...
// Write all our samples
// ...
GlobalSecondaryBuffer->Unlock(Region1, Region1Size, Region2, Region2Size);}
We also need to start playing. You do it by calling Play method of the buffer. Usually you want to start playing only after initially filling out the buffer; for the sake of our test, for now we'll do it immediately after we initialized the buffer.
As parameters for Play
, we really don't have many options. As you can see from the documentation, both the first and second parameter can only be 0
, while dwFlags
allows us to set up looping of the buffer. That's what we're interested in, so we'll pass the DSBPLAY_LOOPING
flag along:
Win32InitDSound(Window, SamplesPerSecond, SecondaryBufferSize);GlobalSecondaryBuffer->Play(0, 0, DSBPLAY_LOOPING);
Compile, set your speakers to a low volume and listen to the beauty of your work! If you've done everything right, you should hear a continuous, uninterrupted sound, without any noticeable “clicking”.
Today, we've written most of DirectSound-related code. It will largely remain the same, ready to output whatever samples we'll pass to it.
The code we've written today will definitely contain some bugs. Compressing a “flat” linear buffer into a circular one is always somewhat complex. Next time, we are going to challenge it more, by implementing some more advanced wave types. We will also look at the buffer and verify that it looks like it should.
We hope that you enjoyed following along in the beautiful world of circular buffer coding. If you struggle with some parts, take regular breaks and return to the code when you are ready to rumble!
If you remember, “Resource Acquisition Is Initialization” is a practice to acquire and release resources in one command. C++ allows to do it via Constructors and Destructors. We've discussed RAII in day 4.
Practice your RAII by dynamically locking and unlocking the sound buffer!
You might have noticed that we're writing our sound code directly in WinMain
. Try extracting this code to a separate function, say Win32UpdateDSound
, and passing the constants we defined above as its parameters.
Let's quickly go over what we need for our sound to be played. Sound is produced by receiving a vibration of an “acoustic wave”. In our case, such a wave is produced by speakers or headphones. As a wave, it has a specific frequency (i.e. how many times does it repeats in a second), which determines a sound's pitch, and amplitude, which determines its intensity and therefore loudness (volume).
The frequency is measured in Hertz (ใ), i.e. cycles per second. A cycle is how long it takes for a wave to go from a position (e.g. a peak) to the next iteration of the same position (next peak).
In other words, a wave oscillating between maximum and minimum values at a certain amplitude and frequency produces sound. The frequency should be within a specific range (~20Hz to ~20kHz) and propagate through suitable media (e.g. air) to be perceived by the human ear.
We write our sound by “sampling” it many many many times (48000 times per second as of right now), and telling the DirectSound what the value of our wave at that sample point is. These samples are then used to reconstruct the actual waves as the sound output device emits vibrations corresponding to the value of each sample.
(Back to subsection 6.1)
You will notice that today we wrote something that we though was good, then went in and it turned out we could write it better. This is what we call “compression-oriented programming”. When you write down some code which is somewhat complicated and finnicky, you first keep writing whatever the simpler thing is. You then start pulling things out that are common.
Eventually, a pattern emerges, and you start seeing where this code even go eventually! For instance, our two region loops are the same, so that can probably be boiled down.
This is the best way to end up with nice working code that does exactly what you want it to do.
Previous: Day 7. Initializing DirectSound
Up Next: Day 9. Variable-Pitch Sine Wave Output