Syntopia

Tech pages

This is a bunch of unsorted notes. Eventually, when the leaves are brown and the sky is gray, I might organize them a bit.

Contents:

· Basic concepts · Filters · Decibel notes. 32-bit sound. ·
· Denormals · Fourier Notes · C++ peculiarities · Build notes · Credits to other people ·

Basic concepts

In order to characterize sound, one usually talks about the sound spectrum. Every sound can be decomposed into a set of sinus waves (this is done mathematically by applying a Fourier transformation to a sound signal). For a nice sound, these components will consist of a fundamental frequency (the lowest frequency in the spectrum) followed by a number of harmonics - that is, sine waves at a frequency that is a multiple of the fundamental frequency. Not all sound are this simple: percussion and drums will typically have a noise spectrum, where all frequencies are present.

Since software synthesis works on discrete samples there is a floor on the highest frequency one can playback. This is the Nyquist frequency. It is half the sample rate: IE for standard 44,1 KHz CD Audio, the highest possible frequency is 22,05KHz.

Now in order to create sounds, one possible way would be to just add of number of sine functions. This is additive synthesis. For a center A at 440 Hz, there would be a maximum number of 50 harmonics (that is 22,05KHz/440Hz). Although it would be possible to create sounds this way it would not be computationally feasible.

A nicer (or, perhaps just more common) approach is to use subtractive synthesis. The main idea is to start out with a simple wave form, but one with a lot of harmonics. Traditionally, on analogue synthesizers, the basic wave forms were sawtooth, square and triangle. One then proceeded by filtering these wave forms, typically through a 4-th order low pass filter.

Physical modelling

Alias.

Optimization.

What is implemented?

Build notes

In order to build the project you'll need to download Steinberg's VST SDK 2.0. Place the following files from the SDK (they are located in vstsdk2\source\common):

AEffect.h
AEffectx.h
AudioEffectX.h
AudioEffect.hpp
AudioEffectX.cpp
AudioEffect.cpp

In the /Syntopia/source/SynthCore folder.

I admit that this is an annoying solution, but it is not legal to redistribute the source code for the VST SDK. It is however legal to distribute compiled binaries.

The license agreement also requires me to state that: "VST is a trademark of Steinberg Soft- und Hardware GmbH"

Credits to other people

Not all of the code/algorithms are my own. The following things have been adopted from other sources:

The formulas for determining bi-quadratic filter coefficients was taken from:
'Cookbook formulae for audio EQ biquad filter coefficients' by Robert Bristow-Johnson (robert@wavemechanics.com)

The FFT routine was taken from:
Jörg Arndt's FXT library, released under the GPL.

I use an assembler inlined float-to-integer conversion (way faster than casting), which I originally found on the music-dsp mailing list in a post from "Angus Hewlett (angus from afhewlett f9 co uk)" where I also found the Hermitean interpolation routine I currently use.

VST is a trademark of Steinberg Soft- und Hardware GmbH, and their VST SDK 2.0 is neccessary to build the SYNTOPIA application.

Links

A good place to find information is to search the music-dsp archives or search comp.dsp (try Google groups).

Julius O. Smith's home page contains a number of well-written academical papers. My favorite source of information.

Music and Computers. Nice introduction to digital synthesis. Lots of sound examples and applets.

Programming techniques for modular synthesizers

A very nice FAQ on IIR and FIR filters.

Filters

Consider a input sound signal: in[t]. A filter module creates an output - out[t] - by multiplying the input samples and the earlier output samples by a number of filter co-efficients. Thus a filter could be:

out[t] = 0.5*in[t] + 0.5*in[t-2] + 2.0*out[t-1]

The order of a filter is the number of samples you have to buffer in order to calculate the output. Thus the above is a second order filter (because of the in[t-2] term).

Filters can be divided into two main categories: IIR (infinite impulse response) and FIR (finite impulse response) filters. The FIR filters are not functions of earlier output samples. This means they are stable: if the input dies out so will the output. (after a delay determined by the order of the filter). Also, the phase of the input signal is not modified.

An IIR filter on the other hand, can be unstable: it is easy to see that the output signal will either grow or die out exponentially. Notice the the exponentially dying solution will lead to denormal numbers.

Interpolation and Decimation

Sometimes it is desirable to work at a higher sample rate. IE: if you want to multiply ("ring modulate") two signal, the product will contain frequencies as high as the sums of the two original signals frequencies. This will alias if the original signal utilize the full bandwidth.

One solution is to double the bandwidths (sampling rates) of the original signals, perform the multiplication and go back to the original sampling rate.

The process of increasing the bandwidth is called interpolation. Its done by upsampling the signal, and then applying an interpolating filter.

Upsampling consists of padding the existing samples with zero-valued samples.
The filtering process is usually done with a FIR-filter. Since a lot of the incoming signal consists of zeroes, a trick is utilized in order to save computational costs:

Consider 2 x oversampling: after having upsampled the signal, every second sample is a zero. It follow you can split the FIR into two FIR's of half the length and apply one to the even samples and the other to the uneven.

Decibel notes. 32-bit sound.

The decibel scale is defined relative to a reference point. If P1 and P2 denotes the power of two signal, then the difference in decibel is:

diff. = 10 log (P2/P1) dB (for power)

The sound power is proportional to the square of the pressure, hence:

diff. = 20 log (p2/p1) dB (for pressure/voltage)

This we adopt as a starting point for a digital decibel scale. If we at the same time set the reference point to the largest possible digital value, we have the dBFS (digital deciBel Full Scale). For VST instruments the sound samples are 32-bit floating point numbers in the range [-1;1], hence:

dBFS level = 20 log ( L ) (for VST signals)

Notice 20 log ( 0.5 ) = -6.0. It follows that for every bit of sample resolution, we increase the dynamic range 6 dB.

16 Bit CD audio corresponds to a 96 dB dynamics range. (Roughly the same as the difference between whispering and being at the front row of a rock concert).

24 Bit CD audio corresponds to a 144 dB range. (Roughly the difference from the threshold of hearing to the level at which the bones in the ear may break).

What about 32-Bit sound? Well, this is actually a floating point number having a mantissa of 24-bit (including sign bit) and an 8-bit exponent. The VST format only uses the range from [-1;1] so we are not getting a full 192 dB range. FP numbers can be implemented different ways, but typically the machine resolution is around 3*10^-8 (according to Numerical Recipes).

This corresponds to a maximum dynamic range of 153 dB. Thus the advantage of floating point numbers is not the higher dynamic range, but the simplicity of using them. [TODO: dither... ]

Denormals

Small numbers (in the range 1e-38 to 1e-45) can slow a DSP application a lot! This shows up in IIR filters, as they often lead to an exponential decay (and hence arbitrarily low signal values). A very nice explanation is available here: Music-dsp faq. Syntopia avoids this by setting small numbers to zero.

Fourier Notes

The definition of a DFT (Discrete Fourier Transform) is:

H[n]=SUM(k=0..N-1){ h[k]*exp(2*PI*i*k*n/N) }

where h[k] is the (possibly complex) signal to be transformed, with k running from 0 to N-1. H[n] is the Fourier transform running from 0 to N-1. It will usually be a complex function.

The inverse DFT is defined as:

h[k]=(1/N) SUM(n=0..N-1){ H[n]*exp(-2*PI*i*k*n/N) }

which is the same as above except for a normalization factor and a sign change.

H[n] contains positive frequencies for n in the interval [0, N/2]
H[n] contains negative frequencies for n in the interval [N/2+1, N-1]

Let T denote the length of one sample (that means 1/T is the sampling frequency).
Then H[n] will contain frequencies in steps of 1/N*T, i.e.:

H[0] is the constant frequency component (which is an overall constant added to the function).
H[1] corresponds to a frequency of 1/N*T.
H[2] corresponds to a frequency of 2/N*T.
.....
H[N/2] corresponds to a frequency of 1/2*T - which is the Nyquist frequency: the highest possible frequency in a sampled signal. After this the negative frequencies starts.

H[N/2+1] corresponds to a frequency of (-N+1)/N*T.
H[N/2+2] corresponds to a frequency of (-N+2)/N*T.
....
H[N-2] corresponds to a frequency of -2/N*T
H[N-1] corresponds to a frequency of -1/N*T

There are some important symmetries to consider:

If h[k] is real (as it often is in sound synthesis) then H[n]=H[N-n]* (where * denotes complex conjugation)
If h[k] is even (meaning h[k] = h[N-k]) then H[n] is even.
If h[k] is odd (meaning h[k] = -h[N-k]) then H[n] is odd.
The triangle wave form is even and real.
According to the symmetries above, its Fourier transform must be REAL and EVEN.

The Sawtooth, Square and sinus wave forms are odd and real.
According to the symmetries above, their Fourier transforms must be IMAGINARY and ODD.
Let A[n] denote the amplitude of the n'th partial, where A[1] is the fundamental tone.

Sine wave form:
A[1]=1 ; Sawtooth wave form:
A[n]=1/n ; Phase = PI;

Square wave form:
A[n]=1/n ; for n uneven. Phase = PI;
A[n]=0 ; for n even
(There seems to be a lot of overshooting near transitions. This can be eliminated by adjusting the phase, but I haven't listened to it yet)

Triangle wave form:
A[n]=1/(n*n) ; for n uneven. Phase = 0;
A[n]=0 ; for n even

Originally I implemented the Fourier transform from the book 'Numerical Recipes in C'(Routine 'four1' chapter 12.2). This has changed since you cannot use their code in a GPL project. However, if you ever try their code, notice that there is an error in the array acces. The function 'four1' reads data[1] to data[nn*2]: it should be data[0] to data[nn*2-1].

C++ peculiarities

Long STL symbol names

If you see this:

warning C4786: ... : identifier was truncated to '255' characters in the debug information

when compiling program using STL in debug mode, insert the following before any includes in the .cpp file:

#pragma warning(disable: 4787)

If you are using pre-compiled headers place the pragma after the "stdafx.h" include.

Scope of for-loops

The following:

for (int t=1; t<10; t++) {;} for (int t=1; t<10; t++) {;}

Doesn't compile under all versions of Visual C++ because of a scope error (int t is defined twice in the same scope). The loops can be placed in brackets {,} in order to avoid this problem. This hopefully explains some of the ugliness of my code. As far as I now this is corrected in later version of Visual C++.