Tech pagesThis is a bunch of unsorted notes. Eventually, when the leaves are brown and the sky is gray, I might organize them a bit.
Contents: · Denormals · Fourier Notes · C++ peculiarities · Build notes · Credits to other people ·
Basic conceptsIn order to characterize sound, one usually talks about the sound spectrum. Every sound can be decomposed into a set of sinus waves (this is done mathematically by applying a Fourier transformation to a sound signal). For a nice sound, these components will consist of a fundamental frequency (the lowest frequency in the spectrum) followed by a number of harmonics - that is, sine waves at a frequency that is a multiple of the fundamental frequency. Not all sound are this simple: percussion and drums will typically have a noise spectrum, where all frequencies are present.Since software synthesis works on discrete samples there is a floor on the highest frequency one can playback. This is the Nyquist frequency. It is half the sample rate: IE for standard 44,1 KHz CD Audio, the highest possible frequency is 22,05KHz. Now in order to create sounds, one possible way would be to just add of number of sine functions. This is additive synthesis. For a center A at 440 Hz, there would be a maximum number of 50 harmonics (that is 22,05KHz/440Hz). Although it would be possible to create sounds this way it would not be computationally feasible. A nicer (or, perhaps just more common) approach is to use subtractive synthesis. The main idea is to start out with a simple wave form, but one with a lot of harmonics. Traditionally, on analogue synthesizers, the basic wave forms were sawtooth, square and triangle. One then proceeded by filtering these wave forms, typically through a 4-th order low pass filter. Physical modelling Alias. Optimization. What is implemented? Build notesIn order to build the project you'll need to download Steinberg's VST SDK 2.0. Place the following files from the SDK (they are located in vstsdk2\source\common):AEffect.h AEffectx.h AudioEffectX.h AudioEffect.hpp AudioEffectX.cpp AudioEffect.cppIn the /Syntopia/source/SynthCore folder. I admit that this is an annoying solution, but it is not legal to redistribute the source code for the VST SDK. It is however legal to distribute compiled binaries. The license agreement also requires me to state that: "VST is a trademark of Steinberg Soft- und Hardware GmbH" Credits to other peopleNot all of the code/algorithms are my own. The following things have been adopted from other sources:
The formulas for determining bi-quadratic filter coefficients was taken from:
The FFT routine was taken from: I use an assembler inlined float-to-integer conversion (way faster than casting), which I originally found on the music-dsp mailing list in a post from "Angus Hewlett (angus from afhewlett f9 co uk)" where I also found the Hermitean interpolation routine I currently use. VST is a trademark of Steinberg Soft- und Hardware GmbH, and their VST SDK 2.0 is neccessary to build the SYNTOPIA application.
LinksA good place to find information is to search the music-dsp archives or search comp.dsp (try Google groups).Julius O. Smith's home page contains a number of well-written academical papers. My favorite source of information. Music and Computers. Nice introduction to digital synthesis. Lots of sound examples and applets. Programming techniques for modular synthesizers A very nice FAQ on IIR and FIR filters. FiltersConsider a input sound signal: in[t]. A filter module creates an output - out[t] - by multiplying the input samples and the earlier output samples by a number of filter co-efficients. Thus a filter could be:
out[t] = 0.5*in[t] + 0.5*in[t-2] + 2.0*out[t-1] The order of a filter is the number of samples you have to buffer in order to calculate the output. Thus the above is a second order filter (because of the in[t-2] term). Filters can be divided into two main categories: IIR (infinite impulse response) and FIR (finite impulse response) filters. The FIR filters are not functions of earlier output samples. This means they are stable: if the input dies out so will the output. (after a delay determined by the order of the filter). Also, the phase of the input signal is not modified. An IIR filter on the other hand, can be unstable: it is easy to see that the output signal will either grow or die out exponentially. Notice the the exponentially dying solution will lead to denormal numbers. Interpolation and DecimationSometimes it is desirable to work at a higher sample rate. IE: if you want to multiply ("ring modulate") two signal, the product will contain frequencies as high as the sums of the two original signals frequencies. This will alias if the original signal utilize the full bandwidth.One solution is to double the bandwidths (sampling rates) of the original signals, perform the multiplication and go back to the original sampling rate. The process of increasing the bandwidth is called interpolation. Its done by upsampling the signal, and then applying an interpolating filter.
Upsampling consists of padding the existing samples with zero-valued samples. Consider 2 x oversampling: after having upsampled the signal, every second sample is a zero. It follow you can split the FIR into two FIR's of half the length and apply one to the even samples and the other to the uneven.
Decibel notes. 32-bit sound.The decibel scale is defined relative to a reference point. If P1 and P2 denotes the power of two signal, then the difference in decibel is:diff. = 10 log (P2/P1) dB (for power) The sound power is proportional to the square of the pressure, hence: diff. = 20 log (p2/p1) dB (for pressure/voltage) This we adopt as a starting point for a digital decibel scale. If we at the same time set the reference point to the largest possible digital value, we have the dBFS (digital deciBel Full Scale). For VST instruments the sound samples are 32-bit floating point numbers in the range [-1;1], hence: dBFS level = 20 log ( L ) (for VST signals) Notice 20 log ( 0.5 ) = -6.0. It follows that for every bit of sample resolution, we increase the dynamic range 6 dB. 16 Bit CD audio corresponds to a 96 dB dynamics range. (Roughly the same as the difference between whispering and being at the front row of a rock concert). 24 Bit CD audio corresponds to a 144 dB range. (Roughly the difference from the threshold of hearing to the level at which the bones in the ear may break). What about 32-Bit sound? Well, this is actually a floating point number having a mantissa of 24-bit (including sign bit) and an 8-bit exponent. The VST format only uses the range from [-1;1] so we are not getting a full 192 dB range. FP numbers can be implemented different ways, but typically the machine resolution is around 3*10^-8 (according to Numerical Recipes). This corresponds to a maximum dynamic range of 153 dB. Thus the advantage of floating point numbers is not the higher dynamic range, but the simplicity of using them. [TODO: dither... ] DenormalsSmall numbers (in the range 1e-38 to 1e-45) can slow a DSP application a lot! This shows up in IIR filters, as they often lead to an exponential decay (and hence arbitrarily low signal values). A very nice explanation is available here: Music-dsp faq. Syntopia avoids this by setting small numbers to zero.Fourier NotesThe definition of a DFT (Discrete Fourier Transform) is:H[n]=SUM(k=0..N-1){ h[k]*exp(2*PI*i*k*n/N) } where h[k] is the (possibly complex) signal to be transformed, with k running from 0 to N-1. H[n] is the Fourier transform running from 0 to N-1. It will usually be a complex function. The inverse DFT is defined as: h[k]=(1/N) SUM(n=0..N-1){ H[n]*exp(-2*PI*i*k*n/N) } which is the same as above except for a normalization factor and a sign change.
H[n] contains positive frequencies for n in the interval [0, N/2]
Let T denote the length of one sample (that means 1/T is the sampling frequency).
H[0] is the constant frequency component (which is an overall constant added to the function).
H[N/2+1] corresponds to a frequency of (-N+1)/N*T. There are some important symmetries to consider:
If h[k] is real (as it often is in sound synthesis) then H[n]=H[N-n]* (where * denotes complex conjugation)
The Sawtooth, Square and sinus wave forms are odd and real.
Sine wave form:
Square wave form:
Triangle wave form: Originally I implemented the Fourier transform from the book 'Numerical Recipes in C'(Routine 'four1' chapter 12.2). This has changed since you cannot use their code in a GPL project. However, if you ever try their code, notice that there is an error in the array acces. The function 'four1' reads data[1] to data[nn*2]: it should be data[0] to data[nn*2-1].
C++ peculiarities
warning C4786: ... : identifier was truncated to '255' characters in the debug information when compiling program using STL in debug mode, insert the following before any includes in the .cpp file: #pragma warning(disable: 4787) If you are using pre-compiled headers place the pragma after the "stdafx.h" include.
for (int t=1; t<10; t++) {;} Doesn't compile under all versions of Visual C++ because of a scope error (int t is defined twice in the same scope). The loops can be placed in brackets {,} in order to avoid this problem. This hopefully explains some of the ugliness of my code. As far as I now this is corrected in later version of Visual C++. |