-------------------------------------------------------------------------
A E S  /  E B U     a n d    S / P D I F         Digital Audio Interfaces
-------------------------------------------------------------------------

Many posts in the Usenet newsgroup rec.audio.pro deal with questions upon 
the topic of the two most common digital audio interfaces: AES/EBU and 
S/PDIF. Almost all the users have heard that these formats are related to 
each other and by this they are similar. 

This electronic booklet tries to explain the technical part of the 
interfaces. It is written from a somewhat theoretical view, because I had 
to deal with this theme in my diploma work (autumn 1995) without having 
soooo much experience with that interfaces. My explanations base upon the 
EBU document Tech 3250 which describes the AES/EBU interface on the one 
hand and on the other hand on the book "Harddisk Recording" by Horst Zander. 

AES/EBU and S/PDIF are not the only digital audio interface formats. For 
multitrack usage there is MADI (not to confuse with MIDI), there is a 
format by Mitsubishi (MELCO) and two formats by Yamaha (Y1 and Y2).

Acronyms:
AES/EBU: Audio Engineering Society / European Broadcasting Union
S/PDIF:  Sony / Philips Digital Interface Format

These two interface formats belong to the group of self-clocking data 
interfaces which means the sync signal is derived from the data signal 
itself. (Non-self-clocking interfaces have seperate leads for the clock 
and/or sync signals).

End of introduction. Medias in res, so the Latins say.

The data structure of the interface (valid for both AES/EBU and S/PDIF)
-----------------------------------------------------------------------
The digital signal is divided into blocks of 192 frames each. Each frame 
consists of two subframes and each subframe carries 32 small time slots, 
one slot for each bit. 

(Fig. stream.gif)

The digital signal inside the time slots is 
transported by the Biphase Modulation. At the border of each time slot 
there is a compulsory Low/High or High/Low jump. The "High" level is a 
positive voltage against ground, +U. The "Low" level is a negative voltage 
against ground, -U. (This means the absolute amount of the voltages is 
the same.) By the continuous Hi/Lo - Lo/Hi jumping the digital signal 
doesn't have a DC offset. You can put this signal through a transformer 
also.

(Fig. biphase.gif)

In this way the digital signal is transmitted. The sync signals are the 
so-called PREAMBLES which fill the first 4 time slots in each subframe. 
There are three different preambles:
X: Beginning of a subframe "A" inside a block
Y: Beginning of a subframe "B"
Z: Beginning of a new block. This subframe is compulsory a subframe "A".

The decoder recognizes a preamble by its violation of the Biphase 
Modulation rules. There are points where a Hi/Lo or a Lo/Hi jump is 
missing. Look at the next picture, it explains it better than I can do 
with words.

(Fig. preamble.gif)

The next 4 time slots in each subframe carry the auxiliary audio data. 
The interface is capable to carry an audio signal with a resolution of 
up to 24 bits. In this case the audio LSB is put in the first aux-audio 
time slot. If the audio signal has a resolution of 20 bit or less, these 
4 bits are empty and may be used to carry an additional monitor-quality 
audio signal. This monitor signal is quantized 12 bits linear and occupies 
the aux-audio slots of three subframes. This means the sampling frequency 
of that monitor signal is just 1/3 of the main-audio sampling frequency. 
In the whole interface two of such aux-audio signals may be transmitted 
(one in the subframes "A" and one in the subframes "B").
The LSB of the aux-audio signal is aligned to the first frame in a block 
(introduced by preamble Z).

Next there are 20 time slots for the main audio signal. The MSB is the last 
bit in that area. The bits below the LSB are set zero.

The last 4 time slots contain 4 control bits in the following order:
V: Validity. V=0 means the value of that sample is correct (correctly 
   ADC'ed or correctly read from the storage).
U: User Bits. Here the designer of the device may transmit system-exclusive 
   data. In the meantime this has become the place of START or SKIP IDs. If 
   not in use, these bits have to be set zero.
S: Channel Status Bits. Because there are 192 frames, the S-Bits form a 
   24-byte (192/8=24) info "label" for the digital signal. Here the signal 
   parameters like sampling frequency, emphasis, copy-protection are 
   defined. Because this is a very important topic, I've dedicated the 
   whole next chapter to this information.
P: Parity. The interface works with Even Parity Check. This bit is set in
   a way that the total number of "1" bits in that subframe is an even 
   number. 

Hey, this is the right time to make a break, because the next chapter is 
a "hammer". And then: with new resoluteness and fresh force - to avoid 
all the labour ;-)

Channel Status Data
-------------------
Welcome to the "mystic ingredients" of the interface! Unfortunately 
here the S/PDIF coding differs from the AES/EBU coding. But to make the 
whole thing confusing, the decoders have to be constructed in a way they 
can understand both codings. To achieve this the very first bit in the 
channel status data block is a switch:
0: The block uses "consumer"-coding (which means S/PDIF)
1: The block uses "professional"-coding (which means AES/EBU)

The EBU document Tech 3250 says: "The significance of Byte 0, Bit 0 is such 
that a transmission from an interface conforming to IEC 958 'consumer use'
[S/PDIF] can be identified, and receiver conforming only to IEC 958 will 
correctly identify a transmission from a 'professional use' interface as 
defined in this standard. [...]"

"X" means: value at choice, don't bother whether it is "1" or "0".

The AES/EBU coding:
Byte  Bits     Values    Meaning
  0     0          0  :  consumer coding
  0     0          1  :  professional coding

  0     1          0  :  normal audio mode
  0     1          1  :  non-audio mode (master-sync or CD data mode)

  0   2,3,4     0,0,0 :  emphasis not indicated, default: off
  0   2,3,4     1,0,0 :  emphasis switched off
  0   2,3,4     1,1,0 :  emphasis switched on: 50 us / 15 us
  0   2,3,4     1,1,1 :  emphasis switched on: ITU-T J-17 type
  0   2,3,4     else  :  reserved

  0     5          0  :  sync'ed to source sampling frequency (default)
  0     5          1  :  not sync'ed to source sampling frequency

  0    6,7        0,0 :  sampling frequency not indicated. default 48 kHz
  0    6,7        0,1 :  sampling frequency = 48 kHz
  0    6,7        1,0 :  sampling frequency = 44.1 kHz
  0    6,7        1,1 :  sampling frequency = 32 kHz

  1  0,1,2,3  0,0,0,0 :  channel mode not indicated. default: 2-channel
  1  0,1,2,3  0,0,0,1 :  channel mode: 2-channel
  1  0,1,2,3  0,0,1,0 :  channel mode: 1-channel (mono)
  1  0,1,2,3  0,0,1,1 :  channel mode: primary/secondary, subframe A primary
  1  0,1,2,3  0,1,0,0 :  channel mode: stereo mode. subframe A=left channel
  1  0,1,2,3  1,1,1,1 :  pointer to Byte 3.
  1  0,1,2,3  else    :  reserved

  1  4,5,6,7  0,0,0,0 :  no user-bit info indicated (default)
  1  4,5,6,7  0,0,0,1 :  user-bits arranged as 192 bit blocks. starts with
                         preamble "Z"
  1  4,5,6,7  0,0,1,0 :  packet system user bits, defined at the HDLC 
                         protocol
  1  4,5,6,7  0,0,1,1 :  manufacturer-defined user-bit arrangement
  1  4,5,6,7  else    :  reserved

  2   0,1,2     0,0,0 :  max. main-audio resolution: 20 bits (default). Use 
                         of aux-audio not indicated
  2   0,1,2     0,0,1 :  max. main audio resolution: 24 bits
  2   0,1,2     0,1,0 :  max. main audio resolution: 20 bits. Aux-audio 
                         area used for auxiliary audio signal
  2   0,1,2     else  :  reserved

  2   3,4,5     0,0,0 :  main audio resolution not indicated
        table header ---> max resolution  24 bit       max. res. 20 bit
  2   3,4,5     0,0,1 :   real resolution 23 bit                 19 bit
  2   3,4,5     0,1,0 :   real resolution 22 bit                 18 bit
  2   3,4,5     0,1,1 :   real resolution 21 bit                 17 bit
  2   3,4,5     1,0,0 :   real resolution 20 bit                 16 bit
  2   3,4,5     1,0,1 :   real resolution 24 bit                 20 bit
  2   3,4,5     else  :  reserved

  2     6,7   X,X,X,X :  reserved

  3   0 ... 7  .......:  reserved (target of the vector from byte 1)

  4     0,1       0,0 :  no digital audio reference signal (default)
  4     0,1       0,1 :  reference signal degree 1
  4     0,1       1,0 :  reference signal degree 2
  4     0,1       1,1 :  reserved

  4   2 ... 7  .......:  reserved. shall be set zero.

  5   0 ... 7  .......:  reserved

6..9  0 ... 7 of each byte: alphanumeric channel source data. first cha- 
                            racter is byte 6, ascii coded without parity. 
                            LSB are transmitted first, bit 7 is set zero. 
                            non visible control characters prohibited.
                            default value is 0hex.

10..13 0 ... 7 of each byte: alphanumeric channel target data. first
                            character is byte 10. same rules as byte 6...9.

14..17 0 ... 7 of each byte: local sample address as 32 bit binary code.
                            The value is the first sample of the current 
                            block, LSB is transmitted first. The default 
                            value shall be zero. 
18..21 0 ... 7 of each byte: time sample address. Here you see the time 
                            when the source was coded. Later edits shall 
                            not change this info. A chain of "0" indicates 
                            "exact midnight" 12:00:00 p.m. 0th frame
 
 22 ..................:  indication whether the channel status data are 
                         reliable. default: zero. If unreliable, the 
                         matching bits are set 1.
 22   0,1,2,3  .......:  reserved
 
 22   4 ..............:  indicator for byte 0 ... 5
 22   5 ..............:  indicator for byte 6 ... 13
 22   6 ..............:  indicator for byte 14 ... 17
 22   7 ..............:  indicator for byte 18 ... 21  not 22 ;-)

 23  0 ... 7: Cyclic Redundancy Check Character of the channel status data.
              The generating polynomium is: 
              G(x)=x^8 + x^4 + x^3 + x^2 + 1


Eh, you are still reading! Wow!
The next table is the coding of the channel status data block for S/PDIF.
Don't be afraid, it's shorter.


Byte  Bits     Values    Meaning
  0     0          0  :  Consumer use of Channel Status Block 
  0     0          1  :  Professional use

  0     1          0  :  Normal Audio mode
  0     1          1  :  Non-Audio mode (such as data transmission or sync)

  0     2          0  :  Copying may be prohibited, depends on Byte 1, Bit 7
  0     2          1  :  Copying always allowed. (COPY-BIT!)

  0    3,4       0,0  :  Preemphasis off
  0    3,4       1,0  :  Preemphasis 50 us, 15us
                else  :  reserved

  0     5          0  :  digital data, if Bit 1="1"
  0     5          1  :  reserved, if Bit 1="1", 4-ch audio, if Bit 1="0"

  0    6,7       0,0  :  Bytes 1 ... 3 are defined as channel status data.
  0    6,7      else  :  reserved

  1   0,1,2     ......:  Category Code, this means, the meaning of the Bits
                         3 ... 6 depends on the status of Bits 0 ... 2

  1   3...6     ......:  Cat. 0,0,1 - Digital Broadcast
              0,0,0,0 :  Japan
              0,0,1,1 :  USA
              1,0,0,0 :  Europe
              0,0,0,1 :  "Electronic software delivery"
              else    :  reserved

  1   3...6     ......:  Cat. 0,1,0 - Signal processing
              0,0,0,0 :  PCM encoder/decoder
              0,0,1,0 :  Digital sound sampler
              0,1,0,0 :  Digital signal mixer
              1,1,0,0 :  Sample rate converter
              else    :  reserved

  1   3...6     ......:  Cat. 0,1,1
              0,0,X,X :  ADC w/o copyright
              0,1,X,X :  ADC w/ copyright (use of copy- and L-Bits)
              1,X,X,X :  Digital broadcast reception
              else    :  reserved

  1   3...6     ......:  Cat. 1,0,0 - Laser-optical storages
              0,0,0,0 :  CD compatible with IEC-908 (CD-DA)
              1,0,0,0 :  CD incompatible with IEC-908
              else    :  reserved

  1   3...6     ......:  Cat. 1,0,1 - Musical instruments
              0,0,0,0 :  Synthesizer
              1,0,0,0 :  Microphone(s)
              else    :  reserved

  1   3...6     ......:  Cat. 1,1,0 - Magnetic tape or disk
              0,0,0,0 :  DAT
              1,0,0,0 :  Digital audio VCR (video cassette recorder)
              else    :  reserved

  1   3...6     ......:  reserved. All values reserved

  1     7       ......:  L-Bit: Copy generation indicator
  1     7           0 :  in Cat 0,0,1; 0,1,1; 1,0,0: Original generation
                         in the other Cat's: 1st or higher copy
  1     7           1 :  in Cat 0,0,1; 0,1,1; 1,0,0: 1st or higher copy
                         in the other Cat's: Original generation

  2   0...3     ......:  Source number
              0,0,0,0 :  not indicated
              1,0,0,0 :  No. 1
              0,1,0,0 :  No. 2
              1,1,0,0 :  No. 3
              0,0,1,0 :  No. 4
              .......    and so on, like the binary code
              1,1,1,1 :  No. 15

  2   0...3     ......:  Channel number
              0,0,0,0 :  not indicated
              1,0,0,0 :  No. 1
              0,1,0,0 :  No. 2
              1,1,0,0 :  No. 3
              0,0,1,0 :  No. 4
              .......    and so on, like the binary code
              1,1,1,1 :  No. 15

  3   0...3   0,0,0,0 :  Sampling frequency = 44.1 kHz
  3   0...3   0,1,0,0 :  Sampling frequency = 48 kHz
  3   0...3   1,1,0,0 :  Sampling frequency = 32 kHz
  3   0...3   else    :  reserved

  3    4,5    ........:  Clock accuracy
  3    4,5        0,0 :  Level II, +/- 1000 ppm (default)
  3    4,5        0,1 :  Level III, variable
  3    4,5        1,0 :  Level I, +/- 50 ppm, High accuracy
  3    4,5        1,1 :  reserved

  3    6,7        X,X :  reserved

4...23 ...............:  reserved


These are the informations that are transmitted in the Channel Status
block. If you want a break, so take it now. At last we will be dealing
with the hardware of these interfaces.

The hardware table:

Feature                   AES/EBU                         S/PDIF
------------------------------------------------------------------------
Plugs                     XLR 3-pin                       RCA/Cinch
                          1= GND 2vs3: signal             Mini-Jack 3.5mm
                          absolute phase w/out interest   tip=output
                                                          ring=input

Voltage                   2 ... 7 V  (usually 5 V)        0.5 V

Impedance (to be matched) 110 Ohms                        75 Ohms

Signal form               balanced                        unbalanced

For short distances       microphone/studio cables        line cables


Because there is no DC offest in the digital signal, you may use 
transformers for balancing/unbalancing. Be careful to match the voltage 
range. 
The EBU document Tech 3250 says: "Connection of a 'professional use' 
[AES/EBU] transmitter with a 'consumer use' [S/PDIF] receiver or vice versa 
might result in unpredictable operation."


