Object-Based Audio with VISR

Overview

Although the VISR framework is deliberately application-agnostic, it is well-suited for working with spatial and object-based audio. This is due to a number of reasons:

  • The focus on multichannel audio makes it suitable for object-based audio, which often features complex sound scenes consisting of many signals, as well as multichannel reproduction systems.
  • The ability to handle complex, also structure parameter data allows for the generation and transmission of object metadata within the system.
  • The modular, reusable component structure fosters the creation of complex signal flows that are often used in object-bases audio.
  • Last but not least, the VISR framework was conceived in the S3A project a research project on spatial and object-based audio.

Consequently the VISR framework contains different types of functionality to support processing of object-based audio. These are typically implemened as libraries, for example component libraries.

The VISR object model

_images/visr_object_based_object_model_hierarchy.png

Object types and hierarchy.

The VISR object model supports a hierarchical and extensible set of object types. These types and their relations are shown above.

JSON representation

For transmission, object vectors are encoded as JSON messages.

A scene vector (or a part thereof) has the format

"objects": [ {<object 0>}, {<object 1>}, ... , {<object n>}]

where <object k> stands for the encoding of a single object. The objects can be arranged in arbitary order as long as the object ids are unique. Moreover, the object vector can be split into arbitrary subsets and be transmitted as individual "objects" messages.

Encoding of the individual object types

Coordinate system

Depending on the object type, either Cartesian and spherical coordinates are use. The coordinate axes follow, e.g., the ITU-R BS.2051 conventions.

For Cartesian coordinates, this means:

  • x axis points to the front
  • y axis points to the left.
  • z axis points up.

Coordinates are measured in meters.

Likewise, for spherical coordinates:

  • The azimuth angle is measured counterclockwise from the x axis (front).
  • The elevation angle is measure up (positive values) or down (negative values) from the horizontal plane.

Coordinates are represented in the JSON format in degree (this does not necessarily hold for the internal representation in the renderer).

Object

Object is the base type of all objects. Therefore, the attributes are common to all objects. The following attributes are supported:

"id"
The object id, a nonnegative integer that must be unique withing the object vector (mandatory attribute).
"group"
The group id, a nonnegative integer (mandatory attribute). Not used in the core renderer, but potentially in the metadata adaptation process.
"channels"
A list of audio channel indices referencing the audio signals associated with this object. The list is formed as a string consisting of comma-separated unsigned integers enclosed in quotation marks, e.g., “0,3, 5, 7 ” with arbitrary amounts of whitespace in between. The format also allows Matlab-style ranges for any part of the list. For instance, “0, 2 : 2 : 8, 10” is equivalent to “0,2,4,6,8,10”. This is a mandatory argument. The reuired number of channels is typically determined by the object type and its parameters. For instance, point source objects are invariably single-channel, while the number of required channels of a HOA object depends on the Ambisonics order specified by the “order” of this object.
"level"
The level of the audio object in linear scale as a floating-point number. Note that this value does not necessarily denote the loudness of the reproduced object, since the latter also depends on the level of the audio signal(s). (Mandatory argument).
"priority"
The priority of the object given as an unsigned integer (mandatory argument). Lower numbers represent higher priority, with “0” being the highest prority. Not currently used in the core renderer, but potentially (and moe appropriately) in the metadata processing.
"eq"
An array of parametric EQ parameters to be applied to all audio signals for this object. This is an optional attribute, if not present, a ‘flat’, i.e., unity-gain equalisation curve is applied. The attribute has the format
"eq": [{<eq 0>}, {<eq 1},...{<eq n-1>}]

: The number of admissible EQ sections is renderer-dependent. Providing more EQ parameters for a single object than supported by the renderer might result in an error message and termination of the renderer. If less EQ parameters are sent than supported by the renderer, the remaining EQ sections are padded with ‘flat’ characteristics. The individual EQ section have the form

{ "type": "<type>", "f": (center/cutoff frequency), "q": (quality) [, "gain": (dB) }

with the following attributes:

"type"
A type string chosen from the following values: "lowpass", "highpass", "bandpass", "bandstop", "peak", "lowshelf", "highshelf", "allpass".
"f"
Centre/cutoff frequency in Hz (depending on the filter type).
"q"
Dimensionless Q (quality) parameter.
"gain"
Optional gain parameter (in dB). If not provided, the default value of 0 dB is used. Only used by the filter types "peak", "lowshelf", and "highshelf". The filter characteristics follow the Audio EQ Cookbook formulas.

PointSource

Point sources are invariable single-channel objects, that is the "channels" attribute of the base type Object must contain a single channel index. The type string is "point".

The point source coordinates sre specified in the "position", which is an object holding either Cartesian coordinates "x", "y", and "z" or spherical coordinates "az", "el", "radius"

Example
{ "id": "5", "channels": "2", "type": "point", "group": "2", "priority": "0", "level": "0.350",
  "position": {"x": "3.0", "y": "-0.5", "z": "0.25" } }

or, using polar coordinates,

{ "id": "5", "channels": "2", "type": "point", "group": "2", "priority": "0", "level": "0.350",
  "position": {"az": "30", "el": "15.0", "radius": "1.25" } }

PlaneWave

Plane waves differ from point sources that they do not exhibit distance-dependent attenuation and do not provide parallalax effects for moving listener positions. Because the main reproduction method in the VISR renderer at the moment is VBAP, plane waves are handled identically to point sources. This might change for alternative reproduction methods, including listener position adaptive VBAP.

Plane waves use the type "plane" and are single-channel objects.

The plane wave representation uses an object "direction" containing the attributes "az" and "el" to describe azimuth and elevation of the direction of the impinging source. The third parameter "refDist" (reference distance) encodes the relative timing of the object’s audio signal: A value of 0 means that a sound event at signal time 0 is perceived at the central listener at time 0.

Example
{"id": 5, "channels": 5, "type": "plane", "group": 0, "priority": 0, "level": 1.000000, "direction": {"az": 30.0, "el": 45.0, "refdist": 12.00 } }

PointSourceDiffuse

Point source with diffuseness are derived from PointSource and therefore support all attributes of the latter. In addition they define the attribute "diffuseness" that is a floating-point supposed to be in the range between 0.0 and 1.0 and describes the amount of diffuse energy relative to the point source radiation.

They are single-channel and use the type string "pointdiffuse".

Example
{"id": "5", "channels": "5", "type": "pointdiffuse", "group": "0", "priority": "0",
 "level": "1.0", "diffuseness": "0.35", "position": {"x": "3.0", "y": "-0.5", "z": "0.25" } }

DiffuseSource

This source type describes a surrounding objects reproducing decorrelated signals obtained from the single object audio signal.

This object does not introduce any other attributes apart from those inherited from the base class Object. The type string is "diffuse".

Example
{"id": 3, "channels": 3, "type": "diffuse", "group": 0, "priority": 0, "level": 1.000000}

HoaSource

This source type represents a Ambisonics sound field of arbitrary order. It is a multichannel object where the number of channels depends on the Ambisonics order \(N\): \(ch=(N+1)^{2}\). The audio signals (as indexed by the "channels" attribute, are expected to be in ACN channel order http://ambisonics.ch/standards/channels/.

The type string is "hoa".

Example
{"type": "hoa", "channels": "0:8", "group": 0, "id": 0, "level": 1, "order": 2, "priority": 0},

ChannelObject

Channel objects are audio signals that are routed directly to a loudspeaker (or group of loudspeakers) specified by an id.

This type is derived from Object and adds the "outputChannels" attribute. This attribute is a string contains a list of loudspeaker ids (i.e., labels). Channel objects can contain an arbitrary number of channels. The outputChannels must contain an entry for each channel. This can be either a single label or a list of labels enclosed in square brackets. In the latter case, the respective channel is routed to the list of loudspeakers.

An diffuseness attribute controls the level of decorrelation applied, from 0.0 (no decorrelation) to 1.0 (fully replayed to the decorrelation filters). OPtional attribute, default is 0.0.

If a channel is routed to more than one loudspeaker, the levels of these loudspeakers are normalised using the same norm as the respective panner (VBAP, VBIP in case of separate high-frequency panning, or diffuse panning).

Example

Single-channel object routed to a single loudspeaker:

{"id": 2, "channels": "3", "type": "channel", "group": 0, "priority": 0, "level": 0.50000, "diffuseness": 0.5,
  "outputChannels": "M+030"} ]

Alternative syntax for single-channel syntax :

{"id": 2, "channels": "3", "type": "channel", "group": 0, "priority": 0, "level": 0.50000, "diffuseness": 0.5,
 "outputChannels": "[M+030]"} ]

Single channel routed to multiple loudspeakers:

{"id": 2, "channels": "3", "type": "channel", "group": 0, "priority": 0, "level": 0.50000, "diffuseness": 0.5,
 "outputChannels": "[M+030, M-030]"} ]

Multiple channels routed to single or multiple loudspeakers:

{ "id": 2, "channels": "4:8", "type": "channel", "group": 0, "priority": 0,
  "level": 0.350000, "diffuseness": 0.25,
  "outputChannels": "M+000, [M+030], [M-030, U+030], U+110"}]

PointSourceWithReverb

PointSourceWithReverb is a single-channel object that adds reverb to a PointSource. It uses the type string "pointreverb". In addition to the Object and PointSource properties it defines an attribute “room” containing the objects "ereflect" (early reflections) and "lreverb" (late reverberation). "ereflect" is an array of early reflection objects, consisting of IIR coefficients ("biquadsos", a point source position "position" using the same format as in PointSource, and additional level and delay information.

The maximum number of discrete reflections per reverb object is a configuration parameter of the renderer.

The "lreverb" object contains parameter data in fixed frequency bands that are used to synthesize reverb tails.

Example
{ "type": "pointreverb", "channels": "4", "group": 0, "id": 1,"level": 1,
  "position": {"x": 1.5, "y": 0.0, "z": 0.0}, "priority": 0,
  "room": {
  "ereflect": [{"biquadsos": [{"a0": "1.00000e+00", "a1": "-1.05734e+00", "a2": "5.69314e-01",
                               "b0": "3.87648e-01", "b1": "0.00000e+00", "b2": "0.00000e+00"},
                              ( more biquad coefficients)
                              {"a0": "1.00000e+00", "a1": "-7.20132e-02", "a2": "6.48827e-01",
                               "b0": "1.00000e+00", "b1": "0.00000e+00", "b2": "0.00000e+00"}],
                "delay": "0.00931", "level": "0.0603584806", "position": {"az": 337.0, "el": "-1.00000", "refdist": "1.00000"} },
              ( more early reflections )
            ],
"lreverb": {"attacktime": "0.01321, 0.01321, 0.01321, 0.01321, 0.01321, 0.01321, 0.01321, 0.01321, 0.01321",
            "decayconst": "-4.50698, -5.02028, -5.75817, -5.36509, -5.42654, -5.62316, -5.75298, -6.41075, -11.13465",
            "level": "0.02522, 0.01052, 0.01657, 0.02744, 0.02058, 0.01679, 0.01698, 0.01433, 0.00041", "delay": "0.00931" } }

Predefined object-based rendering primitives and renderers

The default component library contains numerous atomic components for object-based audio as well as ready-made rendering signal flows.

These include:

Standalone renderers

The loudspeaker renderers are described in Section VISR object-based loudspeaker renderer.

Object-Based Reverberation

Note

This section will describe the support for object-based reverberation in the VISR renderers. This is based on the reverb object [CFJ+17] using the object representation described in Section PointSourceWithReverb. The functionality is contained in the library reverbobject and the corresponding Python module reverbobject.

The loudspeaker configuration format

Loudspeaker configurations are used to tell a renderer about the loudspeaker positions and other properties of the setup. It is used primarily in the loudspeaker renderers, including binaural renderering that use a virtual loudspeaker setup internally.

This section describes the format of a loudspeaker configuration and explains the helper functions provided with a VISR installation to create configuration files.

Configuration file example

A loudspeaker configuration has to be specified in an XML file.

An example is given below.

<panningConfiguration>
  <loudspeaker id="M+000" channel="1" eq="highpass">
    <cart x="1.0" y="0.0" z="0"/>
  </loudspeaker>
  <loudspeaker id="M-030" channel="2" eq="highpass">
    <polar az="-30.0" el="0.0" r="1.0"/>
  </loudspeaker>
  <loudspeaker id="M+030" channel="3" eq="highpass">
    <polar az="30.0" el="0.0" r="1.0"/>
  </loudspeaker>
  <loudspeaker id="M-110" channel="4" eq="highpass">
    <polar az="-110.0" el="0.0" r="1.0"/>
  </loudspeaker>
  <loudspeaker id="M+110" channel="5" eq="highpass">
    <polar az="110.0" el="0.0" r="1.0"/>
  </loudspeaker>
  <loudspeaker id="U-030" channel="6" eq="highpass">
    <polar az="-30.0" el="30.0" r="1.0"/>
  </loudspeaker>
  <loudspeaker id="U+030" channel="7" eq="highpass">
    <polar az="30.0" el="30.0" r="1.0"/>
  </loudspeaker>
  <loudspeaker id="U-110" channel="8" eq="highpass">
    <polar az="-110.0" el="30.0" r="1.0"/>
  </loudspeaker>
  <loudspeaker id="U+110" channel="9" eq="highpass">
    <polar az="110.0" el="30.0" r="1.0"/>
  </loudspeaker>
  <virtualspeaker id="VoS">
    <polar az="0.0" el="-90.0" r="1.0"/>
    <route lspId="M+000" gainDB="-13.9794"/>
    <route lspId="M+030" gainDB="-13.9794"/>
    <route lspId="M-030" gainDB="-13.9794"/>
    <route lspId="M+110" gainDB="-13.9794"/>
    <route lspId="M-110" gainDB="-13.9794"/>
  </virtualspeaker>
  <triplet l1="VoS" l2="M+110" l3="M-110"/>
  <triplet l1="M-030" l2="VoS" l3="M-110"/>
  <triplet l1="M-030" l2="VoS" l3="M+000"/>
  <triplet l1="M-030" l2="U-030" l3="M+000"/>
  <triplet l1="M+030" l2="VoS" l3="M+000"/>
  <triplet l1="M+030" l2="VoS" l3="M+110"/>
  <triplet l1="U+030" l2="U-030" l3="M+000"/>
  <triplet l1="U+030" l2="M+030" l3="M+000"/>
  <triplet l1="U-110" l2="M-030" l3="U-030"/>
  <triplet l1="U-110" l2="M-030" l3="M-110"/>
  <triplet l1="U+110" l2="U-110" l3="M-110"/>
  <triplet l1="U+110" l2="M+110" l3="M-110"/>
  <triplet l1="U+030" l2="U-110" l3="U-030"/>
  <triplet l1="U+030" l2="U+110" l3="U-110"/>
  <triplet l1="U+030" l2="U+110" l3="M+110"/>
  <triplet l1="U+030" l2="M+030" l3="M+110"/>
  <subwoofer assignedLoudspeakers="M+000, M-030, M+030, M-110, M+110, U-030, U+030, U-110, U+110"
          channel="10" delay="0" eq="lowpass" gainDB="0"
          weights="1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0"
          />
  <outputEqConfiguration numberOfBiquads="1" type="iir">
      <filterSpec name="lowpass">
         <biquad a1="-1.9688283" a2="0.96907117" b0="6.0729856e-05" b1="0.00012145971" b2="6.0729856e-05"/>
      </filterSpec>
      <filterSpec name="highpass">
         <biquad a1="-1.9688283" a2="0.96907117" b0="-0.98447486" b1="1.9689497" b2="-0.98447486"/>
      </filterSpec>
  </outputEqConfiguration>
</panningConfiguration>

Predefined configuration files

The VISR package comes with a number of preconfigured loudspeaker configurations. They are contained in the directory $VISR_ROOT/config/.

The subdirectory config/generic contains standard configurations, mainly from the ITU-R BS2051 standard.

The supported configurations are:

Name File name Number of loudspeakers (upper, horizontal, lower) Dimension Virtual loudspeakers (azimuth, elevation) Comment
System A bs2051-0+2+0 0, 2, 0 2D (180,0) Stereo
System B bs2051-0+5+0 0, 5, 0 2D 5.1
System C bs2051-2+5+0 2, 5, 0 3D (0,90), (0,-90)  
System D bs2051-3+7+0 3, 7, 0 3D (0,90), (0,-90)  
System E bs2051-4+5+0 4, 5, 0 3D (0,90), (0,-90)  
System F bs2051-4+5+1 4, 5, 1 3D (0,90), (0,-90)  
System G bs2051-4+9+0 4, 9, 0 3D (0,90), (0,-90)  
System H bs2051-9+10+3 9, 10, 3 3D (0,-90) 22.2 (NHK)

For all configurations, versions with and without subwoofer channels are provided. The version with subwoofer has no suffix, the version without has -no-subwoofer appended to the file name. In general, the version with subwoofer should be preferred and the generated loudspeaker outputs are identical. The output channel mapping is identical for both cases. In some cases the subwoofer channels are embedded into the block of loudspeaker output channels. In the corresponding configurations without subwoofer, these channels are not used.

For the stereo configuration, an additional configuration bs2051-0+2+0-rear-fading.xml is provided in which sound sources are faded out as they approach \(180^{\circ}\). In all other cases, the energy from virtual loudspeakers (as denoted in the table above) is distributed to neighboring real loudspeakers.

The subdirectories config/isvr, config/surrey, and config/bbc contain examples of actual listening rooms. The functions to generated these configurations are contained in the subdirectories scripts/ within these folders.

Generation functions

To ease the creation of generation functions, the VISR framework provides several Python functions to create the XML configuration files from a number of loudspeaker coordinates and additional optional parameters. These functions are contained in the Python module loudspeakerconfig. If the VISR framework was installed through a binary installer and Python was configured as described in Configuration, then the package can be directly imported, e.g.,

import loudspeakerconfig
loudspeakerconfig.createArrayConfigFile( ... )

or

from loudspeakerconfig import createArrayConfigFile( ... )

The main function in this module is createArrayConfigFile(). It takes a set of loudspeaker coordinates, an output file name, and a large set of additional options:

loudspeakerconfig.createArrayConfigFile(outputFileName, lspPositions, twoDconfig=False, sphericalPositions=False, channelIndices=None, loudspeakerLabels=None, triplets=None, distanceDelay=False, distanceAttenuation=False, lspDelays=None, lspGainDB=None, eqConfiguration=None, virtualLoudspeakers=[], subwooferConfig=[], comment=None, speedOfSound=340.0)

Generate a loudspeaker configuration XML file.

Parameters:
  • outputFileName (string) – The file name of the XML file to be written. This can be a file name or path. The file extension (typically .xml) must be provided by the user.
  • lspPositions (array-like, 3xL or 2xL, where L is the number of loudspeakers.) – Provide the loudspeaker in Cartesian coordinates, relative to the centre of the array.
  • twoDconfig (bool, optional) – Whether the loudspeaker configuration is 2D or 3D (default). In the former case, the lspPositions parameter does not need to have a third row, and it is ignored if present. If twoDconfig if True, then the loudspeaker coordinates in the do not have an “z” or “el” coordinate. Likewise, the triangulation “triplets” consist only of two loudspeakers.
  • sphericalPositions (bool, optional) – Specify whether the loudspeaker and virtual loudspeaker positions are written in spherical (True) or Cartesian coordinates (False). Default is Cartesian.
  • channelIndices (array-like, optional) – A list of output integer channel indices, one for each real loudspeaker. Optional argument, if not provided, consecutive indices starting from 1 are assigned. If provided, the length of the array must match the number of real loudspeakers, and indices must be unique.
  • loudspeakerLabels (array-like, optional) – A list of strings containing alphanumerical labels for the real loudspeakers. Labels must be unique, consist of the characters ‘a-zA-Z0-9&()+:_-‘, one for each real loudspeaker. The labels are used to reference loudspeakers in triplets, virtual loudspeaker routings, and subwoofer configs. Optional parameter. If not provided, labels of the form ‘lsp_i’ with i=1,2,… are generated.
  • triplets (array-like, optional.) – A loudspeaker triangulation. To be provided as a list of arrays consisting of three (or two in case of a 2D configuration) loudspeaker labels. Labels must match existing values of the loudspeakerLabels parameter. Optional parameter, to be provided only in special cases. By default, the triangulation is computed internally.
  • distanceDelay (bool, optional) – Whether the loudspeaker signals are delayed such that they arrive simultaneously in the array centre. This can be used if the loudspeaker distances to the centre ar not equal. In this case the farthest loudspeaker gets a delay of 0 s, and closer loudpeakers a positive delay. The distance compensation delay is added to the lspDelays parameter (if present). Optional attribute. The default (False) means no distance attenuation is used.
  • distanceAttenuation (bool, optional) – Whether the loudspeaker gains shall be scaled if the loudspeaker distances are not 1.0. In this case, a 1/r distance law is applied such that the farthest loudspeaker gets a scaling factor of 0 dB, and lower factors are assigned to loudspeakers closer to the centre. The gain factors are applied on top of the optional parameter lspGainDB, if present. Optional attribute. Default is False (no distance attenutation applied)
  • lspDelays (array-like, optional) – An array of delay values to be applied tothe loudspeakers. Values are to be provided in seconds. If not provided, no delays are applied. If specified, the length of the array must match the number of real loudspeakers.
  • lspGainDB (array-like, optional.) – An array of gain values (in dB) to adjust the output gains of the real loudpeakers. If provided, the length must match the number of real loudspeakers. By default, no additional gains are applied.
  • virtualLoudspeakers (array of dicts, optional) –

    Provide a set of additional virtual/phantom/dead/imaginary loudspeaker nodes to adjust the triangulation of the array. Each entry is a dict consisting of the following key-value pairs.

    • ”id”: A alphanumeric id, following the same rules as the loudspeaker indices. Must be unique across all real and imaginary loudspeakers.
    • ”pos”: A 3- or vector containing the position in Cartesian coordinates. 2 elements are allowed for 2D setups.
    • ”routing”: Specification how the panning gains calculated for this loudspeaker are distributed to neighbouring real loudspeakers. Provided as a list of tuples (label, gain), where label is the id of a real loudspeaker and gain is a linear gain value. Optional element, if not given, the energy of the virtual loudspeaker is discarded.

    Optional argument. No virtual loudspeakers are created if not specified.

  • eqConfiguration (array of structures (dicts), optional) –

    Define a set of EQ filters to be applied to loudspeaker and subwoofer output channels. Each entry of the list is a dict containing the following key-value pairs.

    • ”name”: A unique, nonempty id that is referenced in loudspeaker and subwoofer specifications.
    • ”filter”: A list of biquad definitions, where each element is a dictionary containing the keys ‘b’ and ‘a’ that represent the numerator and denominator of the transfer function. ‘b’ must be a 3-element numeric vector, and ‘a’ a three- or two-element numeric vector. In the latter case, the leading coefficient is assumed to be 1, i.e., a normalised transfer function.
    • ”loudspeakers”: A list of loudspeaker labels (real loudspeakers) to whom the eq is applied.
  • subwooferConfig (array of dicts, optional) –

    A list of subwoofer specifications, where each entry is a dictionary with the following key-value pairs:

    • ”name”: A string to name the subwoofer. If not provided, a default name will be generated.
    • ”channel”: An output channel number for the subwoofer signal. Must be unique across all loudspeakers and subwoofers.
    • ”assignedSpeakers”: A list of ids of (real) loudspeakers. The signals of these loudspeakers are used in the computation of the subwoofer signal.
    • ”weights”: An optional weighting applied to the loudspeaker signals of the the assigned loudspeakers. If provided, it must be an array-like sequence with the same length as assignedSpeakers. If not given, all assigned speakers are weighted equally with factor “1.0”.
  • comment (string, optional) – Optional string to be written as an XML comment at the head of the file.

Examples

A minimal example of a 3D configuration:

createArrayConfigFile( 'bs2051-4+5+0.xml',
                       lspPositions = lspPos,
                       twoDconfig = False,
                       sphericalPositions=True,
                       channelIndices = [1, 2, 3, 5, 6, 7, 8, 9, 10],
                       loudspeakerLabels =  ["M+030", "M-030", "M+000", "M+110", "M-110",
                          "U+030", "U-030", "U+110", "U-110"  ],
                       virtualLoudspeakers = [ { "id": "VotD", "pos": [0.0, 0.0,-1.0],
                                            "routing": [ ("M+030", 0.2), ("M-030", 0.2),
                                             ("M+000", 0.2), ("M+110", 0.2), ("M-110", 0.2) ] }]

The function createArrayConfigFromSofa() can be used to create configuration files from a SOFA file to be used, for example in a virtual loudspeaker renderer (visr_bst.VirtualLoudspeakerRenderer):

loudspeakerconfig.createArrayConfigFromSofa(sofaFile, xmlFile=None, lspLabels=None, twoDSetup=False, virtualLoudspeakers=[])

Create a loudspeaker configuraton file from a SOFA file containing a number of emitters representing loudspeakers.

Parameters:
  • sofaFile (string) – A file path to a SOFA file.
  • xmlFile (string, optional) – Path of the XML output file to be written. Optional argument, if not provided, the SOFA file path is used with the extension replaced by “.xml”
  • lspLabels (list of strings, optional) – List of loudspeaker labels, must match the number of emitters in the SOFA files. If not provided, numbered labels are automatically generated.
  • twoDSetup (bool, optional) – Flag specifying whether the aray is to be considered plane (True) or 3D (False). Optional value, dafault is False (3D).
  • virtualLoudspeakers (list, optional) – A list of virtual loudspeakers to be added to the setup. Each entry must be a Python dict as decribed in the function loudspeakerconfig.createArrayConfigFile().

Format description

The root node of the XML file is <panningConfiguration>. This root element supports the following optional attributes:

isInfinite
Whether the loudspeakers are regarded as point sources located on the unit sphere (false) or as plane waves, corresponding to an infinite distance (true). The default value is false.
dimension
Whether the setup is considered as a 2-dimensional configuration (value 2) or as three-dimensional (3, thedefault). In the 2D case, the array is considered in the x-y plane , and the z or el attributes of the loudspeaker positions are not evaluated. In this case, the triplet specifications consist of two indices only (technically they are pairs, not triplets).

Within the <panningConfiguration> root element, the following elements are supported:

<loudspeaker>

Represents a reproduction loudspeaker. The position is encoded either in a <cart> node representing the cartesian coordinates in the x, y and z attributes (floating point values in meter), or a <polar> node with the attributes az and el (azimuth and elevation, both in degree) and r (radius, in meter).

The <loudspeaker> nodes supports for a number of attributes:

  • id A mandatory, non-empty string identification for the loudspeaker, which must be unique across all <loudspeaker> and <virtualspeaker> (see below) elements. Permitted are alpha-numeric characters, numbers, and the characters “@&()+/:_-“. ID strings are case-sensitive.
  • channel The output channel number (sound card channel) for this loudspeaker. Logical channel indices start from 1. Each channel must be assigned at most once over the set of all loudspeaker and subwoofers of the setup..
  • gainDB or gain Additional gain adjustment for this loudspeaker, either in linear scale or in dB (floating-point values. The default value is 1.0 or 0 dB. gainDB or gain are mutually exclusive.
  • delay Delay adjustment to be applied to this loudspeaker as a floating-point value in seconds. The default value is 0.0).
  • eq An optional output equalisation filter to be applied for this loudspeaker. Specified as a non-empty string that needs to match an filterSpec element in the outputEqConfiguration element (see below). If not given, no EQ is applied to for this loudspeaker.
<virtualspeaker>

An additional vertex added to the triangulation that does not correspond to a physical loudspeaker. Consist of a numerical id attribute and a position specified either as a <cart> or a <polar> node (see <loudspeaker> specification).

The <virtualspeaker> node provides the following configuration options:

  • A mandatory, nonempty and unique attribute id that follows the same rules as for the <loudspeaker> elements.

  • A number of route sub-elements that specify how the energy from this virtual loudspeaker is routed to real loudspeakers. The route element has the following attributes: * lspId: The ID of an existing real loudspeaker. * gainDB: A scaling factor with which the gain of the virtual loudspeaker is distributed to the real loudspeaker.

    In the above example, the routing specification is given by

     <virtualspeaker id="VoS">
      <polar az="0.0" el="-90.0" r="1.0"/>
      <route lspId="M+000" gainDB="-13.9794"/>
      <route lspId="M+030" gainDB="-13.9794"/>
      <route lspId="M-030" gainDB="-13.9794"/>
      <route lspId="M+110" gainDB="-13.9794"/>
      <route lspId="M-110" gainDB="-13.9794"/>
    </virtualspeaker>
    

    That means that the energy of the virtual speaker "vos" is routed to five surrounding speakers, with a scaling factor of 13.97 dB each.

<subwoofer> Specify a subwoofer channel. In the current implementation, the loudspeaker are weighted and mixed into an arbitray number of subwoofer channels. The attributes are:

  • assignedLoudspeakers The loudspeaker signals (given as a sequence of logical loudspeaker IDs) that contribute to the subwoofer signals. Given as comma-separated list of loudspeaker index or loudspeaker ranges. Index sequences are similar to Matlab array definitions, except that thes commas separating the parts of the sequence are compulsory.

    Complex example:

    assignedLoudspeakers = "1, 3,4,5:7, 2, 8:-3:1"
    
  • weights Optional weights (linear scale) that scale the contributions of the assigned speakers to the subwoofer signal. Given as a sequence of comma-separated linear-scale gain values, Matlab ranges are also allowed. The number of elements must match the assignedLoudspeakers index list. Optional value, the default option assigns 1.0 for all assigned loudspeakers. Example: “0:0.2:1.0, 1, 1, 1:-0.2:0”.

  • gainDB or gain Additional gain adjustment for this subwoofer, either in linear scale or in dB (floating-point valus, default 1.0 / 0 dB ). Applied on top of the weight attributes to the summed subwoofer signal. See the <loudspeaker> specification.

  • delay Delay adjustment for this (floating-point value in seconds, default 0.0). See the <loudspeaker> specification.

<triplet>

Loudspeaker triplet specified by the attributes l1, l2, and l3. The values of l1, l2, and l3 must correspond to IDs of existing real or virtual loudspeakers. In case of a 2D setup, only l1 and l2 are evaluated.

Note

At the time being, triplet specifications must be generated externally and placed in the configuration file. This is typically done by creating a Delaunay triangulation on the sphere, which can be done in Matlab or Python.

Future versions of the loudspeaker renderer might perform the triangulation internally, or might not require a conventional triangulation at all. In these cases, is it possible that the renderer ignores or internally adapts the specified triplets.

outputEqConfiguration

This optional element must occur at most once. It provides a global specification for equalisation filters for loudspeakers and subwoofers.

<outputEqConfiguration  type="iir" numberOfBiquads="1">
  <filterSpec name="lowpass">
    <biquad a1="-1.9688283" a2="0.96907117" b0="6.0729856e-05" b1="0.00012145971" b2="6.0729856e-05"/>
  </filterSpec>
  <filterSpec name="highpass">
    <biquad a1="-1.9688283" a2="0.96907117" b0="-0.98447486" b1="1.9689497" b2="-0.98447486"/>
  </filterSpec>
</outputEqConfiguration>

The attributes are:

  • type: The type of the output filters. At the moment, only IIR filters provide as second-order sections (biquads) are supported. Thus, the value "iir" must be set.
  • numberOfBiquads: This value is specific to the "iir" filter type.

The filters are described in filterSpec elements. These are identifed by a name attribute, which must be an non-empty string unique across all filterSpec elements. For the type iir, a filterSpec element consists of at most numberOfBiquad nodes of type biquad, which represent the coefficients of one second-order IIR (biquad) section. This is done through the attributes a1, a2, b0, b1, b2 that represent the coefficients of the normalised transfer function

\[H(z) = \frac{ b_0 + b_1 z^{-1} + b_{2}z^{-2} }{1 + a_1 z^{-1} + a_{2}z^{-2}}\]