SpeechWaveHeader

GoogleApi.ContentWarehouse.V1.Model.SpeechWaveHeader


Table of Contents ▼

Jump to a specific part of the page:

Description

A general-purpose buffer to contain sequences of samples. When representing a waveform, the samples are the scalar values of an acoustic signal. When representing a sequence of feature frames, the samples are vector-valued frames.

Attributes List

This module has the following attributes (case-insensitive ascending order):

View Attributes

Attributes

  1. atomicSize (type: integer(), default: nil)
    - Size of atomic type, in bytes.
  2. atomicType (type: String.t, default: nil)
    - Numeric type of data elements (if generic)
  3. bitRate (type: number(), default: nil)
    - For compressed signals with fixed bitrate, this is the number of bits per second.
  4. byteOrder (type: String.t, default: nil)
    - Byte-order of the atomic_type When atomic_type == "char", byte_order should be always "1". When atomic_type == "int16", byte_order can be either "01" (Intel) or "10" (Motorola). Byte order should default to Intel when in question.
  5. details (type: String.t, default: nil)
    - Typically contains the parameter settings of the program that created the file.
  6. dimension (type: list(integer()), default: nil)
    - Array dimensions for a single sample. For audio samples: mono: rank==0, dimension==[1] stereo: rank==0, dimension==[2] (samples are interleaved) For typical ASR features representing energy, 12 MFCC coefficients, and first and second derivatives: * rank==1 and dimension==[39].
  7. elementsPerSample (type: integer(), default: nil)
    - The number of atomic elements stored per sample. This is the product of all the entries in the dimension array. Written "out of order" in this file to be close to the dimension field, from which it can always be computed.
  8. rank (type: integer(), default: nil)
    - The rank of each sample. For a waveform (signals that are sequences of scalar values), this is 0. For vector-valued signals (used as signals containing sequences of features, for example), this is 1. scalar=0, vector=1, matrix=2, ...
  9. sampleCoding (type: String.t, default: nil)
    - Sample encoding. Can be "ulaw".
  10. sampleRate (type: number(), default: nil)
    - For periodic signals, this is the number samples per second, else 0.0
  11. sampleSize (type: integer(), default: nil)
    - Size of a single sample, in bytes.
  12. sampleType (type: String.t, default: nil)
    - Structure of each sample. "generic" means that the samples are multi-dimensional arrays of atomic_type with the specified rank.
  13. startTime (type: number(), default: nil)
    - Time origin for the signal, in seconds. Warning: Using float can result in rounding errors: float's smallest distance between two representable values (1 ULP; see https://en.wikipedia.org/wiki/Unit_in_the_last_place) between 1024 and 2048 (representing ~17-34 min) is 0.0001220703125, what is approximately double of what we need to represent 1 sample in a 16 kHz sample rate audio. The error is double in the 2048s-4096s, 4x in the 4096s-8192s range etc. Higher sample rate encounters rounding errors earlier: with 96 kHz, rounding errors start at ~2 min (128s).
  14. totalSamples (type: String.t, default: nil)
    - The number of samples in file. Can be inferred for generics from file size.

Type

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.SpeechWaveHeader{
atomicSize: integer() | nil,
atomicType: String.t() | nil,
bitRate: number() | nil,
byteOrder: String.t() | nil,
details: String.t() | nil,
dimension: [integer()] | nil,
elementsPerSample: integer() | nil,
rank: integer() | nil,
sampleCoding: String.t() | nil,
sampleRate: number() | nil,
sampleSize: integer() | nil,
sampleType: String.t() | nil,
startTime: number() | nil,
totalSamples: String.t() | nil
}

Function

@spec decode(struct(), keyword()) :: struct()

Data sourced from HexDocs : GoogleApi.ContentWarehouse.V1.Model.SpeechWaveHeader