Skip to content

Commit 28ed7f5

Browse files
authored
Add signal operators (onnx#3741)
Signed-off-by: Sheil Kumar <[email protected]>
1 parent 566499f commit 28ed7f5

File tree

59 files changed

+2347
-3
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+2347
-3
lines changed

.gitattributes

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
*.pb binary linguist-detectable=false
1+
*.pb binary linguist-detectable=false

docs/Changelog.md

Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20817,6 +20817,170 @@ This version of the operator has been available since version 16 of the default
2081720817
</dl>
2081820818

2081920819
## Version 17 of the default ONNX operator set
20820+
### <a name="BlackmanWindow-17"></a>**BlackmanWindow-17**</a>
20821+
20822+
Generates a Blackman window as described in the paper https://ieeexplore.ieee.org/document/1455106.
20823+
20824+
#### Version
20825+
20826+
This version of the operator has been available since version 17 of the default ONNX operator set.
20827+
20828+
#### Attributes
20829+
20830+
<dl>
20831+
<dt><tt>output_datatype</tt> : int (default is 1)</dt>
20832+
<dd>The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T2. The default value is 1 = FLOAT. </dd>
20833+
<dt><tt>periodic</tt> : int (default is 1)</dt>
20834+
<dd>If 1, returns a window to be used as periodic function. If 0, return a symmetric window. When 'periodic' is specified, hann computes a window of length size + 1 and returns the first size points. The default value is 1. </dd>
20835+
</dl>
20836+
20837+
#### Inputs
20838+
20839+
<dl>
20840+
<dt><tt>size</tt> (non-differentiable) : T1</dt>
20841+
<dd>A scalar value indicating the length of the window.</dd>
20842+
</dl>
20843+
20844+
#### Outputs
20845+
20846+
<dl>
20847+
<dt><tt>output</tt> (non-differentiable) : T2</dt>
20848+
<dd>A Hann window with length: size. The output has the shape: [size].</dd>
20849+
</dl>
20850+
20851+
#### Type Constraints
20852+
20853+
<dl>
20854+
<dt><tt>T1</tt> : tensor(int32), tensor(int64)</dt>
20855+
<dd>Constrain the input size to int64_t.</dd>
20856+
<dt><tt>T2</tt> : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(bfloat16)</dt>
20857+
<dd>Constrain output types to numeric tensors.</dd>
20858+
</dl>
20859+
20860+
### <a name="DFT-17"></a>**DFT-17**</a>
20861+
20862+
Computes the discrete Fourier transform of input.
20863+
20864+
#### Version
20865+
20866+
This version of the operator has been available since version 17 of the default ONNX operator set.
20867+
20868+
#### Attributes
20869+
20870+
<dl>
20871+
<dt><tt>axis</tt> : int (default is 1)</dt>
20872+
<dd>The axis on which to perform the DFT. By default this value is set to 1, which corresponds to the first dimension after the batch index.</dd>
20873+
<dt><tt>inverse</tt> : int (default is 0)</dt>
20874+
<dd>Whether to perform the inverse discrete fourier transform. By default this value is set to 0, which corresponds to false.</dd>
20875+
<dt><tt>onesided</tt> : int (default is 0)</dt>
20876+
<dd>If onesided is 1, only values for w in [0, 1, 2, ..., floor(n_fft/2) + 1] are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., X[m, w] = X[m,w]=X[m,n_fft-w]*. Note if the input or window tensors are complex, then onesided output is not possible. Enabling onesided with real inputs performs a Real-valued fast Fourier transform (RFFT). When invoked with real or complex valued input, the default value is 0. Values can be 0 or 1.</dd>
20877+
</dl>
20878+
20879+
#### Inputs (1 - 2)
20880+
20881+
<dl>
20882+
<dt><tt>input</tt> (non-differentiable) : T1</dt>
20883+
<dd>For real input, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]...[signal_dimN][1]. For complex input, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]...[signal_dimN][2]. The first dimension is the batch dimension. The following N dimentions correspond to the signal's dimensions. The final dimension represents the real and imaginary parts of the value in that order.</dd>
20884+
<dt><tt>dft_length</tt> (optional, non-differentiable) : T2</dt>
20885+
<dd>The length of the signal.If greater than the axis dimension, the signal will be zero-padded up to dft_length. If less than the axis dimension, only the first dft_length values will be used as the signal. It's an optional value. </dd>
20886+
</dl>
20887+
20888+
#### Outputs
20889+
20890+
<dl>
20891+
<dt><tt>output</tt> : T1</dt>
20892+
<dd>The Fourier Transform of the input vector.If onesided is 0, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]...[signal_dimN][2]. If axis=0 and onesided is 1, the following shape is expected: [batch_idx][floor(signal_dim1/2)+1][signal_dim2]...[signal_dimN][2]. If axis=1 and onesided is 1, the following shape is expected: [batch_idx][signal_dim1][floor(signal_dim2/2)+1]...[signal_dimN][2]. If axis=N-1 and onesided is 1, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]...[floor(signal_dimN/2)+1][2]. The signal_dim at the specified axis is equal to the dft_length.</dd>
20893+
</dl>
20894+
20895+
#### Type Constraints
20896+
20897+
<dl>
20898+
<dt><tt>T1</tt> : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)</dt>
20899+
<dd>Constrain input and output types to float tensors.</dd>
20900+
<dt><tt>T2</tt> : tensor(int32), tensor(int64)</dt>
20901+
<dd>Constrain scalar length types to int64_t.</dd>
20902+
</dl>
20903+
20904+
### <a name="HammingWindow-17"></a>**HammingWindow-17**</a>
20905+
20906+
Generates a Hamming window as described in the paper https://ieeexplore.ieee.org/document/1455106.
20907+
20908+
#### Version
20909+
20910+
This version of the operator has been available since version 17 of the default ONNX operator set.
20911+
20912+
#### Attributes
20913+
20914+
<dl>
20915+
<dt><tt>output_datatype</tt> : int (default is 1)</dt>
20916+
<dd>The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T2. The default value is 1 = FLOAT. </dd>
20917+
<dt><tt>periodic</tt> : int (default is 1)</dt>
20918+
<dd>If 1, returns a window to be used as periodic function. If 0, return a symmetric window. When 'periodic' is specified, hann computes a window of length size + 1 and returns the first size points. The default value is 1. </dd>
20919+
</dl>
20920+
20921+
#### Inputs
20922+
20923+
<dl>
20924+
<dt><tt>size</tt> (non-differentiable) : T1</dt>
20925+
<dd>A scalar value indicating the length of the window.</dd>
20926+
</dl>
20927+
20928+
#### Outputs
20929+
20930+
<dl>
20931+
<dt><tt>output</tt> (non-differentiable) : T2</dt>
20932+
<dd>A Hann window with length: size. The output has the shape: [size].</dd>
20933+
</dl>
20934+
20935+
#### Type Constraints
20936+
20937+
<dl>
20938+
<dt><tt>T1</tt> : tensor(int32), tensor(int64)</dt>
20939+
<dd>Constrain the input size to int64_t.</dd>
20940+
<dt><tt>T2</tt> : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(bfloat16)</dt>
20941+
<dd>Constrain output types to numeric tensors.</dd>
20942+
</dl>
20943+
20944+
### <a name="HannWindow-17"></a>**HannWindow-17**</a>
20945+
20946+
Generates a Hann window as described in the paper https://ieeexplore.ieee.org/document/1455106.
20947+
20948+
#### Version
20949+
20950+
This version of the operator has been available since version 17 of the default ONNX operator set.
20951+
20952+
#### Attributes
20953+
20954+
<dl>
20955+
<dt><tt>output_datatype</tt> : int (default is 1)</dt>
20956+
<dd>The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T2. The default value is 1 = FLOAT. </dd>
20957+
<dt><tt>periodic</tt> : int (default is 1)</dt>
20958+
<dd>If 1, returns a window to be used as periodic function. If 0, return a symmetric window. When 'periodic' is specified, hann computes a window of length size + 1 and returns the first size points. The default value is 1. </dd>
20959+
</dl>
20960+
20961+
#### Inputs
20962+
20963+
<dl>
20964+
<dt><tt>size</tt> (non-differentiable) : T1</dt>
20965+
<dd>A scalar value indicating the length of the window.</dd>
20966+
</dl>
20967+
20968+
#### Outputs
20969+
20970+
<dl>
20971+
<dt><tt>output</tt> (non-differentiable) : T2</dt>
20972+
<dd>A Hann window with length: size. The output has the shape: [size].</dd>
20973+
</dl>
20974+
20975+
#### Type Constraints
20976+
20977+
<dl>
20978+
<dt><tt>T1</tt> : tensor(int32), tensor(int64)</dt>
20979+
<dd>Constrain the input size to int64_t.</dd>
20980+
<dt><tt>T2</tt> : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(bfloat16)</dt>
20981+
<dd>Constrain output types to numeric tensors.</dd>
20982+
</dl>
20983+
2082020984
### <a name="LayerNormalization-17"></a>**LayerNormalization-17**</a>
2082120985

2082220986
This is layer normalization defined in ONNX as function.
@@ -20905,6 +21069,105 @@ This version of the operator has been available since version 17 of the default
2090521069
<dd>Type of Mean and InvStdDev tensors.</dd>
2090621070
</dl>
2090721071

21072+
### <a name="MelWeightMatrix-17"></a>**MelWeightMatrix-17**</a>
21073+
21074+
Generate a MelWeightMatrix that can be used to re-weight a Tensor containing a linearly sampled frequency spectra (from DFT or STFT) into num_mel_bins frequency information based on the [lower_edge_hertz, upper_edge_hertz] range on the mel scale.
21075+
This function defines the mel scale in terms of a frequency in hertz according to the following formula:
21076+
21077+
mel(f) = 2595 * log10(1 + f/700)
21078+
21079+
In the returned matrix, all the triangles (filterbanks) have a peak value of 1.0.
21080+
21081+
The returned MelWeightMatrix can be used to right-multiply a spectrogram S of shape [frames, num_spectrogram_bins] of linear scale spectrum values (e.g. STFT magnitudes) to generate a "mel spectrogram" M of shape [frames, num_mel_bins].
21082+
21083+
#### Version
21084+
21085+
This version of the operator has been available since version 17 of the default ONNX operator set.
21086+
21087+
#### Attributes
21088+
21089+
<dl>
21090+
<dt><tt>output_datatype</tt> : int (default is 1)</dt>
21091+
<dd>The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T3. The default value is 1 = FLOAT. </dd>
21092+
</dl>
21093+
21094+
#### Inputs
21095+
21096+
<dl>
21097+
<dt><tt>num_mel_bins</tt> (non-differentiable) : T1</dt>
21098+
<dd>The number of bands in the mel spectrum.</dd>
21099+
<dt><tt>dft_length</tt> (non-differentiable) : T1</dt>
21100+
<dd>The size of the original DFT. The size of the original DFT is used to infer the size of the onesided DFT, which is understood to be floor(dft_length/2) + 1, i.e. the spectrogram only contains the nonredundant DFT bins.</dd>
21101+
<dt><tt>sample_rate</tt> (non-differentiable) : T1</dt>
21102+
<dd>Samples per second of the input signal used to create the spectrogram. Used to figure out the frequencies corresponding to each spectrogram bin, which dictates how they are mapped into the mel scale.</dd>
21103+
<dt><tt>lower_edge_hertz</tt> (non-differentiable) : T2</dt>
21104+
<dd>Lower bound on the frequencies to be included in the mel spectrum. This corresponds to the lower edge of the lowest triangular band.</dd>
21105+
<dt><tt>upper_edge_hertz</tt> (non-differentiable) : T2</dt>
21106+
<dd>The desired top edge of the highest frequency band.</dd>
21107+
</dl>
21108+
21109+
#### Outputs
21110+
21111+
<dl>
21112+
<dt><tt>output</tt> (non-differentiable) : T3</dt>
21113+
<dd>The Mel Weight Matrix. The output has the shape: [floor(dft_length/2) + 1][num_mel_bins].</dd>
21114+
</dl>
21115+
21116+
#### Type Constraints
21117+
21118+
<dl>
21119+
<dt><tt>T1</tt> : tensor(int32), tensor(int64)</dt>
21120+
<dd>Constrain to integer tensors.</dd>
21121+
<dt><tt>T2</tt> : tensor(float), tensor(float16), tensor(double), tensor(bfloat16)</dt>
21122+
<dd>Constrain to float tensors</dd>
21123+
<dt><tt>T3</tt> : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(bfloat16)</dt>
21124+
<dd>Constrain to any numerical types.</dd>
21125+
</dl>
21126+
21127+
### <a name="STFT-17"></a>**STFT-17**</a>
21128+
21129+
Computes the Short-time Fourier Transform of the signal.
21130+
21131+
#### Version
21132+
21133+
This version of the operator has been available since version 17 of the default ONNX operator set.
21134+
21135+
#### Attributes
21136+
21137+
<dl>
21138+
<dt><tt>onesided</tt> : int (default is 1)</dt>
21139+
<dd>If onesided is 1, only values for w in [0, 1, 2, ..., floor(n_fft/2) + 1] are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., X[m, w] = X[m,w]=X[m,n_fft-w]*. Note if the input or window tensors are complex, then onesided output is not possible. Enabling onesided with real inputs performs a Real-valued fast Fourier transform (RFFT).When invoked with real or complex valued input, the default value is 1. Values can be 0 or 1.</dd>
21140+
</dl>
21141+
21142+
#### Inputs (2 - 4)
21143+
21144+
<dl>
21145+
<dt><tt>signal</tt> (non-differentiable) : T1</dt>
21146+
<dd>Input tensor representing a real or complex valued signal. For real input, the following shape is expected: [batch_size][signal_length][1]. For complex input, the following shape is expected: [batch_size][signal_length][2], where [batch_size][signal_length][0] represents the real component and [batch_size][signal_length][1] represents the imaginary component of the signal.</dd>
21147+
<dt><tt>frame_step</tt> (non-differentiable) : T2</dt>
21148+
<dd>The number of samples to step between successive DFTs.</dd>
21149+
<dt><tt>window</tt> (optional, non-differentiable) : T1</dt>
21150+
<dd>A tensor representing the window that will be slid over the signal.The window must have rank 1 with shape: [window_shape]. It's an optional value. </dd>
21151+
<dt><tt>frame_length</tt> (optional, non-differentiable) : T2</dt>
21152+
<dd>A scalar representing the size of the DFT. It's an optional value.</dd>
21153+
</dl>
21154+
21155+
#### Outputs
21156+
21157+
<dl>
21158+
<dt><tt>output</tt> (non-differentiable) : T1</dt>
21159+
<dd>The Short-time Fourier Transform of the signals.If onesided is 1, the output has the shape: [batch_size][frames][dft_unique_bins][2], where dft_unique_bins is frame_length // 2 + 1 (the unique components of the DFT) If onesided is 0, the output has the shape: [batch_size][frames][frame_length][2], where frame_length is the length of the DFT.</dd>
21160+
</dl>
21161+
21162+
#### Type Constraints
21163+
21164+
<dl>
21165+
<dt><tt>T1</tt> : tensor(float), tensor(float16), tensor(double), tensor(bfloat16)</dt>
21166+
<dd>Constrain signal and output to float tensors.</dd>
21167+
<dt><tt>T2</tt> : tensor(int32), tensor(int64)</dt>
21168+
<dd>Constrain scalar length types to int64_t.</dd>
21169+
</dl>
21170+
2090821171
### <a name="SequenceMap-17"></a>**SequenceMap-17**</a>
2090921172

2091021173
Applies a sub-graph to each sample in the input sequence(s).

0 commit comments

Comments
 (0)