**Session Date/Time:** 26 Jul 2023 00:00

# mlcodec

## Summary

The ML Codec working group held its first meeting, focusing on three main topics: Opus extension mechanisms, speech coding enhancements, and deep redundancy. The discussion covered proposals for extending Opus functionality while maintaining compatibility, explored methods for improving speech coding quality at lower bitrates, and addressed the challenge of making Opus robust to packet loss through deep redundancy techniques. Key decisions included adopting the Opus extension mechanism draft as a working group document. Discussions also highlighted areas needing further investigation, such as standardizing speech enhancement methods and addressing security considerations related to machine learning models in codecs.

## Key Discussion Points

*   **Opus Extension Mechanisms:**
    *   Proposal to transmit extensions within Opus padding to maintain compatibility with older encoders.
    *   Format of the extension header, including a 7-bit ID and a length flag.
    *   Reservation of extension IDs for different purposes (padding, frame separators, experimental use).
    *   SDP parameters for signaling supported extensions.
    *   Discussion about allowing repeated IDs within the same frame.
    *   Consideration of unsafe extensions.
*   **Speech Coding Enhancements:**
    *   Proposed method for improving speech coding quality at lower bitrates using a linear adaptive coding enhancer with a neural network.
    *   Challenges of heterogeneous inputs, maintaining decoder integrity, and interoperability.
    *   Discussion about standardizing speech enhancement methods and the potential impact on encoder behavior and compatibility with legacy Opus decoders.
    *   Consideration of defining quality requirements for enhancement models.
*   **Deep Redundancy:**
    *   Deep Redundancy aims to improve robustness against packet loss by encoding features from past frames using ML.
    *   Method for achieving deep redundancy using feature extraction, compression with machine learning, and a neural vocoder.
    *   Format of the redundancy packets, including latent features and initial states.
    *   Normative aspects of the deep redundancy implementation, focusing on the decoder weights and acoustic features.
    *   Concerns about adversarial input and potential security implications of using machine learning models.
    *   Discussion on potential duration that we think is reasonable

## Decisions and Action Items

*   **Decision:** The working group adopted the draft-valen-opus-extension as a working group document.

## Next Steps

*   Further discussion is needed on how to standardize speech enhancement methods, focusing on defining requirements and ensuring compatibility.
*   Investigate and address security concerns related to adversarial input in machine learning models used in codecs.
*   Solicit feedback on the draft specification for deep redundancy, particularly regarding the proposed format and parameters.
*   Explore the possibility of defining a mechanism to validate the output after it has been decoded to validate the model has not changed the original intent of the stream.