1e9df12c3STakashi Iwai========================= 2e9df12c3STakashi IwaiALSA Compress-Offload API 3e9df12c3STakashi Iwai========================= 4e9df12c3STakashi Iwai 5e9df12c3STakashi IwaiPierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com> 6e9df12c3STakashi Iwai 7e9df12c3STakashi IwaiVinod Koul <vinod.koul@linux.intel.com> 8e9df12c3STakashi Iwai 9e9df12c3STakashi Iwai 10e9df12c3STakashi IwaiOverview 11e9df12c3STakashi Iwai======== 12e9df12c3STakashi IwaiSince its early days, the ALSA API was defined with PCM support or 13e9df12c3STakashi Iwaiconstant bitrates payloads such as IEC61937 in mind. Arguments and 14e9df12c3STakashi Iwaireturned values in frames are the norm, making it a challenge to 15e9df12c3STakashi Iwaiextend the existing API to compressed data streams. 16e9df12c3STakashi Iwai 17e9df12c3STakashi IwaiIn recent years, audio digital signal processors (DSP) were integrated 18e9df12c3STakashi Iwaiin system-on-chip designs, and DSPs are also integrated in audio 19e9df12c3STakashi Iwaicodecs. Processing compressed data on such DSPs results in a dramatic 20e9df12c3STakashi Iwaireduction of power consumption compared to host-based 21e9df12c3STakashi Iwaiprocessing. Support for such hardware has not been very good in Linux, 22e9df12c3STakashi Iwaimostly because of a lack of a generic API available in the mainline 23e9df12c3STakashi Iwaikernel. 24e9df12c3STakashi Iwai 25e9df12c3STakashi IwaiRather than requiring a compatibility break with an API change of the 26e9df12c3STakashi IwaiALSA PCM interface, a new 'Compressed Data' API is introduced to 27e9df12c3STakashi Iwaiprovide a control and data-streaming interface for audio DSPs. 28e9df12c3STakashi Iwai 29e9df12c3STakashi IwaiThe design of this API was inspired by the 2-year experience with the 30e9df12c3STakashi IwaiIntel Moorestown SOC, with many corrections required to upstream the 31e9df12c3STakashi IwaiAPI in the mainline kernel instead of the staging tree and make it 32e9df12c3STakashi Iwaiusable by others. 33e9df12c3STakashi Iwai 34e9df12c3STakashi Iwai 35e9df12c3STakashi IwaiRequirements 36e9df12c3STakashi Iwai============ 37e9df12c3STakashi IwaiThe main requirements are: 38e9df12c3STakashi Iwai 39e9df12c3STakashi Iwai- separation between byte counts and time. Compressed formats may have 40e9df12c3STakashi Iwai a header per file, per frame, or no header at all. The payload size 41e9df12c3STakashi Iwai may vary from frame-to-frame. As a result, it is not possible to 42e9df12c3STakashi Iwai estimate reliably the duration of audio buffers when handling 43e9df12c3STakashi Iwai compressed data. Dedicated mechanisms are required to allow for 44e9df12c3STakashi Iwai reliable audio-video synchronization, which requires precise 45e9df12c3STakashi Iwai reporting of the number of samples rendered at any given time. 46e9df12c3STakashi Iwai 47e9df12c3STakashi Iwai- Handling of multiple formats. PCM data only requires a specification 48e9df12c3STakashi Iwai of the sampling rate, number of channels and bits per sample. In 49e9df12c3STakashi Iwai contrast, compressed data comes in a variety of formats. Audio DSPs 50e9df12c3STakashi Iwai may also provide support for a limited number of audio encoders and 51e9df12c3STakashi Iwai decoders embedded in firmware, or may support more choices through 52e9df12c3STakashi Iwai dynamic download of libraries. 53e9df12c3STakashi Iwai 54e9df12c3STakashi Iwai- Focus on main formats. This API provides support for the most 55e9df12c3STakashi Iwai popular formats used for audio and video capture and playback. It is 56e9df12c3STakashi Iwai likely that as audio compression technology advances, new formats 57e9df12c3STakashi Iwai will be added. 58e9df12c3STakashi Iwai 59e9df12c3STakashi Iwai- Handling of multiple configurations. Even for a given format like 60e9df12c3STakashi Iwai AAC, some implementations may support AAC multichannel but HE-AAC 61e9df12c3STakashi Iwai stereo. Likewise WMA10 level M3 may require too much memory and cpu 62e9df12c3STakashi Iwai cycles. The new API needs to provide a generic way of listing these 63e9df12c3STakashi Iwai formats. 64e9df12c3STakashi Iwai 65e9df12c3STakashi Iwai- Rendering/Grabbing only. This API does not provide any means of 66e9df12c3STakashi Iwai hardware acceleration, where PCM samples are provided back to 67e9df12c3STakashi Iwai user-space for additional processing. This API focuses instead on 68e9df12c3STakashi Iwai streaming compressed data to a DSP, with the assumption that the 69e9df12c3STakashi Iwai decoded samples are routed to a physical output or logical back-end. 70e9df12c3STakashi Iwai 71e9df12c3STakashi Iwai- Complexity hiding. Existing user-space multimedia frameworks all 72e9df12c3STakashi Iwai have existing enums/structures for each compressed format. This new 73e9df12c3STakashi Iwai API assumes the existence of a platform-specific compatibility layer 74e9df12c3STakashi Iwai to expose, translate and make use of the capabilities of the audio 75e9df12c3STakashi Iwai DSP, eg. Android HAL or PulseAudio sinks. By construction, regular 76e9df12c3STakashi Iwai applications are not supposed to make use of this API. 77e9df12c3STakashi Iwai 78e9df12c3STakashi Iwai 79e9df12c3STakashi IwaiDesign 80e9df12c3STakashi Iwai====== 81e9df12c3STakashi IwaiThe new API shares a number of concepts with the PCM API for flow 82e9df12c3STakashi Iwaicontrol. Start, pause, resume, drain and stop commands have the same 83e9df12c3STakashi Iwaisemantics no matter what the content is. 84e9df12c3STakashi Iwai 85e9df12c3STakashi IwaiThe concept of memory ring buffer divided in a set of fragments is 86e9df12c3STakashi Iwaiborrowed from the ALSA PCM API. However, only sizes in bytes can be 87e9df12c3STakashi Iwaispecified. 88e9df12c3STakashi Iwai 89e9df12c3STakashi IwaiSeeks/trick modes are assumed to be handled by the host. 90e9df12c3STakashi Iwai 91e9df12c3STakashi IwaiThe notion of rewinds/forwards is not supported. Data committed to the 92e9df12c3STakashi Iwairing buffer cannot be invalidated, except when dropping all buffers. 93e9df12c3STakashi Iwai 94e9df12c3STakashi IwaiThe Compressed Data API does not make any assumptions on how the data 95e9df12c3STakashi Iwaiis transmitted to the audio DSP. DMA transfers from main memory to an 96e9df12c3STakashi Iwaiembedded audio cluster or to a SPI interface for external DSPs are 97e9df12c3STakashi Iwaipossible. As in the ALSA PCM case, a core set of routines is exposed; 98e9df12c3STakashi Iwaieach driver implementer will have to write support for a set of 99e9df12c3STakashi Iwaimandatory routines and possibly make use of optional ones. 100e9df12c3STakashi Iwai 101e9df12c3STakashi IwaiThe main additions are 102e9df12c3STakashi Iwai 103e9df12c3STakashi Iwaiget_caps 104e9df12c3STakashi Iwai This routine returns the list of audio formats supported. Querying the 105e9df12c3STakashi Iwai codecs on a capture stream will return encoders, decoders will be 106e9df12c3STakashi Iwai listed for playback streams. 107e9df12c3STakashi Iwai 108e9df12c3STakashi Iwaiget_codec_caps 109e9df12c3STakashi Iwai For each codec, this routine returns a list of 110e9df12c3STakashi Iwai capabilities. The intent is to make sure all the capabilities 111e9df12c3STakashi Iwai correspond to valid settings, and to minimize the risks of 112e9df12c3STakashi Iwai configuration failures. For example, for a complex codec such as AAC, 113e9df12c3STakashi Iwai the number of channels supported may depend on a specific profile. If 114e9df12c3STakashi Iwai the capabilities were exposed with a single descriptor, it may happen 115e9df12c3STakashi Iwai that a specific combination of profiles/channels/formats may not be 116e9df12c3STakashi Iwai supported. Likewise, embedded DSPs have limited memory and cpu cycles, 117e9df12c3STakashi Iwai it is likely that some implementations make the list of capabilities 118e9df12c3STakashi Iwai dynamic and dependent on existing workloads. In addition to codec 119e9df12c3STakashi Iwai settings, this routine returns the minimum buffer size handled by the 120e9df12c3STakashi Iwai implementation. This information can be a function of the DMA buffer 121e9df12c3STakashi Iwai sizes, the number of bytes required to synchronize, etc, and can be 122e9df12c3STakashi Iwai used by userspace to define how much needs to be written in the ring 123e9df12c3STakashi Iwai buffer before playback can start. 124e9df12c3STakashi Iwai 125e9df12c3STakashi Iwaiset_params 126e9df12c3STakashi Iwai This routine sets the configuration chosen for a specific codec. The 127e9df12c3STakashi Iwai most important field in the parameters is the codec type; in most 128e9df12c3STakashi Iwai cases decoders will ignore other fields, while encoders will strictly 129e9df12c3STakashi Iwai comply to the settings 130e9df12c3STakashi Iwai 131e9df12c3STakashi Iwaiget_params 132e9df12c3STakashi Iwai This routines returns the actual settings used by the DSP. Changes to 133e9df12c3STakashi Iwai the settings should remain the exception. 134e9df12c3STakashi Iwai 135e9df12c3STakashi Iwaiget_timestamp 136e9df12c3STakashi Iwai The timestamp becomes a multiple field structure. It lists the number 137e9df12c3STakashi Iwai of bytes transferred, the number of samples processed and the number 138e9df12c3STakashi Iwai of samples rendered/grabbed. All these values can be used to determine 139e9df12c3STakashi Iwai the average bitrate, figure out if the ring buffer needs to be 140e9df12c3STakashi Iwai refilled or the delay due to decoding/encoding/io on the DSP. 141e9df12c3STakashi Iwai 142e9df12c3STakashi IwaiNote that the list of codecs/profiles/modes was derived from the 143e9df12c3STakashi IwaiOpenMAX AL specification instead of reinventing the wheel. 144e9df12c3STakashi IwaiModifications include: 145e9df12c3STakashi Iwai- Addition of FLAC and IEC formats 146e9df12c3STakashi Iwai- Merge of encoder/decoder capabilities 147e9df12c3STakashi Iwai- Profiles/modes listed as bitmasks to make descriptors more compact 148e9df12c3STakashi Iwai- Addition of set_params for decoders (missing in OpenMAX AL) 149e9df12c3STakashi Iwai- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL) 150e9df12c3STakashi Iwai- Addition of format information for WMA 151e9df12c3STakashi Iwai- Addition of encoding options when required (derived from OpenMAX IL) 152e9df12c3STakashi Iwai- Addition of rateControlSupported (missing in OpenMAX AL) 153e9df12c3STakashi Iwai 1542441bf4dSVinod KoulState Machine 1552441bf4dSVinod Koul============= 1562441bf4dSVinod Koul 1572441bf4dSVinod KoulThe compressed audio stream state machine is described below :: 1582441bf4dSVinod Koul 1592441bf4dSVinod Koul +----------+ 1602441bf4dSVinod Koul | | 1612441bf4dSVinod Koul | OPEN | 1622441bf4dSVinod Koul | | 1632441bf4dSVinod Koul +----------+ 1642441bf4dSVinod Koul | 1652441bf4dSVinod Koul | 1662441bf4dSVinod Koul | compr_set_params() 1672441bf4dSVinod Koul | 1682441bf4dSVinod Koul v 1692441bf4dSVinod Koul compr_free() +----------+ 1702441bf4dSVinod Koul +------------------------------------| | 1712441bf4dSVinod Koul | | SETUP | 1722441bf4dSVinod Koul | +-------------------------| |<-------------------------+ 1732441bf4dSVinod Koul | | compr_write() +----------+ | 1742441bf4dSVinod Koul | | ^ | 1752441bf4dSVinod Koul | | | compr_drain_notify() | 1762441bf4dSVinod Koul | | | or | 1772441bf4dSVinod Koul | | | compr_stop() | 1782441bf4dSVinod Koul | | | | 1792441bf4dSVinod Koul | | +----------+ | 1802441bf4dSVinod Koul | | | | | 1812441bf4dSVinod Koul | | | DRAIN | | 1822441bf4dSVinod Koul | | | | | 1832441bf4dSVinod Koul | | +----------+ | 1842441bf4dSVinod Koul | | ^ | 1852441bf4dSVinod Koul | | | | 1862441bf4dSVinod Koul | | | compr_drain() | 1872441bf4dSVinod Koul | | | | 1882441bf4dSVinod Koul | v | | 1892441bf4dSVinod Koul | +----------+ +----------+ | 1902441bf4dSVinod Koul | | | compr_start() | | compr_stop() | 1912441bf4dSVinod Koul | | PREPARE |------------------->| RUNNING |--------------------------+ 1922441bf4dSVinod Koul | | | | | | 1932441bf4dSVinod Koul | +----------+ +----------+ | 1942441bf4dSVinod Koul | | | ^ | 1952441bf4dSVinod Koul | |compr_free() | | | 1962441bf4dSVinod Koul | | compr_pause() | | compr_resume() | 1972441bf4dSVinod Koul | | | | | 1982441bf4dSVinod Koul | v v | | 1992441bf4dSVinod Koul | +----------+ +----------+ | 2002441bf4dSVinod Koul | | | | | compr_stop() | 2012441bf4dSVinod Koul +--->| FREE | | PAUSE |---------------------------+ 2022441bf4dSVinod Koul | | | | 2032441bf4dSVinod Koul +----------+ +----------+ 2042441bf4dSVinod Koul 205e9df12c3STakashi Iwai 206e9df12c3STakashi IwaiGapless Playback 207e9df12c3STakashi Iwai================ 208e9df12c3STakashi IwaiWhen playing thru an album, the decoders have the ability to skip the encoder 209e9df12c3STakashi Iwaidelay and padding and directly move from one track content to another. The end 210e9df12c3STakashi Iwaiuser can perceive this as gapless playback as we don't have silence while 211e9df12c3STakashi Iwaiswitching from one track to another 212e9df12c3STakashi Iwai 213e9df12c3STakashi IwaiAlso, there might be low-intensity noises due to encoding. Perfect gapless is 214e9df12c3STakashi Iwaidifficult to reach with all types of compressed data, but works fine with most 215e9df12c3STakashi Iwaimusic content. The decoder needs to know the encoder delay and encoder padding. 216e9df12c3STakashi IwaiSo we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers 217e9df12c3STakashi Iwaiand are not present by default in the bitstream, hence the need for a new 218e9df12c3STakashi Iwaiinterface to pass this information to the DSP. Also DSP and userspace needs to 219e9df12c3STakashi Iwaiswitch from one track to another and start using data for second track. 220e9df12c3STakashi Iwai 221e9df12c3STakashi IwaiThe main additions are: 222e9df12c3STakashi Iwai 223e9df12c3STakashi Iwaiset_metadata 224e9df12c3STakashi Iwai This routine sets the encoder delay and encoder padding. This can be used by 225e9df12c3STakashi Iwai decoder to strip the silence. This needs to be set before the data in the track 226e9df12c3STakashi Iwai is written. 227e9df12c3STakashi Iwai 228e9df12c3STakashi Iwaiset_next_track 229e9df12c3STakashi Iwai This routine tells DSP that metadata and write operation sent after this would 230e9df12c3STakashi Iwai correspond to subsequent track 231e9df12c3STakashi Iwai 232e9df12c3STakashi Iwaipartial drain 233e9df12c3STakashi Iwai This is called when end of file is reached. The userspace can inform DSP that 234e9df12c3STakashi Iwai EOF is reached and now DSP can start skipping padding delay. Also next write 235e9df12c3STakashi Iwai data would belong to next track 236e9df12c3STakashi Iwai 237e9df12c3STakashi IwaiSequence flow for gapless would be: 238e9df12c3STakashi Iwai- Open 239e9df12c3STakashi Iwai- Get caps / codec caps 240e9df12c3STakashi Iwai- Set params 241e9df12c3STakashi Iwai- Set metadata of the first track 242e9df12c3STakashi Iwai- Fill data of the first track 243e9df12c3STakashi Iwai- Trigger start 244e9df12c3STakashi Iwai- User-space finished sending all, 245e9df12c3STakashi Iwai- Indicate next track data by sending set_next_track 246e9df12c3STakashi Iwai- Set metadata of the next track 247e9df12c3STakashi Iwai- then call partial_drain to flush most of buffer in DSP 248e9df12c3STakashi Iwai- Fill data of the next track 249e9df12c3STakashi Iwai- DSP switches to second track 250e9df12c3STakashi Iwai 251e9df12c3STakashi Iwai(note: order for partial_drain and write for next track can be reversed as well) 252e9df12c3STakashi Iwai 253d0af37c8SVinod KoulGapless Playback SM 254d0af37c8SVinod Koul=================== 255d0af37c8SVinod Koul 256d0af37c8SVinod KoulFor Gapless, we move from running state to partial drain and back, along 257d0af37c8SVinod Koulwith setting of meta_data and signalling for next track :: 258d0af37c8SVinod Koul 259d0af37c8SVinod Koul 260d0af37c8SVinod Koul +----------+ 261d0af37c8SVinod Koul compr_drain_notify() | | 262d0af37c8SVinod Koul +------------------------>| RUNNING | 263d0af37c8SVinod Koul | | | 264d0af37c8SVinod Koul | +----------+ 265d0af37c8SVinod Koul | | 266d0af37c8SVinod Koul | | 267d0af37c8SVinod Koul | | compr_next_track() 268d0af37c8SVinod Koul | | 269d0af37c8SVinod Koul | V 270d0af37c8SVinod Koul | +----------+ 271*7ea9ee00SSrinivas Kandagatla | compr_set_params() | | 272*7ea9ee00SSrinivas Kandagatla | +-----------|NEXT_TRACK| 273*7ea9ee00SSrinivas Kandagatla | | | | 274*7ea9ee00SSrinivas Kandagatla | | +--+-------+ 275*7ea9ee00SSrinivas Kandagatla | | | | 276*7ea9ee00SSrinivas Kandagatla | +--------------+ | 277d0af37c8SVinod Koul | | 278d0af37c8SVinod Koul | | compr_partial_drain() 279d0af37c8SVinod Koul | | 280d0af37c8SVinod Koul | V 281d0af37c8SVinod Koul | +----------+ 282d0af37c8SVinod Koul | | | 283d0af37c8SVinod Koul +------------------------ | PARTIAL_ | 284d0af37c8SVinod Koul | DRAIN | 285d0af37c8SVinod Koul +----------+ 286e9df12c3STakashi Iwai 287e9df12c3STakashi IwaiNot supported 288e9df12c3STakashi Iwai============= 289e9df12c3STakashi Iwai- Support for VoIP/circuit-switched calls is not the target of this 290e9df12c3STakashi Iwai API. Support for dynamic bit-rate changes would require a tight 291e9df12c3STakashi Iwai coupling between the DSP and the host stack, limiting power savings. 292e9df12c3STakashi Iwai 293e9df12c3STakashi Iwai- Packet-loss concealment is not supported. This would require an 294e9df12c3STakashi Iwai additional interface to let the decoder synthesize data when frames 295e9df12c3STakashi Iwai are lost during transmission. This may be added in the future. 296e9df12c3STakashi Iwai 297e9df12c3STakashi Iwai- Volume control/routing is not handled by this API. Devices exposing a 298e9df12c3STakashi Iwai compressed data interface will be considered as regular ALSA devices; 299e9df12c3STakashi Iwai volume changes and routing information will be provided with regular 300e9df12c3STakashi Iwai ALSA kcontrols. 301e9df12c3STakashi Iwai 302e9df12c3STakashi Iwai- Embedded audio effects. Such effects should be enabled in the same 303e9df12c3STakashi Iwai manner, no matter if the input was PCM or compressed. 304e9df12c3STakashi Iwai 305e9df12c3STakashi Iwai- multichannel IEC encoding. Unclear if this is required. 306e9df12c3STakashi Iwai 307e9df12c3STakashi Iwai- Encoding/decoding acceleration is not supported as mentioned 308e9df12c3STakashi Iwai above. It is possible to route the output of a decoder to a capture 309e9df12c3STakashi Iwai stream, or even implement transcoding capabilities. This routing 310e9df12c3STakashi Iwai would be enabled with ALSA kcontrols. 311e9df12c3STakashi Iwai 312e9df12c3STakashi Iwai- Audio policy/resource management. This API does not provide any 313e9df12c3STakashi Iwai hooks to query the utilization of the audio DSP, nor any preemption 314e9df12c3STakashi Iwai mechanisms. 315e9df12c3STakashi Iwai 316e9df12c3STakashi Iwai- No notion of underrun/overrun. Since the bytes written are compressed 317e9df12c3STakashi Iwai in nature and data written/read doesn't translate directly to 318e9df12c3STakashi Iwai rendered output in time, this does not deal with underrun/overrun and 319e9df12c3STakashi Iwai maybe dealt in user-library 320e9df12c3STakashi Iwai 321e9df12c3STakashi Iwai 322e9df12c3STakashi IwaiCredits 323e9df12c3STakashi Iwai======= 324e9df12c3STakashi Iwai- Mark Brown and Liam Girdwood for discussions on the need for this API 325e9df12c3STakashi Iwai- Harsha Priya for her work on intel_sst compressed API 326e9df12c3STakashi Iwai- Rakesh Ughreja for valuable feedback 327e9df12c3STakashi Iwai- Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for 328e9df12c3STakashi Iwai demonstrating and quantifying the benefits of audio offload on a 329e9df12c3STakashi Iwai real platform. 330