1.. SPDX-License-Identifier: GPL-2.0 2 3======================= 4STM32 DMA-MDMA chaining 5======================= 6 7 8Introduction 9------------ 10 11 This document describes the STM32 DMA-MDMA chaining feature. But before going 12 further, let's introduce the peripherals involved. 13 14 To offload data transfers from the CPU, STM32 microprocessors (MPUs) embed 15 direct memory access controllers (DMA). 16 17 STM32MP1 SoCs embed both STM32 DMA and STM32 MDMA controllers. STM32 DMA 18 request routing capabilities are enhanced by a DMA request multiplexer 19 (STM32 DMAMUX). 20 21 **STM32 DMAMUX** 22 23 STM32 DMAMUX routes any DMA request from a given peripheral to any STM32 DMA 24 controller (STM32MP1 counts two STM32 DMA controllers) channels. 25 26 **STM32 DMA** 27 28 STM32 DMA is mainly used to implement central data buffer storage (usually in 29 the system SRAM) for different peripheral. It can access external RAMs but 30 without the ability to generate convenient burst transfer ensuring the best 31 load of the AXI. 32 33 **STM32 MDMA** 34 35 STM32 MDMA (Master DMA) is mainly used to manage direct data transfers between 36 RAM data buffers without CPU intervention. It can also be used in a 37 hierarchical structure that uses STM32 DMA as first level data buffer 38 interfaces for AHB peripherals, while the STM32 MDMA acts as a second level 39 DMA with better performance. As a AXI/AHB master, STM32 MDMA can take control 40 of the AXI/AHB bus. 41 42 43Principles 44---------- 45 46 STM32 DMA-MDMA chaining feature relies on the strengths of STM32 DMA and 47 STM32 MDMA controllers. 48 49 STM32 DMA has a circular Double Buffer Mode (DBM). At each end of transaction 50 (when DMA data counter - DMA_SxNDTR - reaches 0), the memory pointers 51 (configured with DMA_SxSM0AR and DMA_SxM1AR) are swapped and the DMA data 52 counter is automatically reloaded. This allows the SW or the STM32 MDMA to 53 process one memory area while the second memory area is being filled/used by 54 the STM32 DMA transfer. 55 56 With STM32 MDMA linked-list mode, a single request initiates the data array 57 (collection of nodes) to be transferred until the linked-list pointer for the 58 channel is null. The channel transfer complete of the last node is the end of 59 transfer, unless first and last nodes are linked to each other, in such a 60 case, the linked-list loops on to create a circular MDMA transfer. 61 62 STM32 MDMA has direct connections with STM32 DMA. This enables autonomous 63 communication and synchronization between peripherals, thus saving CPU 64 resources and bus congestion. Transfer Complete signal of STM32 DMA channel 65 can triggers STM32 MDMA transfer. STM32 MDMA can clear the request generated 66 by the STM32 DMA by writing to its Interrupt Clear register (whose address is 67 stored in MDMA_CxMAR, and bit mask in MDMA_CxMDR). 68 69 .. table:: STM32 MDMA interconnect table with STM32 DMA 70 71 +--------------+----------------+-----------+------------+ 72 | STM32 DMAMUX | STM32 DMA | STM32 DMA | STM32 MDMA | 73 | channels | channels | Transfer | request | 74 | | | complete | | 75 | | | signal | | 76 +==============+================+===========+============+ 77 | Channel *0* | DMA1 channel 0 | dma1_tcf0 | *0x00* | 78 +--------------+----------------+-----------+------------+ 79 | Channel *1* | DMA1 channel 1 | dma1_tcf1 | *0x01* | 80 +--------------+----------------+-----------+------------+ 81 | Channel *2* | DMA1 channel 2 | dma1_tcf2 | *0x02* | 82 +--------------+----------------+-----------+------------+ 83 | Channel *3* | DMA1 channel 3 | dma1_tcf3 | *0x03* | 84 +--------------+----------------+-----------+------------+ 85 | Channel *4* | DMA1 channel 4 | dma1_tcf4 | *0x04* | 86 +--------------+----------------+-----------+------------+ 87 | Channel *5* | DMA1 channel 5 | dma1_tcf5 | *0x05* | 88 +--------------+----------------+-----------+------------+ 89 | Channel *6* | DMA1 channel 6 | dma1_tcf6 | *0x06* | 90 +--------------+----------------+-----------+------------+ 91 | Channel *7* | DMA1 channel 7 | dma1_tcf7 | *0x07* | 92 +--------------+----------------+-----------+------------+ 93 | Channel *8* | DMA2 channel 0 | dma2_tcf0 | *0x08* | 94 +--------------+----------------+-----------+------------+ 95 | Channel *9* | DMA2 channel 1 | dma2_tcf1 | *0x09* | 96 +--------------+----------------+-----------+------------+ 97 | Channel *10* | DMA2 channel 2 | dma2_tcf2 | *0x0A* | 98 +--------------+----------------+-----------+------------+ 99 | Channel *11* | DMA2 channel 3 | dma2_tcf3 | *0x0B* | 100 +--------------+----------------+-----------+------------+ 101 | Channel *12* | DMA2 channel 4 | dma2_tcf4 | *0x0C* | 102 +--------------+----------------+-----------+------------+ 103 | Channel *13* | DMA2 channel 5 | dma2_tcf5 | *0x0D* | 104 +--------------+----------------+-----------+------------+ 105 | Channel *14* | DMA2 channel 6 | dma2_tcf6 | *0x0E* | 106 +--------------+----------------+-----------+------------+ 107 | Channel *15* | DMA2 channel 7 | dma2_tcf7 | *0x0F* | 108 +--------------+----------------+-----------+------------+ 109 110 STM32 DMA-MDMA chaining feature then uses a SRAM buffer. STM32MP1 SoCs embed 111 three fast access static internal RAMs of various size, used for data storage. 112 Due to STM32 DMA legacy (within microcontrollers), STM32 DMA performances are 113 bad with DDR, while they are optimal with SRAM. Hence the SRAM buffer used 114 between STM32 DMA and STM32 MDMA. This buffer is split in two equal periods 115 and STM32 DMA uses one period while STM32 MDMA uses the other period 116 simultaneously. 117 :: 118 119 dma[1:2]-tcf[0:7] 120 .----------------. 121 ____________ ' _________ V____________ 122 | STM32 DMA | / __|>_ \ | STM32 MDMA | 123 |------------| | / \ | |------------| 124 | DMA_SxM0AR |<=>| | SRAM | |<=>| []-[]...[] | 125 | DMA_SxM1AR | | \_____/ | | | 126 |____________| \___<|____/ |____________| 127 128 STM32 DMA-MDMA chaining uses (struct dma_slave_config).peripheral_config to 129 exchange the parameters needed to configure MDMA. These parameters are 130 gathered into a u32 array with three values: 131 132 * the STM32 MDMA request (which is actually the DMAMUX channel ID), 133 * the address of the STM32 DMA register to clear the Transfer Complete 134 interrupt flag, 135 * the mask of the Transfer Complete interrupt flag of the STM32 DMA channel. 136 137Device Tree updates for STM32 DMA-MDMA chaining support 138------------------------------------------------------- 139 140 **1. Allocate a SRAM buffer** 141 142 SRAM device tree node is defined in SoC device tree. You can refer to it in 143 your board device tree to define your SRAM pool. 144 :: 145 146 &sram { 147 my_foo_device_dma_pool: dma-sram@0 { 148 reg = <0x0 0x1000>; 149 }; 150 }; 151 152 Be careful of the start index, in case there are other SRAM consumers. 153 Define your pool size strategically: to optimise chaining, the idea is that 154 STM32 DMA and STM32 MDMA can work simultaneously, on each buffer of the 155 SRAM. 156 If the SRAM period is greater than the expected DMA transfer, then STM32 DMA 157 and STM32 MDMA will work sequentially instead of simultaneously. It is not a 158 functional issue but it is not optimal. 159 160 Don't forget to refer to your SRAM pool in your device node. You need to 161 define a new property. 162 :: 163 164 &my_foo_device { 165 ... 166 my_dma_pool = &my_foo_device_dma_pool; 167 }; 168 169 Then get this SRAM pool in your foo driver and allocate your SRAM buffer. 170 171 **2. Allocate a STM32 DMA channel and a STM32 MDMA channel** 172 173 You need to define an extra channel in your device tree node, in addition to 174 the one you should already have for "classic" DMA operation. 175 176 This new channel must be taken from STM32 MDMA channels, so, the phandle of 177 the DMA controller to use is the MDMA controller's one. 178 :: 179 180 &my_foo_device { 181 [...] 182 my_dma_pool = &my_foo_device_dma_pool; 183 dmas = <&dmamux1 ...>, // STM32 DMA channel 184 <&mdma1 0 0x3 0x1200000a 0 0>; // + STM32 MDMA channel 185 }; 186 187 Concerning STM32 MDMA bindings: 188 189 1. The request line number : whatever the value here, it will be overwritten 190 by MDMA driver with the STM32 DMAMUX channel ID passed through 191 (struct dma_slave_config).peripheral_config 192 193 2. The priority level : choose Very High (0x3) so that your channel will 194 take priority other the other during request arbitration 195 196 3. A 32bit mask specifying the DMA channel configuration : source and 197 destination address increment, block transfer with 128 bytes per single 198 transfer 199 200 4. The 32bit value specifying the register to be used to acknowledge the 201 request: it will be overwritten by MDMA driver, with the DMA channel 202 interrupt flag clear register address passed through 203 (struct dma_slave_config).peripheral_config 204 205 5. The 32bit mask specifying the value to be written to acknowledge the 206 request: it will be overwritten by MDMA driver, with the DMA channel 207 Transfer Complete flag passed through 208 (struct dma_slave_config).peripheral_config 209 210Driver updates for STM32 DMA-MDMA chaining support in foo driver 211---------------------------------------------------------------- 212 213 **0. (optional) Refactor the original sg_table if dmaengine_prep_slave_sg()** 214 215 In case of dmaengine_prep_slave_sg(), the original sg_table can't be used as 216 is. Two new sg_tables must be created from the original one. One for 217 STM32 DMA transfer (where memory address targets now the SRAM buffer instead 218 of DDR buffer) and one for STM32 MDMA transfer (where memory address targets 219 the DDR buffer). 220 221 The new sg_list items must fit SRAM period length. Here is an example for 222 DMA_DEV_TO_MEM: 223 :: 224 225 /* 226 * Assuming sgl and nents, respectively the initial scatterlist and its 227 * length. 228 * Assuming sram_dma_buf and sram_period, respectively the memory 229 * allocated from the pool for DMA usage, and the length of the period, 230 * which is half of the sram_buf size. 231 */ 232 struct sg_table new_dma_sgt, new_mdma_sgt; 233 struct scatterlist *s, *_sgl; 234 dma_addr_t ddr_dma_buf; 235 u32 new_nents = 0, len; 236 int i; 237 238 /* Count the number of entries needed */ 239 for_each_sg(sgl, s, nents, i) 240 if (sg_dma_len(s) > sram_period) 241 new_nents += DIV_ROUND_UP(sg_dma_len(s), sram_period); 242 else 243 new_nents++; 244 245 /* Create sg table for STM32 DMA channel */ 246 ret = sg_alloc_table(&new_dma_sgt, new_nents, GFP_ATOMIC); 247 if (ret) 248 dev_err(dev, "DMA sg table alloc failed\n"); 249 250 for_each_sg(new_dma_sgt.sgl, s, new_dma_sgt.nents, i) { 251 _sgl = sgl; 252 sg_dma_len(s) = min(sg_dma_len(_sgl), sram_period); 253 /* Targets the beginning = first half of the sram_buf */ 254 s->dma_address = sram_buf; 255 /* 256 * Targets the second half of the sram_buf 257 * for odd indexes of the item of the sg_list 258 */ 259 if (i & 1) 260 s->dma_address += sram_period; 261 } 262 263 /* Create sg table for STM32 MDMA channel */ 264 ret = sg_alloc_table(&new_mdma_sgt, new_nents, GFP_ATOMIC); 265 if (ret) 266 dev_err(dev, "MDMA sg_table alloc failed\n"); 267 268 _sgl = sgl; 269 len = sg_dma_len(sgl); 270 ddr_dma_buf = sg_dma_address(sgl); 271 for_each_sg(mdma_sgt.sgl, s, mdma_sgt.nents, i) { 272 size_t bytes = min_t(size_t, len, sram_period); 273 274 sg_dma_len(s) = bytes; 275 sg_dma_address(s) = ddr_dma_buf; 276 len -= bytes; 277 278 if (!len && sg_next(_sgl)) { 279 _sgl = sg_next(_sgl); 280 len = sg_dma_len(_sgl); 281 ddr_dma_buf = sg_dma_address(_sgl); 282 } else { 283 ddr_dma_buf += bytes; 284 } 285 } 286 287 Don't forget to release these new sg_tables after getting the descriptors 288 with dmaengine_prep_slave_sg(). 289 290 **1. Set controller specific parameters** 291 292 First, use dmaengine_slave_config() with a struct dma_slave_config to 293 configure STM32 DMA channel. You just have to take care of DMA addresses, 294 the memory address (depending on the transfer direction) must point on your 295 SRAM buffer, and set (struct dma_slave_config).peripheral_size != 0. 296 297 STM32 DMA driver will check (struct dma_slave_config).peripheral_size to 298 determine if chaining is being used or not. If it is used, then STM32 DMA 299 driver fills (struct dma_slave_config).peripheral_config with an array of 300 three u32 : the first one containing STM32 DMAMUX channel ID, the second one 301 the channel interrupt flag clear register address, and the third one the 302 channel Transfer Complete flag mask. 303 304 Then, use dmaengine_slave_config with another struct dma_slave_config to 305 configure STM32 MDMA channel. Take care of DMA addresses, the device address 306 (depending on the transfer direction) must point on your SRAM buffer, and 307 the memory address must point to the buffer originally used for "classic" 308 DMA operation. Use the previous (struct dma_slave_config).peripheral_size 309 and .peripheral_config that have been updated by STM32 DMA driver, to set 310 (struct dma_slave_config).peripheral_size and .peripheral_config of the 311 struct dma_slave_config to configure STM32 MDMA channel. 312 :: 313 314 struct dma_slave_config dma_conf; 315 struct dma_slave_config mdma_conf; 316 317 memset(&dma_conf, 0, sizeof(dma_conf)); 318 [...] 319 config.direction = DMA_DEV_TO_MEM; 320 config.dst_addr = sram_dma_buf; // SRAM buffer 321 config.peripheral_size = 1; // peripheral_size != 0 => chaining 322 323 dmaengine_slave_config(dma_chan, &dma_config); 324 325 memset(&mdma_conf, 0, sizeof(mdma_conf)); 326 config.direction = DMA_DEV_TO_MEM; 327 mdma_conf.src_addr = sram_dma_buf; // SRAM buffer 328 mdma_conf.dst_addr = rx_dma_buf; // original memory buffer 329 mdma_conf.peripheral_size = dma_conf.peripheral_size; // <- dma_conf 330 mdma_conf.peripheral_config = dma_config.peripheral_config; // <- dma_conf 331 332 dmaengine_slave_config(mdma_chan, &mdma_conf); 333 334 **2. Get a descriptor for STM32 DMA channel transaction** 335 336 In the same way you get your descriptor for your "classic" DMA operation, 337 you just have to replace the original sg_list (in case of 338 dmaengine_prep_slave_sg()) with the new sg_list using SRAM buffer, or to 339 replace the original buffer address, length and period (in case of 340 dmaengine_prep_dma_cyclic()) with the new SRAM buffer. 341 342 **3. Get a descriptor for STM32 MDMA channel transaction** 343 344 If you previously get descriptor (for STM32 DMA) with 345 346 * dmaengine_prep_slave_sg(), then use dmaengine_prep_slave_sg() for 347 STM32 MDMA; 348 * dmaengine_prep_dma_cyclic(), then use dmaengine_prep_dma_cyclic() for 349 STM32 MDMA. 350 351 Use the new sg_list using SRAM buffer (in case of dmaengine_prep_slave_sg()) 352 or, depending on the transfer direction, either the original DDR buffer (in 353 case of DMA_DEV_TO_MEM) or the SRAM buffer (in case of DMA_MEM_TO_DEV), the 354 source address being previously set with dmaengine_slave_config(). 355 356 **4. Submit both transactions** 357 358 Before submitting your transactions, you may need to define on which 359 descriptor you want a callback to be called at the end of the transfer 360 (dmaengine_prep_slave_sg()) or the period (dmaengine_prep_dma_cyclic()). 361 Depending on the direction, set the callback on the descriptor that finishes 362 the overall transfer: 363 364 * DMA_DEV_TO_MEM: set the callback on the "MDMA" descriptor 365 * DMA_MEM_TO_DEV: set the callback on the "DMA" descriptor 366 367 Then, submit the descriptors whatever the order, with dmaengine_tx_submit(). 368 369 **5. Issue pending requests (and wait for callback notification)** 370 371 As STM32 MDMA channel transfer is triggered by STM32 DMA, you must issue 372 STM32 MDMA channel before STM32 DMA channel. 373 374 If any, your callback will be called to warn you about the end of the overall 375 transfer or the period completion. 376 377 Don't forget to terminate both channels. STM32 DMA channel is configured in 378 cyclic Double-Buffer mode so it won't be disabled by HW, you need to terminate 379 it. STM32 MDMA channel will be stopped by HW in case of sg transfer, but not 380 in case of cyclic transfer. You can terminate it whatever the kind of transfer. 381 382 **STM32 DMA-MDMA chaining DMA_MEM_TO_DEV special case** 383 384 STM32 DMA-MDMA chaining in DMA_MEM_TO_DEV is a special case. Indeed, the 385 STM32 MDMA feeds the SRAM buffer with the DDR data, and the STM32 DMA reads 386 data from SRAM buffer. So some data (the first period) have to be copied in 387 SRAM buffer when the STM32 DMA starts to read. 388 389 A trick could be pausing the STM32 DMA channel (that will raise a Transfer 390 Complete signal, triggering the STM32 MDMA channel), but the first data read 391 by the STM32 DMA could be "wrong". The proper way is to prepare the first SRAM 392 period with dmaengine_prep_dma_memcpy(). Then this first period should be 393 "removed" from the sg or the cyclic transfer. 394 395 Due to this complexity, rather use the STM32 DMA-MDMA chaining for 396 DMA_DEV_TO_MEM and keep the "classic" DMA usage for DMA_MEM_TO_DEV, unless 397 you're not afraid. 398 399Resources 400--------- 401 402 Application note, datasheet and reference manual are available on ST website 403 (STM32MP1_). 404 405 Dedicated focus on three application notes (AN5224_, AN4031_ & AN5001_) 406 dealing with STM32 DMAMUX, STM32 DMA and STM32 MDMA. 407 408.. _STM32MP1: https://www.st.com/en/microcontrollers-microprocessors/stm32mp1-series.html 409.. _AN5224: https://www.st.com/resource/en/application_note/an5224-stm32-dmamux-the-dma-request-router-stmicroelectronics.pdf 410.. _AN4031: https://www.st.com/resource/en/application_note/dm00046011-using-the-stm32f2-stm32f4-and-stm32f7-series-dma-controller-stmicroelectronics.pdf 411.. _AN5001: https://www.st.com/resource/en/application_note/an5001-stm32cube-expansion-package-for-stm32h7-series-mdma-stmicroelectronics.pdf 412 413:Authors: 414 415- Amelie Delaunay <amelie.delaunay@foss.st.com>