1# 2# CDDL HEADER START 3# 4# The contents of this file are subject to the terms of the 5# Common Development and Distribution License, Version 1.0 only 6# (the "License"). You may not use this file except in compliance 7# with the License. 8# 9# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 10# or http://www.opensolaris.org/os/licensing. 11# See the License for the specific language governing permissions 12# and limitations under the License. 13# 14# When distributing Covered Code, include this CDDL HEADER in each 15# file and include the License file at usr/src/OPENSOLARIS.LICENSE. 16# If applicable, add the following below this CDDL HEADER, with the 17# fields enclosed by brackets "[]" replaced with your own identifying 18# information: Portions Copyright [yyyy] [name of copyright owner] 19# 20# CDDL HEADER END 21# 22/* 23 * Copyright 2003 Sun Microsystems, Inc. All rights reserved. 24 * Use is subject to license terms. 25 */ 26 27#pragma ident "%Z%%M% %I% %E% SMI" 28 29 SOLARIS USB BANDWIDTH ANALYSIS 30 311.Introduction 32 33 This document discuss the USB bandwidth allocation scheme, and the protocol 34 overheads used for both full and high speed host controller drivers. This 35 information is derived from the USB 2.0 specification, the "Bandwidth Analysis 36 Whitepaper" which is posted on www.usb.org, and other resources. 37 38 The target audience for this whitepaper are USB software & hardware designers 39 and engineers, and other interested people. The reader should be familiar with 40 the Universal Serial Bus Specification version 2.0, the OpenHCI Specification 41 1.0a and the Enhanced HCI Specification 1.0. 42 432.Full speed bus 44 45 The following overheads, formulas and scheme are applicable both to full speed 46 host controllers and also to high speed hub Transaction Translators (TT), 47 which perform full/low speed transactions. 48 49 o Timing and data rate calculations 50 51 - Timing calculations 52 53 1 sec 1000 ms or 1000000000 ns 54 1 ms 1 frame 55 56 - Data rate calculations 57 58 1 ms 1500 bytes or 12000 bits (per frame) 59 668 ns 1 byte or 8 bits 60 61 1 full speed bit time 83.54 ns 62 63 o Protocol Overheads and Bandwidth numbers 64 65 - Protocol Overheads 66 67 (Refer 5.11.3 section of USB2.0 specification & page 2 of USB Bandwidth 68 Analysis document) 69 70 Non Isochronous 9107 ns 14 bytes 71 Isochronous Output 6265 ns 10 bytes 72 Isochronous Input 7268 ns 11 bytes 73 Low-speed overhead 64060 ns 97 bytes 74 Hub LS overhead* 668 ns 1 byte 75 SOF 4010 ns 6 bytes 76 EOF 2673 ns 4 bytes 77 78 Host Delay* Specific to hardware 18 bytes 79 Low-Speed clock* Slower than Full speed 8 80 81 - Bandwidth numbers 82 83 (Refer 7.3.5 section of OHCI specification 1.0a & page 2 of USB Bandwidth 84 Analysis document) 85 86 Maximum bandwidth available 1500 bytes/frame 87 Maximum Non Periodic bandwidth 197 bytes/frame 88 Maximum Periodic bandwidth 1293 bytes/frame 89 90 NOTE: 91 92 1.Hub specific low speed overhead 93 94 The time provided by the Host Controller for hubs to enable Low Speed 95 ports. The minimum of 4 full speed bit time. 96 97 overhead = 2 x Hub_LS_Setup 98 = 2 x (4 x 83.54) = 668.32 Nano seconds = 1 byte. 99 100 2.Host delay will be specific to particular hardware. The following host 101 delay is for RIO USB OHCI host controller (Provided by Ken Ward - RIO 102 USB hardware person). The following is just an example how to calculate 103 "host delay" for given USB host controller implementation. 104 105 Ex: Assuming ED (Endpoint Descriptor)/TD's (Transfer Descriptor) are not 106 streaming in Schizo (PCI bridge) and no cache hits for an ED or TD: 107 108 To read an ED or TD or data: 109 110 PCI_ARB_DELAY + PCI_ADDRESS + SCHIZO_RETRY 111 PCI_ARB_DELAY + PCI_ADDRESS + SCHIZO_TRDY + 112 DATA + Core_overhead 113 114 Where, 115 116 PCI_ARB_DELAY = 2000ns 117 PCI_ADDRESS = 30ns 118 SCHIZO RETRY = 60ns 119 SCHIZO TRDY = 60ns 120 DATA = 240ns (Always read 64 bytes ...) 121 Core Overhead =240 + 30 * (MPS/4) + 83.54 * (MPS/4) + 4 * 83.54 122 = ~3400ns 123 124 now multiply by 3 for ED+TD+DATA = 10200ns = ~128 bits or 16 bytes. 125 126 This is probably on the optimistic side, only using 2us for the 127 PCI_ARB_DELAY. 128 129 If there is a USB cache hit, the time it takes for an ED or TD is: 130 131 CORE SYNC DELAY + CACHE_HIT CHECK + 30 * (MPS/4) + CORE OVERHEAD 132 133 240 + 30 + 120 + 1000ns ~ 1400ns , or ~ 2 bytes 134 135 Total Host delay will be 18 bytes. 136 137 3.The Low-Speed clock is eight times slower than full speed i.e. 1/8th of 138 the full speed. 139 140 4.For non-periodic transfers, reserve for at least one low-speed device 141 transaction per frame. According to the USB Bandwidth Analysis white 142 paper and also as per OHCI Specification 1.0a, section 7.3.5, page 123, 143 one low-speed transaction takes 0x628h full speed bits (197 bytes), 144 which comes to around 13% of USB frame time. 145 146 5. Maximum Periodic bandwidth is calculated using the following formula 147 148 Maximum Periodic bandwidth = Maximum bandwidth available 149 - SOF - EOF - Maximum Non Periodic bandwidth. 150 151 o Bus Transaction Formulas 152 153 (Refer 5.11.3 section of USB2.0 specification) 154 155 - Full-Speed: 156 157 Protocol overhead + ((MaxPacketSize * 7) / 6 ) + Host_Delay 158 159 - Low-Speed: 160 161 Protocol overhead + Hub LS overhead + 162 (Low-Speed clock * ((MaxPacketSize * 7) / 6 )) + Host_Delay 163 164 o Periodic Schedule 165 166 The figure 5.5 in OHCI specification 1.0a gives you information on periodic 167 scheduling, different polling intervals that are supported, & other details 168 for the OHCI host controller. 169 170 - The host controller processes one interrupt endpoint descriptor list every 171 frame. The lower five bits of the current frame number us used as an 172 index into an array of 32 interrupt endpoint descriptor lists or periodic 173 frame lists found in the HCCA (Host controller communication area). This 174 means each list is revisited once every 32ms. The host controller driver 175 sets up the interrupt lists to visit any given endpoint descriptor in as 176 many lists as necessary to provide the interrupt granularity required for 177 that endpoint. See figure 5.5 in OHCI specification 1.0a. 178 179 - Isochronous endpoint descriptors are added at the end of 1ms interrupt 180 endpoint descriptors. 181 182 - The host controller driver maintains an array of 32 frame bandwidth lists 183 to save bandwidth allocated in each USB frame. 184 185 Please refer section 5.2.7.2 of OHCI specification 1.0a, page 61 for more 186 details. 187 188 o Bandwidth Allocation Scheme 189 190 The OHCI host controller driver will go through the following steps to 191 allocate bandwidth needed for an interrupt or isochronous endpoint as 192 follows 193 194 - Calculate the bandwidth required for the given endpoint using the bus 195 transaction formula and protocol overhead calculations mentioned in 196 previous section. 197 198 - Compare the bandwidth available in the least allocated frame list out of 199 the 32 frame bandwidth lists, against the bandwidth required by this 200 endpoint. If this exceeds the limit, then, an return error. 201 202 - Find out the static node to which the given endpoint needs to be linked 203 so that it will be polled as per the required polling interval. This value 204 varies based on polling interval and current bandwidth load on this 205 schedule. See figure 5.5 in OHCI specification 1.0a. 206 207 Ex: If a polling interval is 4ms, then, the endpoint will be linked to one 208 of the four static nodes (range 3-6) in the 4ms column of figure 5.5 209 in OHCI specification 1.0a. 210 211 - Depending on the polling interval, we need to add the above calculated 212 bandwidth to one or more frame bandwidth lists. Before adding, we need to 213 double check the availability of bandwidth in those respective lists. If 214 this exceeds the limit, then, return an error. Add this bandwidth to all 215 the required frame bandwidth lists. 216 217 Ex: Assume a give polling interval of 4 and a static node value of 3. 218 In this case, we need to add required bandwidth to 0,4,8,12,16,20,24, 219 28 frame bandwidth lists. 220 221 2223.High speed bus 223 224 o Timing and data rate calculations 225 226 - Timing calculations 227 228 1 sec 1000 ms 229 125 us 1 uframe 230 1 ms 1 frame or 8 uframes 231 232 - Data rate calculations 233 234 125 us 7500 bytes (per uframe) 235 16.66 ns 1 byte or 8 bits 236 237 1 high speed bit time 2.083 ns 238 239 o Protocol Overheads and Bandwidth numbers 240 241 - Protocol Overheads 242 243 (Refer 5.11.3, 8.4.2.2 and 8.4.2.3 sections of USB2.0 specification) 244 245 Non Isochronous 917 ns 55 bytes 246 Isochronous 634 ns 38 bytes 247 248 Start split overhead 67 ns 4 bytes 249 Complete split overhead 67 ns 4 bytes 250 251 SOF 200 ns 12 bytes 252 EOF 1667 ns 70 bytes 253 254 Host Delay* Specific to hardware 18 bytes 255 256 - Bandwidth numbers 257 258 (Refer 5.5.4 section of USB2.0 specification) 259 260 Maximum bandwidth available 7500 bytes/uframe 261 Maximum Non Periodic bandwidth* 1500 bytes/uframe 262 Maximum Periodic bandwidth* 5918 bytes/uframe 263 264 NOTE: 265 266 1.Host delay will be specific to particular hardware. 267 268 2.As per USB 2.0 specification section 5.5.4, 20% of bus time is reserved 269 for the non-periodic high-speed transfers, where as periodic high-speed 270 transfers will get 80% of the bus time. In one micro-frame or 125us, we 271 can transfer 7500 bytes or 60,000 bits. So 20% of 7500 is 1500 bytes. 272 273 3.Maximum Periodic bandwidth is calculated using the following formula 274 275 Maximum Periodic bandwidth = Maximum bandwidth available 276 - SOF - EOF - Maximum Non Periodic bandwidth. 277 278 o Bus Transaction Formulas 279 280 (Refer 5.11.3 8.4.2.2 and 8.4.2.3 sections of USB2.0 specification) 281 282 - High-Speed (Non-Split transactions): 283 284 (Protocol overhead + ((MaxPacketSize * 7) / 6 ) + 285 Host_Delay) x Number of transactions per micro-frame 286 287 - High-Speed (Split transaction - Device to Host): 288 289 Start Split transaction: 290 291 Protocol overhead + Host_Delay + Start split overhead 292 293 Complete Split transaction: 294 295 Protocol overhead + ((MaxPacketSize * 7) / 6 ) + 296 Host_Delay + Complete split overhead 297 298 - High-Speed (Split transaction - Host to Device): 299 300 Start Split transaction: 301 302 Protocol overhead + ((MaxPacketSize * 7) / 6 ) + 303 Host_Delay) + Start split overhead 304 305 Complete Split transaction: 306 307 Protocol overhead + Host_Delay + Complete split overhead 308 309 310 o Interrupt schedule or Start and Complete split masks 311 312 (Refer 3.6.2 & 4.12.2 sections of EHCI 1.0 specification) 313 314 - Interrupt schedule or Start split mask 315 316 This field is used for for high, full and low speed usb device interrupt 317 and isochronous endpoints. This will tell the host controller which micro- 318 frame of a given usb frame to initiate a high speed interrupt and 319 isochronous transaction. For full/low speed devices, it will tell when to 320 initiate a "start split" transaction. 321 322 ehci_start_split_mask[15] = /* One byte field */ 323 /* 324 * For all low/full speed devices, and for high speed devices with 325 * a polling interval greater than or equal to 8us (125us). 326 */ 327 {0x01, /* 00000001 */ 328 0x02, /* 00000010 */ 329 0x04, /* 00000100 */ 330 0x08, /* 00001000 */ 331 0x10, /* 00010000 */ 332 0x20, /* 00100000 */ 333 0x40, /* 01000000 */ 334 0x80, /* 10000000 */ 335 336 /* For high speed devices with a polling interval of 4us. */ 337 0x11, /* 00010001 */ 338 0x22, /* 00100010 */ 339 0x44, /* 01000100 */ 340 0x88, /* 10001000 */ 341 342 /* For high speed devices with a polling interval of 2us. */ 343 0x55, /* 01010101 */ 344 0xaa, /* 10101010 */ 345 346 /* For high speed devices with a polling interval of 1us. */ 347 0xff }; /* 11111111 */ 348 349 - Complete split mask 350 351 This field is used only for full/low speed usb device interrupt and 352 isochronous endpoints. It will tell the host controller which micro frame 353 to initiate a "complete split" transaction. Complete split transactions 354 can to be retried for up to 3 times. So bandwidth for complete split 355 transaction is reserved in 3 consecutive micro frames 356 357 ehci_complete_split_mask[8] = /* One byte field */ 358 /* Only full/low speed devices */ 359 {0x0e, /* 00001110 */ 360 0x1c, /* 00011100 */ 361 0x38, /* 00111000 */ 362 0x70, /* 01110000 */ 363 0xe0, /* 11100000 */ 364 Reserved , /* Need FSTN feature */ 365 Reserved , /* Need FSTN feature */ 366 Reserved}; /* Need FSTN feature */ 367 368 o Periodic Schedule 369 370 The figure 4.8 in EHCI specification gives you information on periodic 371 scheduling, different polling intervals that are supported, and other 372 details for the EHCI host controller. 373 374 - The high speed host controller can support 256, 512 or 1024 periodic frame 375 lists. By default all host controllers will support 1024 frame lists. In 376 our implementation, we support 1024 frame lists and we do this by first 377 constructing 32 periodic frame lists and duplicating the same periodic 378 frame lists for a total of 32 times. See figure 4.8 in EHCI specification. 379 380 - The host controller traverses the periodic schedule by constructing an 381 array offset reference from the PERIODICLISTBASE & the FRINDEX registers. 382 It fetches the element and begins traversing the graph of linked schedule 383 data structure. See fig 4.8 in EHCI specification. 384 385 - The host controller processes one interrupt endpoint descriptor list every 386 micro frame (125us). This means same list is revisited 8 times in a frame. 387 388 - The host controller driver sets up the interrupt lists to visit any given 389 endpoint descriptor in as many lists as necessary to provide the interrupt 390 granularity required for that endpoint. 391 392 - For isochronous transfers, we use only transfer descriptors but no 393 endpoint descriptors as in OHCI. Transfer descriptors are added at the 394 beginning of the periodic schedule. 395 396 - For EHCI, the bandwidth requirement is depends on the usb device speed 397 i.e. 398 399 For a high speed usb device, you only need high speed bandwidth. For a 400 full/low speed device connected through a high speed hub, you need both 401 high speed bandwidth and TT (transaction translator) bandwidth. 402 403 High speed bandwidth information is saved in an EHCI data structure and TT 404 bandwidth is saved in the high speed hub's usb device data structure. Each 405 TT acts as a full speed host controller & its bandwidth allocation scheme 406 overhead calculations and other details are similar to those of a full 407 speed host controller. Refer to the "Full speed bus" section for more 408 details. 409 410 - The EHCI host controller driver maintains an array of 32 frame lists to 411 store high speed bandwidth allocated in each frame and also each frame 412 list has eight micro frame lists, which saves bandwidth allocated in each 413 micro frame of that particular frame. 414 415 o Bandwidth Allocation Scheme 416 417 (Refer 3.6.2 & 4.12.2 sections of EHCI 1.0 specification) 418 419 High speed Non Split Transaction (for High speed devices only): 420 421 For a given high speed interrupt or isochronous endpoint, the EHCI host 422 controller driver will go through the following steps to allocate 423 bandwidth needed for this endpoint. 424 425 - Calculate the bandwidth required for given endpoint using the formula and 426 overhead calculations mentioned in previous section. 427 428 - Compare the bandwidth available in the least allocated frame list out of 429 the 32 frame lists against the bandwidth required by this endpoint. If 430 this exceeds the limit, then, return an error. 431 432 - Map a given high speed endpoint's polling interval in micro seconds to an 433 interrupt list path based on a millisecond value. For example, an endpoint 434 with a polling interval of 16us will map to an interrupt list path of 2ms. 435 436 - Find out the static node to which the given endpoint needs to be linked 437 so that it will be polled at its required polling interval. This varies 438 based on polling interval and current bandwidth load on this schedule. 439 440 Ex: If a polling interval is 32us and its corresponding frame polling 441 interval will be 4ms, then the endpoint will be linked to one of the 442 four static nodes (range 3-6) in the 4ms column of figure 4.8 in EHCI 443 specification. 444 445 - Depending on the polling interval, we need to add the above calculated 446 bandwidth to one or more frame bandwidth lists, and also to one or more 447 micro frame bandwidth lists for that particular frame bandwidth list. 448 Before adding, we need to double check the availability of bandwidth in 449 those respective lists. If needed bandwidth is not available, then, 450 return an error. Otherwise add this bandwidth to all the required frame 451 and micro frame lists. 452 453 Ex: Assume given endpoint's polling interval is 32us and static node value 454 is 3. In this case, we need to add required bandwidth to 0,4,8,12,16, 455 20,24,28 frame bandwidth lists and micro bandwidth information is 456 saved using ehci_start_split_masks matrix. For this example, we need 457 to use any one of the 15 entries to save micro frame bandwidth. 458 459 High speed split transactions (for full and low speed devices only): 460 461 For a given full/low speed interrupt or isochronous endpoint, we need both 462 high speed and TT bandwidths. The TT bandwidth allocation is same as full 463 speed bus bandwidth allocation. Please refer to the "full speed bus" 464 bandwidth allocation section for more details. 465 466 The EHCI driver will go through the following steps to allocate high speed 467 bandwidth needed for this full/low speed endpoint. 468 469 - Calculate the bandwidth required for a given endpoint using the formula 470 and overhead calculations mentioned in previous section. In this case, 471 we need to calculate bandwidth needed both for Start and Complete start 472 transactions separately. 473 474 - Compare the bandwidth available in the least allocated frame list out of 475 32 frame lists against the bandwidth required by this endpoint. If this 476 exceeds the limit, then, return an error. 477 478 - Find out the static node to which the given endpoint needs to be linked 479 so that it will be polled as per the required polling interval. This 480 value varies based on polling interval and current bandwidth load on 481 this schedule. 482 483 Ex: If a polling interval is 4ms, then the endpoint will be linked to 484 one of the four static nodes (range 3-6) in the 4ms column of figure 485 4.8 in EHCI specification. 486 487 - Depending on the polling interval, we need to add the above calculated 488 Start and Complete split transactions bandwidth to one or more frame 489 bandwidth lists and also to one or more micro frame bandwidth lists for 490 that particular frame bandwidth list. In this case, the Start split 491 transaction needs bandwidth in one micro frame, where as the Complete 492 split transaction needs bandwidth in next three subsequent micro frames 493 of that particular frame or next frame. Before adding, we need to double 494 check the availability of bandwidth in those respective lists. If needed 495 bandwidth is not available, then, return an error. Otherwise add this 496 bandwidth to all the required lists. 497 498 Ex: Assume give polling interval is 4ms and static node value is 3. In 499 this case, we need to add required Start and Complete split 500 bandwidth to the 0,4,8,12,16,20,24,28 frame bandwidth lists. The 501 micro frame bandwidth lists is stored using ehci_start_split_mask & 502 ehci_complete_split_mask matrices. In this case, we need to use any 503 of the first 8 entries to save micro frame bandwidth. 504 505 Assume we found that the following micro frame bandwidth lists of 506 0,4,8,12,16,20,24,28 frame lists can be used for this endpoint. 507 It means, we need to initiate "start split transaction" in first 508 micro frame of 0,4,8,12,16,20,24,28 frames. 509 510 Start split mask = 0x01, /* 00000001 */ 511 512 For this "start split mask", the "complete split mask" should be 513 514 Complete split mask = 0x0e, /* 00001110 */ 515 516 It means try "complete split transactions" in second, third or 517 fourth micro frames of 0,4,8,12,16,20,24,28 frames. 518 5194.Reference 520 521 - USB2.0, OHCI and EHCI Specifications 522 523 http://www.usb.org/developers/docs 524 525 - USB bandwidth analysis from Intel 526 527 http://www.usb.org/developers/whitepapers 528