1# 2# CDDL HEADER START 3# 4# The contents of this file are subject to the terms of the 5# Common Development and Distribution License (the "License"). 6# You may not use this file except in compliance with the License. 7# 8# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9# or http://www.opensolaris.org/os/licensing. 10# See the License for the specific language governing permissions 11# and limitations under the License. 12# 13# When distributing Covered Code, include this CDDL HEADER in each 14# file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15# If applicable, add the following below this CDDL HEADER, with the 16# fields enclosed by brackets "[]" replaced with your own identifying 17# information: Portions Copyright [yyyy] [name of copyright owner] 18# 19# CDDL HEADER END 20# 21/* 22 * Copyright 2008 Sun Microsystems, Inc. All rights reserved. 23 * Use is subject to license terms. 24 */ 25 26 27 SOLARIS USB BANDWIDTH ANALYSIS 28 291.Introduction 30 31 This document discuss the USB bandwidth allocation scheme, and the protocol 32 overheads used for both full and high speed host controller drivers. This 33 information is derived from the USB 2.0 specification, the "Bandwidth Analysis 34 Whitepaper" which is posted on www.usb.org, and other resources. 35 36 The target audience for this whitepaper are USB software & hardware designers 37 and engineers, and other interested people. The reader should be familiar with 38 the Universal Serial Bus Specification version 2.0, the OpenHCI Specification 39 1.0a and the Enhanced HCI Specification 1.0. 40 412.Full speed bus 42 43 The following overheads, formulas and scheme are applicable both to full speed 44 host controllers and also to high speed hub Transaction Translators (TT), 45 which perform full/low speed transactions. 46 47 o Timing and data rate calculations 48 49 - Timing calculations 50 51 1 sec 1000 ms or 1000000000 ns 52 1 ms 1 frame 53 54 - Data rate calculations 55 56 1 ms 1500 bytes or 12000 bits (per frame) 57 668 ns 1 byte or 8 bits 58 59 1 full speed bit time 83.54 ns 60 61 o Protocol Overheads and Bandwidth numbers 62 63 - Protocol Overheads 64 65 (Refer 5.11.3 section of USB2.0 specification & page 2 of USB Bandwidth 66 Analysis document) 67 68 Non Isochronous 9107 ns 14 bytes 69 Isochronous Output 6265 ns 10 bytes 70 Isochronous Input 7268 ns 11 bytes 71 Low-speed overhead 64060 ns 97 bytes 72 Hub LS overhead* 668 ns 1 byte 73 SOF 4010 ns 6 bytes 74 EOF 2673 ns 4 bytes 75 76 Host Delay* Specific to hardware 18 bytes 77 Low-Speed clock* Slower than Full speed 8 78 79 - Bandwidth numbers 80 81 (Refer 7.3.5 section of OHCI specification 1.0a & page 2 of USB Bandwidth 82 Analysis document) 83 84 Maximum bandwidth available 1500 bytes/frame 85 Maximum Non Periodic bandwidth 197 bytes/frame 86 Maximum Periodic bandwidth 1293 bytes/frame 87 88 NOTE: 89 90 1.Hub specific low speed overhead 91 92 The time provided by the Host Controller for hubs to enable Low Speed 93 ports. The minimum of 4 full speed bit time. 94 95 overhead = 2 x Hub_LS_Setup 96 = 2 x (4 x 83.54) = 668.32 Nano seconds = 1 byte. 97 98 2.Host delay will be specific to particular hardware. The following host 99 delay is for RIO USB OHCI host controller (Provided by Ken Ward - RIO 100 USB hardware person). The following is just an example how to calculate 101 "host delay" for given USB host controller implementation. 102 103 Ex: Assuming ED (Endpoint Descriptor)/TD's (Transfer Descriptor) are not 104 streaming in Schizo (PCI bridge) and no cache hits for an ED or TD: 105 106 To read an ED or TD or data: 107 108 PCI_ARB_DELAY + PCI_ADDRESS + SCHIZO_RETRY 109 PCI_ARB_DELAY + PCI_ADDRESS + SCHIZO_TRDY + 110 DATA + Core_overhead 111 112 Where, 113 114 PCI_ARB_DELAY = 2000ns 115 PCI_ADDRESS = 30ns 116 SCHIZO RETRY = 60ns 117 SCHIZO TRDY = 60ns 118 DATA = 240ns (Always read 64 bytes ...) 119 Core Overhead =240 + 30 * (MPS/4) + 83.54 * (MPS/4) + 4 * 83.54 120 = ~3400ns 121 122 now multiply by 3 for ED+TD+DATA = 10200ns = ~128 bits or 16 bytes. 123 124 This is probably on the optimistic side, only using 2us for the 125 PCI_ARB_DELAY. 126 127 If there is a USB cache hit, the time it takes for an ED or TD is: 128 129 CORE SYNC DELAY + CACHE_HIT CHECK + 30 * (MPS/4) + CORE OVERHEAD 130 131 240 + 30 + 120 + 1000ns ~ 1400ns , or ~ 2 bytes 132 133 Total Host delay will be 18 bytes. 134 135 3.The Low-Speed clock is eight times slower than full speed i.e. 1/8th of 136 the full speed. 137 138 4.For non-periodic transfers, reserve for at least one low-speed device 139 transaction per frame. According to the USB Bandwidth Analysis white 140 paper and also as per OHCI Specification 1.0a, section 7.3.5, page 123, 141 one low-speed transaction takes 0x628h full speed bits (197 bytes), 142 which comes to around 13% of USB frame time. 143 144 5. Maximum Periodic bandwidth is calculated using the following formula 145 146 Maximum Periodic bandwidth = Maximum bandwidth available 147 - SOF - EOF - Maximum Non Periodic bandwidth. 148 149 o Bus Transaction Formulas 150 151 (Refer 5.11.3 section of USB2.0 specification) 152 153 - Full-Speed: 154 155 Protocol overhead + ((MaxPacketSize * 7) / 6 ) + Host_Delay 156 157 - Low-Speed: 158 159 Protocol overhead + Hub LS overhead + 160 (Low-Speed clock * ((MaxPacketSize * 7) / 6 )) + Host_Delay 161 162 o Periodic Schedule 163 164 The figure 5.5 in OHCI specification 1.0a gives you information on periodic 165 scheduling, different polling intervals that are supported, & other details 166 for the OHCI host controller. 167 168 - The host controller processes one interrupt endpoint descriptor list every 169 frame. The lower five bits of the current frame number us used as an 170 index into an array of 32 interrupt endpoint descriptor lists or periodic 171 frame lists found in the HCCA (Host controller communication area). This 172 means each list is revisited once every 32ms. The host controller driver 173 sets up the interrupt lists to visit any given endpoint descriptor in as 174 many lists as necessary to provide the interrupt granularity required for 175 that endpoint. See figure 5.5 in OHCI specification 1.0a. 176 177 - Isochronous endpoint descriptors are added at the end of 1ms interrupt 178 endpoint descriptors. 179 180 - The host controller driver maintains an array of 32 frame bandwidth lists 181 to save bandwidth allocated in each USB frame. 182 183 Please refer section 5.2.7.2 of OHCI specification 1.0a, page 61 for more 184 details. 185 186 o Bandwidth Allocation Scheme 187 188 The OHCI host controller driver will go through the following steps to 189 allocate bandwidth needed for an interrupt or isochronous endpoint as 190 follows 191 192 - Calculate the bandwidth required for the given endpoint using the bus 193 transaction formula and protocol overhead calculations mentioned in 194 previous section. 195 196 - Compare the bandwidth available in the least allocated frame list out of 197 the 32 frame bandwidth lists, against the bandwidth required by this 198 endpoint. If this exceeds the limit, then, an return error. 199 200 - Find out the static node to which the given endpoint needs to be linked 201 so that it will be polled as per the required polling interval. This value 202 varies based on polling interval and current bandwidth load on this 203 schedule. See figure 5.5 in OHCI specification 1.0a. 204 205 Ex: If a polling interval is 4ms, then, the endpoint will be linked to one 206 of the four static nodes (range 3-6) in the 4ms column of figure 5.5 207 in OHCI specification 1.0a. 208 209 - Depending on the polling interval, we need to add the above calculated 210 bandwidth to one or more frame bandwidth lists. Before adding, we need to 211 double check the availability of bandwidth in those respective lists. If 212 this exceeds the limit, then, return an error. Add this bandwidth to all 213 the required frame bandwidth lists. 214 215 Ex: Assume a give polling interval of 4 and a static node value of 3. 216 In this case, we need to add required bandwidth to 0,4,8,12,16,20,24, 217 28 frame bandwidth lists. 218 219 2203.High speed bus 221 222 o Timing and data rate calculations 223 224 - Timing calculations 225 226 1 sec 1000 ms 227 125 us 1 uframe 228 1 ms 1 frame or 8 uframes 229 230 - Data rate calculations 231 232 125 us 7500 bytes (per uframe) 233 16.66 ns 1 byte or 8 bits 234 235 1 high speed bit time 2.083 ns 236 237 o Protocol Overheads and Bandwidth numbers 238 239 - Protocol Overheads 240 241 (Refer 5.11.3, 8.4.2.2 and 8.4.2.3 sections of USB2.0 specification) 242 243 Non Isochronous 917 ns 55 bytes 244 Isochronous 634 ns 38 bytes 245 246 Start split overhead 67 ns 4 bytes 247 Complete split overhead 67 ns 4 bytes 248 249 SOF 200 ns 12 bytes 250 EOF 1667 ns 70 bytes 251 252 Host Delay* Specific to hardware 18 bytes 253 254 - Bandwidth numbers 255 256 (Refer 5.5.4 section of USB2.0 specification) 257 258 Maximum bandwidth available 7500 bytes/uframe 259 Maximum Non Periodic bandwidth* 1500 bytes/uframe 260 Maximum Periodic bandwidth* 5918 bytes/uframe 261 262 NOTE: 263 264 1.Host delay will be specific to particular hardware. 265 266 2.As per USB 2.0 specification section 5.5.4, 20% of bus time is reserved 267 for the non-periodic high-speed transfers, where as periodic high-speed 268 transfers will get 80% of the bus time. In one micro-frame or 125us, we 269 can transfer 7500 bytes or 60,000 bits. So 20% of 7500 is 1500 bytes. 270 271 3.Maximum Periodic bandwidth is calculated using the following formula 272 273 Maximum Periodic bandwidth = Maximum bandwidth available 274 - SOF - EOF - Maximum Non Periodic bandwidth. 275 276 o Bus Transaction Formulas 277 278 (Refer 5.11.3 8.4.2.2 and 8.4.2.3 sections of USB2.0 specification) 279 280 - High-Speed (Non-Split transactions): 281 282 (Protocol overhead + ((MaxPacketSize * 7) / 6 ) + 283 Host_Delay) x Number of transactions per micro-frame 284 285 - High-Speed (Split transaction - Device to Host): 286 287 Start Split transaction: 288 289 Protocol overhead + Host_Delay + Start split overhead 290 291 Complete Split transaction: 292 293 Protocol overhead + ((MaxPacketSize * 7) / 6 ) + 294 Host_Delay + Complete split overhead 295 296 - High-Speed (Split transaction - Host to Device): 297 298 Start Split transaction: 299 300 Protocol overhead + ((MaxPacketSize * 7) / 6 ) + 301 Host_Delay) + Start split overhead 302 303 Complete Split transaction: 304 305 Protocol overhead + Host_Delay + Complete split overhead 306 307 308 o Interrupt schedule or Start and Complete split masks 309 310 (Refer 3.6.2 & 4.12.2 sections of EHCI 1.0 specification) 311 312 - Interrupt schedule or Start split mask 313 314 This field is used for for high, full and low speed usb device interrupt 315 and isochronous endpoints. This will tell the host controller which micro- 316 frame of a given usb frame to initiate a high speed interrupt and 317 isochronous transaction. For full/low speed devices, it will tell when to 318 initiate a "start split" transaction. 319 320 ehci_start_split_mask[15] = /* One byte field */ 321 /* 322 * For all low/full speed devices, and for high speed devices with 323 * a polling interval greater than or equal to 8us (125us). 324 */ 325 {0x01, /* 00000001 */ 326 0x02, /* 00000010 */ 327 0x04, /* 00000100 */ 328 0x08, /* 00001000 */ 329 0x10, /* 00010000 */ 330 0x20, /* 00100000 */ 331 0x40, /* 01000000 */ 332 0x80, /* 10000000 */ 333 334 /* For high speed devices with a polling interval of 4us. */ 335 0x11, /* 00010001 */ 336 0x22, /* 00100010 */ 337 0x44, /* 01000100 */ 338 0x88, /* 10001000 */ 339 340 /* For high speed devices with a polling interval of 2us. */ 341 0x55, /* 01010101 */ 342 0xaa, /* 10101010 */ 343 344 /* For high speed devices with a polling interval of 1us. */ 345 0xff }; /* 11111111 */ 346 347 - Complete split mask 348 349 This field is used only for full/low speed usb device interrupt and 350 isochronous endpoints. It will tell the host controller which micro frame 351 to initiate a "complete split" transaction. Complete split transactions 352 can to be retried for up to 3 times. So bandwidth for complete split 353 transaction is reserved in 3 consecutive micro frames 354 355 ehci_complete_split_mask[8] = /* One byte field */ 356 /* Only full/low speed devices */ 357 {0x0e, /* 00001110 */ 358 0x1c, /* 00011100 */ 359 0x38, /* 00111000 */ 360 0x70, /* 01110000 */ 361 0xe0, /* 11100000 */ 362 Reserved , /* Need FSTN feature */ 363 Reserved , /* Need FSTN feature */ 364 Reserved}; /* Need FSTN feature */ 365 366 o Periodic Schedule 367 368 The figure 4.8 in EHCI specification gives you information on periodic 369 scheduling, different polling intervals that are supported, and other 370 details for the EHCI host controller. 371 372 - The high speed host controller can support 256, 512 or 1024 periodic frame 373 lists. By default all host controllers will support 1024 frame lists. In 374 our implementation, we support 1024 frame lists and we do this by first 375 constructing 32 periodic frame lists and duplicating the same periodic 376 frame lists for a total of 32 times. See figure 4.8 in EHCI specification. 377 378 - The host controller traverses the periodic schedule by constructing an 379 array offset reference from the PERIODICLISTBASE & the FRINDEX registers. 380 It fetches the element and begins traversing the graph of linked schedule 381 data structure. See fig 4.8 in EHCI specification. 382 383 - The host controller processes one interrupt endpoint descriptor list every 384 micro frame (125us). This means same list is revisited 8 times in a frame. 385 386 - The host controller driver sets up the interrupt lists to visit any given 387 endpoint descriptor in as many lists as necessary to provide the interrupt 388 granularity required for that endpoint. 389 390 - For isochronous transfers, we use only transfer descriptors but no 391 endpoint descriptors as in OHCI. Transfer descriptors are added at the 392 beginning of the periodic schedule. 393 394 - For EHCI, the bandwidth requirement is depends on the usb device speed 395 i.e. 396 397 For a high speed usb device, you only need high speed bandwidth. For a 398 full/low speed device connected through a high speed hub, you need both 399 high speed bandwidth and TT (transaction translator) bandwidth. 400 401 High speed bandwidth information is saved in an EHCI data structure and TT 402 bandwidth is saved in the high speed hub's usb device data structure. Each 403 TT acts as a full speed host controller & its bandwidth allocation scheme 404 overhead calculations and other details are similar to those of a full 405 speed host controller. Refer to the "Full speed bus" section for more 406 details. 407 408 - The EHCI host controller driver maintains an array of 32 frame lists to 409 store high speed bandwidth allocated in each frame and also each frame 410 list has eight micro frame lists, which saves bandwidth allocated in each 411 micro frame of that particular frame. 412 413 o Bandwidth Allocation Scheme 414 415 (Refer 3.6.2 & 4.12.2 sections of EHCI 1.0 specification) 416 417 High speed Non Split Transaction (for High speed devices only): 418 419 For a given high speed interrupt or isochronous endpoint, the EHCI host 420 controller driver will go through the following steps to allocate 421 bandwidth needed for this endpoint. 422 423 - Calculate the bandwidth required for given endpoint using the formula and 424 overhead calculations mentioned in previous section. 425 426 - Compare the bandwidth available in the least allocated frame list out of 427 the 32 frame lists against the bandwidth required by this endpoint. If 428 this exceeds the limit, then, return an error. 429 430 - Map a given high speed endpoint's polling interval in micro seconds to an 431 interrupt list path based on a millisecond value. For example, an endpoint 432 with a polling interval of 16us will map to an interrupt list path of 2ms. 433 434 - Find out the static node to which the given endpoint needs to be linked 435 so that it will be polled at its required polling interval. This varies 436 based on polling interval and current bandwidth load on this schedule. 437 438 Ex: If a polling interval is 32us and its corresponding frame polling 439 interval will be 4ms, then the endpoint will be linked to one of the 440 four static nodes (range 3-6) in the 4ms column of figure 4.8 in EHCI 441 specification. 442 443 - Depending on the polling interval, we need to add the above calculated 444 bandwidth to one or more frame bandwidth lists, and also to one or more 445 micro frame bandwidth lists for that particular frame bandwidth list. 446 Before adding, we need to double check the availability of bandwidth in 447 those respective lists. If needed bandwidth is not available, then, 448 return an error. Otherwise add this bandwidth to all the required frame 449 and micro frame lists. 450 451 Ex: Assume given endpoint's polling interval is 32us and static node value 452 is 3. In this case, we need to add required bandwidth to 0,4,8,12,16, 453 20,24,28 frame bandwidth lists and micro bandwidth information is 454 saved using ehci_start_split_masks matrix. For this example, we need 455 to use any one of the 15 entries to save micro frame bandwidth. 456 457 High speed split transactions (for full and low speed devices only): 458 459 For a given full/low speed interrupt or isochronous endpoint, we need both 460 high speed and TT bandwidths. The TT bandwidth allocation is same as full 461 speed bus bandwidth allocation. Please refer to the "full speed bus" 462 bandwidth allocation section for more details. 463 464 The EHCI driver will go through the following steps to allocate high speed 465 bandwidth needed for this full/low speed endpoint. 466 467 - Calculate the bandwidth required for a given endpoint using the formula 468 and overhead calculations mentioned in previous section. In this case, 469 we need to calculate bandwidth needed both for Start and Complete start 470 transactions separately. 471 472 - Compare the bandwidth available in the least allocated frame list out of 473 32 frame lists against the bandwidth required by this endpoint. If this 474 exceeds the limit, then, return an error. 475 476 - Find out the static node to which the given endpoint needs to be linked 477 so that it will be polled as per the required polling interval. This 478 value varies based on polling interval and current bandwidth load on 479 this schedule. 480 481 Ex: If a polling interval is 4ms, then the endpoint will be linked to 482 one of the four static nodes (range 3-6) in the 4ms column of figure 483 4.8 in EHCI specification. 484 485 - Depending on the polling interval, we need to add the above calculated 486 Start and Complete split transactions bandwidth to one or more frame 487 bandwidth lists and also to one or more micro frame bandwidth lists for 488 that particular frame bandwidth list. In this case, the Start split 489 transaction needs bandwidth in one micro frame, where as the Complete 490 split transaction needs bandwidth in next three subsequent micro frames 491 of that particular frame or next frame. Before adding, we need to double 492 check the availability of bandwidth in those respective lists. If needed 493 bandwidth is not available, then, return an error. Otherwise add this 494 bandwidth to all the required lists. 495 496 Ex: Assume give polling interval is 4ms and static node value is 3. In 497 this case, we need to add required Start and Complete split 498 bandwidth to the 0,4,8,12,16,20,24,28 frame bandwidth lists. The 499 micro frame bandwidth lists is stored using ehci_start_split_mask & 500 ehci_complete_split_mask matrices. In this case, we need to use any 501 of the first 8 entries to save micro frame bandwidth. 502 503 Assume we found that the following micro frame bandwidth lists of 504 0,4,8,12,16,20,24,28 frame lists can be used for this endpoint. 505 It means, we need to initiate "start split transaction" in first 506 micro frame of 0,4,8,12,16,20,24,28 frames. 507 508 Start split mask = 0x01, /* 00000001 */ 509 510 For this "start split mask", the "complete split mask" should be 511 512 Complete split mask = 0x0e, /* 00001110 */ 513 514 It means try "complete split transactions" in second, third or 515 fourth micro frames of 0,4,8,12,16,20,24,28 frames. 516 5174.Reference 518 519 - USB2.0, OHCI and EHCI Specifications 520 521 http://www.usb.org/developers/docs 522 523 - USB bandwidth analysis from Intel 524 525 http://www.usb.org/developers/whitepapers 526