#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License, Version 1.0 only
# (the "License").  You may not use this file except in compliance
# with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
/*
 * Copyright 2003 Sun Microsystems, Inc.  All rights reserved.
 * Use is subject to license terms.
 */

#pragma ident	"%Z%%M%	%I%	%E% SMI"

			SOLARIS USB BANDWIDTH ANALYSIS
			
1.Introduction

  This document discuss the USB bandwidth allocation scheme, and the protocol 
  overheads used for both full and high speed host controller drivers. This
  information is derived from the USB 2.0 specification, the "Bandwidth Analysis
  Whitepaper" which is posted on www.usb.org, and other resources.

  The target audience for this whitepaper are USB software & hardware designers 
  and engineers, and other interested people. The reader should be familiar with
  the Universal Serial Bus Specification version 2.0, the OpenHCI Specification
  1.0a and the Enhanced HCI Specification 1.0. 

2.Full speed bus

  The following overheads, formulas and scheme are applicable both to full speed
  host controllers and also to high speed hub Transaction Translators (TT),
  which perform full/low speed transactions.

  o Timing and data rate calculations

    - Timing calculations

      1 sec			1000 ms or 1000000000 ns
      1 ms			1 frame

    - Data rate calculations

      1 ms			1500 bytes or 12000 bits (per  frame)
      668 ns			1 byte or 8 bits

      1 full speed bit time	83.54 ns
              
  o Protocol Overheads and Bandwidth numbers

    - Protocol Overheads

      (Refer 5.11.3 section of USB2.0 specification & page 2 of USB Bandwidth
       Analysis document)

      Non Isochronous	  	9107 ns			14 bytes
      Isochronous Output	6265 ns			10 bytes
      Isochronous Input	        7268 ns			11 bytes
      Low-speed overhead       64060 ns			97 bytes
      Hub LS overhead*	         668 ns	 	 	 1 byte
      SOF		        4010 ns 	 	 6 bytes
      EOF		        2673 ns		 	 4 bytes

      Host Delay*		Specific to hardware    18 bytes
      Low-Speed clock*		Slower than Full speed	 8
	
    - Bandwidth numbers

      (Refer 7.3.5 section of OHCI specification 1.0a & page 2 of USB Bandwidth
       Analysis document) 

      Maximum bandwidth available		      1500 bytes/frame
      Maximum Non Periodic bandwidth	  	       197 bytes/frame
      Maximum Periodic bandwidth		      1293 bytes/frame

      NOTE:

      1.Hub specific low speed overhead

        The time provided by the Host Controller for hubs to enable Low Speed
        ports. The minimum of 4 full speed bit time.

        overhead = 2 x Hub_LS_Setup
                 = 2 x (4 x 83.54) =  668.32 Nano seconds  = 1 byte.
              
      2.Host delay will be specific to particular hardware. The following host
        delay is for RIO USB OHCI host controller (Provided by Ken Ward - RIO
        USB hardware person). The following is just an example how to calculate 
        "host delay" for given USB host controller implementation.

        Ex: Assuming ED (Endpoint Descriptor)/TD's (Transfer Descriptor) are not
            streaming in Schizo (PCI bridge) and no cache hits for an ED or TD:

            To read an ED or TD or data:

            PCI_ARB_DELAY + PCI_ADDRESS + SCHIZO_RETRY
            PCI_ARB_DELAY + PCI_ADDRESS + SCHIZO_TRDY +
			DATA +  Core_overhead

            Where,

	    PCI_ARB_DELAY = 2000ns
	    PCI_ADDRESS = 30ns
	    SCHIZO RETRY = 60ns
	    SCHIZO TRDY = 60ns
	    DATA = 240ns (Always read 64 bytes ...)
	    Core Overhead =240 + 30 * (MPS/4) + 83.54 * (MPS/4) + 4 * 83.54
	    =  ~3400ns

	    now multiply by 3 for ED+TD+DATA = 10200ns = ~128 bits or 16 bytes.

	    This is probably on the optimistic side, only using 2us for the
	    PCI_ARB_DELAY.

	If there is a USB cache hit, the time it takes for an ED or TD is:

	CORE SYNC DELAY + CACHE_HIT CHECK + 30 * (MPS/4) + CORE OVERHEAD

	240 + 30 + 120 + 1000ns ~ 1400ns , or ~ 2 bytes

        Total Host delay will be 18 bytes.

      3.The Low-Speed clock is eight times slower than full speed  i.e. 1/8th of
        the full speed.

      4.For non-periodic transfers, reserve for at least one low-speed device
        transaction per frame. According to the USB Bandwidth Analysis white
        paper and also as per OHCI Specification 1.0a, section 7.3.5, page 123,
        one low-speed transaction takes  0x628h full speed bits (197 bytes),
        which comes to around 13% of USB frame time.
              
     5. Maximum Periodic bandwidth is calculated using the following formula

        Maximum Periodic bandwidth  = Maximum bandwidth available
        - SOF - EOF -  Maximum Non Periodic bandwidth.

  o Bus Transaction Formulas

    (Refer 5.11.3 section of USB2.0 specification)

    - Full-Speed:

      Protocol overhead + ((MaxPacketSize * 7) / 6 ) + Host_Delay

    - Low-Speed:

      Protocol overhead + Hub LS overhead + 
		(Low-Speed clock  * ((MaxPacketSize * 7) / 6 )) + Host_Delay

  o Periodic Schedule

    The figure 5.5 in OHCI specification 1.0a gives you information on periodic 
    scheduling, different polling intervals that are supported, & other details 
    for the OHCI host controller.

    - The host controller processes one interrupt endpoint descriptor list every
      frame. The lower five bits of the current frame number us  used as an
      index into an array of 32 interrupt endpoint descriptor lists or periodic
      frame lists found in the HCCA (Host controller communication area). This
      means each list is revisited once every 32ms. The host controller driver
      sets up the interrupt lists to visit any given endpoint descriptor in as
      many lists as necessary to provide the interrupt granularity required for 
      that endpoint. See figure 5.5 in OHCI specification 1.0a.
        
    - Isochronous endpoint descriptors are added at the end of 1ms interrupt 
      endpoint descriptors.

    - The host controller driver maintains an array of 32 frame bandwidth lists 
      to save bandwidth allocated in each USB frame.

      Please refer section 5.2.7.2 of OHCI specification 1.0a, page 61 for more 
      details.

  o Bandwidth Allocation Scheme

    The OHCI host controller driver will go through the following steps to
    allocate bandwidth needed for  an interrupt or isochronous endpoint as
    follows

    - Calculate the bandwidth required for the given endpoint using the bus
      transaction formula and protocol overhead calculations mentioned in
      previous section.

    - Compare the bandwidth available in the least allocated frame list out of
      the 32 frame bandwidth lists, against the bandwidth required by this
      endpoint. If this exceeds the limit, then, an return error.

    - Find out the static node to which the given endpoint needs to be linked
      so that it will be polled as per the required polling interval. This value
      varies based on polling interval and current bandwidth load on this
      schedule. See figure 5.5 in OHCI specification 1.0a.

      Ex: If a polling interval is 4ms, then, the endpoint will be linked to one
          of the four static nodes (range 3-6) in the 4ms column of figure 5.5
          in OHCI specification 1.0a.

    - Depending on the polling interval, we need to add the above calculated
      bandwidth to one or more frame bandwidth lists. Before adding, we need to 
      double check the availability of bandwidth in those respective lists. If
      this exceeds the limit, then, return an error. Add this bandwidth to all
      the required frame bandwidth lists.

      Ex: Assume a give polling interval of 4 and a static node value of 3.
          In this case, we need to add required bandwidth to 0,4,8,12,16,20,24,
          28 frame bandwidth lists. 


3.High speed bus

  o Timing and data rate calculations

    - Timing calculations

      1 sec			1000 ms
      125 us			1 uframe
      1 ms			1 frame or 8  uframes

    - Data rate calculations

      125 us			7500 bytes (per uframe)
      16.66 ns			1 byte or 8 bits

      1 high speed bit time	2.083 ns

  o Protocol Overheads and Bandwidth numbers

    - Protocol Overheads

      (Refer 5.11.3, 8.4.2.2 and 8.4.2.3 sections of USB2.0 specification)

      Non Isochronous	  	917 ns			55 bytes
      Isochronous 		634 ns			38 bytes

      Start split  overhead 	 67 ns		  	 4 bytes
      Complete split  overhead 	 67 ns		  	 4 bytes

      SOF		  	200 ns			12 bytes
      EOF		       1667 ns 			70 bytes

      Host Delay*		 Specific to hardware 	18 bytes

    - Bandwidth numbers

      (Refer 5.5.4 section of USB2.0 specification)

      Maximum bandwidth available		      7500 bytes/uframe
      Maximum Non Periodic bandwidth*		      1500 bytes/uframe
      Maximum Periodic bandwidth*		      5918 bytes/uframe

      NOTE:

      1.Host delay will be specific to particular hardware. 

      2.As per USB 2.0 specification section 5.5.4, 20% of bus time is reserved
        for the non-periodic high-speed transfers, where as periodic high-speed
        transfers will get 80% of the bus time. In one micro-frame or 125us, we
        can transfer 7500 bytes or 60,000 bits. So 20% of 7500 is 1500 bytes.

      3.Maximum Periodic bandwidth is calculated using the following formula

        Maximum Periodic bandwidth  = Maximum bandwidth available
		- SOF - EOF -  Maximum Non Periodic bandwidth.

  o Bus Transaction Formulas

    (Refer 5.11.3 8.4.2.2 and 8.4.2.3 sections of USB2.0 specification) 

    - High-Speed (Non-Split transactions):

      (Protocol overhead + ((MaxPacketSize * 7) / 6 ) +
		Host_Delay) x Number of transactions per micro-frame

    - High-Speed (Split transaction - Device to Host):

      Start Split transaction:

      Protocol overhead  + Host_Delay + Start split overhead

      Complete Split transaction:

      Protocol overhead  + ((MaxPacketSize * 7) / 6 ) +
		Host_Delay + Complete split overhead

    - High-Speed (Split transaction - Host to Device):

      Start Split transaction:

      Protocol overhead + ((MaxPacketSize * 7) / 6 ) +
		Host_Delay) + Start split overhead

      Complete Split transaction:

      Protocol overhead  + Host_Delay + Complete split overhead


  o Interrupt schedule or Start and Complete split masks

    (Refer 3.6.2 & 4.12.2 sections of EHCI 1.0 specification)

    - Interrupt schedule or Start split mask

      This field  is used for for high, full and low speed usb device interrupt 
      and isochronous endpoints. This will tell the host controller which micro-
      frame of a given usb frame to initiate a high speed interrupt and
      isochronous transaction. For full/low speed devices, it will tell when to 
      initiate a "start split" transaction.

	ehci_start_split_mask[15] = /* One byte field */
	/*
	 * For all low/full speed devices, and  for  high speed devices with
	 * a polling interval greater than or equal to 8us (125us).
	 */
	{0x01,	/*  00000001 */
	0x02,	/*  00000010 */
	0x04,	/*  00000100 */
	0x08,	/*  00001000 */
	0x10,	/*  00010000 */
	0x20,	/*  00100000 */
	0x40,	/*  01000000 */
	0x80,	/*  10000000 */

	/* For high speed devices with a polling interval of 4us. */
	0x11,	/* 00010001 */
	0x22,	/* 00100010 */
	0x44,	/* 01000100 */
	0x88,	/* 10001000 */

	/* For high speed devices with a polling interval of 2us. */
	0x55,	/* 01010101 */
	0xaa,	/* 10101010 */

	/* For high speed devices with a polling interval of 1us. */
	0xff };	/* 11111111 */

    - Complete split mask

      This field is used only for full/low speed usb device interrupt and
      isochronous endpoints. It will tell the host controller which micro frame 
      to initiate a "complete split" transaction. Complete split transactions
      can to be retried for up to 3 times. So bandwidth for complete split
      transaction is reserved in 3 consecutive micro frames 

	ehci_complete_split_mask[8] = /* One byte field */
	/* Only full/low speed devices */ 
	{0x0e,	/*  00001110 */
	0x1c,	/*  00011100 */
	0x38,	/*  00111000 */
	0x70,	/*  01110000 */
	0xe0,	/*  11100000 */
	Reserved ,	/*  Need FSTN feature  */
	Reserved ,	/*  Need FSTN feature  */
	Reserved};	/*  Need FSTN feature */
      
  o Periodic Schedule

    The figure 4.8 in EHCI specification gives you information on periodic
    scheduling, different polling intervals that are supported, and other
    details for the EHCI host controller.

    - The high speed host controller can support 256, 512 or 1024 periodic frame
      lists. By default all host controllers will support 1024 frame lists. In
      our implementation, we support 1024 frame lists and we do this by first
      constructing 32 periodic frame lists and duplicating the same periodic
      frame lists for a total of 32 times. See figure 4.8 in EHCI specification.

    - The host controller traverses the periodic schedule by constructing an
      array offset reference from the PERIODICLISTBASE & the FRINDEX registers.
      It fetches the element and begins traversing the graph of linked schedule
      data structure. See fig 4.8 in EHCI specification.

    - The host controller processes one interrupt endpoint descriptor list every
      micro frame (125us). This means same list is revisited 8 times in a frame.

    - The host controller driver sets up the interrupt lists to visit any given 
      endpoint descriptor in as many lists as necessary to provide the interrupt
      granularity required for that endpoint.

    - For isochronous transfers, we use only transfer descriptors but no
      endpoint descriptors as in OHCI. Transfer descriptors are added at the
      beginning of the periodic schedule.

    - For EHCI, the bandwidth requirement is depends on the usb device speed
      i.e.

      For a high speed usb device, you only need high speed bandwidth. For a
      full/low speed device connected through a high speed hub, you need both
      high speed bandwidth and TT (transaction translator) bandwidth. 

      High speed bandwidth information is saved in an EHCI data structure and TT
      bandwidth is saved in the high speed hub's usb device data structure. Each
      TT acts as a full speed host controller & its bandwidth allocation scheme 
      overhead calculations and other details are similar to those of a full
      speed  host controller. Refer to the "Full speed bus" section for more
      details.

    - The EHCI host controller driver maintains an array of 32 frame lists to
      store high speed bandwidth allocated in each  frame and also each frame
      list has eight micro frame lists, which saves bandwidth allocated in each
      micro frame of  that particular frame.

  o Bandwidth Allocation Scheme

    (Refer 3.6.2 & 4.12.2 sections of EHCI 1.0 specification)

    High speed Non Split Transaction (for High speed devices only):

    For a given high speed interrupt or isochronous endpoint, the EHCI host
    controller driver will go through the following steps to allocate
    bandwidth needed for this endpoint.

    - Calculate the bandwidth required for given endpoint using the formula and 
      overhead calculations mentioned in previous section.

    - Compare the bandwidth available in the least allocated frame list out of
      the 32 frame lists against the bandwidth required by this endpoint. If
      this exceeds the limit, then, return an error.

    - Map a given high speed endpoint's polling interval in micro seconds to an 
      interrupt list path based on a millisecond value. For example, an endpoint
      with a polling interval of 16us will map to an interrupt list path of 2ms.

    - Find out the static node to which the given endpoint needs to be linked
      so that it will be polled at its required polling interval. This varies
      based on polling interval and current bandwidth load on this schedule.

      Ex: If a polling interval is 32us and its corresponding frame polling
          interval will be 4ms, then the endpoint will be linked to one of the
          four static nodes (range 3-6) in the 4ms column of figure 4.8 in EHCI 
          specification.

    - Depending on the polling interval, we need to add the above calculated
      bandwidth to one or more frame bandwidth lists, and also to one or more
      micro frame bandwidth lists for that particular frame bandwidth list.
      Before adding, we need to double check the availability of bandwidth in
      those respective lists. If needed bandwidth is not available, then,
      return an error. Otherwise add this bandwidth to all the required frame
      and micro frame lists.

      Ex: Assume given endpoint's polling interval is 32us and static node value
          is 3. In this case, we need to add required bandwidth to 0,4,8,12,16,
          20,24,28 frame bandwidth lists and micro bandwidth information is
          saved using ehci_start_split_masks matrix. For this example, we need
          to use any one of the 15 entries to save micro frame bandwidth.

      High speed split transactions (for full and low speed devices only):

      For a given full/low speed interrupt or isochronous endpoint, we need both
      high speed and TT bandwidths. The TT bandwidth allocation is same as full
      speed bus bandwidth allocation. Please refer to the "full speed bus"
      bandwidth allocation section for more details.

      The EHCI driver will go through the following steps to allocate high speed
      bandwidth needed for  this full/low speed endpoint.

      - Calculate the bandwidth required for a given endpoint using the formula 
        and overhead calculations mentioned in previous section. In this case,
        we need to calculate bandwidth needed both for Start and Complete start 
        transactions separately.

      - Compare the bandwidth available in the least allocated frame list out of
        32 frame lists against the bandwidth required by this endpoint. If this
        exceeds the limit, then, return an error.

      - Find out the static node to which the given endpoint needs to be linked 
        so that it will be polled as per the required polling interval. This
        value varies based on polling interval and current bandwidth load on
        this schedule.

        Ex: If a polling interval is  4ms, then the endpoint will be linked to
            one of the four static nodes (range 3-6) in the 4ms column of figure
            4.8 in EHCI specification.

      - Depending on the polling interval, we need to add the above calculated
        Start and Complete split transactions bandwidth to one or more frame
        bandwidth lists and also to one or more micro frame bandwidth lists for 
        that particular frame bandwidth list. In this case, the Start split
        transaction needs bandwidth in one micro frame, where as the Complete
        split transaction needs bandwidth in next three subsequent micro frames
        of that particular frame or next frame. Before adding, we need to double
        check the availability of bandwidth in those respective lists. If needed
        bandwidth is not available, then, return an error. Otherwise add this
        bandwidth to all the required lists. 

        Ex: Assume give polling interval is 4ms and static node value is 3. In
            this case, we need to add required Start and Complete split
            bandwidth to the 0,4,8,12,16,20,24,28  frame bandwidth lists. The
            micro frame bandwidth lists is stored using ehci_start_split_mask & 
            ehci_complete_split_mask matrices. In this case, we need to use any 
            of the first 8 entries to save micro frame bandwidth.

            Assume we found that the following micro frame bandwidth lists of  
            0,4,8,12,16,20,24,28 frame lists can be used for this endpoint.
            It means, we need to initiate "start split transaction" in first
            micro frame of 0,4,8,12,16,20,24,28 frames.

            Start split mask = 0x01,	/*  00000001 */

            For this "start split mask",  the "complete split mask" should be

	    Complete split mask = 0x0e, /*  00001110 */

	    It means try "complete split transactions" in second, third or
            fourth micro frames of 0,4,8,12,16,20,24,28 frames.
             
4.Reference

  - USB2.0, OHCI and EHCI Specifications

    http://www.usb.org/developers/docs

  - USB bandwidth analysis from Intel

    http://www.usb.org/developers/whitepapers