xref: /linux/Documentation/networking/device_drivers/ethernet/toshiba/spider_net.rst (revision 4b4193256c8d3bc3a5397b5cd9494c2ad386317d)
1*132db935SJakub Kicinski.. SPDX-License-Identifier: GPL-2.0
2*132db935SJakub Kicinski
3*132db935SJakub Kicinski===========================
4*132db935SJakub KicinskiThe Spidernet Device Driver
5*132db935SJakub Kicinski===========================
6*132db935SJakub Kicinski
7*132db935SJakub KicinskiWritten by Linas Vepstas <linas@austin.ibm.com>
8*132db935SJakub Kicinski
9*132db935SJakub KicinskiVersion of 7 June 2007
10*132db935SJakub Kicinski
11*132db935SJakub KicinskiAbstract
12*132db935SJakub Kicinski========
13*132db935SJakub KicinskiThis document sketches the structure of portions of the spidernet
14*132db935SJakub Kicinskidevice driver in the Linux kernel tree. The spidernet is a gigabit
15*132db935SJakub Kicinskiethernet device built into the Toshiba southbridge commonly used
16*132db935SJakub Kicinskiin the SONY Playstation 3 and the IBM QS20 Cell blade.
17*132db935SJakub Kicinski
18*132db935SJakub KicinskiThe Structure of the RX Ring.
19*132db935SJakub Kicinski=============================
20*132db935SJakub KicinskiThe receive (RX) ring is a circular linked list of RX descriptors,
21*132db935SJakub Kicinskitogether with three pointers into the ring that are used to manage its
22*132db935SJakub Kicinskicontents.
23*132db935SJakub Kicinski
24*132db935SJakub KicinskiThe elements of the ring are called "descriptors" or "descrs"; they
25*132db935SJakub Kicinskidescribe the received data. This includes a pointer to a buffer
26*132db935SJakub Kicinskicontaining the received data, the buffer size, and various status bits.
27*132db935SJakub Kicinski
28*132db935SJakub KicinskiThere are three primary states that a descriptor can be in: "empty",
29*132db935SJakub Kicinski"full" and "not-in-use".  An "empty" or "ready" descriptor is ready
30*132db935SJakub Kicinskito receive data from the hardware. A "full" descriptor has data in it,
31*132db935SJakub Kicinskiand is waiting to be emptied and processed by the OS. A "not-in-use"
32*132db935SJakub Kicinskidescriptor is neither empty or full; it is simply not ready. It may
33*132db935SJakub Kicinskinot even have a data buffer in it, or is otherwise unusable.
34*132db935SJakub Kicinski
35*132db935SJakub KicinskiDuring normal operation, on device startup, the OS (specifically, the
36*132db935SJakub Kicinskispidernet device driver) allocates a set of RX descriptors and RX
37*132db935SJakub Kicinskibuffers. These are all marked "empty", ready to receive data. This
38*132db935SJakub Kicinskiring is handed off to the hardware, which sequentially fills in the
39*132db935SJakub Kicinskibuffers, and marks them "full". The OS follows up, taking the full
40*132db935SJakub Kicinskibuffers, processing them, and re-marking them empty.
41*132db935SJakub Kicinski
42*132db935SJakub KicinskiThis filling and emptying is managed by three pointers, the "head"
43*132db935SJakub Kicinskiand "tail" pointers, managed by the OS, and a hardware current
44*132db935SJakub Kicinskidescriptor pointer (GDACTDPA). The GDACTDPA points at the descr
45*132db935SJakub Kicinskicurrently being filled. When this descr is filled, the hardware
46*132db935SJakub Kicinskimarks it full, and advances the GDACTDPA by one.  Thus, when there is
47*132db935SJakub Kicinskiflowing RX traffic, every descr behind it should be marked "full",
48*132db935SJakub Kicinskiand everything in front of it should be "empty".  If the hardware
49*132db935SJakub Kicinskidiscovers that the current descr is not empty, it will signal an
50*132db935SJakub Kicinskiinterrupt, and halt processing.
51*132db935SJakub Kicinski
52*132db935SJakub KicinskiThe tail pointer tails or trails the hardware pointer. When the
53*132db935SJakub Kicinskihardware is ahead, the tail pointer will be pointing at a "full"
54*132db935SJakub Kicinskidescr. The OS will process this descr, and then mark it "not-in-use",
55*132db935SJakub Kicinskiand advance the tail pointer.  Thus, when there is flowing RX traffic,
56*132db935SJakub Kicinskiall of the descrs in front of the tail pointer should be "full", and
57*132db935SJakub Kicinskiall of those behind it should be "not-in-use". When RX traffic is not
58*132db935SJakub Kicinskiflowing, then the tail pointer can catch up to the hardware pointer.
59*132db935SJakub KicinskiThe OS will then note that the current tail is "empty", and halt
60*132db935SJakub Kicinskiprocessing.
61*132db935SJakub Kicinski
62*132db935SJakub KicinskiThe head pointer (somewhat mis-named) follows after the tail pointer.
63*132db935SJakub KicinskiWhen traffic is flowing, then the head pointer will be pointing at
64*132db935SJakub Kicinskia "not-in-use" descr. The OS will perform various housekeeping duties
65*132db935SJakub Kicinskion this descr. This includes allocating a new data buffer and
66*132db935SJakub Kicinskidma-mapping it so as to make it visible to the hardware. The OS will
67*132db935SJakub Kicinskithen mark the descr as "empty", ready to receive data. Thus, when there
68*132db935SJakub Kicinskiis flowing RX traffic, everything in front of the head pointer should
69*132db935SJakub Kicinskibe "not-in-use", and everything behind it should be "empty". If no
70*132db935SJakub KicinskiRX traffic is flowing, then the head pointer can catch up to the tail
71*132db935SJakub Kicinskipointer, at which point the OS will notice that the head descr is
72*132db935SJakub Kicinski"empty", and it will halt processing.
73*132db935SJakub Kicinski
74*132db935SJakub KicinskiThus, in an idle system, the GDACTDPA, tail and head pointers will
75*132db935SJakub Kicinskiall be pointing at the same descr, which should be "empty". All of the
76*132db935SJakub Kicinskiother descrs in the ring should be "empty" as well.
77*132db935SJakub Kicinski
78*132db935SJakub KicinskiThe show_rx_chain() routine will print out the locations of the
79*132db935SJakub KicinskiGDACTDPA, tail and head pointers. It will also summarize the contents
80*132db935SJakub Kicinskiof the ring, starting at the tail pointer, and listing the status
81*132db935SJakub Kicinskiof the descrs that follow.
82*132db935SJakub Kicinski
83*132db935SJakub KicinskiA typical example of the output, for a nearly idle system, might be::
84*132db935SJakub Kicinski
85*132db935SJakub Kicinski    net eth1: Total number of descrs=256
86*132db935SJakub Kicinski    net eth1: Chain tail located at descr=20
87*132db935SJakub Kicinski    net eth1: Chain head is at 20
88*132db935SJakub Kicinski    net eth1: HW curr desc (GDACTDPA) is at 21
89*132db935SJakub Kicinski    net eth1: Have 1 descrs with stat=x40800101
90*132db935SJakub Kicinski    net eth1: HW next desc (GDACNEXTDA) is at 22
91*132db935SJakub Kicinski    net eth1: Last 255 descrs with stat=xa0800000
92*132db935SJakub Kicinski
93*132db935SJakub KicinskiIn the above, the hardware has filled in one descr, number 20. Both
94*132db935SJakub Kicinskihead and tail are pointing at 20, because it has not yet been emptied.
95*132db935SJakub KicinskiMeanwhile, hw is pointing at 21, which is free.
96*132db935SJakub Kicinski
97*132db935SJakub KicinskiThe "Have nnn decrs" refers to the descr starting at the tail: in this
98*132db935SJakub Kicinskicase, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
99*132db935SJakub Kicinskito all of the rest of the descrs, from the last status change. The "nnn"
100*132db935SJakub Kicinskiis a count of how many descrs have exactly the same status.
101*132db935SJakub Kicinski
102*132db935SJakub KicinskiThe status x4... corresponds to "full" and status xa... corresponds
103*132db935SJakub Kicinskito "empty". The actual value printed is RXCOMST_A.
104*132db935SJakub Kicinski
105*132db935SJakub KicinskiIn the device driver source code, a different set of names are
106*132db935SJakub Kicinskiused for these same concepts, so that::
107*132db935SJakub Kicinski
108*132db935SJakub Kicinski    "empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
109*132db935SJakub Kicinski    "full"  == SPIDER_NET_DESCR_FRAME_END == 0x4
110*132db935SJakub Kicinski    "not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
111*132db935SJakub Kicinski
112*132db935SJakub Kicinski
113*132db935SJakub KicinskiThe RX RAM full bug/feature
114*132db935SJakub Kicinski===========================
115*132db935SJakub Kicinski
116*132db935SJakub KicinskiAs long as the OS can empty out the RX buffers at a rate faster than
117*132db935SJakub Kicinskithe hardware can fill them, there is no problem. If, for some reason,
118*132db935SJakub Kicinskithe OS fails to empty the RX ring fast enough, the hardware GDACTDPA
119*132db935SJakub Kicinskipointer will catch up to the head, notice the not-empty condition,
120*132db935SJakub Kicinskiad stop. However, RX packets may still continue arriving on the wire.
121*132db935SJakub KicinskiThe spidernet chip can save some limited number of these in local RAM.
122*132db935SJakub KicinskiWhen this local ram fills up, the spider chip will issue an interrupt
123*132db935SJakub Kicinskiindicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
124*132db935SJakub Kicinskiwill be set in GHIINT1STS).  When the RX ram full condition occurs,
125*132db935SJakub Kicinskia certain bug/feature is triggered that has to be specially handled.
126*132db935SJakub KicinskiThis section describes the special handling for this condition.
127*132db935SJakub Kicinski
128*132db935SJakub KicinskiWhen the OS finally has a chance to run, it will empty out the RX ring.
129*132db935SJakub KicinskiIn particular, it will clear the descriptor on which the hardware had
130*132db935SJakub Kicinskistopped. However, once the hardware has decided that a certain
131*132db935SJakub Kicinskidescriptor is invalid, it will not restart at that descriptor; instead
132*132db935SJakub Kicinskiit will restart at the next descr. This potentially will lead to a
133*132db935SJakub Kicinskideadlock condition, as the tail pointer will be pointing at this descr,
134*132db935SJakub Kicinskiwhich, from the OS point of view, is empty; the OS will be waiting for
135*132db935SJakub Kicinskithis descr to be filled. However, the hardware has skipped this descr,
136*132db935SJakub Kicinskiand is filling the next descrs. Since the OS doesn't see this, there
137*132db935SJakub Kicinskiis a potential deadlock, with the OS waiting for one descr to fill,
138*132db935SJakub Kicinskiwhile the hardware is waiting for a different set of descrs to become
139*132db935SJakub Kicinskiempty.
140*132db935SJakub Kicinski
141*132db935SJakub KicinskiA call to show_rx_chain() at this point indicates the nature of the
142*132db935SJakub Kicinskiproblem. A typical print when the network is hung shows the following::
143*132db935SJakub Kicinski
144*132db935SJakub Kicinski    net eth1: Spider RX RAM full, incoming packets might be discarded!
145*132db935SJakub Kicinski    net eth1: Total number of descrs=256
146*132db935SJakub Kicinski    net eth1: Chain tail located at descr=255
147*132db935SJakub Kicinski    net eth1: Chain head is at 255
148*132db935SJakub Kicinski    net eth1: HW curr desc (GDACTDPA) is at 0
149*132db935SJakub Kicinski    net eth1: Have 1 descrs with stat=xa0800000
150*132db935SJakub Kicinski    net eth1: HW next desc (GDACNEXTDA) is at 1
151*132db935SJakub Kicinski    net eth1: Have 127 descrs with stat=x40800101
152*132db935SJakub Kicinski    net eth1: Have 1 descrs with stat=x40800001
153*132db935SJakub Kicinski    net eth1: Have 126 descrs with stat=x40800101
154*132db935SJakub Kicinski    net eth1: Last 1 descrs with stat=xa0800000
155*132db935SJakub Kicinski
156*132db935SJakub KicinskiBoth the tail and head pointers are pointing at descr 255, which is
157*132db935SJakub Kicinskimarked xa... which is "empty". Thus, from the OS point of view, there
158*132db935SJakub Kicinskiis nothing to be done. In particular, there is the implicit assumption
159*132db935SJakub Kicinskithat everything in front of the "empty" descr must surely also be empty,
160*132db935SJakub Kicinskias explained in the last section. The OS is waiting for descr 255 to
161*132db935SJakub Kicinskibecome non-empty, which, in this case, will never happen.
162*132db935SJakub Kicinski
163*132db935SJakub KicinskiThe HW pointer is at descr 0. This descr is marked 0x4.. or "full".
164*132db935SJakub KicinskiSince its already full, the hardware can do nothing more, and thus has
165*132db935SJakub Kicinskihalted processing. Notice that descrs 0 through 254 are all marked
166*132db935SJakub Kicinski"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
167*132db935SJakub Kicinskidescr 254, since tail was at 255.) Thus, the system is deadlocked,
168*132db935SJakub Kicinskiand there can be no forward progress; the OS thinks there's nothing
169*132db935SJakub Kicinskito do, and the hardware has nowhere to put incoming data.
170*132db935SJakub Kicinski
171*132db935SJakub KicinskiThis bug/feature is worked around with the spider_net_resync_head_ptr()
172*132db935SJakub Kicinskiroutine. When the driver receives RX interrupts, but an examination
173*132db935SJakub Kicinskiof the RX chain seems to show it is empty, then it is probable that
174*132db935SJakub Kicinskithe hardware has skipped a descr or two (sometimes dozens under heavy
175*132db935SJakub Kicinskinetwork conditions). The spider_net_resync_head_ptr() subroutine will
176*132db935SJakub Kicinskisearch the ring for the next full descr, and the driver will resume
177*132db935SJakub Kicinskioperations there.  Since this will leave "holes" in the ring, there
178*132db935SJakub Kicinskiis also a spider_net_resync_tail_ptr() that will skip over such holes.
179*132db935SJakub Kicinski
180*132db935SJakub KicinskiAs of this writing, the spider_net_resync() strategy seems to work very
181*132db935SJakub Kicinskiwell, even under heavy network loads.
182*132db935SJakub Kicinski
183*132db935SJakub Kicinski
184*132db935SJakub KicinskiThe TX ring
185*132db935SJakub Kicinski===========
186*132db935SJakub KicinskiThe TX ring uses a low-watermark interrupt scheme to make sure that
187*132db935SJakub Kicinskithe TX queue is appropriately serviced for large packet sizes.
188*132db935SJakub Kicinski
189*132db935SJakub KicinskiFor packet sizes greater than about 1KBytes, the kernel can fill
190*132db935SJakub Kicinskithe TX ring quicker than the device can drain it. Once the ring
191*132db935SJakub Kicinskiis full, the netdev is stopped. When there is room in the ring,
192*132db935SJakub Kicinskithe netdev needs to be reawakened, so that more TX packets are placed
193*132db935SJakub Kicinskiin the ring. The hardware can empty the ring about four times per jiffy,
194*132db935SJakub Kicinskiso its not appropriate to wait for the poll routine to refill, since
195*132db935SJakub Kicinskithe poll routine runs only once per jiffy.  The low-watermark mechanism
196*132db935SJakub Kicinskimarks a descr about 1/4th of the way from the bottom of the queue, so
197*132db935SJakub Kicinskithat an interrupt is generated when the descr is processed. This
198*132db935SJakub Kicinskiinterrupt wakes up the netdev, which can then refill the queue.
199*132db935SJakub KicinskiFor large packets, this mechanism generates a relatively small number
200*132db935SJakub Kicinskiof interrupts, about 1K/sec. For smaller packets, this will drop to zero
201*132db935SJakub Kicinskiinterrupts, as the hardware can empty the queue faster than the kernel
202*132db935SJakub Kicinskican fill it.
203