#
1a85141a |
| 13-Feb-2013 |
Adrian Chadd <adrian@FreeBSD.org> |
Pull out the if_transmit() work and revert back to ath_start().
My changed had some rather significant behavioural changes to throughput. The two issues I noticed:
* With if_start and the ifnet mbu
Pull out the if_transmit() work and revert back to ath_start().
My changed had some rather significant behavioural changes to throughput. The two issues I noticed:
* With if_start and the ifnet mbuf queue, any temporary latency would get eaten up by some mbufs being queued. With ath_transmit() queuing things to ath_buf's, I'd only get 512 TX buffers before I couldn't queue any further frames.
* There's also some non-zero latency involved with TX being pushed into a taskqueue via direct dispatch. Any time the scheduler didn't immediately schedule the ath TX task would cause extra latency. Various 1ge/10ge drivers implement both direct dispatch (if the TX lock can be acquired) and deferred task transmission (if the TX lock can't be acquired), with frames being pushed into a drbd queue. I'll have to do this at some point, but until I figure out how to deal with 802.11 fragments, I'll have to wait a while longer.
So what I saw:
* lots of extra latency, specially under load - if the taskqueue wasn't immediately scheduled, things went pear shaped;
* any extra latency would result in TX ath_buf's taking their sweet time being replenished, so any further calls to ath_transmit() would drop mbufs.
* .. yes, there's no explicit backpressure here - things are just dropped. Eek.
With this, the general performance has gone up, but those subtle if_start() related race conditions are back. For some reason, this is doubly-obvious with the AR5416 NIC and I don't quite understand why yet.
There's an unrelated issue with AR5416 performance in STA mode (it's fine in AP mode when bridging frames, weirdly..) that requires a little further investigation. Specifically - it works fine on a Lenovo T40 (single core CPU) running a March 2012 9-STABLE kernel, but a Lenovo T60 (dual core) running an early November 2012 kernel behaves very poorly. The same hardware with an AR9160 or AR9280 behaves perfectly.
show more ...
|
#
d9a44755 |
| 08-Feb-2013 |
David E. O'Brien <obrien@FreeBSD.org> |
Sync with HEAD.
|
#
21bca442 |
| 07-Feb-2013 |
Adrian Chadd <adrian@FreeBSD.org> |
Methodize the process of adding the software TX queue to the taskqueue.
Move it (for now) to the TX taskqueue.
|
#
f28a5520 |
| 26-Jan-2013 |
Adrian Chadd <adrian@FreeBSD.org> |
Migrate the TX sending code out from under the ath0 taskq and into the separate ath0 TX taskq.
Whilst here, make sure that the TX software scheduler is also running out of the TX task, rather than t
Migrate the TX sending code out from under the ath0 taskq and into the separate ath0 TX taskq.
Whilst here, make sure that the TX software scheduler is also running out of the TX task, rather than the ath0 taskqueue.
Make sure that the tx taskqueue is blocked/unblocked as necessary.
This allows for a little more parallelism on multi-core machines, as well as (eventually) supporting a higher task priority for TX tasks, allowing said TX task to preempt an already running RX or TX completion task.
Tested:
* AR5416, AR9280 hostap and STA modes
show more ...
|
#
f74d878f |
| 21-Jan-2013 |
Adrian Chadd <adrian@FreeBSD.org> |
Fix this routine to acutally break out and not set clrdmask if any of the TIDs are currently marked as "filtered."
|
#
4f25ddbb |
| 21-Jan-2013 |
Adrian Chadd <adrian@FreeBSD.org> |
Migrate CLRDMASK to be a per-node flag, rather than a per-TID flag.
This is easily possible now that the TX is protected by a single lock, rather than a per-TXQ (and thus per-TID) lock.
Only set CL
Migrate CLRDMASK to be a per-node flag, rather than a per-TID flag.
This is easily possible now that the TX is protected by a single lock, rather than a per-TXQ (and thus per-TID) lock.
Only set CLRDMASK if none of the destinations are filtered. This likely will need some tuning when it comes time to do UASPD/PS-POLL TX, however at that point it should be manually set anyway.
Tested:
* AR9280, STA mode
TODO:
* More thorough testing in AP mode * test other chipsets, just to be safe/sure.
show more ...
|
#
c2217b98 |
| 17-Jan-2013 |
Neel Natu <neel@FreeBSD.org> |
IFC @ r245509
|
#
c5239edb |
| 15-Jan-2013 |
Adrian Chadd <adrian@FreeBSD.org> |
Implement frame (data) transmission using if_transmit(), rather than if_start().
This removes the overlapping data path TX from occuring, which solves quite a number of the potential TX queue races
Implement frame (data) transmission using if_transmit(), rather than if_start().
This removes the overlapping data path TX from occuring, which solves quite a number of the potential TX queue races in ath(4). It doesn't fix the net80211 layer TX queue races and it doesn't fix the raw TX path yet, but it's an important step towards this.
This hasn't dropped the TX performance in my testing; primarily because now the TX path can quickly queue frames and continue along processing.
This involves a few rather deep changes:
* Use the ath_buf as a queue placeholder for now, as we need to be able to support queuing a list of mbufs (ie, when transmitting fragments) and m_nextpkt can't be used here (because it's what is joining the fragments together)
* if_transmit() now simply allocates the ath_buf and queues it to a driver TX staging queue.
* TX is now moved into a taskqueue function.
* The TX taskqueue function now dequeues and transmits frames.
* Fragments are handled correctly here - as the current API passes the fragment list as one mbuf list (joined with m_nextpkt) through to the driver if_transmit().
* For the couple of places where ath_start() may be called (mostly from net80211 when starting the VAP up again), just reimplement it using the new enqueue and taskqueue methods.
What I don't like (about this work and the TX code in general):
* I'm using the same lock for the staging TX queue management and the actual TX. This isn't required; I'm just being slack.
* I haven't yet moved TX to a separate taskqueue (but the taskqueue is created); it's easy enough to do this later if necessary. I just need to make sure it's a higher priority queue, so TX has the same behaviour as it used to (where it would preempt existing RX..)
* I need to re-review the TX path a little more and make sure that ieee80211_node_*() functions aren't called within the TX lock. When queueing, I should just push failed frames into a queue and when I'm wrapping up the TX code, unlock the TX lock and call ieee80211_node_free() on each.
* It would be nice if I could hold the TX lock for the entire TX and TX completion, rather than this release/re-acquire behaviour. But that requires that I shuffle around the TX completion code to handle actual ath_buf free and net80211 callback/free outside of the TX lock. That's one of my next projects.
* the ic_raw_xmit() path doesn't use this yet - so it still has sequencing problems with parallel, overlapping calls to the data path. I'll fix this later.
Tested:
* Hostap - AR9280, AR9220 * STA - AR5212, AR9280, AR5416
show more ...
|
#
46b1c55d |
| 04-Jan-2013 |
Neel Natu <neel@FreeBSD.org> |
IFC @ r244983.
|
#
fc56c9c5 |
| 11-Dec-2012 |
Adrian Chadd <adrian@FreeBSD.org> |
There's no need to use a TXQ pointer here; we specifically need the hardware queue ID when queuing to EDMA descriptors.
This is a small part of trying to reduce the size of ath_buf entries.
|
#
c6499ecc |
| 04-Dec-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags in sys/dev.
|
#
32531ccb |
| 04-Dec-2012 |
Neel Natu <neel@FreeBSD.org> |
IFC @r243836
|
#
974185bb |
| 02-Dec-2012 |
Adrian Chadd <adrian@FreeBSD.org> |
Don't grab the PCU lock inside the TX lock.
|
#
375307d4 |
| 02-Dec-2012 |
Adrian Chadd <adrian@FreeBSD.org> |
Delete the per-TXQ locks and replace them with a single TX lock.
I couldn't think of a way to maintain the hardware TXQ locks _and_ layer on top of that per-TXQ software queuing and any other kind o
Delete the per-TXQ locks and replace them with a single TX lock.
I couldn't think of a way to maintain the hardware TXQ locks _and_ layer on top of that per-TXQ software queuing and any other kind of fine-grained locks (eg per-TID, or per-node locks.)
So for now, to facilitate some further code refactoring and development as part of the final push to get software queue ps-poll and u-apsd handling into this driver, just do away with them entirely.
I may eventually bring them back at some point, when it looks slightly more architectually cleaner to do so. But as it stands at the present, it's not really buying us much:
* in order to properly serialise things and not get bitten by scheduling and locking interactions with things higher up in the stack, we need to wrap the whole TX path in a long held lock. Otherwise we can end up being pre-empted during frame handling, resulting in some out of order frame handling between sequence number allocation and encryption handling (ie, the seqno and the CCMP IV get out of sequence);
* .. so whilst that's the case, holding the lock for that long means that we're acquiring and releasing the TXQ lock _inside_ that context;
* And we also acquire it per-frame during frame completion, but we currently can't hold the lock for the duration of the TX completion as we need to call net80211 layer things with the locks _unheld_ to avoid LOR.
* .. the other places were grab that lock are reset/flush, which don't happen often.
My eventual aim is to change the TX path so all rejected frame transmissions and all frame completions result in any ieee80211_free_node() calls to occur outside of the TX lock; then I can cut back on the amount of locking that goes on here.
There may be some LORs that occur when ieee80211_free_node() is called when the TX queue path fails; I'll begin to address these in follow-up commits.
show more ...
|
Revision tags: release/9.1.0 |
|
#
491e1248 |
| 28-Nov-2012 |
Adrian Chadd <adrian@FreeBSD.org> |
Until I figure out what to do here, remind myself that this needs some rate control 'adjustment' when NOACK is set.
|
#
300675f6 |
| 27-Nov-2012 |
Alexander Motin <mav@FreeBSD.org> |
MFC
|
#
e477abf7 |
| 27-Nov-2012 |
Alexander Motin <mav@FreeBSD.org> |
MFC @ r241285
|
#
32484645 |
| 17-Nov-2012 |
Neel Natu <neel@FreeBSD.org> |
IFC @ r243164
|
#
bb327d28 |
| 16-Nov-2012 |
Adrian Chadd <adrian@FreeBSD.org> |
ALQ logging enhancements:
* upon setup, tell the alq code what the chip information is. * add TX/RX path logging for legacy chips. * populate the tx/rx descriptor length fields with a best-estimate.
ALQ logging enhancements:
* upon setup, tell the alq code what the chip information is. * add TX/RX path logging for legacy chips. * populate the tx/rx descriptor length fields with a best-estimate. It's overly big (96 bytes when AH_SUPPORT_AR5416 is enabled) but it'll do for now.
Whilst I'm here, add CURVNET_RESTORE() here during probe/attach as a partial solution to fixing crashes during attach when the attach fails. There are other attach failures that I have to deal with; those'll come later.
show more ...
|
#
bbdf3df1 |
| 15-Nov-2012 |
Adrian Chadd <adrian@FreeBSD.org> |
Make sure the final descriptor in an aggregate has rate control information.
This was broken by me when merging the 802.11n aggregate descriptor chain setup with the default descriptor chain setup,
Make sure the final descriptor in an aggregate has rate control information.
This was broken by me when merging the 802.11n aggregate descriptor chain setup with the default descriptor chain setup, in preparation for supporting AR9380 NICs.
The corner case here is quite specific - if you queue an aggregate frame with >1 frames in it, and the last subframe has only one descriptor making it up, then that descriptor won't have the rate control information copied into it. Look at what happens inside ar5416FillTxDesc() if both firstSeg and lastSeg are set to 1.
Then when ar5416ProcTxDesc() goes to fill out ts_rate based on the transmit index, it looks at the rate control fields in that descriptor and dutifully sets it to be 0.
It doesn't happen for non-aggregate frames - if they have one descriptor, the first descriptor already has rate control info.
I removed the call to ath_hal_setuplasttxdesc() when I migrated the code to use the "new" style aggregate chain routines from the HAL. But I missed this particular corner case.
This is a bit inefficient with MIPS boards as it involves a few redundant writes into non-cachable memory. I'll chase that up when it matters.
Tested:
* AR9280 STA mode, TCP iperf traffic * Rui Paulo <rpaulo@> first reported this and has verified it on his AR9160 based AP.
PR: kern/173636
show more ...
|
#
7d3d462b |
| 13-Nov-2012 |
Neel Natu <neel@FreeBSD.org> |
IFC @ r242940
|
#
7d9dd2ac |
| 13-Nov-2012 |
Adrian Chadd <adrian@FreeBSD.org> |
Add some debugging to try and catch an invalid TX rate (0x0) that is being reported.
|
#
58c82ec4 |
| 11-Nov-2012 |
Adrian Chadd <adrian@FreeBSD.org> |
Remove this; i incorrectly committed the wrong (debug) changes in my previous commit.
|
#
a10c6f55 |
| 11-Nov-2012 |
Neel Natu <neel@FreeBSD.org> |
IFC @ r242684
|
#
04cdca73 |
| 11-Nov-2012 |
Adrian Chadd <adrian@FreeBSD.org> |
Don't call av_set_tim() if it's NULL.
This happens during a scan in STA mode; any queued data frames will be power save queued but as there's no TIM in STA mode, it panics.
This was introduced by m
Don't call av_set_tim() if it's NULL.
This happens during a scan in STA mode; any queued data frames will be power save queued but as there's no TIM in STA mode, it panics.
This was introduced by me when I disabled my driver-aware power save handling support.
show more ...
|