History log of /freebsd/sys/dev/ath/if_ath_tx.c (Results 126 – 150 of 272)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1a85141a 13-Feb-2013 Adrian Chadd <adrian@FreeBSD.org>

Pull out the if_transmit() work and revert back to ath_start().

My changed had some rather significant behavioural changes to throughput.
The two issues I noticed:

* With if_start and the ifnet mbu

Pull out the if_transmit() work and revert back to ath_start().

My changed had some rather significant behavioural changes to throughput.
The two issues I noticed:

* With if_start and the ifnet mbuf queue, any temporary latency
would get eaten up by some mbufs being queued. With ath_transmit()
queuing things to ath_buf's, I'd only get 512 TX buffers before I
couldn't queue any further frames.

* There's also some non-zero latency involved with TX being pushed
into a taskqueue via direct dispatch. Any time the scheduler didn't
immediately schedule the ath TX task would cause extra latency.
Various 1ge/10ge drivers implement both direct dispatch (if the TX
lock can be acquired) and deferred task transmission (if the TX lock
can't be acquired), with frames being pushed into a drbd queue.
I'll have to do this at some point, but until I figure out how to
deal with 802.11 fragments, I'll have to wait a while longer.

So what I saw:

* lots of extra latency, specially under load - if the taskqueue
wasn't immediately scheduled, things went pear shaped;

* any extra latency would result in TX ath_buf's taking their sweet time
being replenished, so any further calls to ath_transmit() would drop
mbufs.

* .. yes, there's no explicit backpressure here - things are just dropped.
Eek.

With this, the general performance has gone up, but those subtle if_start()
related race conditions are back. For some reason, this is doubly-obvious
with the AR5416 NIC and I don't quite understand why yet.

There's an unrelated issue with AR5416 performance in STA mode (it's
fine in AP mode when bridging frames, weirdly..) that requires a little
further investigation. Specifically - it works fine on a Lenovo T40
(single core CPU) running a March 2012 9-STABLE kernel, but a Lenovo T60
(dual core) running an early November 2012 kernel behaves very poorly.
The same hardware with an AR9160 or AR9280 behaves perfectly.

show more ...


# d9a44755 08-Feb-2013 David E. O'Brien <obrien@FreeBSD.org>

Sync with HEAD.


# 21bca442 07-Feb-2013 Adrian Chadd <adrian@FreeBSD.org>

Methodize the process of adding the software TX queue to the taskqueue.

Move it (for now) to the TX taskqueue.


# f28a5520 26-Jan-2013 Adrian Chadd <adrian@FreeBSD.org>

Migrate the TX sending code out from under the ath0 taskq and into
the separate ath0 TX taskq.

Whilst here, make sure that the TX software scheduler is also
running out of the TX task, rather than t

Migrate the TX sending code out from under the ath0 taskq and into
the separate ath0 TX taskq.

Whilst here, make sure that the TX software scheduler is also
running out of the TX task, rather than the ath0 taskqueue.

Make sure that the tx taskqueue is blocked/unblocked as necessary.

This allows for a little more parallelism on multi-core machines,
as well as (eventually) supporting a higher task priority for TX
tasks, allowing said TX task to preempt an already running RX or
TX completion task.

Tested:

* AR5416, AR9280 hostap and STA modes

show more ...


# f74d878f 21-Jan-2013 Adrian Chadd <adrian@FreeBSD.org>

Fix this routine to acutally break out and not set clrdmask if any
of the TIDs are currently marked as "filtered."


# 4f25ddbb 21-Jan-2013 Adrian Chadd <adrian@FreeBSD.org>

Migrate CLRDMASK to be a per-node flag, rather than a per-TID flag.

This is easily possible now that the TX is protected by a single
lock, rather than a per-TXQ (and thus per-TID) lock.

Only set CL

Migrate CLRDMASK to be a per-node flag, rather than a per-TID flag.

This is easily possible now that the TX is protected by a single
lock, rather than a per-TXQ (and thus per-TID) lock.

Only set CLRDMASK if none of the destinations are filtered.
This likely will need some tuning when it comes time to do UASPD/PS-POLL
TX, however at that point it should be manually set anyway.

Tested:

* AR9280, STA mode

TODO:

* More thorough testing in AP mode
* test other chipsets, just to be safe/sure.

show more ...


# c2217b98 17-Jan-2013 Neel Natu <neel@FreeBSD.org>

IFC @ r245509


# c5239edb 15-Jan-2013 Adrian Chadd <adrian@FreeBSD.org>

Implement frame (data) transmission using if_transmit(), rather than
if_start().

This removes the overlapping data path TX from occuring, which
solves quite a number of the potential TX queue races

Implement frame (data) transmission using if_transmit(), rather than
if_start().

This removes the overlapping data path TX from occuring, which
solves quite a number of the potential TX queue races in ath(4).
It doesn't fix the net80211 layer TX queue races and it doesn't
fix the raw TX path yet, but it's an important step towards this.

This hasn't dropped the TX performance in my testing; primarily
because now the TX path can quickly queue frames and continue
along processing.

This involves a few rather deep changes:

* Use the ath_buf as a queue placeholder for now, as we need to be
able to support queuing a list of mbufs (ie, when transmitting
fragments) and m_nextpkt can't be used here (because it's what is
joining the fragments together)

* if_transmit() now simply allocates the ath_buf and queues it to
a driver TX staging queue.

* TX is now moved into a taskqueue function.

* The TX taskqueue function now dequeues and transmits frames.

* Fragments are handled correctly here - as the current API passes
the fragment list as one mbuf list (joined with m_nextpkt) through
to the driver if_transmit().

* For the couple of places where ath_start() may be called (mostly
from net80211 when starting the VAP up again), just reimplement
it using the new enqueue and taskqueue methods.

What I don't like (about this work and the TX code in general):

* I'm using the same lock for the staging TX queue management and the
actual TX. This isn't required; I'm just being slack.

* I haven't yet moved TX to a separate taskqueue (but the taskqueue is
created); it's easy enough to do this later if necessary. I just need
to make sure it's a higher priority queue, so TX has the same
behaviour as it used to (where it would preempt existing RX..)

* I need to re-review the TX path a little more and make sure that
ieee80211_node_*() functions aren't called within the TX lock.
When queueing, I should just push failed frames into a queue and
when I'm wrapping up the TX code, unlock the TX lock and
call ieee80211_node_free() on each.

* It would be nice if I could hold the TX lock for the entire
TX and TX completion, rather than this release/re-acquire behaviour.
But that requires that I shuffle around the TX completion code
to handle actual ath_buf free and net80211 callback/free outside
of the TX lock. That's one of my next projects.

* the ic_raw_xmit() path doesn't use this yet - so it still has
sequencing problems with parallel, overlapping calls to the
data path. I'll fix this later.

Tested:

* Hostap - AR9280, AR9220
* STA - AR5212, AR9280, AR5416

show more ...


# 46b1c55d 04-Jan-2013 Neel Natu <neel@FreeBSD.org>

IFC @ r244983.


# fc56c9c5 11-Dec-2012 Adrian Chadd <adrian@FreeBSD.org>

There's no need to use a TXQ pointer here; we specifically need the
hardware queue ID when queuing to EDMA descriptors.

This is a small part of trying to reduce the size of ath_buf entries.


# c6499ecc 04-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.


# 32531ccb 04-Dec-2012 Neel Natu <neel@FreeBSD.org>

IFC @r243836


# 974185bb 02-Dec-2012 Adrian Chadd <adrian@FreeBSD.org>

Don't grab the PCU lock inside the TX lock.


# 375307d4 02-Dec-2012 Adrian Chadd <adrian@FreeBSD.org>

Delete the per-TXQ locks and replace them with a single TX lock.

I couldn't think of a way to maintain the hardware TXQ locks _and_ layer
on top of that per-TXQ software queuing and any other kind o

Delete the per-TXQ locks and replace them with a single TX lock.

I couldn't think of a way to maintain the hardware TXQ locks _and_ layer
on top of that per-TXQ software queuing and any other kind of fine-grained
locks (eg per-TID, or per-node locks.)

So for now, to facilitate some further code refactoring and development
as part of the final push to get software queue ps-poll and u-apsd handling
into this driver, just do away with them entirely.

I may eventually bring them back at some point, when it looks slightly more
architectually cleaner to do so. But as it stands at the present, it's
not really buying us much:

* in order to properly serialise things and not get bitten by scheduling
and locking interactions with things higher up in the stack, we need to
wrap the whole TX path in a long held lock. Otherwise we can end up
being pre-empted during frame handling, resulting in some out of order
frame handling between sequence number allocation and encryption handling
(ie, the seqno and the CCMP IV get out of sequence);

* .. so whilst that's the case, holding the lock for that long means that
we're acquiring and releasing the TXQ lock _inside_ that context;

* And we also acquire it per-frame during frame completion, but we currently
can't hold the lock for the duration of the TX completion as we need
to call net80211 layer things with the locks _unheld_ to avoid LOR.

* .. the other places were grab that lock are reset/flush, which don't happen
often.

My eventual aim is to change the TX path so all rejected frame transmissions
and all frame completions result in any ieee80211_free_node() calls to occur
outside of the TX lock; then I can cut back on the amount of locking that
goes on here.

There may be some LORs that occur when ieee80211_free_node() is called when
the TX queue path fails; I'll begin to address these in follow-up commits.

show more ...


Revision tags: release/9.1.0
# 491e1248 28-Nov-2012 Adrian Chadd <adrian@FreeBSD.org>

Until I figure out what to do here, remind myself that this needs some
rate control 'adjustment' when NOACK is set.


# 300675f6 27-Nov-2012 Alexander Motin <mav@FreeBSD.org>

MFC


# e477abf7 27-Nov-2012 Alexander Motin <mav@FreeBSD.org>

MFC @ r241285


# 32484645 17-Nov-2012 Neel Natu <neel@FreeBSD.org>

IFC @ r243164


# bb327d28 16-Nov-2012 Adrian Chadd <adrian@FreeBSD.org>

ALQ logging enhancements:

* upon setup, tell the alq code what the chip information is.
* add TX/RX path logging for legacy chips.
* populate the tx/rx descriptor length fields with a best-estimate.

ALQ logging enhancements:

* upon setup, tell the alq code what the chip information is.
* add TX/RX path logging for legacy chips.
* populate the tx/rx descriptor length fields with a best-estimate.
It's overly big (96 bytes when AH_SUPPORT_AR5416 is enabled)
but it'll do for now.

Whilst I'm here, add CURVNET_RESTORE() here during probe/attach as a
partial solution to fixing crashes during attach when the attach fails.
There are other attach failures that I have to deal with; those'll come
later.

show more ...


# bbdf3df1 15-Nov-2012 Adrian Chadd <adrian@FreeBSD.org>

Make sure the final descriptor in an aggregate has rate control information.

This was broken by me when merging the 802.11n aggregate descriptor chain
setup with the default descriptor chain setup,

Make sure the final descriptor in an aggregate has rate control information.

This was broken by me when merging the 802.11n aggregate descriptor chain
setup with the default descriptor chain setup, in preparation for supporting
AR9380 NICs.

The corner case here is quite specific - if you queue an aggregate frame
with >1 frames in it, and the last subframe has only one descriptor making
it up, then that descriptor won't have the rate control information
copied into it. Look at what happens inside ar5416FillTxDesc() if
both firstSeg and lastSeg are set to 1.

Then when ar5416ProcTxDesc() goes to fill out ts_rate based on the
transmit index, it looks at the rate control fields in that descriptor
and dutifully sets it to be 0.

It doesn't happen for non-aggregate frames - if they have one descriptor,
the first descriptor already has rate control info.

I removed the call to ath_hal_setuplasttxdesc() when I migrated the
code to use the "new" style aggregate chain routines from the HAL.
But I missed this particular corner case.

This is a bit inefficient with MIPS boards as it involves a few redundant
writes into non-cachable memory. I'll chase that up when it matters.

Tested:

* AR9280 STA mode, TCP iperf traffic
* Rui Paulo <rpaulo@> first reported this and has verified it on
his AR9160 based AP.

PR: kern/173636

show more ...


# 7d3d462b 13-Nov-2012 Neel Natu <neel@FreeBSD.org>

IFC @ r242940


# 7d9dd2ac 13-Nov-2012 Adrian Chadd <adrian@FreeBSD.org>

Add some debugging to try and catch an invalid TX rate (0x0) that is
being reported.


# 58c82ec4 11-Nov-2012 Adrian Chadd <adrian@FreeBSD.org>

Remove this; i incorrectly committed the wrong (debug) changes in my
previous commit.


# a10c6f55 11-Nov-2012 Neel Natu <neel@FreeBSD.org>

IFC @ r242684


# 04cdca73 11-Nov-2012 Adrian Chadd <adrian@FreeBSD.org>

Don't call av_set_tim() if it's NULL.

This happens during a scan in STA mode; any queued data frames will
be power save queued but as there's no TIM in STA mode, it panics.

This was introduced by m

Don't call av_set_tim() if it's NULL.

This happens during a scan in STA mode; any queued data frames will
be power save queued but as there's no TIM in STA mode, it panics.

This was introduced by me when I disabled my driver-aware power save
handling support.

show more ...


1234567891011