/freebsd/sys/dev/rtwn/usb/ |
H A D | rtwn_usb_tx.h | diff d99eb8230eb717ab0b2eba948614d0f2f2b5dd2b Tue Nov 12 05:48:12 CET 2024 Adrian Chadd <adrian@FreeBSD.org> rtwn: change the USB TX transfers to only do one pending transfer per endpoint
I found I was getting constant device timeouts when doing anything more complicated than a single SSH on laptop with RTL8811AU.
After digging into it, i found a variety of fun situations, including traffic stalls that would recover w/ a shorter (1 second) USB transfer timeout. However, the big one is a straight up hang of any TX endpoint until the NIC was reset. The RX side kept going just fine; only the TX endpoints would hang.
Reproducing it was easy - just start up a couple of traffic streams on different WME AC's - eg a best effort + bulk transfer, like browsing the web and doing an ssh clone - throw in a ping -i 0.1 to your gateway, and it would very quickly hit device timeouts every couple of seconds.
I put everything into a single TX EP and the hangs went away. Well, mostly.
So after some MORE digging, I found that this driver isn't checking if the transfers are going into the correct EPs for the packet WME access category / 802.11 TID; and would frequently be able to schedule multiple transfers into the same endpoint.
Then there's a second problem - there's an array of endpoints used for setting up the USB device, with .endpoint = UE_ADDR_ANY, however they're also being setup with the same endpoint configured in multiple transfer configs. Eg, a NIC with 3 or 4 bulk TX endpoints will configure the BK and BE endpoints with the same physical endpoint ID. This also leads to timed out transfers.
My /guess/ was that the firmware isn't happy with one or both of the above, and so I solved both.
* drop the USB transfer timeout to 1 second, not 5 seconds - that way we'll either get a 1 second traffic pause and USB transfer failure, or a 5 second device timeout. Having both the TX timeout and the USB transfer timeout made recovery from a USB transfer timeout (without a NIC reset) almost impossible.
* enforce one transfer per endpoint; * separate pending/active buffer tracking per endpoint; * each endpoint now has its own TX callback to make sure the queue / end point ID is known; * and only frames from a given endpoint pending queue is going into the active queue and into that endpoint. * Finally, create a local wme2qid array and populate it with the endpoint mapping that ensures unique physical endpoint use.
Locally tested:
* rtl8812AU, 11n STA mode * rtl8192EU, 11n STA mode (with diffs to fix the channel config / power timeouts.)
Differential Revision: https://reviews.freebsd.org/D47522
|
H A D | rtwn_usb_tx.c | diff d99eb8230eb717ab0b2eba948614d0f2f2b5dd2b Tue Nov 12 05:48:12 CET 2024 Adrian Chadd <adrian@FreeBSD.org> rtwn: change the USB TX transfers to only do one pending transfer per endpoint
I found I was getting constant device timeouts when doing anything more complicated than a single SSH on laptop with RTL8811AU.
After digging into it, i found a variety of fun situations, including traffic stalls that would recover w/ a shorter (1 second) USB transfer timeout. However, the big one is a straight up hang of any TX endpoint until the NIC was reset. The RX side kept going just fine; only the TX endpoints would hang.
Reproducing it was easy - just start up a couple of traffic streams on different WME AC's - eg a best effort + bulk transfer, like browsing the web and doing an ssh clone - throw in a ping -i 0.1 to your gateway, and it would very quickly hit device timeouts every couple of seconds.
I put everything into a single TX EP and the hangs went away. Well, mostly.
So after some MORE digging, I found that this driver isn't checking if the transfers are going into the correct EPs for the packet WME access category / 802.11 TID; and would frequently be able to schedule multiple transfers into the same endpoint.
Then there's a second problem - there's an array of endpoints used for setting up the USB device, with .endpoint = UE_ADDR_ANY, however they're also being setup with the same endpoint configured in multiple transfer configs. Eg, a NIC with 3 or 4 bulk TX endpoints will configure the BK and BE endpoints with the same physical endpoint ID. This also leads to timed out transfers.
My /guess/ was that the firmware isn't happy with one or both of the above, and so I solved both.
* drop the USB transfer timeout to 1 second, not 5 seconds - that way we'll either get a 1 second traffic pause and USB transfer failure, or a 5 second device timeout. Having both the TX timeout and the USB transfer timeout made recovery from a USB transfer timeout (without a NIC reset) almost impossible.
* enforce one transfer per endpoint; * separate pending/active buffer tracking per endpoint; * each endpoint now has its own TX callback to make sure the queue / end point ID is known; * and only frames from a given endpoint pending queue is going into the active queue and into that endpoint. * Finally, create a local wme2qid array and populate it with the endpoint mapping that ensures unique physical endpoint use.
Locally tested:
* rtl8812AU, 11n STA mode * rtl8192EU, 11n STA mode (with diffs to fix the channel config / power timeouts.)
Differential Revision: https://reviews.freebsd.org/D47522
|
H A D | rtwn_usb_var.h | diff d99eb8230eb717ab0b2eba948614d0f2f2b5dd2b Tue Nov 12 05:48:12 CET 2024 Adrian Chadd <adrian@FreeBSD.org> rtwn: change the USB TX transfers to only do one pending transfer per endpoint
I found I was getting constant device timeouts when doing anything more complicated than a single SSH on laptop with RTL8811AU.
After digging into it, i found a variety of fun situations, including traffic stalls that would recover w/ a shorter (1 second) USB transfer timeout. However, the big one is a straight up hang of any TX endpoint until the NIC was reset. The RX side kept going just fine; only the TX endpoints would hang.
Reproducing it was easy - just start up a couple of traffic streams on different WME AC's - eg a best effort + bulk transfer, like browsing the web and doing an ssh clone - throw in a ping -i 0.1 to your gateway, and it would very quickly hit device timeouts every couple of seconds.
I put everything into a single TX EP and the hangs went away. Well, mostly.
So after some MORE digging, I found that this driver isn't checking if the transfers are going into the correct EPs for the packet WME access category / 802.11 TID; and would frequently be able to schedule multiple transfers into the same endpoint.
Then there's a second problem - there's an array of endpoints used for setting up the USB device, with .endpoint = UE_ADDR_ANY, however they're also being setup with the same endpoint configured in multiple transfer configs. Eg, a NIC with 3 or 4 bulk TX endpoints will configure the BK and BE endpoints with the same physical endpoint ID. This also leads to timed out transfers.
My /guess/ was that the firmware isn't happy with one or both of the above, and so I solved both.
* drop the USB transfer timeout to 1 second, not 5 seconds - that way we'll either get a 1 second traffic pause and USB transfer failure, or a 5 second device timeout. Having both the TX timeout and the USB transfer timeout made recovery from a USB transfer timeout (without a NIC reset) almost impossible.
* enforce one transfer per endpoint; * separate pending/active buffer tracking per endpoint; * each endpoint now has its own TX callback to make sure the queue / end point ID is known; * and only frames from a given endpoint pending queue is going into the active queue and into that endpoint. * Finally, create a local wme2qid array and populate it with the endpoint mapping that ensures unique physical endpoint use.
Locally tested:
* rtl8812AU, 11n STA mode * rtl8192EU, 11n STA mode (with diffs to fix the channel config / power timeouts.)
Differential Revision: https://reviews.freebsd.org/D47522
|
H A D | rtwn_usb_ep.c | diff d99eb8230eb717ab0b2eba948614d0f2f2b5dd2b Tue Nov 12 05:48:12 CET 2024 Adrian Chadd <adrian@FreeBSD.org> rtwn: change the USB TX transfers to only do one pending transfer per endpoint
I found I was getting constant device timeouts when doing anything more complicated than a single SSH on laptop with RTL8811AU.
After digging into it, i found a variety of fun situations, including traffic stalls that would recover w/ a shorter (1 second) USB transfer timeout. However, the big one is a straight up hang of any TX endpoint until the NIC was reset. The RX side kept going just fine; only the TX endpoints would hang.
Reproducing it was easy - just start up a couple of traffic streams on different WME AC's - eg a best effort + bulk transfer, like browsing the web and doing an ssh clone - throw in a ping -i 0.1 to your gateway, and it would very quickly hit device timeouts every couple of seconds.
I put everything into a single TX EP and the hangs went away. Well, mostly.
So after some MORE digging, I found that this driver isn't checking if the transfers are going into the correct EPs for the packet WME access category / 802.11 TID; and would frequently be able to schedule multiple transfers into the same endpoint.
Then there's a second problem - there's an array of endpoints used for setting up the USB device, with .endpoint = UE_ADDR_ANY, however they're also being setup with the same endpoint configured in multiple transfer configs. Eg, a NIC with 3 or 4 bulk TX endpoints will configure the BK and BE endpoints with the same physical endpoint ID. This also leads to timed out transfers.
My /guess/ was that the firmware isn't happy with one or both of the above, and so I solved both.
* drop the USB transfer timeout to 1 second, not 5 seconds - that way we'll either get a 1 second traffic pause and USB transfer failure, or a 5 second device timeout. Having both the TX timeout and the USB transfer timeout made recovery from a USB transfer timeout (without a NIC reset) almost impossible.
* enforce one transfer per endpoint; * separate pending/active buffer tracking per endpoint; * each endpoint now has its own TX callback to make sure the queue / end point ID is known; * and only frames from a given endpoint pending queue is going into the active queue and into that endpoint. * Finally, create a local wme2qid array and populate it with the endpoint mapping that ensures unique physical endpoint use.
Locally tested:
* rtl8812AU, 11n STA mode * rtl8192EU, 11n STA mode (with diffs to fix the channel config / power timeouts.)
Differential Revision: https://reviews.freebsd.org/D47522
|
H A D | rtwn_usb_attach.c | diff d99eb8230eb717ab0b2eba948614d0f2f2b5dd2b Tue Nov 12 05:48:12 CET 2024 Adrian Chadd <adrian@FreeBSD.org> rtwn: change the USB TX transfers to only do one pending transfer per endpoint
I found I was getting constant device timeouts when doing anything more complicated than a single SSH on laptop with RTL8811AU.
After digging into it, i found a variety of fun situations, including traffic stalls that would recover w/ a shorter (1 second) USB transfer timeout. However, the big one is a straight up hang of any TX endpoint until the NIC was reset. The RX side kept going just fine; only the TX endpoints would hang.
Reproducing it was easy - just start up a couple of traffic streams on different WME AC's - eg a best effort + bulk transfer, like browsing the web and doing an ssh clone - throw in a ping -i 0.1 to your gateway, and it would very quickly hit device timeouts every couple of seconds.
I put everything into a single TX EP and the hangs went away. Well, mostly.
So after some MORE digging, I found that this driver isn't checking if the transfers are going into the correct EPs for the packet WME access category / 802.11 TID; and would frequently be able to schedule multiple transfers into the same endpoint.
Then there's a second problem - there's an array of endpoints used for setting up the USB device, with .endpoint = UE_ADDR_ANY, however they're also being setup with the same endpoint configured in multiple transfer configs. Eg, a NIC with 3 or 4 bulk TX endpoints will configure the BK and BE endpoints with the same physical endpoint ID. This also leads to timed out transfers.
My /guess/ was that the firmware isn't happy with one or both of the above, and so I solved both.
* drop the USB transfer timeout to 1 second, not 5 seconds - that way we'll either get a 1 second traffic pause and USB transfer failure, or a 5 second device timeout. Having both the TX timeout and the USB transfer timeout made recovery from a USB transfer timeout (without a NIC reset) almost impossible.
* enforce one transfer per endpoint; * separate pending/active buffer tracking per endpoint; * each endpoint now has its own TX callback to make sure the queue / end point ID is known; * and only frames from a given endpoint pending queue is going into the active queue and into that endpoint. * Finally, create a local wme2qid array and populate it with the endpoint mapping that ensures unique physical endpoint use.
Locally tested:
* rtl8812AU, 11n STA mode * rtl8192EU, 11n STA mode (with diffs to fix the channel config / power timeouts.)
Differential Revision: https://reviews.freebsd.org/D47522
|
/freebsd/sys/dev/rtwn/ |
H A D | if_rtwnvar.h | diff d99eb8230eb717ab0b2eba948614d0f2f2b5dd2b Tue Nov 12 05:48:12 CET 2024 Adrian Chadd <adrian@FreeBSD.org> rtwn: change the USB TX transfers to only do one pending transfer per endpoint
I found I was getting constant device timeouts when doing anything more complicated than a single SSH on laptop with RTL8811AU.
After digging into it, i found a variety of fun situations, including traffic stalls that would recover w/ a shorter (1 second) USB transfer timeout. However, the big one is a straight up hang of any TX endpoint until the NIC was reset. The RX side kept going just fine; only the TX endpoints would hang.
Reproducing it was easy - just start up a couple of traffic streams on different WME AC's - eg a best effort + bulk transfer, like browsing the web and doing an ssh clone - throw in a ping -i 0.1 to your gateway, and it would very quickly hit device timeouts every couple of seconds.
I put everything into a single TX EP and the hangs went away. Well, mostly.
So after some MORE digging, I found that this driver isn't checking if the transfers are going into the correct EPs for the packet WME access category / 802.11 TID; and would frequently be able to schedule multiple transfers into the same endpoint.
Then there's a second problem - there's an array of endpoints used for setting up the USB device, with .endpoint = UE_ADDR_ANY, however they're also being setup with the same endpoint configured in multiple transfer configs. Eg, a NIC with 3 or 4 bulk TX endpoints will configure the BK and BE endpoints with the same physical endpoint ID. This also leads to timed out transfers.
My /guess/ was that the firmware isn't happy with one or both of the above, and so I solved both.
* drop the USB transfer timeout to 1 second, not 5 seconds - that way we'll either get a 1 second traffic pause and USB transfer failure, or a 5 second device timeout. Having both the TX timeout and the USB transfer timeout made recovery from a USB transfer timeout (without a NIC reset) almost impossible.
* enforce one transfer per endpoint; * separate pending/active buffer tracking per endpoint; * each endpoint now has its own TX callback to make sure the queue / end point ID is known; * and only frames from a given endpoint pending queue is going into the active queue and into that endpoint. * Finally, create a local wme2qid array and populate it with the endpoint mapping that ensures unique physical endpoint use.
Locally tested:
* rtl8812AU, 11n STA mode * rtl8192EU, 11n STA mode (with diffs to fix the channel config / power timeouts.)
Differential Revision: https://reviews.freebsd.org/D47522
|