Skip to content

Conversation

@HiFiPhile
Copy link
Collaborator

@HiFiPhile HiFiPhile commented Apr 11, 2025

Describe the PR
Fix queued xfer of previous failed enumeration mess with next enumeration.

[1:1] Get Descriptor: 80 06 03 03 09 04 00 01 
on EP 00 with 8 bytes: OK
on EP 80 with 40 bytes: OK
[1:1] Control data:
  0000:  28 03 4D 00 61 00 73 00 73 00 20 00 53 00 74 00  |(.M.a.s.s. .S.t.|
  0010:  6F 00 72 00 61 00 67 00 65 00 20 00 44 00 65 00  |o.r.a.g.e. .D.e.|
  0020:  76 00 69 00 63 00 65 00                          |v.i.c.e.|
[1:0:0] USBH DEVICE REMOVED
[1:0:0] unplugged address = 1
Device removed, address = 1
[1:] USBH Device Attach
High Speed
[1:0] Open EP0 with Size = 8
Get 8 byte of Device Descriptor
[1:0] Get Descriptor: 80 06 00 01 00 00 08 00 
on EP 00 with 8 bytes: OK                                  <------- Missed by one
on EP 00 with 8 bytes: OK
[1:0] Control data:
  0000:  12 01 00 02 00 00 00 40                          |.......@|
on EP 80 with 8 bytes: OK

Set Address = 1
[1:0] Set Address: 00 05 01 00 00 00 00 00 
on EP 00 with 0 bytes: OK
on EP 00 with 8 bytes: OK

[1:1] Open EP0 with Size = 64
Get Device Descriptor
[1:1] Get Descriptor: 80 06 00 01 00 00 12 00 
on EP 80 with 8 bytes: OK
hcd_edpt_xfer 655: ASSERT FAILED

@maximevince
Copy link
Contributor

maximevince commented Apr 14, 2025

While I can see the issue described here on my side as well, it seems to me this is not the proper fix, but rather a workaround - with its own side-effects.
Why wasn't this xfer cleared? And what if is currently ongoing?

There seem to be some race conditions between handling some stuff in the interrupt handler, and deferring some stuff to the tuh_task(). This is probably the root of (some of) these problems...

@HiFiPhile
Copy link
Collaborator Author

it seems to me this is not the proper fix, but rather a workaround - with its own side-effects.

Could you detail more, what you mean sie-effects ?

@hathach
Copy link
Owner

hathach commented Apr 18, 2025

my bad, I think we cannot simply clear data here. We should mark endpoint for removal then disable channel first, within the halted interrupt check the flag and clear the endpoint.

@HiFiPhile
Copy link
Collaborator Author

Ah yes we should close the channel properly, especially when a hub is used.

Signed-off-by: HiFiPhile <[email protected]>
@HiFiPhile
Copy link
Collaborator Author

I've changed it use channel_disable(), I think there is no need to wait until the channel is disabled ?

@hathach
Copy link
Owner

hathach commented Apr 29, 2025

I've changed it use channel_disable(), I think there is no need to wait until the channel is disabled ?

I am on vacation, will try to check this out afterwards (next week or so).

Copilot AI review requested due to automatic review settings November 18, 2025 03:18
Copilot finished reviewing on behalf of hathach November 18, 2025 03:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where queued transfers from a failed USB device enumeration interfere with subsequent enumeration attempts by implementing a cleanup mechanism for pending transfers when a device is closed.

Key Changes:

  • Added a closing flag to the hcd_xfer_t structure to mark transfers being cleaned up
  • Modified hcd_device_close() to mark associated transfers as closing and disable their channels
  • Updated all four channel interrupt handlers (IN/OUT for both slave and DMA modes) to check the closing flag and deallocate channels without sending events

Comment on lines 463 to 476
tu_memclr(edpt, sizeof(hcd_endpoint_t));
for (uint8_t ch_id = 0; ch_id < (uint8_t) DWC2_CHANNEL_COUNT_MAX; ch_id++) {
hcd_xfer_t* xfer = &_hcd_data.xfer[ch_id];
if (xfer->allocated && xfer->ep_id == i) {
dwc2_channel_t* channel = &dwc2->channel[ch_id];
dwc2_channel_split_t hcsplt = {.value = channel->hcsplt};
xfer->closing = 1;
// Channel disable must not be programmed for non-split periodic channels
if (!channel_is_periodic(channel->hcchar) || hcsplt.split_en) {
channel_disable(dwc2, channel);
}
}
}
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint is cleared before marking associated channels as closing, which creates a race condition. If a channel interrupt occurs after the endpoint is cleared (line 464) but before the transfer is marked as closing (line 470), the interrupt handler will access the cleared endpoint data via edpt = &_hcd_data.edpt[xfer->ep_id] (used in all channel handlers).

Recommendation: Move the tu_memclr(edpt, sizeof(hcd_endpoint_t)) call to after the channel cleanup loop, or disable interrupts for this channel before clearing the endpoint.

Copilot uses AI. Check for mistakes.
Comment on lines 471 to 474
// Channel disable must not be programmed for non-split periodic channels
if (!channel_is_periodic(channel->hcchar) || hcsplt.split_en) {
channel_disable(dwc2, channel);
}
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-split periodic channels, the closing flag is set (line 470) but channel_disable() is not called (line 472-474). The closing flag is only checked within if (hcint & HCINT_HALTED) blocks in the interrupt handlers (lines 962-966, 1027-1031, 1149-1154, 1216-1221). If a non-split periodic channel doesn't naturally halt after the device is removed, it may never be properly cleaned up.

Consider whether non-split periodic channels need additional handling to ensure they eventually halt and get deallocated, or document why they will naturally halt in this scenario.

Suggested change
// Channel disable must not be programmed for non-split periodic channels
if (!channel_is_periodic(channel->hcchar) || hcsplt.split_en) {
channel_disable(dwc2, channel);
}
// Always disable the channel to ensure proper cleanup, even for non-split periodic channels
channel_disable(dwc2, channel);

Copilot uses AI. Check for mistakes.
Comment on lines 962 to 969
} else if (xfer->closing) {
// channel is closing, de-allocate channel
channel_dealloc(dwc2, ch_id);
// don't send event
is_done = false;
} else {
// got here due to NAK or NYET
channel_xfer_in_retry(dwc2, ch_id, hcint);
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In slave mode, the closing flag check uses else if (lines 962-966), which means it won't execute if the transfer has a valid result (line 957-958) or reached max errors (line 959-961). This could cause events to be sent for closing transfers.

In contrast, the DMA mode handlers (lines 1149-1154, 1216-1221) use a separate if statement that correctly overrides is_done to false regardless of the transfer result.

The slave mode handlers should use a separate if statement (not else if) for the closing check, similar to the DMA handlers, to ensure events are not sent for closing transfers.

Suggested change
} else if (xfer->closing) {
// channel is closing, de-allocate channel
channel_dealloc(dwc2, ch_id);
// don't send event
is_done = false;
} else {
// got here due to NAK or NYET
channel_xfer_in_retry(dwc2, ch_id, hcint);
} else {
// got here due to NAK or NYET
channel_xfer_in_retry(dwc2, ch_id, hcint);
}
// Always check closing after other conditions, to override is_done if needed
if (xfer->closing) {
// channel is closing, de-allocate channel
channel_dealloc(dwc2, ch_id);
// don't send event
is_done = false;

Copilot uses AI. Check for mistakes.
Comment on lines 1027 to 1034
} else if (xfer->closing) {
// channel is closing, de-allocate channel
channel_dealloc(dwc2, ch_id);
// don't send event
is_done = false;
} else {
// Got here due to NAK or NYET
TU_ASSERT(channel_xfer_start(dwc2, ch_id));
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the issue in handle_channel_in_slave, the closing flag check uses else if (lines 1027-1031), which means it won't execute if the transfer has a valid result (line 1022-1023) or reached max errors (line 1024-1026). This could cause events to be sent for closing transfers.

The DMA mode handlers (lines 1149-1154, 1216-1221) correctly use a separate if statement that overrides is_done to false.

Change this to a separate if statement (not else if) to match the DMA handler behavior and ensure events are not sent for closing transfers.

Suggested change
} else if (xfer->closing) {
// channel is closing, de-allocate channel
channel_dealloc(dwc2, ch_id);
// don't send event
is_done = false;
} else {
// Got here due to NAK or NYET
TU_ASSERT(channel_xfer_start(dwc2, ch_id));
} else {
// Got here due to NAK or NYET
TU_ASSERT(channel_xfer_start(dwc2, ch_id));
}
if (xfer->closing) {
// channel is closing, de-allocate channel
channel_dealloc(dwc2, ch_id);
// don't send event
is_done = false;

Copilot uses AI. Check for mistakes.
@HiFiPhile
Copy link
Collaborator Author

Copilot sounds reasonable, I need to bring up the memory ...

Copy link
Owner

@hathach hathach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I making some changes to PR as well, will take this chance to implement the hcd_edpt_close() as well

dwc2_channel_t* channel = &dwc2->channel[ch_id];
dwc2_channel_split_t hcsplt = {.value = channel->hcsplt};
xfer->closing = 1;
// Channel disable must not be programmed for non-split periodic channels
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only true for DMA mode, for slave mode, we still need to disable it I guess

Image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right.

@hathach
Copy link
Owner

hathach commented Nov 18, 2025

@HiFiPhile I make more changes to PR, also add closing flag for endpoint to de-alloc() it when chanel is halted with closing flags. Also put the check into chanel_disable(). Let me know if these make sense to you.

@HiFiPhile
Copy link
Collaborator Author

@hathach It looks good but I don't remember some details. Tried plugging in/out a bad contact USB stick on H7RS and no assert is hit.

@hathach
Copy link
Owner

hathach commented Nov 19, 2025

@HiFiPhile why skipping these examples for h7rs, I see it compile just fine with minsize.

@HiFiPhile
Copy link
Collaborator Author

@hathach With LOG=2 IAR debug builds needs > 70k for these examples, cdc_msc_hid_freertos needs 66k even on MinRel build.

@hathach
Copy link
Owner

hathach commented Nov 19, 2025

@hathach With LOG=2 IAR debug builds needs > 70k for these examples, cdc_msc_hid_freertos needs 66k even on MinRel build.

as long as minsize with LOG=0 compile, we can leave it enabled since that is what ci build, also user can test with those example as well. LOG > 1 include lots of constant file/func/line for assert and increase the footprint qutie a bit.

@HiFiPhile
Copy link
Collaborator Author

I've just dropped that commit, maybe later we can enhance skip/only mechanism for debug variant.

Copy link
Owner

@hathach hathach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect, thank you very much

@hathach hathach merged commit 15ed920 into hathach:master Nov 19, 2025
146 of 147 checks passed
@HiFiPhile HiFiPhile deleted the xfer_close branch November 19, 2025 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants