Skip to content

RP2040: intermittent USB CDC monitor hang on dev after #5447 under XIP pressure #5464

@rdon-key

Description

@rdon-key

With -scheduler=cores, I see an intermittent RP2040 USB CDC monitor hang that appears after:

ca584de8 machine/usb: support bidirectional endpoints by dynamic registration (#5447)

Since this is still pre-release and the regression is reproducible, I'm flagging it as a possible release blocker. The reproduction is intermittent (2/5 below), so the conclusion is "this looks like a regression from #5447," not a certainty.

I'd suggest considering a temporary revert of #5447 — or investigating it — before release, unless it can be explained quickly. I'm not trying to root-cause it here, and I'm happy to run more tests.

Method

To rule out the older RP2 CDC TX race fixed in d9d19e81 / #5391, I tested both revisions with the same usbcdc.go restored from d9d19e81:

git restore --source=d9d19e81 -- src/machine/usb/cdc/usbcdc.go

So the source difference under test is the change introduced by #5447:

594be6db + usbcdc.go@d9d19e81   (previous revision)
ca584de8 + usbcdc.go@d9d19e81   (#5447)

Environment: tinygo 0.42.0-dev-18033ebc (go1.26.1, LLVM 20.1.1). Current upstream/dev already includes both #5391 and #5447. I checked out the two historical commits above only to isolate the regression point.

Reproduction

Target: Raspberry Pi Pico / RP2040

tinygo flash -target=pico -scheduler=cores -monitor 25_min_usb_xip_atomic_use_64kb.go

The test prints many lines with println while another goroutine repeatedly reads a large const string from flash. The hang appears to require this flash/XIP pressure.

Expected final output:

test finished

On a hang, the USB CDC monitor output stops before that line.

25_min_usb_xip_atomic_use_64kb.go
//go:build tinygo && rp2040

package main

import (
        "sync/atomic"
        "time"
)

const lines = 5000
const reads = 8192
const payload = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ--usb-xip-atomic-use-min--"

const chunk256 = "" +
        "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef" +
        "fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210" +
        "00112233445566778899aabbccddeeffffeeddccbbaa99887766554433221100" +
        "rp2040xipcacheflashworkerrandomaccesstestdataAAAAAAAAAAAAAAAAAAA"

const block4k = chunk256 + chunk256 + chunk256 + chunk256 +
        chunk256 + chunk256 + chunk256 + chunk256 +
        chunk256 + chunk256 + chunk256 + chunk256 +
        chunk256 + chunk256 + chunk256 + chunk256

const flashData = block4k + block4k + block4k + block4k +
        block4k + block4k + block4k + block4k +
        block4k + block4k + block4k + block4k +
        block4k + block4k + block4k + block4k

var (
        workerDone uint32
        workerSum  uint32
)

func xipWorker() {
        x := uint32(0x12345678)
        s := uint32(0)
        for {
                v := atomic.LoadUint32(&workerDone)
                s ^= v // use the atomic value, but do not branch on it
                for i := 0; i < reads; i++ {
                        x = x*1664525 + 1013904223
                        s += uint32(flashData[int(x%uint32(len(flashData)))])
                }
                workerSum = s ^ x
        }
}

func main() {
        time.Sleep(2 * time.Second)
        println("start usb xip stall atomic-use")
        println("flashData size:", len(flashData))
        println("lines:", lines)

        go xipWorker()
        time.Sleep(10 * time.Millisecond)

        for i := 0; i < lines; i++ {
                println("line:", i, payload)
        }

        workerDone = 1
        println("worker sum:", workerSum)
        println("test finished")

        for {
        }
}

Results

594be6db + usbcdc.go@d9d19e81 :  OK OK OK OK OK   (5/5 OK)
ca584de8 + usbcdc.go@d9d19e81 :  OK NG OK OK NG   (3/5 OK, 2/5 NG)

With the same CDC TX fix on both, the previous revision did not reproduce the hang while #5447 did. The sample is small, so I can't rule out variance. I can run more iterations and report a reproduction rate if that helps the decision.

Request

Consider temporarily reverting #5447 before release, or investigating this regression, if it can't be explained quickly.

I'm glad to help with more runs, a narrower repro, or capturing state at the time of the hang.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions