Skip to content

Scheduling Related Rx Latency Affects Slave Response Timing #30

@kbader94

Description

@kbader94

In addition to the FIFO related rx latency issue, there is another issue which affects slave response timing on some devices. This issue is related to the way the Linux tty subsystem schedules the handling of rx data received from serial drivers as it's handed off to the line discipline. This is especially noticeable on resource constrained devices, such as single core devices. When serial drivers receive an rx interrupt, they push the data to the line discipline via tty_flip_buffer_push(), which in turn queue's work to flush the data to the line discipline for consumption via a unbound workqueue. On systems with limited CPU resources, or with heavy CPU utilization, this can introduce additional latency, sometimes on the order of 10's of milliseconds. This is more than enough to exceed the LIN specifications LIN frame timing requirements, which states:

The maximum space between the bytes is additional 40% duration compared to the
nominal transmission time

Testing and Reproducing

This can easily be reproduced by limiting a system to a single core and subjecting it to high CPU utilization. To test this, I've use an RPI 4b, running the latest Bookworm RaspiOS with a 6.17 kernel patched to eliminate the FIFO Rx latency by disabling the FIFO. A physical loopback was setup and the Return Time Trip of a single byte transmission was timed with 3 different test cases:

  1. Control - All 4 cores enabled with normal CPU utilization.
  2. Only 1 core enabled with normal CPU utilization.
  3. Only 1 core enabled with heavy CPU utilization.

For test cases 2 & 3, I disabled the other cores by setting maxcpus=1 in cmdline.txt and setting the CPU frequency to 600mhz in config.txt

For test case 3 stress-ng was also run.

Note: The 'Return Time Trip' test was performed at 19200 bps, which should yield a nominal transmission time of 520 microseconds. Anything above this is additional latency introduced by the system.


Test Case 1 - 4 Cores Enabled, Normal CPU Utilization

Image

Average RTT was 572 microseconds, approximately 52 microseconds of actual latency.


Test Case 2 - 1 Core Enabled, Normal CPU Utilization

Image

Average RTT was 1318 microseconds. Average actual latency is 798 microseconds. Some LIN slave responses may be dropped with such latency.


Test Case 3 - 1 Core Enabled, Heavy CPU Utilization

Image

Average RTT was 6924 microseconds. Average actual latency is 6404 microseconds. Many LIN slave responses may be dropped with such latency.


Workarounds

As noted by @trnila in this PR comment

setting cpuidle.off=1 or writing 1 to /sys/devices/system/cpu/cpu0/cpuidle/state*/disable disables idle states and reduces the delay.

TTY Low Latency History & Kernel Related Discussions

Once upon a time, the tty subsystem had a low_latency flag, which allowed serial drivers to push data directly to the line discipline without using a workqueue. This was deprecated due to unsafe implementation within certain line disciplines, which might sleep or hold locks despite the fact the serial drivers typically call tty_flip_buffer_push from within IRQ context. There was a proposal to use a dedicated kthread. It was at one point rejected, but similar approaches have been subsequently suggested. That last one received a response from Linus, who suggested a dedicated high priority workqueue or avoiding the workqueue altogether. Finally, there was an RFC in 2022 which proposed modifying the workqueue for real-time use specifically for low latency tty rx.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions