Scheduling Related Rx Latency Affects Slave Response Timing

  In addition to the [FIFO related rx latency issue](https://github.com/lin-bus/linux-lin/issues/13), there is another issue which affects slave response timing on some devices. This issue is related to the way the Linux tty subsystem schedules the handling of rx data received from serial drivers as it's handed off to the line discipline. This is especially noticeable on resource constrained devices, such as single core devices. When serial drivers receive an rx interrupt, they push the data to the line discipline via [tty_flip_buffer_push()](https://github.com/torvalds/linux/blob/d0ca0df179c4b21e2a6c4a4fb637aa8fa14575cb/drivers/tty/serial/imx.c#L918), which in turn [queue's](https://github.com/torvalds/linux/blob/d0ca0df179c4b21e2a6c4a4fb637aa8fa14575cb/drivers/tty/tty_buffer.c#L533) work to [flush the data](https://github.com/torvalds/linux/blob/d0ca0df179c4b21e2a6c4a4fb637aa8fa14575cb/drivers/tty/tty_buffer.c#L586) to the line discipline for consumption via a unbound workqueue. On systems with limited CPU resources, or with heavy CPU utilization, this can introduce additional latency, sometimes on the order of 10's of milliseconds. This is more than enough to exceed the [LIN specifications](https://www.lin-cia.org/fileadmin/microsites/lin-cia.org/resources/documents/LIN-Spec_Pac2_1.pdf#page=31) LIN frame timing requirements, which states:

> The maximum space between the bytes is additional 40% duration compared to the
nominal transmission time

## Testing and Reproducing

  This can easily be reproduced by limiting a system to a single core and subjecting it to high CPU utilization. To test this, I've use an RPI 4b, running the latest Bookworm RaspiOS with a 6.17 kernel  [patched](https://github.com/torvalds/linux/compare/master...kbader94:linux:FC_6.17-rc7) to eliminate the FIFO Rx latency by disabling the FIFO. A physical loopback was setup and the Return Time Trip of a single byte transmission was timed with 3 different test cases: 

1.  Control - All 4 cores enabled with normal CPU utilization.
2.  Only 1 core enabled with normal CPU utilization.
3.  Only 1 core enabled with heavy CPU utilization.

For test cases 2 & 3, I disabled the other cores by setting maxcpus=1 in cmdline.txt and setting the CPU frequency to 600mhz in config.txt

For test case 3 stress-ng was also run.

_Note: The 'Return Time Trip' test was performed at 19200 bps, which should yield a nominal transmission time of 520 microseconds. Anything above this is additional latency introduced by the system._ 

---

#### Test Case 1 - 4 Cores Enabled, Normal CPU Utilization

<img width="898" height="820" alt="Image" src="https://github.com/user-attachments/assets/d62932b1-5e6d-4e1d-8d32-7e0dd3c8478d" />

Average RTT was 572 microseconds, approximately 52 microseconds of actual latency.

---

#### Test Case 2 - 1 Core Enabled, Normal CPU Utilization

<img width="898" height="820" alt="Image" src="https://github.com/user-attachments/assets/7037735d-46ca-4088-9e00-b043d9ab936c" />

Average RTT was 1318 microseconds. Average actual latency is 798 microseconds. Some LIN slave responses may be dropped with such latency.

---

#### Test Case 3 - 1 Core Enabled, Heavy CPU Utilization 

<img width="898" height="820" alt="Image" src="https://github.com/user-attachments/assets/50f18c4c-8136-4985-9b51-2e01274f1817" />

Average RTT was 6924 microseconds. Average actual latency is 6404 microseconds. Many LIN slave responses may be dropped with such latency. 

---

## Workarounds

As noted by @trnila in [this PR comment](https://github.com/lin-bus/linux-lin/pull/29#issuecomment-3330833616)

> setting cpuidle.off=1 or writing 1 to /sys/devices/system/cpu/cpu0/cpuidle/state*/disable disables idle states and reduces the delay. 

## TTY Low Latency History & Kernel Related Discussions

Once upon a time, the tty subsystem had a [low_latency flag](https://github.com/torvalds/linux/blob/6e4664525b1db28f8c4e1130957f70a94c19213e/drivers/tty/tty_buffer.c#L520), which allowed serial drivers to push data directly to the line discipline without using a workqueue. This was [deprecated](https://gitlab.eclipse.org/eclipse/oniro-core/linux/-/commit/18b258a37ee54cab6d0fc33f70b3c9d0ecf2dfdb) due to unsafe implementation within certain line disciplines, which might sleep or hold locks despite the fact the serial drivers typically call tty_flip_buffer_push from within IRQ context. There was a proposal to use [a dedicated kthread](https://www.spinics.net/lists/linux-serial/msg17782.html). It was at one point rejected, but similar approaches have been [subsequently suggested](https://lore.kernel.org/lkml/20190110101232.9398-1-o.rempel@pengutronix.de/). That last one received a response from Linus, who suggested a [dedicated high priority workqueue](https://lore.kernel.org/lkml/CAHk-=whSbmZm8aZyoR9XjoLzotbXdxNuUpcxUTBo7svS77R6+A@mail.gmail.com/) or [avoiding the workqueue altogether](https://lore.kernel.org/lkml/CAHk-=wjOMvFBxmBSBZPZEwSyU96L5V8Qhi6bXLqzV4Sdsb2HAg@mail.gmail.com/). Finally, there was [an RFC in 2022 ](https://lwn.net/ml/linux-kernel/20220323145600.2156689-1-linux@rasmusvillemoes.dk/) which proposed modifying the workqueue for real-time use specifically for low latency tty rx. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scheduling Related Rx Latency Affects Slave Response Timing #30

Testing and Reproducing

Test Case 1 - 4 Cores Enabled, Normal CPU Utilization

Test Case 2 - 1 Core Enabled, Normal CPU Utilization

Test Case 3 - 1 Core Enabled, Heavy CPU Utilization

Workarounds

TTY Low Latency History & Kernel Related Discussions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scheduling Related Rx Latency Affects Slave Response Timing #30

Description

Testing and Reproducing

Test Case 1 - 4 Cores Enabled, Normal CPU Utilization

Test Case 2 - 1 Core Enabled, Normal CPU Utilization

Test Case 3 - 1 Core Enabled, Heavy CPU Utilization

Workarounds

TTY Low Latency History & Kernel Related Discussions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions