Skip to content

Conversation

trnila
Copy link
Contributor

@trnila trnila commented Sep 21, 2025

LIN slave receives the LIN header and must react quickly to send the response back. Some drivers are having fixed FIFO thresholds that makes sllin unusable, as characters are buffered for too long and the response is sent too late. There is no generic serial API in the Linux kernel to configure low level UART FIFO.

This PR introduces rx_trigger in struct uart_ops in linux kernel as discussed in the comment (#27), #13 or mailing list.

  • rx_trigger sets, queries or rounds up/down the low level RX trigger condition based on:
    • number of characters in FIFO
    • idle time expressed in nanoseconds

Idle time is currently expressed in nanoseconds, as initially proposed. When the baudrate changes, the configuration must be updated; however, the new values may not be representable in hardware. Should the driver automatically round up/down to the nearest supported value? If so, the ROUND API modes might be unnecessary.
The same issue may occur if the API specifies the trigger in terms of characters while the hardware only supports time-based idle timeouts.

The sllin uses this rx_trigger API to configure FIFO to 1 character when using serial-based tty.

  • configuration is optional and may fail just with warning
  • configuration is compiled only if UART_RX_TRIGGER_MODE_SET is defined in kernel headers

Test setup

The test setup uses the tested device configured as the sllin slave, while the master on the other side requests frame 0x14 from it. The tested device must respond in time, and the master must correctly receive the response:

slave$ insmod ./sllin.ko master=0
slave$ ldattach 28 /dev/ttyXXXXX
slave$ ip link set sllin0 up
slave$ cansend sllin0 000000d4#1122  # configure response to [0x11, 0x22] for frame 0x14  

I have briefly tested Raspberry Pi, STM32MP1 and Beaglebone black and the patch wasn't needed as master received the response correctly, but I will check the timings and hardware capabilities in more details.

imx6

sllin in slave mode on latest kernel v6.17-rc6 (commit b236920) doesn't react in a time and the late response is not accepted by master:
before

The FIFO is set to 8 bytes with AGTIM enabled that triggers interrupt once 8 bytes are received or 8 characters idleness.

The patch implements rx_trigger API in drivers/tty/serial/imx.c:

  • rx_trigger_bytes in range 1..31
  • rx_trigger_idle_time not supported, as theres fixed AGTIM to 8 characters only and already used

.. and the master accepts the response as the FIFO is configured to 1, but there is still some delay:
after

LIN slave must react immediately when a 3-byte header is received and
cannot wait for the RX FIFO to fill or for an idle timeout, which
would be driver-dependent.
@kbader94
Copy link
Contributor

kbader94 commented Sep 21, 2025

Is this just for imx6 devices? Universal support for changing the FIFO levels requires changes to the underlying drivers for nearly every device which supports programmable FIFO levels at the chipset level.

I've been working on a universal solution, you can see commits
here

My patch series support 16550A, 16650, 16750, 16950 & PL011(RPI),
As well as changes to support common chipsets such as CH38x and AX99100.

Perhaps we can integrate your patch into my patch series for wider support. I'm not familiar with imx6 and don't have a device to test.

I'm not sure how your patch would work with RPI. The PL011 doesn't support FIFO levels per se, BUT disabling the FIFO altogether effectively gives us one byte trigger levels. This is why in my patch series I expose a separate interface for disabling the FIFO.

@kbader94
Copy link
Contributor

kbader94 commented Sep 21, 2025

Also 2ms still seems like quite the delay. You should be able to get a response which looks like this:

IMG_20250828_214128212

This was on an AX99100 which originally looked like

IMG_20250828_214111856

What device were you using for this?

@trnila
Copy link
Contributor Author

trnila commented Sep 21, 2025

@kbader94 Patch (sllin/linux-patches/0001-serial-add-rx_trigger-API-to-uart_ops.patch) adds generic function rx_trigger to uart_ops so any driver can implement this function including amba-pl011 etc. This rx_trigger is then called from line discipline sllin driver to configure FIFO directly on its own.
The second patch just implements this API to add support on imx processors.

@kbader94
Copy link
Contributor

@kbader94 Patch (sllin/linux-patches/0001-serial-add-rx_trigger-API-to-uart_ops.patch) adds generic function rx_trigger to uart_ops so any driver can implement this function including amba-pl011 etc. This rx_trigger is then called from line discipline sllin driver to configure FIFO directly on its own. The second patch just implements this API to add support on imx processors.

I see. If I implement your proposed changes into my patch series would you be willing to test? My patch series does not currently support imx, so this would definitely be a welcome addition. Did you find setting the idle time necessary? In my testing I found that setting the FIFO levels and/or disabling the FIFO was sufficient.

@trnila
Copy link
Contributor Author

trnila commented Sep 21, 2025

Of course - I can test it. I am having following hardware available:

  • imx6
  • Raspberry Pi
  • STM32MP1
  • BeagleBone black

I have briefly tested other mentioned devices and had no issue receiving response on master side without any patches. But I haven't checked delay between header and response yet and maybe it was just in big tolerance on master side.
I want to find where 2 ms delay is coming from on imx first.

The API is implemented as @ppisa proposed, still few things should be discussed further:

  • one rx_trigger function or split to two functions setting RX FIFO threshold and idle timeout?
  • idle timeout is expressed in nanoseconds or number of idle characters?

@kbader94
Copy link
Contributor

kbader94 commented Sep 21, 2025

@trnila I don't believe there is a timeout for the sllin master, however real world LIN masters would have a timeout. IIRC it's about 1.4x the nominal response transmission time, so on a single byte frame this would be far less than 2ms. At 19200 bps its somewhere around 0.8ms. 2ms is consistent with the default timeout for most UARTs , which typically timeout after 4 byte spaces of idle activity. In other words it looks like the FIFO level is not being set to a single byte level, and the driver is returning the data based on the default timeout instead.

I had originally based my patch series on @ppisa proposal, however, I don't believe setting the idle timeout is necessary as long as we can actually set the FIFO level to a single byte. The trigger level interrupt will always occur before an idle timeout. Not every chipset supports single byte FIFO rx trigger levels, however, every chipset I've analyzed does support disabling the FIFO, sometimes referred to as '16450 mode', which provides a single byte rx trigger level. Most devices I've analyzed, especially in the 8250 class of drivers, don't support programmable idle timeouts anyways.

For the record I'm not completely satisfied with the API I've chosen yet, which is why I haven't yet completely integrated it into the sllin driver or sent it to the kernel mailing list, but it does work across a broad number of devices that I've tested. I'm certainly open to feedback and further testing would definitely be appreciated as well!

I'll have a better look at integrating your proposal for imx devices later today. If you feel up to it you're also more than welcome to checkout my patch series. I'd be more than willing to lend a hand in integrating it into the patch series if you'd prefer to spearhead that effort.

@kbader94
Copy link
Contributor

@trnila I've had a better look at your code. It looks like it should work. I'm not 100% sure why you're still getting a 2ms delay, but I have some theories. It's interesting that you do get a reduction from 7ms. This could have something to do with either DMA or context switching & scheduling when sllin pushes to the line discipline via tty_flip_buffer_push. I lean towards scheduling because the default rx trigger level on imx is 8 bytes, which should be about 4ms nominal at 19200 bps, not 7ms, and you are seeing a reduction to 2ms after changing the trigger level. Can you give me more details about exactly which imx6 variant you have? How many cores is it? Is the timing fairly consistent (ie always around 2ms)? Or does it vary? If you run a heavy load does the timing get worse? I've seen some latency on the RPI which seemed to come from the tty_flip_buffer_push. If this is a single or maybe even dual core device under moderate load, that could explain the added latency.

@trnila
Copy link
Contributor Author

trnila commented Sep 24, 2025

@kbader94 Its imx6sx - single core ARM Cortex A9.

The main issue was that power-saving CPU idle states were enabled, which delayed the UART interrupt handler by 2–3 ms. Booting with cpuidle.off=1 or writing 1 to /sys/devices/system/cpu/cpu0/cpuidle/state*/disable disables idle states and reduces the delay between receiving a character and the imx_uart_int interrupt invocation to around 30 us.

With idle states disabled, the average latency between the LIN header and the sllin response is about 350 us (time between PID's last bit and START bit of first response byte). Enabling the performance governor can reduce this by an additional 200 us:

$ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

With both power-saving optimizations and the rx_trigger patch applied, the latency between the header and the response is around 160 us:
no_idlestates
latencies

Running stress --cpu 1 doesn't influence latencies.

@kbader94
Copy link
Contributor

kbader94 commented Sep 25, 2025

@trnila IMO this confirms it's a scheduling issue. Disabling cpuidle and enabling the performance governor is a workaround, but not acceptable for an upstream solution. IIRC the cpu load does have an effect on multi-core devices. The imx6ul is very resource constrained to begin with, so you might not see much of a difference with added load. When I last traced the issue it seemed the root issue is in tty_buffer.c, which uses a system workqueue to schedule pushing rx data to the ldisc here. The tty_flip_buffer_push function is called from the serial drivers themselves, such as imx.c

Older kernel versions used a low_latency flag to push from the UART drivers directly to the LDISC, this was deprecated because it's not safe to push directly to the ldisc from IRQ context as some ldisc_ops receive_buf may sleep. See this discussion here

We should probably open a seperate issue for this. I'm not convinced a higher priority workqueue, as suggested in the linked article above, would gurantee us the RT latency we need. I have some ideas for a proper fix, such as adding an additional tty_ldisc_ops function specifically for pushing to the ldisc, which would be opt-in with an emphasis on not sleeping. I'm definitely open to other/better proposals though.

@kbader94
Copy link
Contributor

@trnila Just to follow up I've replicated the additional latency related to workqueue scheduling on a RPI and opened a new issue for that. I've also identified some patches which were submitted to the kernel mailing list which attempt to solve it. Perhaps we can try some of them and then advocate for their inclusion in mainstream if it solves the issue. In some of the discussions Linus didn't seem to understand the real-world use cases for low-latency tty rx hand-off to the ldisc, but our case is clearly one of the real-world use cases.

As far as the status of this PR, I don't see any issues with the code itself as far as an imx solution is concerned. However, I do think our final API should include something to disable the FIFO, as this is necessary on drivers such as PL011. I also think we'd stand a better chance of upstream inclusion if we integrate with the existing rx_trig_bytes sysfs so as to not duplicate and fragment similar functionality (this is the approach I take in my own patch series I linked to above). I've kind of gotten side-tracked with the workqueue scheduling issue, but I do still intend on implementing your work into my own patch series (which still needs some cleanup). There are a number of discussion points on my patch series I'd be happy to receive your feedback on when I do submit it. Thanks for all your help so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants