Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XDMA C2H Speed Capped at ~120MB/s on i.MX8MP with Kintex KCU105 FPGA #328

Open
krithick14 opened this issue Mar 14, 2025 · 1 comment
Open

Comments

@krithick14
Copy link

krithick14 commented Mar 14, 2025

Setup Details:

  • FPGA: Kintex KCU105
  • Host Board: i.MX8M Plus EVK (i.MX8MP)
  • Connection: M.2 to x8 adapter board

PCIe Link Speeds Tested:

  • Gen1 x1 (2.5GT/s) → ~120MB/s

Image

  • Gen2 x1 (5GT/s) → ~120MB/s

Image

XDMA Transfer: C2H (FPGA to IMX)

Data Type: RAW RGB32 video

IMX Linux Kernel Version: 6.6.52

Vivado Version: 2022.2

Issue Description:

When using the XDMA driver for C2H transfers, the observed speed is consistently capped at ~120MB/s, regardless of whether the PCIe link is operating at Gen1 x1 (2.5GT/s) or Gen2 x1 (5GT/s). This suggests a possible bottleneck in the driver, DMA engine, or PCIe configuration.

Steps Taken:

  1. Verified PCIe link speed using lspci -vvv (confirms 5GT/s Gen2 x1 operation). lspci_xdma_log.txt

  2. Ensured XDMA module is correctly loaded and initialized.

Expected Behavior:

  • At Gen2 x1 (5GT/s), the speed should be significantly higher than Gen1.

  • Performance should scale with PCIe link speed.

Questions:

  1. Is there any known limitation in the XDMA driver for i.MX8MP?
  2. Are there additional tuning parameters for increasing throughput?

Would appreciate any insights or recommendations for debugging this further.

Logs and additional details can be provided upon request.

@dmitrym1
Copy link

Hi @krithick14, try the following recommendations:

  1. Set up ILA to watch XDMA AXI bus, and perform multiple transactions.
  2. Your transfer rate depends on AXI frequency, so try to crank it up as much as your application requires.
  3. If you are using AXI MM, make sure you are using it in AXI Full mode, not in AXI Lite, because AXI Full supports burst transactions
  4. See if your data source can supply data continuously without dropping VALID signal, otherwise it creates a bottleneck.
  5. See if your data source can provide data immediately when requested, otherwise it creates a delay that also reduces transfer speed. If your data source can provide data immediately and continuously, you'll see a single transaction rate to be near the maximum possible rate for your AXI frequency. It means that bottleneck is between transactions.
  6. If you have a large transaction, it is separated in smaller chunks by the driver. When one chunk is finished, the driver receives an interrupt and requests a new chunk of data. If on your system interrupts are slow, try to switch the driver to polling mode
    insmod xdma.ko poll_mode=1
  7. Also, make sure you are using latest patch set [xdma] alonbl's stable patchset #240
  8. Make sure that debug outputs in XDMA driver are disabled

Below are my results with different AXI frequencies and PCIe modes, mostly in Lite mode, with one result in Full mode, with iMX8MM, community patch set, MSI interrupt mode, XDMA 2018.2 and small bottlenecks in data source. If you are using these results in some sort of scientific paper, please add a link to me, I'd be very appreciated.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants