Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: w1: HACK do dummy reads while delaying #5846

Closed
wants to merge 830 commits into from

Conversation

P33M
Copy link
Contributor

@P33M P33M commented Jan 11, 2024

This is the first, probably of many cases, where variable latency to RP1 breaks timing-sensitive GPIO bitbash.

See https://forums.raspberrypi.com/viewtopic.php?t=362339

Some other alternative approaches would be to

  • Ask users to set "performance" in /sys/module/pcie_aspm/parameters/policy to force all links to L0 while bitbashing (also affects NVMe, if in use)
  • Clobber ASPM on PCIE2 via overlay fragments, which means that specifying dtoverlay=w1-gpio in config.txt costs you ~400mW
  • Add a pile of heuristic warts to the RP1 firmware to inhibit endpoint L1 entry if it detects RIO activity

So pick your poison...

6by9 and others added 30 commits January 11, 2024 15:37
The code was assuming that it was a single buffer with offsets,
when kmstest uses separate buffers and 0 offsets for each plane.

Signed-off-by: Dave Stevenson <[email protected]>
The audio clock is used by the HDMI controller driver and we were using
it to get its audio rate and compute the dividers needed to reach a
given audio sample rate.

However, we were never enabling it, which was resulting in lockups on
the BCM2712.

Fixes: 632ee3a ("drm/vc4: hdmi: Add audio-related callbacks")
Signed-off-by: Maxime Ripard <[email protected]>
The VC4 HDMI driver has a bunch of accessors to read from a register.
The read accessor was warning when accessing an unknown register, but
the write one was just returning silently.

Let's make sure we warn also when writing to an unknown register.

Signed-off-by: Maxime Ripard <[email protected]>
DLIST generation can get pretty tricky and there's not a lot of debug in
the driver to help. Let's add a few more to track the generated DLIST
size.

Signed-off-by: Maxime Ripard <[email protected]>
We need to allocate a few additional structures when checking our
atomic_state, especially related to hardware SRAM that will hold the
plane descriptors (DLIST) and the current line context (LBM) during
composition.

Since those allocation can fail, let's add some error message in that
case to help debug what goes wrong.

Signed-off-by: Maxime Ripard <[email protected]>
LBM allocations need a different size depending on the line length,
format, etc.

This can get tricky, and fail. Let's add some more prints to ease the
debugging when it does.

Signed-off-by: Maxime Ripard <[email protected]>
The vc4_plane_atomic_check() directly returns the result of the final
function it calls.

Using the already defined ret variable to check its content on error,
and a separate return 0 on success, makes it easier to extend.

Signed-off-by: Maxime Ripard <[email protected]>
We access multiple times the vc4_crtc_state->assigned_channel variable
in the vc4_crtc_get_scanout_position() function, so let's store it in a
local variable.

Signed-off-by: Maxime Ripard <[email protected]>
With the introduction of the BCM2712 support, we will get yet another
generation of display engine to support.

The binary check of whether it's VC5 or not thus doesn't work anymore,
especially since some parts of the driver will have changed with BCM2711,
and some others with BCM2712.

Let's introduce an enum to store the generation the driver is running
on, which should provide more flexibility.

Signed-off-by: Maxime Ripard <[email protected]>
The V3D IP has been separate since BCM2711, so let's make sure we issue
a WARN if we're running not only on BCM2711, but also anything newer.

Signed-off-by: Maxime Ripard <[email protected]>
…output

Since we'll support BCM2712 soon, let's move the logic behind
vc4_hvs_get_fifo_from_output() to a switch to extend it more easily.

Signed-off-by: Maxime Ripard <[email protected]>
Since we'll support BCM2712 soon, let's move the logic to enable and
disable the end-of-frame interrupts to a switch to extend it more
easily.

Signed-off-by: Maxime Ripard <[email protected]>
We currently enable the EOF interrupts through the CRTC destroy_state
implementation.

However, nothing guarantees that we can't call destroy_state multiple
times in a row, and therefore before the EOF interrupt even happens.

This means we would enable the interrupt multiple times but disable it
only once. It wasn't an issue so far since the interrupts were only
enabled by setting a bit in a register, but with BCM2712 we will use an
external interrupt controller, with a refcounted interrupt.

Signed-off-by: Maxime Ripard <[email protected]>
Since the BCM2712 will feature a significantly different HVS, let's move
the hardware initialisation part of our bind function into a separate
function.

That way, it will be easier to extend in the future.

Signed-off-by: Maxime Ripard <[email protected]>
Just like the HVS itself, the COB parameters will be fairly different in
the BCM2712.

Let's move the COB parameters computation and its initialisation to a
separate function that will be easier to extend in the future.

Signed-off-by: Maxime Ripard <[email protected]>
The HVS register set has been heavily modified in the BCM2712, and we'll
thus need a separate debugfs_reg32 array for it.

The name hvs_regs is thus a bit too generic, so let's rename it to
something more specific.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 will have a fairly different dlist, that will feature one
Pointer 0 word for each plane.

Let's prepare by changing the ptr0_offset variable that holds the offset
in a dlist of the pointer 0 word to an array.

Signed-off-by: Maxime Ripard <[email protected]>
With the introduction of the support for BCM2712, the check of whether
we're running on vc5 or not to compute the LBM alignment requirement
doesn't work anymore.

Moreover, the LBM size will need to be computed in words for the
BCM2712, while we've had sizes in bytes so far.

Aligning on either 64 or 32 words is thus fairly harmful on BCM2712, so
let's just explicitly align the size when needed, and then call
drm_mm_insert_node_generic() with an alignment of 1.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 HVS has registers to report the size of the various SRAM the
driver uses, and their size actually differ depending on the stepping.

The initialisation of the memory pools happen in the __vc4_hvs_alloc()
function that also allocates the main HVS structure, that will then hold
the pointer to the memory mapping of the registers.

This creates some kind of circular dependency that we can break by
passing the mapping pointer as an argument for __vc4_hvs_alloc() to use
to query to get the SRAM sizes and initialise the memory pools
accordingly.

Signed-off-by: Maxime Ripard <[email protected]>
It has been observed that a YUV422 unity scaled plane isn't displayed.
Enabling vertical scaling on the UV planes solves this. There is
already a similar clause to always enable horizontal scaling on the
UV planes.

Signed-off-by: Dave Stevenson <[email protected]>
Trying to read /sys/kernel/debug/dri/1/hdmi1_regs
when the hdmi is disconnected results in a fatal system hang.

This is due to the pm suspend code disabling the dvp clock.
That is just a gate of the 108MHz clock in DVP_HT_RPI_MISC_CONFIG,
which results in accesses hanging AXI bus.

Protect against this.

Signed-off-by: Dom Cobley <[email protected]>
The offset fields in vc4_plane_state are described as being
the offset for each buffer in the bo, however it is used to
store the complete DMA address that is then written into the
register.

The DMA address including the fb ofset  can be retrieved
using drm_fb_dma_get_gem_addr, and the offset adjustment due to
clipping is local to vc4_plane_mode_set.
Drop the offset field from the state, and compute the complete
DMA address in vc4_plane_mode_set.

Signed-off-by: Dave Stevenson <[email protected]>
The debug function to display the dlists didn't reset next_entry_start
when starting each display, so resulting in not stopping the
list at the correct place.

Signed-off-by: Dave Stevenson <[email protected]>
The debugfs function to dump dlists aborted at 256 bytes,
when actually the dlist memory is generally significantly
larger but varies based on SoC.

We already have the correct limit in __vc4_hvs_alloc, so
store it for use in the debugfs dlist function.

Signed-off-by: Dave Stevenson <[email protected]>
ABORT_ON_EMPTY chooses whether the HVS abandons the current frame
when it experiences an underflow, or attempts to continue.

In theory the frame should be black from the point of underflow,
compared to a shift of sebsequent pixels to the left.

Unfortunately it seems to put the HVS is a bad state where it is not
possible to recover simply. This typically requires a reboot
following the 'flip done timed out message'.

Discussion with Broadcom has suggested we don't use this flag.
All their testing is done with it disabled.

Additionally setting BLANK_INSERT_EN causes the HDMI to output
blank pixels on an underflow which avoids it losing sync.

After this change a 'flip done timed out' due to sdram bandwidth
starvation or too low a clock is recoverable once the situation improves.

Signed-off-by: Dom Cobley <[email protected]>
Always enable SCALER_CONTROL before attempting other HVS
operations. It's safe to write to some parts of the HVS but
in general it's dangerous to do this because it can cause bus
lockups.

Signed-off-by: Tim Gover <[email protected]>
The BCM2712 HDMI controller uses a slightly different HDMI controller
than the BCM2711, and a completely different PHY.

Let's introduce a new compatible for it.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 has a completely different HVS than the previous
generations, so let's add a new compatible for it.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 has 3 different pixelvalves that are similar to the ones
found in the previous generations but with slightly different
capabilities.

Express that using a new set of compatibles.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 has a MOP controller which is basically a new revision of
the TXP.

Express that by adding a new compatible for it.

Signed-off-by: Maxime Ripard <[email protected]>
pelwell and others added 23 commits January 11, 2024 15:37
The driver makes use of the fact that the firmware node is its parent,
so we'd better make it so.

Signed-off-by: Phil Elwell <[email protected]>
HTS was still using the raw register ID.

Fixes: dd26d43 ("media: i2c: imx219: make HBLANK r/w to allow longer exposures")
Signed-off-by: Dave Stevenson <[email protected]>
Add a dtparam "rtc", so that "dtparam=rtc=off" can be used to disable
the Pi 5's onboard RTC.

See: https://forums.raspberrypi.com/viewtopic.php?t=361813

Signed-off-by: Phil Elwell <[email protected]>
The i2c-designware driver supports reading precise timing values from
ACPI, but the Device Tree support relies on a combination of standard
rise and fall times and hard-coded minimum timings. The result of this
is that it is difficult to get optimum timings, particularly given that
the values are bus speed-specific and only one set can be stored in
DT at a time.

Add support for initialisation from DT that is similar to that for
ACPI.

Signed-off-by: Phil Elwell <[email protected]>
Signed-off-by: Phil Elwell <[email protected]>
This configuration hindered performance by ~74% measured from RPI 4B
from ~680Mbps to ~390Mbps when benchmarking wireguard locally using
netns and iperf3. Remove it by default for better performance.

Signed-off-by: Yangyu Chen <[email protected]>
bclk_ratio is only a factor in clock producer mode, and needs to
override the default value of num_channels * sample_size.
Move the bclk_ratio handling into the hw_params method, only latching
the value in set_bclk_ratio, to address both of those matters.

See: raspberrypi#5817

Signed-off-by: Phil Elwell <[email protected]>
The frame count values moved within registers DISPSTAT1 and
DISPSTAT2 with GEN5, so update the accessor function to
accommodate that.

Fixes: b51cd7a ("drm/vc4: hvs: Fix frame count register readout")
Signed-off-by: Dave Stevenson <[email protected]>
Noted that if we get "node link is not enabled", then we also
get the videobuf2 splat for the driver not cleaning up correctly
on a failed start_streaming, and indeed we weren't returning the
buffers.

Checking the other error paths, noted that the "FE enabled, but
FE_CONFIG node is not" path was not calling pm_runtime_put.

Fix both paths.

Signed-off-by: Dave Stevenson <[email protected]>
CSI2 devices are meant to use the 1Xnn formats rather than 2Xnn
such as MEDIA_BUS_FMT_UYVY8_2X8.

For devices with ADV7180_FLAG_MIPI_CSI2 set, use
MEDIA_BUS_FMT_UYVY8_1X16.

Signed-off-by: Dave Stevenson <[email protected]>
For CSI2 receivers that need to know the link frequency,
add it as a control to the driver.
Interlaced modes are 216Mbp/s or 108MHz, whilst going through
the I2P to deinterlace gives 432Mb/s or 216MHz.

Signed-off-by: Dave Stevenson <[email protected]>
Seeing as we now have the CSI2 data types defined, make use of
them instead of hardcoding the values.

Signed-off-by: Dave Stevenson <[email protected]>
Raw 16bit formats didn't have a csi_dt value defined, which
presumably would trip the WARN_ON(!fmt->csi_dt); in
cfe_start_channel.

The value is defined in CSI2 v2.0 as 0x2e, so set it accordingly.

Signed-off-by: Dave Stevenson <[email protected]>
Include the dtparams controlling the Ethernet jack LEDs, as used on
other Pis.

See: raspberrypi#5825

Signed-off-by: Phil Elwell <[email protected]>
Add dtparams for adjusting the Pi 5 cooling fan speeds and temperature
thresholds.

See: raspberrypi#5820

Signed-off-by: Phil Elwell <[email protected]>
Currently, when using non-blocking commits, we can see the following
kernel warning:

[  110.908514] ------------[ cut here ]------------
[  110.908529] refcount_t: underflow; use-after-free.
[  110.908620] WARNING: CPU: 0 PID: 1866 at lib/refcount.c:87 refcount_dec_not_one+0xb8/0xc0
[  110.908664] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep hid_logitech_hidpp vc4 brcmfmac hci_uart btbcm brcmutil bluetooth snd_soc_hdmi_codec cfg80211 cec drm_display_helper drm_dma_helper drm_kms_helper snd_soc_core snd_compress snd_pcm_dmaengine fb_sys_fops sysimgblt syscopyarea sysfillrect raspberrypi_hwmon ecdh_generic ecc rfkill libaes i2c_bcm2835 binfmt_misc joydev snd_bcm2835(C) bcm2835_codec(C) bcm2835_isp(C) v4l2_mem2mem videobuf2_dma_contig snd_pcm bcm2835_v4l2(C) raspberrypi_gpiomem bcm2835_mmal_vchiq(C) videobuf2_v4l2 snd_timer videobuf2_vmalloc videobuf2_memops videobuf2_common snd videodev vc_sm_cma(C) mc hid_logitech_dj uio_pdrv_genirq uio i2c_dev drm fuse dm_mod drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[  110.909086] CPU: 0 PID: 1866 Comm: kodi.bin Tainted: G         C         6.1.66-v8+ raspberrypi#32
[  110.909104] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[  110.909114] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  110.909132] pc : refcount_dec_not_one+0xb8/0xc0
[  110.909152] lr : refcount_dec_not_one+0xb4/0xc0
[  110.909170] sp : ffffffc00913b9c0
[  110.909177] x29: ffffffc00913b9c0 x28: 000000556969bbb0 x27: 000000556990df60
[  110.909205] x26: 0000000000000002 x25: 0000000000000004 x24: ffffff8004448480
[  110.909230] x23: ffffff800570b500 x22: ffffff802e03a7bc x21: ffffffecfca68c78
[  110.909257] x20: ffffff8002b42000 x19: ffffff802e03a600 x18: 0000000000000000
[  110.909283] x17: 0000000000000011 x16: ffffffffffffffff x15: 0000000000000004
[  110.909308] x14: 0000000000000fff x13: ffffffed577e47e0 x12: 0000000000000003
[  110.909333] x11: 0000000000000000 x10: 0000000000000027 x9 : c912d0d083728c00
[  110.909359] x8 : c912d0d083728c00 x7 : 65646e75203a745f x6 : 746e756f63666572
[  110.909384] x5 : ffffffed579f62ee x4 : ffffffed579eb01e x3 : 0000000000000000
[  110.909409] x2 : 0000000000000000 x1 : ffffffc00913b750 x0 : 0000000000000001
[  110.909434] Call trace:
[  110.909441]  refcount_dec_not_one+0xb8/0xc0
[  110.909461]  vc4_bo_dec_usecnt+0x4c/0x1b0 [vc4]
[  110.909903]  vc4_cleanup_fb+0x44/0x50 [vc4]
[  110.910315]  drm_atomic_helper_cleanup_planes+0x88/0xa4 [drm_kms_helper]
[  110.910669]  vc4_atomic_commit_tail+0x390/0x9dc [vc4]
[  110.911079]  commit_tail+0xb0/0x164 [drm_kms_helper]
[  110.911397]  drm_atomic_helper_commit+0x1d0/0x1f0 [drm_kms_helper]
[  110.911716]  drm_atomic_commit+0xb0/0xdc [drm]
[  110.912569]  drm_mode_atomic_ioctl+0x348/0x4b8 [drm]
[  110.913330]  drm_ioctl_kernel+0xec/0x15c [drm]
[  110.914091]  drm_ioctl+0x24c/0x3b0 [drm]
[  110.914850]  __arm64_sys_ioctl+0x9c/0xd4
[  110.914873]  invoke_syscall+0x4c/0x114
[  110.914897]  el0_svc_common+0xd0/0x118
[  110.914917]  do_el0_svc+0x38/0xd0
[  110.914936]  el0_svc+0x30/0x8c
[  110.914958]  el0t_64_sync_handler+0x84/0xf0
[  110.914979]  el0t_64_sync+0x18c/0x190
[  110.914996] ---[ end trace 0000000000000000 ]---

This happens because, although `prepare_fb` and `cleanup_fb` are
perfectly balanced, we cannot guarantee consistency in the check
plane->state->fb == state->fb. This means that sometimes we can increase
the refcount in `prepare_fb` and don't decrease it in `cleanup_fb`. The
opposite can also be true.

In fact, the struct drm_plane .state shouldn't be accessed directly
but instead, the `drm_atomic_get_new_plane_state()` helper function should
be used. So, we could stick to this check, but using
`drm_atomic_get_new_plane_state()`. But actually, this check is not really
needed. We can increase and decrease the refcount symmetrically without
problems.

This is going to make the code more simple and consistent.

Signed-off-by: Maíra Canal <[email protected]>
As of 6.6, the names of the labels on the Pi LEDs was swapped to match
the upstream code, i.e. led_act rather than act_led.

Apply the same change to Pi 5.

Signed-off-by: Phil Elwell <[email protected]>
Users report about the unexpected behavior for setting timeouts above
15 sec on Raspberry Pi. According to watchdog-api.rst the ioctl
WDIOC_SETTIMEOUT shouldn't fail because of hardware limitations.
But looking at the code shows that max_timeout based on the
register value PM_WDOG_TIME_SET, which is the maximum.

Since 664a392 ("watchdog: Introduce hardware maximum heartbeat
in watchdog core") the watchdog core is able to handle this problem.

This fix has been tested with watchdog-test from selftests.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217374
Fixes: 664a392 ("watchdog: Introduce hardware maximum heartbeat in watchdog core")
Signed-off-by: Stefan Wahren <[email protected]>
The forced conversion of native CS lines into software CS lines is done
whether or not the controller has been given any CS lines to use. This
breaks the use of the spi0-0cs overlay to prevent SPI from claiming any
CS lines, particularly with spidev which doesn't pass in the SPI_NO_CS
flag at creation.

Use the presence of an empty cs-gpios property as an indication that no
CS lines should be used, bypassing the native CS conversion code.

See: raspberrypi#5835

Signed-off-by: Phil Elwell <[email protected]>
Add V4L2_CID_LINK_FREQ as a read-only control with a value of 408 Mhz.
This will be used by the CFE driver to corretly setup the DPHY timing
parameters in the CSI-2 block.

Signed-off-by: Lee Jackson <[email protected]>
Add V4L2_CID_LINK_FREQ as a read-only control with a value of 456 Mhz.
This will be used by the CFE driver to corretly setup the DPHY timing
parameters in the CSI-2 block.

Signed-off-by: Lee Jackson <[email protected]>
On Pi 5, the link to RP1 will bounce in and out of L1 depending on
inactivity timers at both the RC and EP end. Unfortunately for
bitbashing 1-wire, this means that on an otherwise idle Pi 5 many of the
reads/writes to GPIO registers are delayed by up to 8us which causes
mis-sampling of read data and trashes write bits.

By issuing dummy reads at a rate greater than the link inactivity
timeout while spinning on a delay, PCIe stays in L0 which does not incur
additional latency.

There are probably better ways to solve this problem.

Signed-off-by: Jonathan Bell <[email protected]>
@JamesH65
Copy link
Contributor

Just out of interest, why do you need to do the dummy reads during the delay? (Should that be commented in the code?)

@pelwell
Copy link
Contributor

pelwell commented Jan 11, 2024

To keep the link out of a powersave mode.

@pelwell
Copy link
Contributor

pelwell commented Jan 12, 2024

FYI, this gives me a reliable temperature from a DS18B20.

@popcornmix popcornmix force-pushed the rpi-6.6.y branch 2 times, most recently from 7df1d3b to 60781d6 Compare January 22, 2024 12:43
@P33M
Copy link
Contributor Author

P33M commented Jan 22, 2024

Closing - rebasing offline doesn't seem to want to work, and I have a better version of the fix

@P33M P33M closed this Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.