Enable Big/Super Pages for v3d#7285
Open
mairacanal wants to merge 683 commits intoraspberrypi:rpi-6.18.yfrom
Open
Enable Big/Super Pages for v3d#7285mairacanal wants to merge 683 commits intoraspberrypi:rpi-6.18.yfrom
mairacanal wants to merge 683 commits intoraspberrypi:rpi-6.18.yfrom
Conversation
Users are reporting running out of DLIST memory. Add a debugfs file to dump out all the allocations. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
We have a read-modify-write race when updating SCALER_DISPCTRL for underrun and end-of-frame interrupts. Ideally it would be fixed via a spinlock or similar, but that will require a reasonable amount of study to ensure we don't get deadlocks. The underrun reporting is only for debug, so disable it for now. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The dmabuf import already checks that the backing buffer is contiguous and rejects it if it isn't. vc4 also requires that the buffer is in the bottom 1GB of RAM, and this is all correctly defined via dma-ranges. However the kernel silently uses swiotlb to bounce dma buffers around if they are in the wrong region. This relies on dma sync functions to be called in order to copy the data to/from the bounce buffer. DRM is based on all memory allocations being coherent with the GPU so that any updates to a framebuffer will be acted on without the need for any additional update. This is fairly fundamentally incompatible with needing to call dma_sync_ to handle the bounce buffer copies, and therefore we have to detect and reject mappings that use bounce buffers. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
DSI0 is misbehaving and needs to action things on vblank to work around it. Add a new hook to call across during vblank. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The initialisation sequence differs slightly from the documentation in that the clocks are meant to be running before resets and similar. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
vc4_dsi_bridge_disable wasn't resetting things during shutdown, so add that in. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The block must be enabled for the FIFO resets to be actioned, so ensure this is the case. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The pixel to byte FIFO appears to not always reset correctly, which can lead to colour errors and/or horizontal shifts. Reset on every vblank to work around the issue. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The TC358762 bridge and panel decodes the mode differently on DSI0 to DSI1 for no obvious reason, and results in a shift off the screen. Whilst it would be possible to change the compatible used for the panel, that then messes up Pi5. As it appears to be restricted to vc4 DSI0, fix up the mode in vc4_dsi. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Some DSI peripheral drivers wish to send commands in the post_disable or panel unprepare callback. These are called after the DSI host's disable call, but before the host's post_disable if pre_enable_prev_first is set. Don't reset the block until post_disable to allow these commands to be sent. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The DSI block appears to be able to come up stuck in a condition where it leaves the lanes in HS mode or just jabbering. This stops LP transfers from completing as there is no LP time available. This is signalled via the LP1 contention error. Enabling video briefly clears that condition, so if we detect the error condition, enable video mode and then retry. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Similar to the ch7006 and nouveau drivers, introduce a "tv_mode" module parameter that allow setting the TV norm by specifying vc4.tv_norm= on the kernel command line. If that is not specified, try inferring one of the most popular norms (PAL or NTSC) from the video mode specified on the command line. On Raspberry Pis, this causes the most common cases of the sdtv_mode setting in config.txt to be respected. Signed-off-by: Mateusz Kwiatkowski <kfyatek+publicgit@gmail.com> drm/vc4: Do not reset tv mode as this is already handled by framework In vc4_vec_connector_reset, the tv mode is already reset to the property default by drm_atomic_helper_connector_tv_reset, so there is no need for a local fixup to potentially some other default. Fixes: 96922af ("drm/vc4: Allow setting the TV norm via module parameter") Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
With the command line parser now providing the information about the tv mode, use that as the preferred choice for initialising the default of the tv_mode property. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
If you disable HDR metadata, then the hardware should stop sending the infoframe, and that is implemented by the clear_infoframe hook which wasn't implemented. Add it. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
See: https://forum.libreelec.tv/thread/24783-tv-avr-turns-back-on-right-after-turning-them-off While the kernel provides a :D flag for assuming device is connected, it doesn't stop this function from being called and generating a cec_phys_addr_invalidate message when hotplug is deasserted. That message provokes a flurry of CEC messages which for many users results in the TV switching back on again and it's very hard to get it to stay switched off. It seems to only occur with an AVR and TV connected but has been observed across a number of manufacturers. The issue started with raspberrypi#4371 and this provides an optional way of getting back the old behaviour Signed-off-by: Dom Cobley <popcornmix@gmail.com>
The intention of the vc4.force_hotplug setting is to ignore hotplug completely. It can be used when a display toggles hotplug when switching AV inputs, going into standby or changing a KVM switch, and some side effect of that is unwanted. It turns out while vc4.force_hotplug currently makes hotplug always read as asserted, that isn't enough to stop drm doing lots of stuff, including re-reading the edid. An example of what drm does with a hotplug deasert/assert and vc4.force_hotplug=1 currently is: https://paste.debian.net/hidden/dc07434b/ That is unwanted. Lets ignore the hotplug interrupt completely so drm is blissfully unaware of the hotplug change. Signed-off-by: Dom Cobley <popcornmix@gmail.com>
There appears to be a requirement for some devices (I'm testing with a 8K VRROOM 40Gbps HDMI switch) for a measable delay between removing the hdmi phy output from the old mode, to enabling the hdmi phy output for the new mode. Without the delay, a mode switch has a small change of getting a permanent 'no signal', which requires a subsequent mode switch or a unplug/replug to redetect. Switching between 4kp24/25/30 modes fails about 5% of time in my testing. Add a delay to make it impossible to switch faster than this. Signed-off-by: Dom Cobley <popcornmix@gmail.com>
The body of this function was missing so we don't reset the phy when disabling it. Signed-off-by: Dom Cobley <popcornmix@gmail.com>
The current reset code doesn't actually stop the hdmi output. That makes it difficult for displays to handle a mode set. Powering down the PLL does actually remove the hdmi signal and makes mode sets more reliable Signed-off-by: Dom Cobley <popcornmix@gmail.com>
There are no MEDIA_BUS_FMT_* defines for GRB or BRG, and adding them is a pain. Add a DT override to allow setting the order. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Seeing as the HVS can be configured with regard the scaling filter, and DRM now supports selecting scaling filters at a per CRTC or per plane level, we can implement it. Default remains as the Mitchell/Netravali filter, but nearest neighbour is now also implemented. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The documentation says that the TPZ filter can not upscale, and requesting a scaling factor > 1:1 will output the original image in the top left, and repeat the right/bottom most pixels thereafter. That fits perfectly with upscaling a 1x1 image which is done a fair amount by some compositors to give solid colour, and it saves a large amount of LBM (TPZ is based on src size, whilst PPF is based on dest size). Select TPZ filter for images with source rectangle <=1. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The register to enable/disable background fill was being set from atomic flush, however that will be applied immediately and can be a while before the vblank. If it was required for the current frame but not for the next one, that can result in corruption for part of the current frame. Store the state in vc4_hvs, and update it on vblank. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The HVS can accept an arbitrary number of planes, provided that the overall pixel read load is within limits, and the display list can fit into the dlist memory. Now that DRM will support 64 planes per device, increase the number of overlay planes from 16 to 48 so that the dlist complexity can be increased (eg 4x4 video wall on each of 3 displays). Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Instead of having 48 generic overlay planes, assign 32 to the writeback connector so that there is no ambiguity in wlroots when trying to find a plane for composition using the writeback connector vs display. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The transposer/writeback connector should be running with a lower priority, so shouldn't be factored into the load calculations. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
As the writeback connector doesn't have the same realtime constraints of a live display, drop the panic priority for it. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The txp block can implement transpose as it writes out the image data, so expose that through the new connector rotation property. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> drm: vc4: txp: Do not allow 24bpp formats when transposing The hardware doesn't support transposing to 24bpp (RGB888/BGR888) formats. There's no way to advertise this through DRM, so block it from atomic_check instead. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Currently, booting with no hdmi connected has: pi@pi4:~ $ vcgencmd measure_clock hdmi pixel frequency(9)=120010256 frequency(29)=74988280 After connecting hdmi we get: pi@pi4:~ $ vcgencmd measure_clock hdmi pixel frequency(9)=300005856 frequency(29)=149989744 and that persists after disconnecting hdmi I can measure this on a power supply as 10mA@5.2V (52mW). We should always remove clk_set_min_rate requests when we no longer need them. Signed-off-by: Dom Cobley <popcornmix@gmail.com>
On current firmware versions, RPI_FIRMWARE_SET_CLOCK_STATE doesn't actually power off the clock. To achieve meaningful power savings, the clock rate must be set to the minimum before disabling. This might be fixed in future firmware releases. Rather than pushing rate management to clock consumers, handle it directly in the clock framework's prepare/unprepare callbacks. In unprepare, set the rate to the firmware-reported minimum before disabling the clock. In prepare, for clocks marked with `maximize` (currently v3d), restore the rate to the maximum after enabling. Signed-off-by: Maíra Canal <mcanal@igalia.com>
If PIXEL_CLK or HEVC_CLK is disabled during boot, the firmware will skip HSM initialization, which would result in a bus lockup. However, those clocks are consumed by drivers (vc4 and HEVC decoder drivers, respectively), which means that they can be enabled/disabled by the drivers. Mark those clocks as CLK_IGNORE_UNUSED to allow them to be disabled by drivers when appropriate. Acked-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com>
The bcm2835_asb_control() function uses a tight polling loop to wait for the ASB bridge to acknowledge a request. During intensive workloads, this handshake intermittently fails for V3D's master ASB on BCM2711, resulting in "Failed to disable ASB master for v3d" errors during runtime PM suspend. As consequence, the failed power-off leaves V3D in a broken state, leading to bus faults or system hangs on later accesses. As the timeout is insufficient in some scenarios, increase the polling timeout from 1us to 5us, which is still negligible in the context of a power domain transition. Also, move the start timestamp to after the MMIO write, as the write latency is counted against the timeout, reducing the effective wait time for the hardware to respond. Signed-off-by: Maíra Canal <mcanal@igalia.com>
Simplify optional reset handling by using the function devm_reset_control_get_optional_exclusive(). Reviewed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com>
Move all resource allocation operations before actually enabling the clock, as those operations don't require the GPU to be powered on. This is a preparation for runtime PM support. The next commit will move all code related to powering on and initiating the GPU into the runtime PM resume callback and all resource allocation will happen before resume(). Reviewed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com>
Commit 90a64ad ("drm/v3d: Get rid of pm code") removed the last bits of power management code that V3D had, which were actually never hooked. Therefore, currently, the GPU clock is enabled during probe and only disabled when removing the driver. Implement proper power management using the kernel's Runtime PM framework. Signed-off-by: Maíra Canal <mcanal@igalia.com>
commit 237577e upstream The GEM MAC provides four read-only, clear-on-read LPI statistics registers at offsets 0x270-0x27c: GEM_RXLPI (0x270): RX LPI transition count (16-bit) GEM_RXLPITIME (0x274): cumulative RX LPI time (24-bit) GEM_TXLPI (0x278): TX LPI transition count (16-bit) GEM_TXLPITIME (0x27c): cumulative TX LPI time (24-bit) Add register offset definitions, extend struct gem_stats with corresponding u64 software accumulators, and register the four counters in gem_statistics[] so they appear in ethtool -S output. Because the hardware counters clear on read, the existing macb_update_stats() path accumulates them into the u64 fields on every stats poll, preventing loss between userspace reads. These registers are present on SAMA5D2, SAME70, PIC32CZ, and RP1 variants of the Cadence GEM IP and have been confirmed on RP1 via devmem reads. Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Reviewed-by: Théo Lebrun <theo.lebrun@bootlin.com> Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
commit 0cc425f upstream The GEM MAC has hardware LPI registers (NCR bit 19: TXLPIEN) but no built-in idle timer, so asserting TXLPIEN blocks all TX immediately with no automatic wake. A software idle timer is required, as noted in Microchip documentation (section 40.6.19): "It is best to use firmware to control LPI." Implement phylink managed EEE using the mac_enable_tx_lpi and mac_disable_tx_lpi callbacks: - macb_tx_lpi_set(): sets or clears TXLPIEN; requires bp->lock to be held by the caller (asserted with lockdep_assert_held). Returns bool indicating whether the register actually changed, avoiding redundant writes and unnecessary udelay on the xmit fast path. - macb_tx_lpi_work_fn(): delayed_work handler that enters LPI if all TX queues are idle and EEE is still active. Takes bp->lock with irqsave before calling macb_tx_lpi_set(). - macb_tx_lpi_schedule(): arms the work timer using the LPI timer value provided by phylink (default 250 ms). Called from macb_tx_complete() after each TX drain so the idle countdown restarts whenever the ring goes quiet. - macb_tx_lpi_wake(): called from macb_start_xmit() under bp->lock, immediately before TSTART. Returns early if eee_active is false to avoid a register read on the common path when EEE is disabled. Clears TXLPIEN and applies a 50 us udelay for PHY wake (IEEE 802.3az Tw_sys_tx is 16.5 us for 1000BASE-T / 30 us for 100BASE-TX; GEM has no hardware enforcement). Only delays when TXLPIEN was actually set. The delay is placed after tx_head is advanced so the work_fn's queue-idle check sees a non-empty ring and cannot race back into LPI before the frame is transmitted. - mac_enable_tx_lpi: stores the timer and sets eee_active under bp->lock, then defers the first LPI entry by 1 second per IEEE 802.3az section 22.7a. - mac_disable_tx_lpi: cancels the work (sync, without the lock to avoid deadlock with the work_fn), then takes bp->lock to clear eee_active and deassert TXLPIEN. Populate phylink_config lpi_interfaces (MII, GMII, RGMII variants) and lpi_capabilities (MAC_100FD | MAC_1000FD) so phylink can negotiate EEE with the PHY and call the callbacks appropriately. Set lpi_timer_default to 250000 us and eee_enabled_default to true. Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Reviewed-by: Théo Lebrun <theo.lebrun@bootlin.com> Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
commit 61332b7 upstream Implement get_eee and set_eee ethtool ops for GEM as simple passthroughs to phylink_ethtool_get_eee() and phylink_ethtool_set_eee(). No MACB_CAPS_EEE guard is needed: phylink returns -EOPNOTSUPP from both ops when mac_supports_eee is false, which is the case when lpi_capabilities and lpi_interfaces are not populated. Those fields are only set when MACB_CAPS_EEE is present (previous patch), so phylink already handles the unsupported case correctly. Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Reviewed-by: Théo Lebrun <theo.lebrun@bootlin.com> Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
commit 92ba330 upstream Set MACB_CAPS_EEE for the Raspberry Pi 5 RP1 southbridge (Cadence GEM_GXL rev 0x00070109 paired with BCM54213PE PHY). EEE has been verified on RP1 hardware: the LPI counter registers at 0x270-0x27c return valid data, the TXLPIEN bit in NCR (bit 19) controls LPI transmission correctly, and ethtool --show-eee reports the negotiated state after link-up. Other GEM variants that share the same LPI register layout (SAMA5D2, SAME70, PIC32CZ) can be enabled by adding MACB_CAPS_EEE to their respective config entries once tested. Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Reviewed-by: Théo Lebrun <theo.lebrun@bootlin.com> Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
BCM54213PE (PHY_ID = 0x600d84a2) and BCM54210E share the same model ID when masked by PHY_ID_MATCH_MODEL_MASK (0xfffffff0): both reduce to 0x600d84a0. The dispatch switch in bcm54xx_config_init() switches on phydev->drv->phy_id & PHY_ID_MATCH_MODEL_MASK, so the separate case PHY_ID_BCM54213PE: could never match — the expression always evaluated to 0x600d84a0, not 0x600d84a2. bcm54213pe_config_init() was silently never called; BCM54213PE instead fell through to the BCM54210E path. Replace the dead case label with an exact driver ID check inside the BCM54210E case, which already handles the same model ID family. Fixes: 1001c6f ("phy: broadcom: Add bcm54213pe configuration") Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
BCM54xx PHYs default to AutogrEEEn mode (MII_BUF_CNTL_0 bit 0), which manages EEE Low Power Idle autonomously without forwarding LPI signaling to the MAC over the RGMII interface. This prevents the MAC from tracking LPI activity and controlling TX LPI entry/exit. Unconditionally clear the AutogrEEEn enable bit during config_init to switch all BCM54xx PHYs to Native EEE mode. In Native EEE mode the MAC controls TX LPI and the PHY forwards received LPI on the RGMII interface, allowing MACs with IEEE 802.3az support to observe RX LPI transitions. Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reverts 85b196f ("dts: cm5/pi5: Disable EEE on rp1"). Now that the macb driver supports EEE via the phylink managed EEE API, the BCM54213PE PHY can advertise EEE correctly and the MAC will handle TX LPI entry/exit properly. Remove the eee-broken-1000t and eee-broken-100tx properties that were added as a workaround when the macb driver lacked EEE support. Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Add an eee dtparam to allow users to disable EEE on the BCM54213PE PHY without recompiling the device tree. Setting dtparam=eee=off adds the eee-broken-1000t and eee-broken-100tx properties to the PHY node which prevents EEE advertisement and negotiation. EEE is enabled by default (dtparam=eee=on). Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reverts 42b77dd ("net: bcmgenet: Add 'eee' module parameter") for the DTS side. The downstream driver patch can be dropped as the same functionality is now handled entirely in device tree, removing one downstream-only driver change from the tree. Move the eee dtparam from the shared bcm2711-rpi-ds.dtsi into the individual Pi4 and CM4 device trees (where phy1 is defined), using eee-broken-1000t and eee-broken-100tx properties on the PHY node. This prevents EEE advertisement at the PHY level without requiring driver modifications. CM4S is excluded as it has no onboard Ethernet. Setting dtparam=eee=off adds the eee-broken properties to the PHY node which prevents EEE negotiation entirely. EEE is enabled by default (dtparam=eee=on). Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Some Goodix controllers report the buffer isn't ready continuously when there are no touch points to report. That triggers the retry mechanism within the driver required as supposedly the data can be 10ms after the interrupt occurs. Seeing as we don't have an interrupt there is little point in retrying, and we can wait for the next poll event. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The [AX]RGB8888 and [AX]BGR8888 formats were all in the translation list, but RGB[AX]8888 and BGR[AX]8888 weren't. Seeing as the writeback connector had them in the list of supported formats, that meant it could generate content that couldn't be input. Add the relevant translations. Whilst T-Format support should be possible, it has not been added at this stage. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
BCM2711 fixes PLLC, so there's no need to avoid it. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
This reverts commit 44d0986.
This reverts commit 627b64f.
Retain the original compatible strings as fallbacks. See: raspberrypi#7023 Signed-off-by: Phil Elwell <phil@raspberrypi.com>
On pi this was getting set to 0 which was hanging the firmware Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Commit de9e2b3d88af upstream. Currently DIV_ROUND_CLOSEST() is only available for the kernel via include/linux/math.h. Expose it to userland as well by adding __KERNEL_DIV_ROUND_CLOSEST() as a common definition in uapi. Additionally, ensure it allows building ISO C applications by switching from the 'typeof' GNU extension to the ISO-friendly __typeof__. Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Tested-by: Diederik de Haas <diederik@cknow-tech.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20260303-rk3588-bgcolor-v8-1-fee377037ad1@collabora.com Signed-off-by: Daniel Stone <daniels@collabora.com>
Commit 4c684596cde4 upstream. Some display controllers can be hardware programmed to show non-black colors for pixels that are either not covered by any plane or are exposed through transparent regions of higher planes. This feature can help reduce memory bandwidth usage, e.g. in compositors managing a UI with a solid background color while using smaller planes to render the remaining content. To support this capability, introduce the BACKGROUND_COLOR standard DRM mode property, which can be attached to a CRTC through the drm_crtc_attach_background_color_property() helper function. Additionally, define a 64-bit ARGB format value to be built with the help of a couple of dedicated DRM_ARGB64_PREP*() helpers. Individual color components can be extracted with desired precision using the corresponding DRM_ARGB64_GET*() macros. Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Tested-by: Diederik de Haas <diederik@cknow-tech.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20260303-rk3588-bgcolor-v8-2-fee377037ad1@collabora.com Signed-off-by: Daniel Stone <daniels@collabora.com>
When adding the register definitions for the GEN_6D hardware, 6 defines managed to get added twice. Remove that duplication. Fixes: 3ca2940 ("drm/vc4: hvs: Add in support for 2712 D-step.") Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Since a previous patch introduced the BACKGROUND_COLOR CRTC property, which defaults to solid black, take it into account when programming the hardware. The exact registers used varies between the hardware generations, but is supported by all of them. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Considering that the Raspberry Pi is an embedded device with limited memory, memory fragmentation is an important aspect for performance. Using Big/Super Pages has clear benefits when it comes to reducing TLB misses, but also has an impact on memory fragmentation as we need to allocate aligned contiguous memory, increasing compaction pressure and memory waste for small BOs. As Big/Super Pages only have benefits for larger BOs, create a minimum BO size to use the THP partition. After testing different thresholds, 512KB provides the most balanced results with clear improvements and no significant regressions. This means that Big/Super Pages will only be used for BOs of at least 512KB. Signed-off-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
2e9acf3 to
6ad963a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Considering that the Raspberry Pi is an embedded device with limited memory, memory fragmentation is an important aspect for performance. Using Big/Super Pages has clear benefits when it comes to reducing TLB misses, but also has an impact on memory fragmentation as we need to allocate aligned contiguous memory, increasing compaction pressure and memory waste for small BOs.
As Big/Super Pages only have benefits for larger BOs, create a minimum BO size to use the THP partition. After testing different thresholds, 512KB provides the most balanced results with clear improvements and no significant regressions. This means that Big/Super Pages will only be used for BOs of at least 512KB.
Here are some benchmark results. Each trace has been run twice to gather the results.