Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize CMake linting and auto-formatting across RAPIDS #62

Open
jameslamb opened this issue Jan 24, 2025 · 1 comment
Open

standardize CMake linting and auto-formatting across RAPIDS #62

jameslamb opened this issue Jan 24, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@jameslamb
Copy link
Member

Description

As of this writing, CMake formatting varies somewhat across RAPIDS. It would be great to standardize the look and feel of CMake code across RAPIDS, and pre-commit seems like a good system to do that.

This could involve creating new hooks or choosing from the existing third-party open-source projects.

Benefits of this work

  • standardization enforced by tools = less effort put into style-related review comments
  • static analysis = reduced risks of bugs making it through review (especially helpful for code paths not touched by CI)
  • creating an easier-to-adopt approach for CMake linting should improve the likelihood that projects adopt it

Acceptance Criteria

  • RAPIDS projects use pre-commit hook(s) for CMake autoformatting and linting

Approach

1. Review the current approaches used across RAPIDS

Several RAPIDS projects use the following approach:

  1. vendor shell script that invokes cmake-format and cmake-lint in pre-commit-friendly ways (example: cudf/cpp/scripts/ run-cmake-format.sh)
  2. vendor a config file for cmake-format (example: cudf/cpp/cmake/config.json)
  3. at runtime in CI, download more configuration from the rapids-cmake repo like this:
RAPIDS_VERSION_MAJOR_MINOR="$(rapids-version-major-minor)"
FORMAT_FILE_URL="https://raw.githubusercontent.com/rapidsai/rapids-cmake/branch-${RAPIDS_VERSION_MAJOR_MINOR}/cmake-format-rapids-cmake.json"
export RAPIDS_CMAKE_FORMAT_FILE=/tmp/rapids_cmake_ci/cmake-formats-rapids-cmake.json
mkdir -p $(dirname ${RAPIDS_CMAKE_FORMAT_FILE})
wget -O ${RAPIDS_CMAKE_FORMAT_FILE} ${FORMAT_FILE_URL}

(example: cudf/ci/check_style.sh)

  1. Invoke those scripts, using that configuration, using "local" pre-commit hooks, like this:
      - id: cmake-format
        name: cmake-format
        entry: ./cpp/scripts/run-cmake-format.sh cmake-format
        language: python
        types: [cmake]
        additional_dependencies:
          - cmakelang==0.6.13
        verbose: true
        require_serial: true
        exclude: |
          (?x)^(
            cpp/cmake/Modules/FindCUDAToolkit[.]cmake$
          )
      - id: cmake-lint
        name: cmake-lint
        entry: ./cpp/scripts/run-cmake-format.sh cmake-lint
        language: python
        types: [cmake]
          - cmakelang==0.6.13
        verbose: true
        require_serial: true
        exclude: |
          (?x)^(
            cpp/cmake/Modules/FindCUDAToolkit[.]cmake$
          )

(example: cudf/.pre-commit-config.yaml)

2. Review open-source options

Both cmake-format and cmake-lint have official pre-commit hooks.

These might not have existed or might not have fits RAPIDS needs when the approach described above was first introduced.

3. Pick a path and implement it

Some functionality I think we want:

Notes

Historical context:

@vyasr
Copy link
Contributor

vyasr commented Jan 24, 2025

I can provide some context that will hopefully answer some of your questions. rapids-cmake was (unsurprisingly) the first place in RAPIDS to use the cmakelang package for formatting. The basic formatting was added way back in rapidsai/rapids-cmake#42. Even before that, since rapids-cmake also defines a large number of CMake functions, the library started vendoring its own format file in rapidsai/rapids-cmake#29 so that consuming projects could format their CMake if they used rapids-cmake functions.

rapidsai/cudf#9484 was the original source for pre-commit-based CMake linting in RAPIDS. The comment at the top of the run-cmake-format.sh script has remained largely unchanged since then and explains why this approach is necessary and we cannot use the existing hooks directly. To answer

These might not have existed or might not have fits RAPIDS needs when the approach described above was first introduced.

These hooks did exist then. The problem is fundamentally that since we use a lot of rapids-cmake functions everywhere, if we do not have the format file from rapids-cmake available we will incorrectly reformat all of our CMake in local runs of the pre-commit hook. To prevent this, the wrapper script is designed to exit out (without failing!) if that file is not available. The assumption is that the number of people who actually modify the CMake is limited and those people will know that they need this file available locally in order to match remote formatting. The script makes some best effort attempts at finding the file based on the repository root and the build directory where the file will be cloned into.

In the time since then we have also started including additional configuration, as you have noted. So to answer

Some functionality I think we want:

The two main things that this hook should do beyond the hooks that are already available is:

  • To standardize and store the common configuration in our custom pre-commit hook, to be used across all of RAPIDS. This configuration can include standardized formatting for common functions that we use that are not base CMake functions, in particular CPM functions since rapids-cmake does not (and probably should not) provide formatting rules for those functions it does not define itself.
  • A way for the hook to maintain a reasonably up-to-date version of the rapids-cmake cmake-format JSON file that can be used independently of a project having a build directory. To minimize network requests, that file could be refreshed based on how stale it is. That is no worse than the current approach, which assumes that the current build directory contains an up-to-date file. The main difference would be the mechanism by which the file is refreshed. Currently, it is refreshed when the build directory is cleaned and rebuilt from scratch. If the hook managed the file, the easy but expensive option would be a pre-commit clean, which clears the pre-commit cache and requires redownloading all hooks. A more surgical option would be supporting some environment variable by which the hook could be told to force redownload the file. I'm open to other options here too. Given that the current approach already allows for stale files if someone never nukes their build directory (and many people never do) I am not overly concerned with that being a big problem with the pre-commit approach either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants