Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable building C++ unittests, convert to rattler-build, remove pytorch dependency #29

Merged
merged 27 commits into from
Dec 30, 2024

Conversation

mgorny
Copy link
Contributor

@mgorny mgorny commented Dec 16, 2024

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

This is primarily for the purpose of testing rattler-build, so I'm fine with you rejecting it. However, the first commit also makes sense for conda-build:

The package build has been implicitly enabling building unittests. Besides being unnecessary (as setup.py does not run them), it caused googletest to be implicitly downloaded.

Note that I had to change deps per prefix-dev/rattler-build#1265.

Disable building C++ unittests, since they are not run.  Furthermore,
they cause googletest to be implicitly git cloned (which breaks building
with rattler-build).
Converted using `conda-recipe-manager convert`, with a few hand fixes
afterwards.
@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/recipe.yaml) and found some lint.

Here's what I've got...

For recipe/recipe.yaml:

  • ❌ The top level meta key schema_version is unexpected

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12359371824. Examine the logs at this URL for more detail.

No clue why `conda-recipe-manager convert` adds that, but the linter
does not like it.
@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/recipe.yaml) and found it was in an excellent condition.

rattler-build uses `${{ hash }}` instead of `${{ PKG_HASH }}`,
requires you to pass build number explicitly and construct Python
version directly.  It also provides a version_to_buildstring filter
that replaces the inline substitutions.
@h-vetinari
Copy link
Member

I don't mind the change to rattler, but for now CI is red, so I haven't looked more closely yet... ;-)

@mgorny
Copy link
Contributor Author

mgorny commented Dec 19, 2024

Hmm, CI says it ran out of space. I wonder if rattler actually uses more space during the build, or it is just incidental.

@h-vetinari
Copy link
Member

In either case you can add

azure:
  free_disk_space: true

to conda-forge.yml and rerender.

(it's not enabled by default because it takes ~2-10minutes, which would be wasted time on the large majority of feedstocks where this isn't necessary)

@mgorny
Copy link
Contributor Author

mgorny commented Dec 19, 2024

Done, thanks! I've also filed #30 prior to reading your comment, if anything I think it's worth reporting if rattler indeed consumes more space.

@h-vetinari
Copy link
Member

@conda-forge-admin, please rerender

@mgorny mgorny mentioned this pull request Dec 27, 2024
5 tasks
Per the discussion on pytorch-cpu-feedstock:
conda-forge/pytorch-cpu-feedstock#166 (comment)
(and below)
@mgorny mgorny changed the title Disable building C++ unittests, convert to rattler-build Disable building C++ unittests, convert to rattler-build, remove pytorch dependency Dec 27, 2024
@mgorny
Copy link
Contributor Author

mgorny commented Dec 27, 2024

I've removed the pytorch dependency per conda-forge/pytorch-cpu-feedstock#166 (comment) (and below).

@h-vetinari
Copy link
Member

Seems like you'll need a run-dep on setuptools for 3.13:

 │ │   File "$PREFIX/lib/python3.13/site-packages/triton/backends/amd/driver.py", line 7, in <module>
 │ │     from triton.runtime.build import _build
 │ │   File "$PREFIX/lib/python3.13/site-packages/triton/runtime/build.py", line 8, in <module>
 │ │     import setuptools
 │ │ ModuleNotFoundError: No module named 'setuptools'

lit is only used for the C++ test suite.
@mgorny
Copy link
Contributor Author

mgorny commented Dec 28, 2024

Hmm, it seems to be used in all Python versions, so I guess it's just 3.13 not pulling it implicitly. I've also noticed that the dependency on lit is unnecessary. filelock does not seem to be used either, though it is listed as install_requires — should I patch it out or leave?

@mgorny
Copy link
Contributor Author

mgorny commented Dec 28, 2024

Hmm, now that the PyTorch dep is eliminated/reversed, should I try enabling Windows build as well? I don't have a Windows system, so the best I can do is guess work and seeing whether CI works.

@h-vetinari
Copy link
Member

Sounds like we could give it a shot with windows!

@mgorny mgorny marked this pull request as draft December 28, 2024 15:03
@mgorny
Copy link
Contributor Author

mgorny commented Dec 28, 2024

Converting back to draft while I experiment with Windows builds. I'm going to also limit CI targets to avoid wasting resources while I try to get it initially working.

@h-vetinari
Copy link
Member

I've also noticed that the dependency on lit is unnecessary. filelock does not seem to be used either, though it is listed as install_requires — should I patch it out or leave?

No strong preference. If you're reasonably sure that this doesn't get used or imported anywhere, then we can patch it out, but not necessary from my POV either. lit should be removed if not needed

@mgorny
Copy link
Contributor Author

mgorny commented Dec 28, 2024

Okay, I'm out of ideas.

Building with MSVC fails:

 │ │   %SRC_DIR%\lib\Analysis\Utility.cpp(818): error C2672: 'mlir::Operation::walk': no matching overloaded function found
 │ │   %SRC_DIR%\lib\Analysis\Utility.cpp(818): error C2783: 'enable_if<llvm::function_traits<decay<FnT>::type,std::is_class<T>::value>::num_args==1,RetT>::type mlir::Operation::walk(FnT &&)': could not deduce template argument for 'RetT'
 │ │           with
 │ │           [
 │ │               T=decay<FnT>::type
 │ │           ]

Plus flags passed seem to indicate upstream didn't expect MSVC there.

Building with clang fails too:

 │ │   -- Found zstd: %PREFIX%/Library/lib/libzstd.lib
 │ │   -- Found LibXml2: %PREFIX%/Library/lib/libxml2.lib (found version "2.13.5")
 │ │   -- Adding Python module
 │ │   -- Triton backends tuple: nvidia,amd
 │ │   -- Configuring done (8.7s)
 │ │   CMake Error at %BUILD_PREFIX%/Library/lib/cmake/llvm/LLVMExports.cmake:64 (set_target_properties):
 │ │     The link interface of target "LLVMSupport" contains:
 │ │       zstd::libzstd_shared
 │ │     but the target was not found.  Possible reasons include:
 │ │       * There is a typo in the target name.
 │ │       * A find_package call is missing for an IMPORTED target.
 │ │       * An ALIAS target is missing.
 │ │   Call Stack (most recent call first):
 │ │     %BUILD_PREFIX%/Library/lib/cmake/llvm/LLVMConfig.cmake:349 (include)
 │ │     %BUILD_PREFIX%/Library/lib/cmake/mlir/MLIRConfig.cmake:10 (find_package)
 │ │     CMakeLists.txt:61 (find_package)

which seems to indicate it expected to use shared libzstd, but found static library instead.

@mgorny mgorny marked this pull request as ready for review December 28, 2024 16:21
@mgorny
Copy link
Contributor Author

mgorny commented Dec 28, 2024

Confirmed that the last filelock use was removed in 2023, and backported an upstream commit removing the dep. Switched Windows back off, but left the progress so far in, so it can be reused in the future.

@h-vetinari
Copy link
Member

which seems to indicate it expected to use shared libzstd, but found static library instead.

This was for the very mundane reason that the recipe does not specify a dependency on zstd, which does contain the shared lib.

@mgorny
Copy link
Contributor Author

mgorny commented Dec 29, 2024

Hmm, please correct me if I'm missing something obvious, but FWICS zstd did end up in the host (and build) environment:

2024-12-28T16:08:58.7898645Z  │ │ │ zstd            ┆ 1.5.6        ┆ h0ea2cb4_0         ┆ conda-forge ┆ 340.96 KiB │

Particularly, the same package does contain libzstd.lib that was found instead. The LLVM's zstd CMake logic is pretty horrifying, and I suspect we're either seeing a bug there, or triton is doing something that breaks it.

@mgorny
Copy link
Contributor Author

mgorny commented Dec 30, 2024

Hmm, interesting is that find_library() seems to work differently with the two compilers. With MSVC, it finds:

 │ │   -- Found ZLIB: %PREFIX%/Library/lib/z.lib (found version "1.3.1")
 │ │   -- Found zstd: %PREFIX%/Library/lib/zstd.lib
 │ │   -- Found LibXml2: %PREFIX%/Library/lib/xml2.lib (found version "2.13.5")

while with clang, it prefers libraries with lib prefix instead:

 │ │   -- Found ZLIB: %PREFIX%/Library/lib/z.lib (found version "1.3.1")
 │ │   -- Found zstd: %PREFIX%/Library/lib/libzstd.lib
 │ │   -- Found LibXml2: %PREFIX%/Library/lib/libxml2.lib (found version "2.13.5")

Now, this is all my best guess since I don't have a Windows system to test, and I haven't been able to get a Conda environment working correctly with wine, but note the CMake code:

https://github.com/llvm/llvm-project/blob/e21dc4bd5474d04b8e62d7331362edcc5648d7e5/llvm/cmake/modules/Findzstd.cmake

When MSVC is used zstd_STATIC_LIBRARY_SUFFIX is _static.lib$, otherwise it's just .lib$. So when using MSVC, zstd.lib (and libzstd.lib would too) triggers the second clause in the zstd_FOUND condition, causing it to perform explicit lookup for bin/zstd.dll. On the other hand, when using Clang, it just treats libzstd.lib as a static library (the first clause) and doesn't bother beyond that. So effectively, with MSVC it finds both static and shared libraries, and with Clang only the former.

Now, since LLVM in Conda is built with MSVC, it uses shared zstd library and includes references to the dependency on this library in the installed CMake files. However, these files don't include the zstd library target itself — instead, they're triggering another lookup in reverse dependencies. Since the lookup is now performed with Clang and it doesn't find the shared library, the dependencies are unsatisfied.

And perhaps the worst part, it is possible that I've actually broken it there. I specifically see the shared library logic for MSVC has been added in llvm/llvm-project@e7fc754 — perhaps the condition there should check for Windows rather than MSVC. But it's outside my expertise to change that. Either way, I'm going to report a bug for our LLVM feedstock and upstream bug tracker.

@mgorny
Copy link
Contributor Author

mgorny commented Dec 30, 2024

@h-vetinari
Copy link
Member

Thanks for chasing this down and filing the issues! Let's get this in then!

@h-vetinari h-vetinari merged commit 6790d7b into conda-forge:main Dec 30, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants