Skip to content

HADOOP-19636. [JDK17] Remove EOL OS Support and Clean Up Dockerfile. #7822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Jul 23, 2025

Description of PR

JIRA: HADOOP-19636. [JDK17] Remove EOL OS Support and Clean Up Dockerfile.

In the Apache Hadoop project, we have historically supported multiple Linux distributions, including CentOS 7, CentOS 8, and Debian 10, as part of our build and test environments. However, all three distributions have now reached End-of-Life (EOL) status and are no longer officially maintained or supported.

To ensure long-term maintainability and security, we propose to deprecate and clean up build support related to these EOL platforms. This includes:

  • Removing associated Dockerfiles
  • Cleaning up Jenkins pipeline configurations
  • Dropping any platform-specific build logic or scripts

This cleanup will simplify our CI infrastructure and reduce the maintenance burden going forward.

This PR is intended to remove support for CentOS 7, CentOS 8, and Debian 10—all of which have reached EOL—in a single submission, including related Dockerfiles and Jenkins build configurations. The goal is to avoid multiple follow-up cleanup PRs, thereby improving efficiency and reducing review overhead.

How was this patch tested?

junit test.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

(!) A patch to the testing environment has been detected.
Re-executing against the patched versions to perform further tests.
The console is at https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7822/1/console in case of problems.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 2m 21s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 shelldocs 0m 1s Shelldocs was not available.
+0 🆗 jsonlint 0m 1s jsonlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ trunk Compile Tests _
+0 🆗 mvndep 8m 42s Maven dependency ordering for branch
+1 💚 shadedclient 40m 51s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for patch
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 hadolint 0m 2s No new issues.
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 shadedclient 40m 8s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
95m 21s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7822/1/artifact/out/Dockerfile
GITHUB PR #7822
Optional Tests dupname asflicense codespell detsecrets hadolint shellcheck shelldocs jsonlint
uname Linux 6fb52aab7b2a 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d4650c0
Max. process+thread count 564 (vs. ulimit of 5500)
modules C: U:
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7822/1/console
versions git=2.25.1 maven=3.6.3 hadolint=1.11.1-0-g0e692dd shellcheck=0.7.0
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@slfan1989 slfan1989 marked this pull request as ready for review July 24, 2025 00:05
@slfan1989
Copy link
Contributor Author

slfan1989 commented Jul 24, 2025

@ayushtkn @GauthamBanasandra @Hexiaoqiao Could you please help review this PR? Thank you very much!

cc: @pan3793

// This stage serves as a means of cross platform validation, which is
// really needed to ensure that any C++ related/platform change doesn't
// break the Hadoop build on Centos 7.
stage ('precommit-run Centos 7') {
Copy link
Member

@pan3793 pan3793 Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's leftover centos 7 stuff at line 86

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

# Dockerfile for installing the necessary dependencies for building Hadoop.
# See BUILDING.txt.

FROM centos:8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should remove centos8, instead, we should migrate it Rocky Linux 8 (or other RHEL-like OS) in place, then 9 or 10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a personal perspective, I don't agree with your suggestion. I believe we should completely remove operating systems that have reached their End of Life (EOL). If we need to support CentOS 9 or Debian 12 in the future, it should be done by submitting a new PR for a thorough evaluation. Rather than maintaining multiple Dockerfiles, I prefer a more lightweight approach, such as providing support through documentation. As the number of supported operating systems increases, if we have to maintain Dockerfiles for each one, we could end up managing dozens, which is neither cost-effective nor sustainable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should completely remove operating systems that have reached their End of Life (EOL). If we need to support CentOS 9 or Debian 12 in the future, it should be done by submitting a new PR for a thorough evaluation.

I don't see much benefit in your proposal, I suppose upgrading in place is straightforward, and can leave clear diff in the commit history to guide users to understand what they should change for planning Hadoop cluster OS upgrading.

Rather than maintaining multiple Dockerfiles, I prefer a more lightweight approach, such as providing support through documentation.

The documentation can easily become outdated (you can try Building on macOS (without Docker) in BUILDING.txt). As I replied here, I think the Dockerfile itself is the best documentation for setup the building env.

https://lists.apache.org/thread/2ypqcrnsth3jk21rpjvjv53tntz21ht8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of operating system should be made by the user, and therefore, the resolution of compilation issues should also be handled by the user.

Take CentOS 7 as an example, which has multiple versions (such as 7.2, 7.3, 7.9, etc.). Different versions may have configuration or dependency differences (e.g., glibc, gcc versions), which can lead to compilation issues, such as with protobuf or native package compilation. For these issues, we should not add extra workarounds, as that would make the project redundant.

If we were to upgrade to CentOS 9, we would change the Dockerfile name from Dockerfile_centos_8 to Dockerfile_centos_9. Users comparing the diff would see that Dockerfile_centos_8 has been deleted and replaced with Dockerfile_centos_9, which contains entirely new content.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that we can not enumerate all Linux distributions and versions. I believe most enterprises use Debian/RHEL family of Linux distributions to run Hadoop. Given the limitation of developer resources in the Hadoop community, how about keeping only 2 OS Dockerfiles and CI pipelines - the latest(or sub-latest) version of Ubuntu(the default env for building, testing, releasing) and Rocky Linux(only verify the compilation)? They will serve as reference for users who want to set up a building environment based on their preferred Linux distribution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that we should remove some dependencies which are EOL, just like some other module. Back to here , CentOS 8 has reached its EOL and the packages re no longer available on mirror.centos.org site.(https://www.centos.org/centos-linux-eol/), So +1 to Shilun's comments from my side. cc @pan3793 What do you think about. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hexiaoqiao If you agree to retain at least one RHEL-family OS Dockerfile for Hadoop building, I suggest keeping CentOS 8, because CentOS 8 works well(the mirror.centos.org site was replaced by vault.centos.org, see dev-support/docker/pkg-resolver/set-vault-as-baseurl-centos.sh) for the Hadoop project build as of today, I plan to migrate it to Rocky Linux 8 soon.
https://endoflife.date/rocky-linux

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supportive of a RHEL variant. There's also the option of an amazon linux container image, which uses yum.

# Dockerfile for installing the necessary dependencies for building Hadoop.
# See BUILDING.txt.

FROM debian:10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, we should upgrade it to debian 12 or 13 in place

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @slfan1989 for your works. +1 from my side.

// This stage serves as a means of cross platform validation, which is
// really needed to ensure that any C++ related/platform change doesn't
// break the Hadoop build on Centos 7.
stage ('precommit-run Centos 7') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

# Dockerfile for installing the necessary dependencies for building Hadoop.
# See BUILDING.txt.

FROM centos:8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that we should remove some dependencies which are EOL, just like some other module. Back to here , CentOS 8 has reached its EOL and the packages re no longer available on mirror.centos.org site.(https://www.centos.org/centos-linux-eol/), So +1 to Shilun's comments from my side. cc @pan3793 What do you think about. Thanks.

// This stage serves as a means of cross platform validation, which is
// really needed to ensure that any C++ related/platform change doesn't
// break the Hadoop build on Centos 8.
stage ('precommit-run Centos 8') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of removing CentOS 8, I would suggest replacing it with another supported RHEL8 clone, like Rocky Linux.

Copy link
Contributor

@stoty stoty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should replace CentOs 8 instead of dropping it outright.

@slfan1989
Copy link
Contributor Author

I think we should replace CentOs 8 instead of dropping it outright.

Thank you for your feedback. Feel free to continue sharing your thoughts in this email thread. So far, I’ve received comments from @ayushtkn , @Hexiaoqiao , @cnauroth, @pan3793. We are still in the discussion phase, and a final decision will be made based on the collective input.

https://lists.apache.org/thread/2ypqcrnsth3jk21rpjvjv53tntz21ht8

@slfan1989
Copy link
Contributor Author

@GauthamBanasandra Thank you, and I look forward to hearing your thoughts on this issue.

@stoty
Copy link
Contributor

stoty commented Jul 28, 2025

Can you please forward the last email to [email protected] @slfan1989 so that I can reply?
I am not subscribed to commons-dev yet.

@slfan1989
Copy link
Contributor Author

Can you please forward the last email to [email protected] @slfan1989 so that I can reply?
I am not subscribed to commons-dev yet.

I’ve cc’d you on the email—please have a look when it’s convenient for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants