-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19236. Incorporate VolcanoEngine Cloud TOS File System Implementation. Part 1: add core code and docs. #7294
Conversation
@wojiaodoubao If this part of the code is to be merged, it must ensure compatibility with JDK 17, as we are currently working on the upgrade. |
@slfan1989 thanks for your kindly reminder. Let me test it with JDK17. |
hi @slfan1989 , I take a look at HADOOP-15984, nice work! If I understand right, the blocker is upgrading jersey. The hadoop-tos dependencies include jersey through hadoop-common and hadoop-yarn-common, both provided, no direct or indirect use, so I think it's safe to merge. I notice the HADOOP-15984 is going to be merged within 2 days. It is a real hard work, we don't want any accidents. So I will |
💔 -1 overall
This message was automatically generated. |
<include>com.volcengine:*</include> | ||
<include>com.squareup.okhttp3:okhttp</include> | ||
<include>com.squareup.okio:okio-jvm</include> | ||
<include>org.jetbrains.kotlin:kotlin-stdlib*</include> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it safe to relocate a language runtime?
I know that OKHttp was a popular HTTP client in the Java world, but since 4.x, it switched from Java to Kotlin, that means every downstream project that used OKHttp client needs to pull a new JVM language runtime, this do sounds crazy for a HTTP client!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pan3793 , thanks your comments. The intention of shade is to ensure the tos sdk doesn't conflict with any user's classes, so the hadoop-tos could be simple and safe to use.
Here I shaded tos sdk, okhttp, okio and kotlin-stdlib, the final shaded jar is 6,226,348 bytes. So if the downstream project uses okhttp too, the classes will doubled to 12MB+. The default meta space size is 21MB, in the worst case, 12MB+ are all loaded and occupy 57% meta space, which is a problem(may cause long gc).
The calculation above is not accurate, since the classes in file is smaller than classes in JVM, and classes won't be loaded until used. But its enough to expose the problem.
Your concern is reasonable. We can leave the potential conflicts for downstream projects. They can shade hadoop-tos and it's dependencies if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size and class increase does not look like a big deal, concerns are relocation itself.
Class relocation is not a silver bullet, for example, it may break reflection invocations, and does not work with JNI, relocating a JVM language runtime library sounds like risky stuff.
AFAIK, the hadoop-client-runtime
was not fully tested in the Hadoop project, and several issues caused by relocation were identified after Spark adoption. I think the hadoop-tos-shade
stays in the same situation
How hard is it for TOS to switch to another HTTP client? i.e. Apache HTTP Client, a pure Java HTTP client, and is easy to relocate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative: publish ve-tos-java-sdk-shaded
, and downstream projects like Hadoop simply consume the single jar, this will address the test coverage concern I mentioned above and save a lot of users for handling class conflict.
I glanced TOS repo README and docs, seems OKHTTP/OKIO is not leaked to public API, so either switching to another HTTP client or creating a ve-tos-java-sdk-shaded
with all 3rd classes relocated to com.volcengine.tos.
package is feasible, thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pan3793 for your reply.
AFAIK, the hadoop-client-runtime was not fully tested in the Hadoop project, and several issues caused by relocation were identified after Spark adoption. I think the hadoop-tos-shade stays in the same situation
This is a new input for me. Could you help giving more details about this issue, thanks a lot.
How hard is it for TOS to switch to another HTTP client?
I'm not sure whether we can push the tos sdk to switch to another HTTP client. Even they would like to do this, it will take a lot of time. So I'd prefer to use the existed tos sdk.
publish ve-tos-java-sdk-shaded, and downstream projects like Hadoop simply consume the single jar, this will address the test coverage concern I mentioned above and save a lot of users for handling class conflict.
In the current version, the ve-tos-java-sdk-shaded is done by hadoop-tos-shade, and the hadoop-tos-core module depends on the hadoop-tos-shade.
If I publish a ve-tos-java-sdk-shaded, what is the difference between the published ve-tos-java-sdk-shaded ant the orignal hadoo-tos-shade module? May be because I don't understand the reflection stuff, I'm not clear at here. Please help if I'm wrong, thanks a lot.
I prefer to make hadoop-tos depending on tos sdk directly, without any shade. The users can shade hadoop-tos and it's dependencies if necessary. This should be the most flexible way, without any side effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help giving more details about this issue
I just pick some tickets to demonstrate the issue, there may be other issues I missed or are not identified yet
- https://issues.apache.org/jira/browse/HADOOP-16080
- https://issues.apache.org/jira/browse/HADOOP-17324
- https://issues.apache.org/jira/browse/HADOOP-17891
If I publish a ve-tos-java-sdk-shaded, what is the difference between the published ve-tos-java-sdk-shaded ant the orignal hadoo-tos-shade module?
have you checked the error stacktrace of unit/integration tests, which package show? the vanilla kotlin package or the relocated one?
AFAIK, maven has limited support consuming a shaded module in the same project, that's why hadoop and some other apache projects which use maven as the building system have a dedicated repo for 3rd relocated libs https://github.com/apache/hadoop-thirdparty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pan3793 thanks your reply.
I just pick some tickets to demonstrate the issue...
Thanks your patience. I read the provided issues and yes shade does cause anomalies. If I understand correct, relocate the same class respectively by different modules might cause NoSuchMethod or ClassNotFound. Single relocate may cause 'breaking native code' or 'breaking reflection' issues, depending on the implementation of the relocated class.
have you checked the error stacktrace of unit/integration tests...
In my situation, it is the relocated one.
Thanks your help. Now I more believe we should make hadoop-tos depending on tos sdk directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to make hadoop-tos depending on tos sdk directly, without any shade.
This is the source of dependency hell.
To summarize my points:
- it's too crazy to pull a new JVM language runtime for an HTTP client.
- shading and relocation is a hack solution that should be used carefully.
0d3b07c
to
413f3e7
Compare
💔 -1 overall
This message was automatically generated. |
413f3e7
to
a62d2ae
Compare
💔 -1 overall
This message was automatically generated. |
a62d2ae
to
547fb01
Compare
💔 -1 overall
This message was automatically generated. |
…ntation. Contributed by: ZhengHu, SunXin, XianyinXin, Rascal Wu, FangBo, Yuanzhihuan.
547fb01
to
f45b76b
Compare
🎊 +1 overall
This message was automatically generated. |
This is the first part of hadoop-tos module, and it's ready for merge now. After that I'll submit the final part of hadoop-tos, which includes all the unit tests. Let's wait 2 days for further comments. |
-1. @wojiaodoubao I'm very sorry, I want to roll back this PR. This PR does not have any committer review. We cannot submit code to the trunk at will. According to our process, such a significant change requires initiating the [DISCUSS] and [VOTE] stages. fix the checkstyle problem first. This PR has been reverted until all processes meet the requirements. https://hadoop.apache.org/bylaws.html
For this PR, I don't have a strong preference as I am not very familiar with this part. We need to comply with the standards, and if this causes any inconvenience, please understand. |
Hi @slfan1989 , no need to feel sorry. I should be sorry for breaking the process. Let me start a DISCUSSION, please get involved if you have time, thanks. |
Description of PR
JIRA: HADOOP-19236. Incorporate VolcanoEngine Cloud TOS File System Implementation. Part 1: add core code and docs.
Please refer to #7194.
The tos integration work is too big. Split it into 2 parts and this is the first part.
How was this patch tested?
With unit tests.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?