-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-17772. Fix JournaledEditsCache int overflow while the maximum capacity to be Integer MAX_VALUE. #7617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…g the maximum capacity to always be Integer MAX_VALUE
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Hi Sirs @Hexiaoqiao @ayushtkn @goiri @ZanderXu @simbadzina @slfan1989 Could you please help me review this pr when you have free time ? Thanks a lot~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 from my side. Thanks @gyz-web .
Thank you very much for your review. @Hexiaoqiao Thank you! |
Committed to trunk. Thanks @gyz-web for your contribution. |
Thanks a lot for merging to trunk, @Hexiaoqiao thank you very much. |
JIRA: HDFS-17772.
Description of PR
When we use the RBF SBN READ in the production environment, we found the following issue.
HDFS-16550 provides the parameter
dfs.journalnode.edit-cache-size.fraction
to control cache size based on journalnode memory ratio, but there is an issue of int overflow:When using the
dfs.journalnode.edit-cache-size.fraction
parameter to control cache capacity, during the initialization of thecapacity
inorg.apache.hadoop.hdfs.qjournal.server.JournaledEditsCache#JournaledEditsCache
, a long-to-int overflow issue occurs. For instance, when the heap memory is configured as 32GB (whereRuntime.getRuntime().maxMemory()
returns 30,542,397,440 bytes), the overflow results in thecapacity
being truncated toInteger.MAX_VALUE
(2,147,483,647).This renders the parameter setting ineffective, as the intended proportional cache capacity cannot be achieved. To resolve this, the
capacity
should be declared as along
type, and thetotalSize
variable should also be converted to along
type to prevent overflow in scenarios wherecapacity
exceeds 2,147,483,647, ensuring both variables can accurately represent large values without integer limitations.The error situation is as follows:
The dfs.Journalnode.edit-cache-size.fraction parameter uses the default value of 0.5f. I configured the heap memory size of Journalnode to 30,542,397,440 bytes, and expected it to be 15,271,198,720 bytes, but the capacity size is always Integer.MAX_VALUE=2,147,483,647 bytes
2025-04-15 14:14:03,970 INFO server.Journal (JournaledEditsCache.java:<init>(144)) - Enabling the journaled edits cache with a capacity of bytes: 2147483647
The repaired result is as follows,meet expectation:
2025-04-15 16:04:44,840 INFO server.Journal (JournaledEditsCache.java:<init>(144)) - Enabling the journaled edits cache with a capacity of bytes: 15271198720
How was this patch tested?
Since Runtime.getRuntime().maxMemory() cannot be adjusted in unit tests, it is not easy to write unit tests, but code changes are no problem for existing unit tests.