-
-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get rid of String.intern #332
Comments
get rid of String.intern
@vladimirdolzhenko what about other property sizes? Like 10, 100, 1000, 100_000? |
get rid of String.intern
GC pauses in ms
|
What about |
PerfInternCache in us/op:
|
There is |
@cowtowncoder I know about that flag - the problem that it is enabled by default and that kind of optimisation harms app more neither having it switched off. What's the point to have such kind of degradation flag rather to fix it and make apps (bearing in mind the spread around of json) using jackon feel a bit better ? |
disabled flag INTERN_FIELD_NAMES PerfInternCache2.java (the numbers from another box - no any reasons to compare absolute values with prev one)
I would really consider to disable that flag by default. |
@vladimirdolzhenko , would you please add |
@vlsi that's all for Enabled/Disable flag only
|
That does look strange. It sounds like "string intern makes memory allocation higher", that is counter-intuitive, at least at the first sight |
This benchmark was very short in terms of time, there are results by @isopov: https://gist.github.com/isopov/b215f473deb4ef40118c00f054dc204c
|
@vladimirdolzhenko I don't agree with our assessment of its harmfulness, based on invalid (as general statement) assumptions on usage. Performance to test is for databinding; and databind does in fact benefit from equality checks. You would have known this if you had asked; or perhaps dug around. I think it is very arrogant of you to make a demand of removing a feature that has served users well for, what, 7 years now. If you do not want the feature, turn it off. I keep mine on. |
@cowtowncoder my test case is based a typical pattern of json we use over several teams. As well I pretty sure that lots of stuff have been changed over past 7 years, json patterns and its use-cases - and it is worth to review it again. If you pay attention a bit more - my original point is to stop using |
maybe in that case turn it on for databinding while keep off for other use-cases ? |
@cowtowncoder so we need test for object mapper read/write use cases? |
@IRus At this point you should be disabling this for your use case if that makes most sense for you, based on tests. @vladimirdolzhenko if you want to discuss this further, issue tracker is not the best way: As to defaulting for some use cases, that could be one option if consensus opinion is that interning in addition to canonicalization (handled by symbol table, not |
Thank you for raising this on mailing list. I'll add comments there soon too, but I thought I'll add one more data point. https://github.com/FasterXML/jackson-benchmarks and I added a no-interning variant. With my quick test there, disabling interning for test case (which is original from |
ok, let me have a look into that |
as JsonFactory uses that has hard-coded enabled I added to private static final boolean INTERN = Boolean.getBoolean("internCache");
////
result = INTERN ? input.intern() : input;
put(result, result);
return result;
}
//// modified JsonInternReadVanilla as that benchmark uses
|
@vladimirdolzhenko that |
@cowtowncoder As for me - test reaches Appreciate if you have a look into that as well. |
Ok, that is peculiar. Tests look legit to me. And I assume this is with 2.8(.5) (or master) |
Actually I think there are calls to Calls are via |
@cowtowncoder sorry I don't get the point Do you consider benchmark results ? |
@vladimirdolzhenko Results I have gotten on my tests are sufficient for me to consider benefits of intern()ing as an option. At this point I see no compelling reason for removing support for intern()ing; I am open to possibility of changing defaults for 3.0. |
We are also facing issue in our Production Application. this issue gets aggravated as we reach more CPU Usage. Stack Trace Count Duration |
@amanhigh This is very likely due to sub-optimal usage of But if you have synthetic names, on the other hand, you do want to disable both (disabling canonicalization is actually sufficient). |
@cowtowncoder We used single instance of Object Mapper with readValue operating on byte stream. We did get a significant improment in 99th Percentile after disabling intern caching. We haved Java Models for all classes and don't deserialize as map hence property names are bounded. I am not sure if same applies to Keys of hashMap as well as some properties are HashMaps with varying keys. |
@amanhigh I do not doubt that you saw a significant improvement, but everything I know about interning and canonicalization suggests that the problem is because Symbol table is not being reused. Now... there are a few reasons why underlying, per-JsonFactory symbol table might not be reused. But there is another possibility: if number of distinct symbols (property names) is really large (in order of tens of thousands), there is size limit (something like 40,000 or so), symbol table is not reused. I suppose that in latter case you could get into trouble because My main concern here is still the fact that I believe there are many use cases where intern()ing is beneficial, and that change to better support other use cases would have detrimental effect on these. I am also open to being proven wrong on benefits: for example, it is possible that newest JVMs may have different kinds of trade-offs. I hope to find a way to validate my knowledge here. |
@amanhigh One more thing I just realized: your mention of Would there be a way for you to locally modify |
@cowtowncoder I tried to persuade you that you don't understand String.intern properly and misuse it - with examples, benchmarks and so on - and I see that you would not take any of them into account |
Could you tell those use cases please? Maybe publish your benchmarks as well if you get some time. |
@relgames testing I did was by modifying existing tests like: https://github.com/FasterXML/jackson-benchmarks by forcible disabling interning and comparing results to ones I get when interning is enabled. While I would not mind sharing results (or someone else running those), I do not see enough value for me to spend more time on general question; and specifically not in being convinced to remove ability to intern() keys. Users are free to change the setting as they see fit; same goes for frameworks using Jackson. |
Hi there, FYI https://shipilev.net/jvm-anatomy-park/10-string-intern/ |
"In almost every project we were taking care of, removing String.intern() from the hotpaths, or optionally replacing it with a handrolled deduplicator, was the very profitable performance optimization. Do not use String.intern() without thinking very hard about it, okay?" Summary from JDK developer. |
@relgames Yes? I have thought about it more than people commenting here. Article itself is fine and properly discusses challenges. It may indeed be the case that for some usage turning off intern()ing makes sense -- if you end up having hundreds of thousands of symbols. I do not think this applies to majority of JSON usage. Given that interning can easily be turned off by those who want to do that:
and since there is clear performance improvement by test I have ( |
For what it's worth, there's follow up issue: #378 -- and I have some good news to share there. |
And for TL;DNR observers, Jackson 3.0 DOES NOT use |
InternCache is used in jackson for json property names only and solves the problem of extra memory footprint.
In the same time it uses
String.intern
- that is not an issue rather than misusage of it. The purpose of using string.intern is jvm specific: usually it maintains string literal pool (and covers internal jvm cases).There are several known drawbacks of using intern:
There is no use cases of check the equality of strings using
==
and therefore no any reasons of usingString.intern
- the biggest profit is to have string deduplication (already achieved).patch
Test case
The general idea behind syntenic json is used in test (a flat json structure with 10000 properties with prefix someQName) is provide lots of different property names to trigger usage of
string.intern
withinInternCache
- that is close to real apps use cases where amount of unique property names is on a scale from hundreds to thousands.JMH test
Let's measure performance of a single json parse with interting and w/o it: PerfInternCache.java
GC
Another test is to measure how intern is affecting implicitly via GC: handle 10k the same jsons as we use in previous test (the use case is very very close to real one in real apps like web service / microsevices):
InternCache GC timings java test
Run it with
-verbose:gc -XX:+PrintGCDetails -Xloggc:gc.log
and after that get the total GC pause time
$ grep "GC" gc.log | grep "Times: " | awk '{S+=$8}END{print S}'
Conclusion
Using intern harms application performance as explicitly via more expencive
InternCache.intern
and implicitly via GC. In the same time we keep memory footprint on the same low possible level.The text was updated successfully, but these errors were encountered: