Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PDB] Analysis never finishes #7920

Open
covers1624 opened this issue Mar 17, 2025 · 9 comments
Open

[PDB] Analysis never finishes #7920

covers1624 opened this issue Mar 17, 2025 · 9 comments
Assignees
Labels
Feature: PDB Status: Triage Information is being gathered

Comments

@covers1624
Copy link

covers1624 commented Mar 17, 2025

Describe the bug
I am attempting to dig through parts of the Factorio demo for research purposes.
Analyzing Factorio for windows with a PDB, never finishes. It is forever stuck on PDB: Processing 1857365 data type components...

I have left Analysis running for over 24 hours without it completing, or any meaningful progress being reported.

This bug is reproducible with the free demo from their website (I assume the Steam demo also reproduces this bug, but have not tested.)

Please see the attached VisualVM snapshots.

To Reproduce
Steps to reproduce the behavior:

  1. Download the Factorio demo.
  2. Import factorio.exe
  3. Ensure the PDB Universal analyzer is enabled. (this appears to be the default)
  4. Analysis never finishes.

Expected behavior
I expect Analysis to finish eventually and provide some form of progress report to the user.

Screenshots
N/A

Attachments
VisualVM sampler snapshot of a fresh analysis: snapshot-1742197521982.nps.zip
VisualVM profiler snapshot, once its 'stuck': snapshot-1742198213675.nps.zip

Environment:

  • OS: Archlinux
  • Java Version: Temurin 21
  • Ghidra Version: 11.4 5a31ded
  • Ghidra Origin: AUR ghidra-git

Additional context
Briefly analyzing the VisualVM snapshots, it appears to get stuck in the pdbapplicator.CompositeTypeApplier.applyCpp tree, inside DefaultCompositeMember.removeUnnecessaryPadding via some insert/delete calls, These appear to repacking structs and computing alignments?.

I'm not well versed in the internal structure of Ghidra to provide any insight into resolving this or any deeper analysis, However, looking at the profiler snapshot, the invocation count of these methods is quite interesting compared to the inner call of AlignedComponentPacker.addComponent.

Please let me know if there is any further information/testing required, or debugging steps I can perform.

@ryanmkurtz ryanmkurtz added Feature: PDB Status: Triage Information is being gathered labels Mar 17, 2025
@ghizard
Copy link
Contributor

ghizard commented Mar 17, 2025

For me, I see it getting hung up in a resolve stage at progress 12% (445153 of 3714730).
Note that this progress doubles the numbers of data type components (1857365) because of two stages.

Interestingly, it is not in the DefaultCompositeMember for me, but maybe it will hang at your progress location as well. Do you happen to know what progress number that your hang happens?

Edit: maybe the progress number doesn't matter, as we likely have different versions.

@covers1624
Copy link
Author

covers1624 commented Mar 17, 2025

Apologies, I should have added this to my initial report, yes, it also hangs there for a little bit, you should see this reflected in my sampler snapshot. It does make it past this stage and moves on to perform as described in my issue report after a short time.

I will note though, that during this resolve stage, the UI is completely unresponsive, which might also be considered a bug, but not the focus of this report. I'm happy to open another issue report about this if you would like.

I'm not sure what progress number specifically you are referring to, where would I find this information? I see, its on hover. I'll re-run and get you the specific numbers.

@covers1624
Copy link
Author

The UI has become responsive again at 25% (932519/3748120). VisualVM shows the application is inside removeUnnecessaryPadding as described in my initial report.

I have left it running for several minutes after this and have not seen those numbers change.

@ghizard
Copy link
Contributor

ghizard commented Mar 17, 2025

I'm running within Eclipse, investigating some of the earlier hang-ups, which mainly appear to be long resolve() stages, and I'm not thrilled about either. Ghidra's data type system tries to build complete data types and also tries to deal with conflicting types, and this can take more time, but the time being spent on these seems unreasonable to me... so these need extra investigation as well.

Regarding your record number, I'm probably on about the same one, but slightly different progress/total numbers: 25% (929951 of 3714730)... your total of 3748120 is now bigger than double of 1857365, so you probably changed versions since your earlier run?

Regardless, I have a spot that seems to be in the insert/delete cycle.

@ghizard
Copy link
Contributor

ghizard commented Mar 17, 2025

Ultimately, the issue is with the algorithm that is used to try to enable packing on structures.

I first investigating the second argument in the call (line 979) DefaultCompositeMember.applyDataTypeMembers(composite, true, false, size, sm, msg -> Msg.warn(this, msg), monitor)) within CppCompositeType.createMembersOnlyClassLayout from false to true. This argument was recently added, but I've not had time to compare resulting composite types and possibly do more tweaks.

Making this change, allowed the processing to get past progress 25% (929951 of 3714730), which happened to be quite large. But then it ran into the a larger composite that was taking even more time. I paused the processing to see that it was making progress at the lower levels, but the processing is very slow because of the degeneracy of the algorithm.

I'm not recommending this change, as I do not know if the resulting composites will have issues, plus, it doesn't really fix the degeneracy. I'm reassigning @ghidra1 to this ticket, as that is his bailiwick.

Underlying cause:

For the first composite that was getting hung up, it was trying to place the first structure member at offset 1048632, so the algorithm first tries to place padding up until that point, only to later try to remove the padding in ways to try to ensure that the structure can pack. One part of the change that @ghidra1 would make would be to shortcut the packing process if more than more than a contiguous 8 bytes of padding would be needed.

The larger structure was trying to place the first of its 6 members at offset 4194672 within the composite. Both of these composites seemed to imply that they are actually classes with base classes that are responsible for earlier members. So I looked and saw that the larger of these had a parent and the grandparent was a stream reader of some sort, so presumably had a large buffer at the start.

Work-around:

I was able to get the processing of your binary to complete by turning on experimental class processing. Note that this work is not complete and is subject to change, but you can try it out make progress.

To do this in the current master repo (this will change), you to change the ghidra launch properties support/launch.properties or if you are using Eclipse, Debug Configurations... -> Ghidra -> Arguments to include the following -Dghidra.pdb.developerMode=true, and then when running analysis, change the PDB Universal option Composite Layout Choice to Class Hierarchy (Experimental).

More:

I still need to take a look at some of the slowness in the resolve stage of processing, but it is likely just the nature of processing very large composites.

@covers1624
Copy link
Author

Thanks for looking into this and the deep analysis.

I left things running overnight + during work, and it has gotten to, 29% (1099931 of 3748120).

It appears in my haste last night, I re-ran on the steam release variant of Factorio, instead of the demo. I had been swapping and changing between them last night whilst originally investigating this and preparing my initial report. My apologies for this again..

I have enabled developer mode as described, and this appears to work around the larger padding hang up, but not the initial resolve slowdown, which I presume is expected.

I wish I could be of more help here and potentially contribute back a solution for this issue, however, things here seem a bit too far out of my depth, and the project seems to be un-importable into IntelliJ without a lot of effort (Given the recommended environment is Eclipse, this is probably expected.)

@ghizard
Copy link
Contributor

ghizard commented Mar 18, 2025

Yes, the resolve slowdown is expected for larger structures, but I still need to check that those that were slow are actually large or complex.

Thank you for your help in triaging this issue. Your sincerity, willingness, and understanding are appreciated.

@Zicandar
Copy link

Thank you ghizard for the suggestions on how to make this work!
I managed to decompile Factorio.exe (version 2.0.42) using the pre-compiled Ghidra, version ghidra_11.3.1_PUBLIC_20250219 using some non-default settings.

I did something similar to what was suggested here, but set the Composite Layout Choice to "Complex with Simple Fallback". This is because the option mentioned above didn't appear for me.
I did get a lot of warnings, but it seems to have (mostly) worked for me at least. Some stuff it skipped due to names being >2000 characters... I am running into issues when trying to find the uses of certain members, but I can at least do most things!

@ghizard
Copy link
Contributor

ghizard commented Apr 2, 2025

@Zicandar Good choice. The "Complex with Simple Fallback" in 11.3.1 is the same mode as "Class Hierarchy (Experimental)" in current source control. I'm not sure what the final incantation will look like as the work is ongoing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: PDB Status: Triage Information is being gathered
Projects
None yet
Development

No branches or pull requests

5 participants