Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculated text checksum is different from the header one #58

Open
ktKongTong opened this issue Dec 14, 2024 · 6 comments
Open

calculated text checksum is different from the header one #58

ktKongTong opened this issue Dec 14, 2024 · 6 comments

Comments

@ktKongTong
Copy link

thank you for your great work. I am working on building another AppInfo.vdf editor(web verision). The documentation has helped me a lot.

I parse an AppInfo object into a custom object.
when I serialize it into binary format, the calculated hash matches the binary checksum in header perfectly.

but when I serialize it into text format and hash it, most of the results are different from the hash field in the binary header.

To investigate, I debugged appinfo.py and found that it also has the same issue.

it's seem that replace \\ to \\\\ didn't solve all problems.

if __name__ == "__main__":
    appinfo = Appinfo('./appinfo.vdf')
    checksum = appinfo.update_app(10)
image
@ktKongTong ktKongTong changed the title calculated text checksum is different with the header one calculated text checksum is different from the header one Dec 14, 2024
@romatthe
Copy link

romatthe commented Feb 21, 2025

Indeed, I just stumbled upon the exact same issue. I was also writing some code to manipulate the format, and didn't succeed in getting the hash to match the original hash for all apps.

I then started playing around with the code here, and I noticed that it also doesn't manage to get the original hash.

So, it turns out that in my code (which is written in Rust but that's besides the point), I do pretty much the exact same thing as this tools does. What's interesting though, is that for some of the apps, calculating the hash DOES work. Using the appinfo.vdf on my local machine, I can get about 600 apps to generate the correct hash, and roughly 5000 for which I can't. All using what I believe to be more or less the exact same technique as seen in this repository.

For an example of an app I'm pretty sure you will have in your local appinfo.vdf as well, try appid 1007 (Steamworks SDK Redist). You'll find that this one hashes just fine.

As for now, I have not been able to notice anything obvious that stands out between the apps that don't hash well and the ones that do. I assume we might need to sanitize the textual representation of the VDF a bit more, just like was necessary for the backslash characters.

Did you make any additional discoveries surrounding this @tralph3?

@tralph3
Copy link
Owner

tralph3 commented Feb 21, 2025

Are you guys implementing parsers for the latest version of appinfo? Now all strings are stored at the end of the file, and in the metadata itself there's only indeces that relate to the string's position in the end of the gile "list of strings".

That would explain why binary data works, but text vdf doesn't.

Although it would be pretty clear that you're doing something wrong if you take a look at the decoded strings and just see garbage. So it's likely something else.

@romatthe
Copy link

romatthe commented Feb 21, 2025

Yes, I'm parsing the latest AppInfo format (v29). Parsing the binary format is very easy thanks to the people who've documented it, including yourself.

Rather, I'm talking about generating the correct hash from the textual representation of the binary vdf. I can confirm that what @ktKongTong is reporting is correct. The hash you generate in this application is often incorrect as well.

Here's a quick example (I'm not a Python programmer):

def main():
    path = os.path.join(
        "/home/romatthe/.local/share/Steam", "appcache", "appinfo.vdf"
    )
    appinfo = Appinfo(
        path, False, None
    )

    app = appinfo.parsedAppInfo[440]
    formatted = appinfo.dict_to_text_vdf(app["sections"])
    print(list(app['checksum_text']))
    print(app['checksum_text'].hex())
    print(list(sha1(formatted).digest()))
    print(sha1(formatted).hexdigest())

Result:

[225, 195, 108, 211, 159, 139, 245, 12, 125, 231, 89, 87, 93, 182, 131, 25, 94, 54, 230, 127]
e1c36cd39f8bf50c7de759575db683195e36e67f
[239, 26, 120, 182, 112, 140, 157, 65, 144, 194, 37, 189, 239, 239, 42, 154, 127, 149, 7, 155]
ef1a78b6708c9d4190c225bdefef2a9a7f95079b

To clarify: I'm just parsing the appinfo.vdf, picking a single app (Team Fortress 2 in this case since it should be easy to verify this yourself since pretty much everyone owns it), generating the textual VDF without having applied any edits, taking the hash, and then comparing that to the checksum originally found in appinfo.vdf. And as you can see, the checksums do not match.

Like I pointed out above, in my code (completely separate codebase) I generate the exact same checksums as you do, but again, these appear to be incorrect.

As I said, for about 10% of all the apps in my local appinfo.vdf this does actually work, but for the other 90% I get conflicting checksums. All of the output from my vdf-to-text routines produce results that look very sane, as in, they all look like perfectly acceptable textual VDF files with no obvious issues.

@romatthe
Copy link

For those interested, I quickly took a dump of the textual VDF of both all the apps I was able to generate a matching checksum for (good.txt) and some examples of apps where I failed to get a matching checksum (bad.txt). Unfortunately I couldn't include all the bad ones as that file was 29Mb in size so I had to cut it a little.

good.txt
bad.txt

@romatthe
Copy link

So after trying to wrap my head around this for a few days, I can't quite get a satisfying answer. Here's more or less what I think the conclusion is:

  1. My (wip) tool, this tool, and SteamEdit all seem to use pretty much precisely the same technique for getting the checksum o the textual VDF. I haven't verified this 100%, but all my test between the three different tools so far have shown they generate the same hashes.
  2. For most apps, these hashes DO NOT match the ones that Valve ships in a clean appinfo.vdf. In other words, it appears Valve likely does something else with the textual VDF representation. I still presume they do some extra string sanitation, but I wouldn't know what it is.
  3. Curiously though: Steam does not seem to reject an altered appinfo.vdf with these (incorrect) hashes. I also can't quite understand why this doesn't happen.

For the last point, I simply tested this in my own tool by parsing an existing 'clean' appinfo.vdf, and then sent it through the packing routines without actually altering any of the content. That results in precisely the same appinfo.vdf as before EXCEPT that it now has the new hashes.

And from what I've seen, Steam does not reject these. It's possible that the Steam client never uses that hash to check the integrity of the file. Or it's possible it only uses it at certain points, which means it could still reject the file somewhere down the line... I don't quite know.

Either way, I do think I'm fairly confident in concluding that pretty much no one actually knows how to correct generate these checksums, except some folks over at Valve. I've found no code in the public space that does it correctly ("correctly" here meaning matching Valve's method).

@tralph3
Copy link
Owner

tralph3 commented Feb 23, 2025

Well, if Steam doesn't reject the hashes, and the changes are reflected, then it's still working, which is the important thing.

I'd look to actual text VDF files, shipped by Valve, and try to spot any discrepancies between them and the ones we generate in our programs.

Maybe there's an extra newline character at the end, or the beginning, or some escape sequence. Who knows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants