Major refactoring by gdesmar · Pull Request #3 · DissectMalware/batch_deobfuscator

gdesmar · 2022-07-22T12:42:33Z

Hi,
I think it could be of interest to merge back the improvement that I've done on the project. As you will see, I modified quite a lot of things. It would be hard to point to just a few things, as I think I modified a large enough percentage of the project. I also added some experimental features that you may not want, so I don't know how we'll be able to merge if you don't want those.
I'll go over a few things that I've done, that I think is worthy:

Added multiple tests. For fun, you could download my two tests file and that would be a way to check what was improved by my changes. I generated the tests by running the commands on a VM, but if you disagree with the expected result of a test, I would be very interested to know.
The get_commands function now try to beautify the IF and FOR statement, to split them into multiple lines for easier interpretation. An example of this advantage is to have this statement IF "A"=="A" (set A=1) else (set A=0) get a valid value for the variable A. If I recall, in the original code, the variable A would be equal to 1) else (set A=0). Now, it will be equal to 0. The beautifying process will split this previous statement in four lines, and since they are parsed in sequence, A will get the value 1, then 0, then move on to the next lines of the script.
I modified the get_value to handle special characters in variable name, better handle slicing and handle replacement (with possible asterisk wildcard).
I created a function to handle set command and in it, a state machine. I believe it is greatly improved. There is quite a lot of use-case where it did a difference and running the tests on the previous version and this one is probably the best way to see it. I interpret the options at the moment, but decided against doing an eval() on the content of the set value when /a is used. I am just surrounding it with parenthesis.
I skip any line that starts with REM, as we don't want to interpret them if they contain code. I got some cases where they contain invalid code and batch_deobfuscator wasn't liking it.
I changed interpret_command to be recursive on CALL, instead of allowing it to be in front of a command. That way there could be any number.
Parsing the curl command options, I'll explain later why
Lots of modifications to the normalized_command function, with recursivity when getting a variable value.
Lastly, the one you may not be interested into : A traits dictionary.

The Traits dictionary is a structure that I want to use to store curious/interesting features of the script that is being analyzed. Currently, it doesn't have much, but it could be improved over time. You can see that one of my trait is being populated in the interpret_curl function, where I will store the location of the exact line that did the download, the source, and the destination. The second place where I populate two other traits is at the end of the normalized_command. It fills in the start_with_var and var_used traits, which is a boolean for the first one, and an integer for the second. The goal is to flag lines that are starting with a variable, a possible sign of obfuscation, and the number of variable used on a single line. If the number is very high, it is another possible sign of obfuscation. You can see a clear example of those two in the test_unittest.py -> test_single_quote_var_name_rewrite_2.

I did not not modify the main function, and barely modified the two interpret_logical_line* functions I believe there was a typo and a missing clear() call. I personally don't use those, and made my own. I could try to merge my interpret_logical_line if there is interest. Both of the ones in the file currently are ending up printing to console as the _str calls the non _str for child executions. In my interpret_local_line, on a higher level, I also keep track of a few things which allow me to generate two or three other traits.

Take your time to look into it, and if you have any question or want to discuss about it, I am more than happy to chat with you.

DissectMalware · 2022-07-23T21:51:13Z

Thank you very much for the PR. However, it may take a while for me to review the PR and merge. Please bear with me.

@wmetcalf can you also take a look?

gdesmar · 2022-08-12T17:33:43Z

Hi, I refactored my code to merge all my traits into this library.

Detecting if the script is a one-liner, even if there is one or more empty lines in the file.
Detecting if the one-liner is expanded in too many lines, then detecting a complex one-liner. The number of line is configurable.
Detection of LOLBAS usage.
Detection of command grouping, when command splitting is different before and after normalization.

I also added my replacement functions for the different interpret_logical_line functions, a combination of the analyze and the analyze_logical_line functions. I do recursive calls to analyze_logical_line to handle command grouping. A big difference from the interpret_logical_line that are already present is that I generate the extracted children and deobfuscated script as distinct files. On top of the previous batch children I also try to search for powershell children. The logic regarding poweshell children is not perfect yet, and I have examples where I want to improve it. It may end up in the interpret_command function with the other command interpreters.

Unless you want to use my analyze function somewhere, this last commit should have no impact on the code that was already there.

Thanks for reviewing.
I know that the script was first and foremost doing variable management for replacement and resolution, and that my additions may have change the focus of the project. There may be a way to create another library to separate the analytic part of my additions, if you'd prefer to concentrate on variables.

wmetcalf · 2022-10-20T13:05:42Z

I finally had some time to go through all of this and play with it a bit. Looks good to me! I say ship it!

DissectMalware · 2022-10-24T18:56:48Z

Thank you @gdesmar for your PR.
Thank you @wmetcalf for the review.

The PR is merged.

gdesmar and others added 14 commits May 5, 2022 15:28

Improve SET, IF and idempotence.

ec937b2

Variable substring and replacement handling

e0b491e

Variable normalization and arguments

7a06897

Remove duplicate character

6392c07

Fix child command clearing in utility

6655d77

Fix long lines

43e866b

Start extracting intereting traits

3f1039d

Improve start_with_var trait

4e8d515

Ignore lines starting with rem

abe8fa5

Add improvement tests

81e8619

Expand FOR and interpret curl.exe

22f9039

Fix download trait's cmd

4885b38

Merge remote-tracking branch 'upstream/master'

b8f4566

Adding test for double-double-quote-elimination

b4b10a9

Analyze function, new traits & powershell

1bab655

gdesmar added 3 commits August 16, 2022 13:27

powershell parsing improvement

9392c3f

Fix multiple usage of f

d48ea7c

Fix infinite recursion, curl and var substring

e5598ec

DissectMalware merged commit f080599 into DissectMalware:master Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major refactoring#3

Major refactoring#3
DissectMalware merged 18 commits intoDissectMalware:masterfrom
gdesmar:master

gdesmar commented Jul 22, 2022

Uh oh!

DissectMalware commented Jul 23, 2022 •

edited

Loading

Uh oh!

gdesmar commented Aug 12, 2022

Uh oh!

wmetcalf commented Oct 20, 2022

Uh oh!

DissectMalware commented Oct 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gdesmar commented Jul 22, 2022

Uh oh!

DissectMalware commented Jul 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gdesmar commented Aug 12, 2022

Uh oh!

wmetcalf commented Oct 20, 2022

Uh oh!

DissectMalware commented Oct 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DissectMalware commented Jul 23, 2022 •

edited

Loading