Major refactoring#3
Conversation
|
Thank you very much for the PR. However, it may take a while for me to review the PR and merge. Please bear with me. @wmetcalf can you also take a look? |
|
Hi, I refactored my code to merge all my traits into this library.
I also added my replacement functions for the different interpret_logical_line functions, a combination of the analyze and the analyze_logical_line functions. I do recursive calls to analyze_logical_line to handle command grouping. A big difference from the interpret_logical_line that are already present is that I generate the extracted children and deobfuscated script as distinct files. On top of the previous batch children I also try to search for powershell children. The logic regarding poweshell children is not perfect yet, and I have examples where I want to improve it. It may end up in the interpret_command function with the other command interpreters. Unless you want to use my analyze function somewhere, this last commit should have no impact on the code that was already there. Thanks for reviewing. |
|
I finally had some time to go through all of this and play with it a bit. Looks good to me! I say ship it! |
Hi,
I think it could be of interest to merge back the improvement that I've done on the project. As you will see, I modified quite a lot of things. It would be hard to point to just a few things, as I think I modified a large enough percentage of the project. I also added some experimental features that you may not want, so I don't know how we'll be able to merge if you don't want those.
I'll go over a few things that I've done, that I think is worthy:
IF "A"=="A" (set A=1) else (set A=0)get a valid value for the variable A. If I recall, in the original code, the variable A would be equal to1) else (set A=0). Now, it will be equal to0. The beautifying process will split this previous statement in four lines, and since they are parsed in sequence, A will get the value1, then0, then move on to the next lines of the script.The Traits dictionary is a structure that I want to use to store curious/interesting features of the script that is being analyzed. Currently, it doesn't have much, but it could be improved over time. You can see that one of my trait is being populated in the interpret_curl function, where I will store the location of the exact line that did the download, the source, and the destination. The second place where I populate two other traits is at the end of the normalized_command. It fills in the start_with_var and var_used traits, which is a boolean for the first one, and an integer for the second. The goal is to flag lines that are starting with a variable, a possible sign of obfuscation, and the number of variable used on a single line. If the number is very high, it is another possible sign of obfuscation. You can see a clear example of those two in the test_unittest.py -> test_single_quote_var_name_rewrite_2.
I did not not modify the main function, and barely modified the two interpret_logical_line* functions I believe there was a typo and a missing clear() call. I personally don't use those, and made my own. I could try to merge my interpret_logical_line if there is interest. Both of the ones in the file currently are ending up printing to console as the _str calls the non _str for child executions. In my interpret_local_line, on a higher level, I also keep track of a few things which allow me to generate two or three other traits.
Take your time to look into it, and if you have any question or want to discuss about it, I am more than happy to chat with you.