Skip to content

Commit 2497bc4

Browse files
committed
0.5.0 - progress to source decompilation, bit of refactoring, and first unit tests
1 parent 1d29c19 commit 2497bc4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+3025
-829
lines changed

.classpath

+3
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
11
<?xml version="1.0" encoding="UTF-8"?>
22
<classpath>
33
<classpathentry kind="src" path="src"/>
4+
<classpathentry kind="src" path="test"/>
45
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER">
56
<attributes>
67
<attribute name="module" value="true"/>
78
</attributes>
89
</classpathentry>
910
<classpathentry kind="lib" path="C:/Users/TeamworkGuy2/Documents/Java/Libraries/java/eclipse/jar/ecj-4.4M2.jar"/>
11+
<classpathentry kind="con" path="org.eclipse.jdt.junit.JUNIT_CONTAINER/5"/>
12+
<classpathentry kind="lib" path="C:/Users/TeamworkGuy2/Documents/Java/Libraries/jprimitive-collections/bin/jprimitive_collections.jar" sourcepath="/JPrimitiveCollections"/>
1013
<classpathentry kind="output" path="bin"/>
1114
</classpath>

CHANGELOG.md

+24-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,30 @@ This project does its best to adhere to [Semantic Versioning](http://semver.org/
44

55

66
--------
7-
### [0.4.0](N/A) - 2020-06-29
7+
### [0.5.0](N/A) - 2020-12-05
8+
__Decompilation to source code in-progress and first round trip compile/decompile unit tests__
9+
#### Added
10+
* A new `twg2.jbcm.ir` package with helper classes for tracking state and data related to decompilation
11+
* `CodeOffsetGetter` interface implemented by `ChangeCpIndex`
12+
* `CodeFlow` for helping analyze potential code flow paths through a method
13+
* Unit tests with tests which perform compilation of source code and check decompilation back to intermediate view
14+
15+
#### Changed
16+
* `CodeToSource` is in-progress, simple code can be converted, loops, switch statements, and interface/dynamic method calls are still TODO
17+
* Split `IoUtility` into new `CodeUtility` class and moved static no-op `CpIndexChanger` and `CodeOffsetChanger` into their respective interfaces
18+
* Renamed `ByteCodeConsumer` -> `BytcodeConsumer`
19+
* Moved `MethodStack` to new `twg2.jbcm.ir` package
20+
* Lots of new functionality in `Opcodes` to support to source decompilation
21+
* Extensive `CompileSource` changes/improvements to support simple use cases like compiling in-memory (although a physical temp file still gets written under the hood)
22+
* Moved `LookupswitchOffsetModifier` and `TableswitchOffsetModifier` out of `IoUtility` and into their own classes
23+
24+
#### Removed
25+
* `CpIndexChanger.shiftIndex()` interface method since it was unused and `CodeOffsetChanger` is equivalent
26+
* Unused classes: `Offset`, `OffsetOpcode`, `Opcode`, `OpcodeObject`
27+
28+
29+
--------
30+
### [0.4.0](https://github.com/TeamworkGuy2/ClassLoading/commit/1d29c1923096438571a751511cd1f2085d708bb9) - 2020-06-29
831
#### Added
932
* `ClassFileToSource` and `CodeToSource` work-in-progress to convert class file back into Java source code
1033
* `CONSTANT_CP_Info.toShortString()` added interface method and to implementations

README.md

+10-8
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,24 @@
11
ClassLoading
22
==========
3-
version: 0.3.0
43

5-
Java class file parsing and manipulation library.
4+
Java class file parsing, manipulation, and decompilation library.
65
This library is mostly experimental for my own personal learning.
7-
It can load and save class files and lookup class file dependencies, but contains very little code for making changes to class files or validating those changes.
8-
See the `twg2.jbcm.main.UsageCliMain` class for a simple command line interface you can use to load and print info about class files.
9-
Reference: [Java Virtual Machine Spec (Java 9)](http://docs.oracle.com/javase/specs/jvms/se9/html/index.html)
6+
It can load and save class files and lookup class file dependencies, but contains few helpers for making changes to class files or validating those changes.
7+
The class file format is fully implemented in [twg2.jbcm.classFormat](./ClassLoading/tree/master/src/twg2/jbcm/classFormat) and sub packages, including all [constant pool entry types](https://docs.oracle.com/javase/specs/jvms/se9/html/jvms-4.html#jvms-4.4) and [class file attributes](https://docs.oracle.com/javase/specs/jvms/se9/html/jvms-4.html#jvms-4.7).
8+
See the `twg2.jbcm.main.UsageCliMain` class for a simple command line interface you can use to load and print class file info.
9+
10+
Reference: [Java Virtual Machine Spec (Java 9)](https://docs.oracle.com/javase/specs/jvms/se9/html/index.html)
1011

1112
### `twg2.jbcm.classFormat`
12-
Contains implementation of the [class file format](http://docs.oracle.com/javase/specs/jvms/se9/html/jvms-4.html)
13+
Contains implementation of the [class file format](https://docs.oracle.com/javase/specs/jvms/se9/html/jvms-4.html)
1314
with related attributes (`twg2.jbcm.classFormat.attributes`) and constant pool types (`twg2.jbcm.classFormat.constantPool`).
1415

1516
### `twg2.jbcm` and `twg2.jbcm.modify`
1617
Interfaces and utilities for searching and modifying class files.
1718

18-
### `twg2.jbcm.opcode`
19-
Partial implementation of [Java instruction set opcodes](http://docs.oracle.com/javase/specs/jvms/se9/html/jvms-6.html#jvms-6.5).
19+
### `twg2.jbcm`
20+
Utilities and the `Opcodes` enum containing detailed, programatic information about the [Java instruction set opcodes](https://docs.oracle.com/javase/specs/jvms/se9/html/jvms-6.html#jvms-6.5).
21+
Also see the [extract-opcodes.js] file for how the enum literals in `Opcodes` are generated.
2022

2123
### `twg2.jbcm.dynamicModification` and `twg2.jbcm.parserExamples`
2224
Classes used by the example and test packages.

bin/class_loading.jar

40.1 KB
Binary file not shown.

extract-opcodes.js

+13-6
Original file line numberDiff line numberDiff line change
@@ -48,33 +48,40 @@ res.map((rr, idx, ary) => {
4848
var isCondition = rr.name.startsWith("if");
4949
var isJump = rr.operation.startsWith("Branch ") || rr.name === "jsr" || rr.name === "jsr_w";
5050
var isCpIndex = rr.description.indexOf("index into the run-time constant pool of the current class") > -1;
51+
var isCompareNumeric = rr.operation.startsWith("Compare ");
52+
var isMathOp = ["Add ", "Subtract ", "Multiply ", "Divide ", "Remainder ", "Negate ", "Shift ", "Arithmetic shift ", "Logical shift ", "Boolean AND ", "Boolean OR ", "Boolean XOR "].some((str) => rr.operation.startsWith(str));
5153
var isStackManipulate = rr.name.startsWith("dup") || rr.name === "swap";
54+
var isTypeConvert = rr.operation.startsWith("Convert ");
5255
var isVariableStackPop = rr.operandStack[opStackOffset].indexOf("[arg") > -1;
5356
var types;
5457
var opUtils;
55-
return "\t/* " + String(rr.opCode).padStart(2, ' ') + " " + ("0x" + rr.opCode.toString(16).toUpperCase()).padStart(4, ' ') + " */" +
58+
return "\t/* " + ("0x" + rr.opCode.toString(16).toUpperCase()).padStart(4, ' ') + " */" +
5659
rr.name.toUpperCase().padEnd(16, ' ') +
5760
"(" + rr.opCode + ", " + operandCount + ", " +
5861
((types = [
5962
(isStackManipulate ? "Type.STACK_MANIPULATE" : null),
6063
(isVariableStackPop ? "Type.POP_UNPREDICTABLE" : null),
6164
(stackPopCount > 0 && !isStackManipulate && !isVariableStackPop ? "Type.POP" + stackPopCount : null),
6265
(stackPushCount > 0 && !isStackManipulate ? "Type.PUSH" + stackPushCount : null),
66+
(rr.name.indexOf("const_") === 1 ? "Type.CONST_LOAD" : null),
6367
(rr.name.indexOf("load") === 1 ? "Type.VAR_LOAD" : null),
6468
(rr.name.indexOf("store") === 1 ? "Type.VAR_STORE" : null),
6569
(rr.name.indexOf("aload") === 1 ? "Type.ARRAY_LOAD" : null),
6670
(rr.name.indexOf("astore") === 1 ? "Type.ARRAY_STORE" : null),
6771
(rr.name.indexOf("return") > -1 ? "Type.RETURN" : null),
6872
(isCondition ? "Type.CONDITION" : null),
6973
(isJump ? "Type.JUMP" : null),
70-
(isCpIndex ? "Type.CP_INDEX" : null)
74+
(isCpIndex ? "Type.CP_INDEX" : null),
75+
(isCompareNumeric ? "Type.COMPARE_NUMERIC" : null),
76+
(isMathOp ? "Type.MATH_OP" : null),
77+
(isTypeConvert ? "Type.TYPE_CONVERT" : null)
7178
].filter(s => s != null)).length > 0 ? "enums(" + types.join(", ") + ")" : "none(Type.class)") +
7279
", " +
7380
((opUtils = [
74-
(isCondition || isJump ? "IoUtility.offsetModifier(1, " + (rr.name.endsWith("_w") ? 4 : 2) + ")" : null),
75-
(isCpIndex ? "IoUtility.cpIndex(1, " + (rr.name.endsWith("_w") ? 4 : 2) + ")" : null),
76-
(rr.name === "tableswitch" ? "IoUtility.TableswitchOffsetModifier" : null),
77-
(rr.name === "lookupswitch" ? "IoUtility.LookupswitchOffsetModifier" : null)
81+
(isCondition || isJump ? "CodeUtility.offsetModifier(1, " + (rr.name.endsWith("_w") ? 4 : 2) + ")" : null),
82+
(isCpIndex ? "CodeUtility.cpIndex(1, " + (rr.name.endsWith("_w") ? 4 : 2) + ")" : null),
83+
(rr.name === "tableswitch" ? "TableswitchOffsetModifier.defaultInst" : null),
84+
(rr.name === "lookupswitch" ? "LookupswitchOffsetModifier.defaultInst" : null)
7885
].filter(s => s != null)).length > 0 ? "Op.of(" + opUtils.join(", ") + ")" : "null") +
7986
")" + (idx < ary.length - 1 ? "," : ";") +
8087
" // " + rr.operation + "," + (!isStackManipulate ? " stack: " + JSON.stringify(rr.operandStack, undefined, " ").split("\n").map(s => s.trim()).join(" ") + "," : "") + (rr.hashLink != null ? " link: " + baseUrl + "#" + rr.hashLink : "");

package-lib.json

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
{
2-
"version" : "0.4.0",
2+
"version" : "0.5.0",
33
"name" : "class-loading",
44
"description" : "Java class file parsing, manipulation, and to human readable representation",
55
"homepage" : "https://github.com/TeamworkGuy2/ClassLoading",
66
"license" : "MIT",
77
"main" : "./bin/class_loading.jar",
88
"dependencies" : {
9-
"ecj-4.4M2" : "*"
9+
"ecj-4.4M2" : "*",
10+
"jprimitive-collections": "*"
1011
}
1112
}

src/twg2/jbcm/CodeFlow.java

+72
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
package twg2.jbcm;
2+
3+
import twg2.collections.primitiveCollections.IntArrayList;
4+
import twg2.collections.primitiveCollections.IntListReadOnly;
5+
import twg2.jbcm.Opcodes.Type;
6+
7+
/** Trace all possible paths through the code in a method. A code flow follows jump, branch/condition, return, and throw instructions.
8+
* Circular paths end at the first jump/branch destination which already exists in the code flow.
9+
* @author TeamworkGuy2
10+
* @since 2020-12-03
11+
*/
12+
public class CodeFlow {
13+
14+
/** Starting at a given point in a bytecode array, follow code jumps and branches to all termination (return/throw) points potentially reachable from the starting point
15+
* @param idx the starting point
16+
* @param instr the bytecode array
17+
* @param dstPath the list to add jumps, branches/conditions, returns, and throw instruction locations to
18+
* @return a list of {@code instr} points at which jumps, branches, returns, and throws occur.
19+
* Non-terminating points (jumps and branches) are represented as negated indexes (i.e. {@code ~value})
20+
* and can easily be converted back by negating them again. This differentiates non-terminal indexes from all
21+
* valid terminal indexes because valid code indexes cannot be less than 0.
22+
*/
23+
public static IntArrayList getFlowPaths(int idx, byte[] instr, IntArrayList dstPath) {
24+
for(int i = idx, size = instr.length; i < size; i++) {
25+
Opcodes opc = Opcodes.get(instr[i] & 0xFF);
26+
int numOperands = opc.getOperandCount();
27+
28+
// Type.JUMP instruction set includes all Type.CONDITION instructions
29+
if(opc.hasBehavior(Type.JUMP)) {
30+
// follow the jump path if it has not already been followed (to avoid loops)
31+
if(!dstPath.contains(~i)) {
32+
dstPath.add(~i);
33+
}
34+
int jumpDst = opc.getJumpDestination(instr, i);
35+
getFlowPaths(jumpDst, instr, dstPath);
36+
37+
// end this code path if the jump path is unconditional (i.e. GOTO or JSR)
38+
if(!opc.hasBehavior(Type.CONDITION)) {
39+
break;
40+
}
41+
}
42+
// end this code flow path once a terminal instruction is reached
43+
else if(opc.hasBehavior(Type.RETURN) || opc == Opcodes.ATHROW) {
44+
dstPath.add(i);
45+
break;
46+
}
47+
48+
i += (numOperands < 0 ? 0 : numOperands);
49+
}
50+
51+
return dstPath;
52+
}
53+
54+
55+
public static String flowPathToString(byte[] instr, IntListReadOnly codeFlow) {
56+
var sb = new StringBuilder();
57+
for(int i = 0, size = codeFlow.size(); i < size; i++) {
58+
var idx = codeFlow.get(i);
59+
// a conditional/jump point
60+
if(idx < 0) {
61+
var opc = Opcodes.get(instr[~idx] & 0xFF);
62+
sb.append(~idx).append(' ').append(opc).append(" -> ");
63+
}
64+
// a terminal point
65+
else {
66+
var opc = Opcodes.get(instr[idx] & 0xFF);
67+
sb.append(idx).append(' ').append(opc).append("], ");
68+
}
69+
}
70+
return sb.toString();
71+
}
72+
}

src/twg2/jbcm/CodeUtility.java

+143
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
package twg2.jbcm;
2+
3+
import twg2.jbcm.modify.BytecodeConsumer;
4+
import twg2.jbcm.modify.ChangeCpIndex;
5+
import twg2.jbcm.modify.CodeOffsetChanger;
6+
import twg2.jbcm.modify.CpIndexChanger;
7+
8+
/** Utilities for dealing with byte code arrays
9+
* @author TeamworkGuy2
10+
* @since 2020-12-3
11+
*/
12+
public class CodeUtility {
13+
14+
/** Shift the offset values associated with a specific instruction in a chunk of code.
15+
* For example, shifting a goto offsets at position 55 by 12 might look like:<br/>
16+
* {@code shiftOffset(0xA7, 12, 1, 2, code, 55);}<br/>
17+
* Or shifting a goto_w offsets at position 25 by 160:<br/>
18+
* {@code shiftOffset(0xC8, 160, 1, 4, code, 25);}
19+
* @param offset the instruction code offset to adjust
20+
* @param offsetOffset the number of bytes ahead of the opcode at which the offset to adjust starts (1 for an offset that immediately follows an opcode)
21+
* @param code the array of code to search through for the opcode
22+
* @param codeOffset the offset into the code array at which to update the opcode's offset value
23+
* @return the location after the opcode's offset value, calculated as {@code codeOffset + offsetOffset + 1}
24+
*/
25+
public static int shift1Offset(int offset, final int offsetOffset, byte[] code, int codeOffset) {
26+
codeOffset += offsetOffset;
27+
byte curOffset = code[codeOffset];
28+
if(curOffset + offset < 0) {
29+
throw new ArithmeticException("byte overflow: " + curOffset + "+" + offset + "=" + (curOffset+offset));
30+
}
31+
curOffset += offset;
32+
code[codeOffset] = curOffset;
33+
return codeOffset + 1;
34+
}
35+
36+
37+
/** Shift the offset values associated with a specific instruction in a chunk of code.
38+
* For example, shifting a goto offsets at position 55 by 12 might look like:<br/>
39+
* {@code shiftOffset(12, 1, 2, code, 55);}<br/>
40+
* Or shifting a goto_w offsets at position 25 by 160:<br/>
41+
* {@code shiftOffset(160, 1, 4, code, 25);}
42+
* @param offset the instruction code offset to adjust
43+
* @param offsetOffset the number of bytes ahead of the opcode at which the offset to adjust starts (1 for an offset that immediately follows an opcode)
44+
* @param code the array of code to search through for the opcode
45+
* @param codeOffset the offset into the code array at which to update the opcode's offset value
46+
* @return the location after the opcode's offset value, calculated as {@code codeOffset + offsetOffset + 2}
47+
*/
48+
public static int shift2Offset(int offset, int offsetOffset, byte[] code, int codeOffset) {
49+
codeOffset += offsetOffset;
50+
short curOffset = IoUtility.readShort(code, codeOffset);
51+
if(curOffset + offset < 0) {
52+
throw new ArithmeticException("short overflow: " + curOffset + "+" + offset + "=" + (curOffset+offset));
53+
}
54+
curOffset += offset;
55+
IoUtility.writeShort(curOffset, code, codeOffset);
56+
return codeOffset + 2;
57+
}
58+
59+
60+
/** Shift the offset values associated with a specific instruction in a chunk of code.
61+
* For example, shifting a goto offsets at position 55 by 12 might look like:<br/>
62+
* {@code shiftOffset(12, 1, 2, code, 55);}<br/>
63+
* Or shifting a goto_w offsets at position 25 by 160:<br/>
64+
* {@code shiftOffset(160, 1, 4, code, 25);}
65+
* @param offset the instruction code offset to adjust
66+
* @param offsetOffset the number of bytes ahead of the opcode at which the offset to adjust starts (1 for an offset that immediately follows an opcode)
67+
* @param code the array of code to search through for the opcode
68+
* @param codeOffset the offset into the code array at which to update the opcode's offset value
69+
* @return the location after the opcode's offset value, calculated as {@code codeOffset + offsetOffset + 4}
70+
*/
71+
public static int shift4Offset(int offset, int offsetOffset, byte[] code, int codeOffset) {
72+
codeOffset += offsetOffset;
73+
int curOffset = IoUtility.readInt(code, codeOffset);
74+
if(curOffset + offset < 0) {
75+
throw new ArithmeticException("integer overflow: " + curOffset + "+" + offset + "=" + (curOffset+offset));
76+
}
77+
curOffset += offset;
78+
IoUtility.writeInt(curOffset, code, codeOffset);
79+
return codeOffset + 4;
80+
}
81+
82+
83+
/** Call the specified {@code BytecodeConsumer} for each instruction in the specified code array
84+
* @param code the code array
85+
* @param offset the offset into the code array at which to start finding instructions
86+
* @param length the number of bytes of the code array to check through
87+
* @param cbFunc the function to call for each instruction found in specified code array range
88+
*/
89+
public static void forEach(byte[] code, int offset, int length, BytecodeConsumer cbFunc) {
90+
int numOperands = 0;
91+
@SuppressWarnings("unused")
92+
int operand = 0;
93+
94+
for(int i = offset, size = offset + length; i < size; i++) {
95+
numOperands = Opcodes.get((code[i] & 0xFF)).getOperandCount();
96+
// Read following bytes of code and convert them to an operand depending on the number of operands specified for the current command
97+
operand = CodeUtility.loadOperands(numOperands, code, i);
98+
// Special handling for instructions with unpredictable byte code lengths
99+
if(numOperands == Opcodes.Const.UNPREDICTABLE) {
100+
if(Opcodes.WIDE.is(code[i])) {
101+
cbFunc.accept(Opcodes.get((code[i] & 0xFF)), code, i);
102+
i++; // because wide operations are nested around other operations
103+
numOperands = Opcodes.get((code[i] & 0xFF)).getOperandCount();
104+
}
105+
else if(Opcodes.TABLESWITCH.is(code[i])) {
106+
throw new IllegalStateException("tableswitch code handling not implemented");
107+
}
108+
else if(Opcodes.LOOKUPSWITCH.is(code[i])) {
109+
throw new IllegalStateException("lookupswitch code handling not implemented");
110+
}
111+
}
112+
cbFunc.accept(Opcodes.get((code[i] & 0xFF)), code, i);
113+
i+= (numOperands < 0) ? 0 : numOperands;
114+
}
115+
}
116+
117+
/** Extract from [0, 4] operands following a specified index in little-endian style order:
118+
* <pre>{@code
119+
* (((a & 0xFF) << 24) | ((b & 0xFF) << 16) | ((c & 0xFF) << 8) | (d & 0xFF))
120+
* }</pre>
121+
* @param numOperands the number of bytes to read as operand(s)
122+
* @param code the byte code array
123+
* @param index the index of the instruction immediately preceding the operand(s)
124+
* @return The binary OR'ed value of the operand bytes with the first operand in the most significant position or -1 if {@code numOperations = 0}
125+
*/
126+
public static int loadOperands(int numOperands, byte[] code, int index) {
127+
return (numOperands > 3 ? (((code[index+1] & 0xFF) << 24) | ((code[index+2] & 0xFF) << 16) | ((code[index+3] & 0xFF) << 8) | (code[index+4] & 0xFF)) :
128+
(numOperands > 2 ? (((code[index+1] & 0xFF) << 16) | ((code[index+2] & 0xFF) << 8) | (code[index+3] & 0xFF)) :
129+
(numOperands > 1 ? (((code[index+1] & 0xFF) << 8) | (code[index+2] & 0xFF)) :
130+
(numOperands > 0 ? ((code[index+1] & 0xFF)) : -1))));
131+
}
132+
133+
134+
public static CpIndexChanger cpIndex(int offset, int len) {
135+
return new ChangeCpIndex(offset, len);
136+
}
137+
138+
139+
public static CodeOffsetChanger offsetModifier(int offset, int len) {
140+
return new ChangeCpIndex(offset, len);
141+
}
142+
143+
}

0 commit comments

Comments
 (0)