Skip to content

Wasmgen

Bernard Teo edited this page Apr 24, 2020 · 8 revisions

Wasmgen

Wasmgen is a WebAssembly binary encoder that is able to encode a WebAssembly program in its binary format specified in the WebAssembly specification. It is best to illustrate with a simple example:

use wasmgen::*;
let mut module: WasmModule = WasmModule::new_builder().build();
let functype = FuncType::new(Box::new([]), Box::new([ValType::I32]));
let (_type_idx, func_idx) = module.register_func(&functype);
let mut code_builder = CodeBuilder::new(functype);
{
    let (_locals_builder, expr_builder) = code_builder.split();
    expr_builder.i32_const(42); // put 42 onto the stack
    expr_builder.end(); // return
}
module.commit_func(func_idx, code_builder);
module.export_func(func_idx, "main".to_string());
let mut receiver = std::vec::Vec::<u8>::new();
module.wasm_serialize(&mut receiver);
return receiver.into_boxed_slice();

The code above creates a WebAssembly binary that contains a single function that takes in zero parameters and returns one i32 value, which is always 42 in this case.

It should be able to encode all things expressible in WebAssembly, including tables, memories, imports, globals, etc.

Abstraction of type section

Wasmgen provides an abstraction for the type section of the WebAssembly binary. Many of the other sections refer to the type section. Instead of adding types into the type section separately, the type section is fully abstracted away — instructions and entities that usually store a type index are added by passing the actual type (as a list of WebAssembly value types (i.e. i32, i64, f32, f64)). Users of Wasmgen should never need to directly access the type section.

Encoding things

Function/Code

In WebAssembly, functions are spread over three sections:

  • Type Section - contains the signature of the function (i.e. the types of its parameters and return values)
  • Function Section - contains the declaration of the function (it is just an index into the Type Section)
  • Code Section - contains the function body

Wasmgen exposes a straightforward API to add functions; it does the appropriate additions to all three sections under the hood.

Adding a function is a two-step process - you have to first register the function, then commit it. Usually the encoding of instructions happens between the register step and the commit step, but it is not strictly necessary.

We first create a type signature for the function:

let functype = FuncType::new(Box::new([]), Box::new([ValType::I32]));

Then we register it with the module:

let (type_idx, func_idx) = module.register_func(&functype);

Registration of a function creates a new entry in the Func Section and a corresponding entry in the Code Section (which will be empty until the function is committed). It will also add the signature to the Type Section if it doesn't already exist. It is somewhat like declaring the function in languages like C. It returns a pair (TypeIdx, FuncIdx). The TypeIdx value represents the type index of the newly created function, which may be used to encode call_indirect instructions; usually this value is ignored. The FuncIdx value is more important -- it returns a handle to the newly created function, and must be used later for committing the function. When encoding call instructions, you will also need to supply the FuncIdx of the callee. These indices are structs containing a single integer, but most of the time it suffices to treat them as opaque handles.

After registration, we construct a CodeBuilder which handles encoding of instructions into a byte buffer. Notice that this CodeBuilder does not interact with the WasmModule in any way.

{
    let (locals_builder, expr_builder) = code_builder.split();
    // encode the function body here...
    // For example:
    expr_builder.i32_const(ir_source_vartype.tag());
    expr_builder.local_set(wasm_localidx);
    expr_builder.i64_reinterpret_f64();
    expr_builder.local_set(wasm_localidx + 1);
}

To use the CodeBuilder, we first split() it into a LocalsManager and a ExprBuilder. The LocalsManager is for adding local variables, and ExprBuilder is for encoding instructions. As seen in the example above, ExprBuilder has a method for each instruction type. The methods map exactly to those specified in WebAssembly.

Note: The ExprBuilder does not validate (in the WebAssembly sense, i.e. type-check) the encoded instructions in any way, so it might be possible to make mistakes that will cause WebAssembly validation to fail. It might be possible to (ab)use Rust generics to design a different CodeBuilder that is aware of the types of local variables and the types of things on the protected stack and statically do WebAssembly validation at compilation time.

After encoding your function, remember to encode the end() instruction as required by the WebAssembly specification. CodeBuilder does not automatically append it for you.

Then, we commit the function into the module:

module.commit_func(func_idx, code_builder);

Committing a function adds the provided function body (via the CodeBuilder) to the module. The FuncIdx parameter should contain the value returned during registration. Committing a function consumes the CodeBuilder (i.e. code_builder will no-longer be usable after calling commit_func()).

Example:

let functype = FuncType::new(Box::new([]), Box::new([ValType::I32]));
let (type_idx, func_idx) = module.register_func(&functype);
let mut code_builder = CodeBuilder::new(functype);
{
    let (locals_builder, expr_builder) = code_builder.split();
    // encode the function body here...
}
module.commit_func(func_idx, code_builder);

Notes:

  1. Since each registered function is identified by its FuncIdx, multiple functions may be registered first, and only committed much later.
  2. During serialization, all registered functions must have already been committed; otherwise serialisation will panic.
  3. The same function can be committed multiple times. Only the last commit will be serialized; all earlier commits will be discarded.
  4. A call/call_indirect instruction may call any function that has been registered (even though the function may have not been committed yet).

Possible Future Extensions

WebAssembly has well-defined validation rules here. We might be able to design a different kind of CodeBuilder that knows the WebAssembly types of its locals and stack variables. By (ab)using Rust's generic types and trait system, we might be able to validate the generated WebAssembly code at compilation time.

Imports

WebAssembly modules may import functions, tables, memories, and globals from the host environment. For our compiler, only function imports are used. Function imports allow WebAssembly to call functions that are defined in the host environment (i.e. JavaScript).

WebAssembly uses the same index space for function imports and normal functions declared in WebAssembly. This means that the FuncIdx used to identify functions may refer to either an imported function or a normal function. Specifically, the imported functions precede the normal functions (i.e. imported functions have smaller FuncIdx values than normal functions); i.e. if there are n imported functions and m normal functions, then indices [0, n) refer to imported functions, and indices [n, m) refer to normal functions. For Wasmgen, it means that we must know the total number of imported functions before the first normal function may be registered (otherwise we can't give the normal function a definite FuncIdx value).

We solve this problem by introducing a new WasmImportBuilderModule type that can be used to add imports before constructing the actual WasmModule. Imports are fixed once the WasmModule is constructed. This is an example where we import two functions (note that WebAssembly imports contain two-levelled names, i.e. the module name followed by the entity name within that module):

let mut wasm_builder: WasmImportBuilderModule = WasmModule::new_builder();
let assert_failed_i32_func = wasm_builder.import_func(
    "platform".to_string(),
    "assert_fail".to_string(),
    &FuncType::new(
        Box::new([ValType::I32, ValType::I32, ValType::I32]),
        Box::new([]),
    ),
);
let test_failed_func = wasm_builder.import_func(
    "platform".to_string(),
    "test_fail".to_string(),
    &FuncType::new(Box::new([]), Box::new([])),
);
let mut wasm_module: WasmModule = wasm_builder.build();
/* ... Add normal functions and other stuff here, but you cannot add any imports now ... */

And now all is perfect — we have enforced the precedence constraint for imported functions at compilation time.

Other sections

There are other sections in a WebAssembly binary, including the linear memory, exports, tables, and some other things. They are simple abstractions over the WebAssembly format, and should be self-explanatory after you read the relevant section of the WebAssembly specification.

Clone this wiki locally