Skip to content

Commit 55c4e8c

Browse files
lambdageeklewing
andauthored
[wasm] Webcil-in-WebAssembly (#85932)
Define a WebAssembly module wrapper for Webcil assemblies. Contributes to #80807 ### Why In some settings serving `application/octet-stream` data, or files with weird extensions will trigger firewalls or AV tools. But let's assume that if you're interested in deploying a .NET WebAssembly app, you're in an environment that can at least serve WebAssembly modules. ### How Essentially we serve this WebAssembly module: ```wat (module (data "\0f\00\00\00") ;; data segment 0: payload size (data "webcil Payload\cc") ;; data segment 1: webcil payload (memory (import "webcil" "memory") 1) (global (export "webcilVersion") i32 (i32.const 0)) (func (export "getWebcilSize") (param $destPtr i32) (result) local.get $destPtr i32.const 0 i32.const 4 memory.init 0) (func (export "getWebcilPayload") (param $d i32) (param $n i32) (result) local.get $d i32.const 0 local.get $n memory.init 1)) ``` The module exports two WebAssembly functions `getWebcilSize` and `getWebcilPayload` that write some bytes (being the size or payload of the webcil assembly) to the linear memory at a given offset. The module also exports the constant `webcilVersion` to version the wrapper format. So a runtime or tool that wants to consume the webcil module can do something like: ```js const wasmModule = new WebAssembly.Module (...); const wasmMemory = new WebAssembly.Memory ({initial: 1}); const wasmInstance = new WebAssembly.Instance(wasmModule, {webcil: {memory: wasmMemory}}); const { getWebcilPayload, webcilVersion, getWebcilSize } = wasmInstance.exports; console.log (`Version ${webcilVersion.value}`); getWebcilSize(0); const size = new Int32Array (wasmMemory.buffer)[0] console.log (`Size ${size}`); console.log (new Uint8Array(wasmMemory.buffer).subarray(0, 20)); getWebcilPayload(4, size); console.log (new Uint8Array(wasmMemory.buffer).subarray(0, 20)); ``` ### How (Part 2) But actually, we will define the wrapper to consist of exactly 2 data segments in the WebAssembly data section: segment 0 is 4 bytes and encodes the webcil payload size; and segment 1 is of variable size and contains the webcil payload. So to load a webcil-in-wasm module, the runtime gets the _raw bytes_ of the WebAssembly module (ie: without instantiating it), and parses it to find the data section, assert that there are 2 segments, ensure they're both passive, and get the data directly from segment 1. --- * Add option to emit webcil inside a wasm module wrapper * [mono][loader] implement a webcil-in-wasm reader * reword WebcilWasmWrapper summary comment * update the Webcil spec to include the WebAssembly wrapper module * Adjust RVA map offsets to account for wasm prefix MonoImage:raw_data is used as a base when applying the RVA map to map virtual addresses to physical offsets in the assembly. With webcil-in-wasm there's an extra wasm prefix before the webcil payload starts, so we need to account for this extra data when creating the mapping. An alternative is to compute the correct offsets as part of generating the webcil, but that would entangle the wasm module and the webcil payload. The current (somewhat hacky approach) keeps them logically separate. * Add a note about the rva mapping to the spec * Serve webcil-in-wasm as .wasm * remove old .webcil support from Sdk Pack Tasks * Implement support for webcil in wasm in the managed WebcilReader * align webcil payload to a 4-byte boundary within the wasm module Add padding to data segment 0 to ensure that data segment 1's payload (ie the webcil content itself) is 4-byte aligned * assert that webcil raw data is 4-byte aligned * add 4-byte alignment requirement to the webcil spec * Don't modify MonoImageStorage:raw_data instead just keep track of the webcil offset in the MonoImageStorage. This introduces a situation where MonoImage:raw_data is different from MonoImageStorage:raw_data. The one to use for accessing IL and metadata is MonoImage:raw_data. The storage pointer is just used by the image loading machinery --------- Co-authored-by: Larry Ewing <[email protected]>
1 parent 4c23ac2 commit 55c4e8c

File tree

28 files changed

+906
-48
lines changed

28 files changed

+906
-48
lines changed

docs/design/mono/webcil.md

Lines changed: 74 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,83 @@
22

33
## Version
44

5-
This is version 0.0 of the Webcil format.
5+
This is version 0.0 of the Webcil payload format.
6+
This is version 0 of the WebAssembly module Webcil wrapper.
67

78
## Motivation
89

910
When deploying the .NET runtime to the browser using WebAssembly, we have received some reports from
1011
customers that certain users are unable to use their apps because firewalls and anti-virus software
1112
may prevent browsers from downloading or caching assemblies with a .DLL extension and PE contents.
1213

13-
This document defines a new container format for ECMA-335 assemblies
14-
that uses the `.webcil` extension and uses a new WebCIL container
15-
format.
14+
This document defines a new container format for ECMA-335 assemblies that uses the `.wasm` extension
15+
and uses a new WebCIL metadata payload format wrapped in a WebAssembly module.
1616

1717

1818
## Specification
1919

20+
### Webcil WebAssembly module
21+
22+
Webcil consists of a standard [binary WebAssembly version 0 module](https://webassembly.github.io/spec/core/binary/index.html) containing the following WAT module:
23+
24+
``` wat
25+
(module
26+
(data "\0f\00\00\00") ;; data segment 0: payload size as a 4 byte LE uint32
27+
(data "webcil Payload\cc") ;; data segment 1: webcil payload
28+
(memory (import "webcil" "memory") 1)
29+
(global (export "webcilVersion") i32 (i32.const 0))
30+
(func (export "getWebcilSize") (param $destPtr i32) (result)
31+
local.get $destPtr
32+
i32.const 0
33+
i32.const 4
34+
memory.init 0)
35+
(func (export "getWebcilPayload") (param $d i32) (param $n i32) (result)
36+
local.get $d
37+
i32.const 0
38+
local.get $n
39+
memory.init 1))
40+
```
41+
42+
That is, the module imports linear memory 0 and exports:
43+
* a global `i32` `webcilVersion` encoding the version of the WebAssembly wrapper (currently 0),
44+
* a function `getWebcilSize : i32 -> ()` that writes the size of the Webcil payload to the specified
45+
address in linear memory as a `u32` (that is: 4 LE bytes).
46+
* a function `getWebcilPayload : i32 i32 -> ()` that writes `$n` bytes of the content of the Webcil
47+
payload at the spcified address `$d` in linear memory.
48+
49+
The Webcil payload size and payload content are stored in the data section of the WebAssembly module
50+
as passive data segments 0 and 1, respectively. The module must not contain additional data
51+
segments. The module must store the payload size in data segment 0, and the payload content in data
52+
segment 1.
53+
54+
The payload content in data segment 1 must be aligned on a 4-byte boundary within the web assembly
55+
module. Additional trailing padding may be added to the data segment 0 content to correctly align
56+
data segment 1's content.
57+
58+
(**Rationale**: With this wrapper it is possible to split the WebAssembly module into a *prefix*
59+
consisting of everything before the data section, the data section, and a *suffix* that consists of
60+
everything after the data section. The prefix and suffix do not depend on the contents of the
61+
Webcil payload and a tool that generates Webcil files could simply emit the prefix and suffix from
62+
constant data. The data section is the only variable content between different Webcil-encoded .NET
63+
assemblies)
64+
65+
(**Rationale**: Encoding the payload in the data section in passive data segments with known indices
66+
allows a runtime that does not include a WebAssembly host or a runtime that does not wish to
67+
instantiate the WebAssembly module to extract the payload by traversing the WebAssembly module and
68+
locating the Webcil payload in the data section at segment 1.)
69+
70+
(**Rationale**: The alignment requirement is due to ECMA-335 metadata requiring certain portions of
71+
the physical layout to be 4-byte aligned, for example ECMA-335 Section II.25.4 and II.25.4.5.
72+
Aligning the Webcil content within the wasm module allows tools that directly examine the wasm
73+
module without instantiating it to properly parse the ECMA-335 metadata in the Webcil payload.)
74+
75+
(**Note**: the wrapper may be versioned independently of the payload.)
76+
77+
78+
### Webcil payload
79+
80+
The webcil payload contains the ECMA-335 metadata, IL and resources comprising a .NET assembly.
81+
2082
As our starting point we take section II.25.1 "Structure of the
2183
runtime file format" from ECMA-335 6th Edition.
2284

@@ -40,12 +102,12 @@ A Webcil file follows a similar structure
40102
| CLI Data |
41103
| |
42104

43-
## Webcil Headers
105+
### Webcil Headers
44106

45107
The Webcil headers consist of a Webcil header followed by a sequence of section headers.
46108
(All multi-byte integers are in little endian format).
47109

48-
### Webcil Header
110+
#### Webcil Header
49111

50112
``` c
51113
struct WebcilHeader {
@@ -75,11 +137,11 @@ The next pairs of integers are a subset of the PE Header data directory specifyi
75137
of the CLI header, as well as the directory entry for the PE debug directory.
76138

77139

78-
### Section header table
140+
#### Section header table
79141

80142
Immediately following the Webcil header is a sequence (whose length is given by `coff_sections`
81143
above) of section headers giving their virtual address and virtual size, as well as the offset in
82-
the Webcil file and the size in the file. This is a subset of the PE section header that includes
144+
the Webcil payload and the size in the file. This is a subset of the PE section header that includes
83145
enough information to correctly interpret the RVAs from the webcil header and from the .NET
84146
metadata. Other information (such as the section names) are not included.
85147

@@ -92,11 +154,13 @@ struct SectionHeader {
92154
};
93155
```
94156

95-
### Sections
157+
(**Note**: the `st_raw_data_ptr` member is an offset from the beginning of the Webcil payload, not from the beginning of the WebAssembly wrapper module.)
158+
159+
#### Sections
96160

97161
Immediately following the section table are the sections. These are copied verbatim from the PE file.
98162

99-
## Rationale
163+
### Rationale
100164

101165
The intention is to include only the information necessary for the runtime to locate the metadata
102166
root, and to resolve the RVA references in the metadata (for locating data declarations and method IL).
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
// Licensed to the .NET Foundation under one or more agreements.
2+
// The .NET Foundation licenses this file to you under the MIT license.
3+
4+
using System;
5+
using System.Collections.Immutable;
6+
using System.IO;
7+
using System.Reflection;
8+
using System.Runtime.InteropServices;
9+
using System.Text;
10+
11+
namespace Microsoft.NET.WebAssembly.Webcil;
12+
13+
internal class WasmModuleReader : IDisposable
14+
{
15+
public enum Section : byte
16+
{
17+
// order matters: enum values must match the WebAssembly spec
18+
Custom,
19+
Type,
20+
Import,
21+
Function,
22+
Table,
23+
Memory,
24+
Global,
25+
Export,
26+
Start,
27+
Element,
28+
Code,
29+
Data,
30+
DataCount,
31+
}
32+
33+
private readonly BinaryReader _reader;
34+
35+
private readonly Lazy<bool> _isWasmModule;
36+
37+
public bool IsWasmModule => _isWasmModule.Value;
38+
39+
public WasmModuleReader(Stream stream)
40+
{
41+
_reader = new BinaryReader(stream, Encoding.UTF8, leaveOpen: true);
42+
_isWasmModule = new Lazy<bool>(this.GetIsWasmModule);
43+
}
44+
45+
46+
public void Dispose()
47+
{
48+
Dispose(true);
49+
}
50+
51+
52+
protected virtual void Dispose(bool disposing)
53+
{
54+
if (disposing)
55+
{
56+
_reader.Dispose();
57+
}
58+
}
59+
60+
protected virtual bool VisitSection (Section sec, out bool shouldStop)
61+
{
62+
shouldStop = false;
63+
return true;
64+
}
65+
66+
private const uint WASM_MAGIC = 0x6d736100u; // "\0asm"
67+
68+
private bool GetIsWasmModule()
69+
{
70+
_reader.BaseStream.Seek(0, SeekOrigin.Begin);
71+
try
72+
{
73+
uint magic = _reader.ReadUInt32();
74+
if (magic == WASM_MAGIC)
75+
return true;
76+
} catch (EndOfStreamException) {}
77+
return false;
78+
}
79+
80+
public bool Visit()
81+
{
82+
if (!IsWasmModule)
83+
return false;
84+
_reader.BaseStream.Seek(4L, SeekOrigin.Begin); // skip magic
85+
86+
uint version = _reader.ReadUInt32();
87+
if (version != 1)
88+
return false;
89+
90+
bool success = true;
91+
while (success) {
92+
success = DoVisitSection (out bool shouldStop);
93+
if (shouldStop)
94+
break;
95+
}
96+
return success;
97+
}
98+
99+
private bool DoVisitSection(out bool shouldStop)
100+
{
101+
shouldStop = false;
102+
byte code = _reader.ReadByte();
103+
Section section = (Section)code;
104+
if (!Enum.IsDefined(typeof(Section), section))
105+
return false;
106+
uint sectionSize = ReadULEB128();
107+
108+
long savedPos = _reader.BaseStream.Position;
109+
try
110+
{
111+
return VisitSection(section, out shouldStop);
112+
}
113+
finally
114+
{
115+
_reader.BaseStream.Seek(savedPos + (long)sectionSize, SeekOrigin.Begin);
116+
}
117+
}
118+
119+
protected uint ReadULEB128()
120+
{
121+
uint val = 0;
122+
int shift = 0;
123+
while (true)
124+
{
125+
byte b = _reader.ReadByte();
126+
val |= (b & 0x7fu) << shift;
127+
if ((b & 0x80u) == 0) break;
128+
shift += 7;
129+
if (shift >= 35)
130+
throw new OverflowException();
131+
}
132+
return val;
133+
}
134+
135+
protected bool TryReadPassiveDataSegment (out long segmentLength, out long segmentStart)
136+
{
137+
segmentLength = 0;
138+
segmentStart = 0;
139+
byte code = _reader.ReadByte();
140+
if (code != 1)
141+
return false; // not passive
142+
segmentLength = ReadULEB128();
143+
segmentStart = _reader.BaseStream.Position;
144+
// skip over the data
145+
_reader.BaseStream.Seek (segmentLength, SeekOrigin.Current);
146+
return true;
147+
}
148+
}

src/libraries/Microsoft.NET.WebAssembly.Webcil/src/Webcil/WebcilConverter.cs

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ FilePosition SectionStart
4242

4343
private string InputPath => _inputPath;
4444

45+
public bool WrapInWebAssembly { get; set; } = true;
46+
4547
private WebcilConverter(string inputPath, string outputPath)
4648
{
4749
_inputPath = inputPath;
@@ -62,6 +64,26 @@ public void ConvertToWebcil()
6264
}
6365

6466
using var outputStream = File.Open(_outputPath, FileMode.Create, FileAccess.Write);
67+
if (!WrapInWebAssembly)
68+
{
69+
WriteConversionTo(outputStream, inputStream, peInfo, wcInfo);
70+
}
71+
else
72+
{
73+
// if wrapping in WASM, write the webcil payload to memory because we need to discover the length
74+
75+
// webcil is about the same size as the PE file
76+
using var memoryStream = new MemoryStream(checked((int)inputStream.Length));
77+
WriteConversionTo(memoryStream, inputStream, peInfo, wcInfo);
78+
memoryStream.Flush();
79+
var wrapper = new WebcilWasmWrapper(memoryStream);
80+
memoryStream.Seek(0, SeekOrigin.Begin);
81+
wrapper.WriteWasmWrappedWebcil(outputStream);
82+
}
83+
}
84+
85+
public void WriteConversionTo(Stream outputStream, FileStream inputStream, PEFileInfo peInfo, WCFileInfo wcInfo)
86+
{
6587
WriteHeader(outputStream, wcInfo.Header);
6688
WriteSectionHeaders(outputStream, wcInfo.SectionHeaders);
6789
CopySections(outputStream, inputStream, peInfo.SectionHeaders);
@@ -210,7 +232,7 @@ private static void WriteStructure<T>(Stream s, T structure)
210232
}
211233
#endif
212234

213-
private static void CopySections(FileStream outStream, FileStream inputStream, ImmutableArray<SectionHeader> peSections)
235+
private static void CopySections(Stream outStream, FileStream inputStream, ImmutableArray<SectionHeader> peSections)
214236
{
215237
// endianness: ok, we're just copying from one stream to another
216238
foreach (var peHeader in peSections)

0 commit comments

Comments
 (0)