Strings in WebAssembly

#frontend#tech

Reading and writing strings in WebAssembly (WASM) is more complicated than working with numbers. At the time of writing, WASM only supports numerical values (i32/i64/f32/f64), meaning we can't directly input a string into a WASM function and expect a string output. Fortunately, it is possible to manipulate strings in WASM by accessing the WASM memory directly. I couldn't find great documentation on WASM memory access on MDN so I'm documenting this here in case anyone searches this on the internet.

Let's assume we want to write a WASM function in Zig that takes an input string argument and returns an output string. Let's start with a basic WASM example using integers.

export fn getData(a: usize) usize {
    return if (a == 0) 0 else a - 1;
}

To compile it:

zig build-exe wasm.zig -target wasm32-freestanding -fno-entry -femit-bin=get_data.wasm --export=getData

Then to call this binary from the browser:

const source = fetch("https://example.com/get_data.wasm");

WebAssembly.instantiateStreaming(source).then((mod) => {
  const getData = mod.instance.exports.getData;
  console.log(getData(101)); // returns 100
});

To work with strings, we first need to understand how strings work in memory. In most programming languages, a string is usually a contiguous array of characters in memory. Characters can be translated to integers representing each character. Once we have an array of integers, we can read and write the integers to WASM memory. Putting this all together, the basic concept for using strings in a WASM module looks like this:

  1. JS gets the input string and writes it to WASM memory
  2. JS calls WASM module export with input string memory address and input length
  3. WASM module reads the input string from memory using the address and input length
  4. WASM module writes the output string to memory and returns the output length
  5. JS reads WASM memory using the output length to get the output string

The way we access WASM memory is via the instantiated WASM module's exports. WASM always contains a memory module export which controls the allocated memory for the module. We can use this to encode the input string as integers into memory using a TextEncoder. From Zig we use a many-item pointer to refer to the string's memory address, then read it into a slice and perform whatever operations we'd like to create an output string. Once we have an output string, we write the output to the same memory address and return the output length to JS. Finally, when reading from memory, we use a TextDecoder to decode the data back into a string.

Here's what this looks like in practice:

export fn getData(input_addr: [*]u8, input_len: usize) usize {
    // you can convert the ptr to a slice to make things simpler
    // const input = input_addr[0..input_len];

    // do something with the input and create the output
    const output = "Hello, JS!";

    // write the output to contiguous memory
    for (output, 0..) |c, i| {
        input_addr[i] = c;
    }
    return output.len;
}
// must be served with "Content-Type: application/wasm"
const source = fetch("https://example.com/get_data.wasm");

// or construct a custom Response
// const source = new Response(
//   await Deno.readFile("get_data.wasm"),
//   { headers: { "Content-type": "application/wasm" } },
// );

WebAssembly.instantiateStreaming(source).then((mod) => {
  const memory = mod.instance.exports.memory;
  const getData = mod.instance.exports.getData;

  const input = "Hello, WASM!";

  const memoryView = new Uint8Array(memory.buffer);
  const { written: memoryInputLength } = new TextEncoder().encodeInto(
    input,
    memoryView,
  );

  // starting address is 0
  const outputLength = getData(0, memoryInputLength);

  const outputView = new Uint8Array(memory.buffer, 0, outputLength);
  const output = new TextDecoder().decode(outputView);

  console.log(output); // Hello, JS!
});

A WASM memory page is 64KiB but I'm assuming that we're working with a smaller input. If you need more than 64KiB you can call memory.grow() to allocate more pages.