Rework syscall ABI.

Syscalls now accept two parameters, allowing for things like "int count = read(buf, len)"
Rather than providing safe signatures for syscalls, the user is now given helper functions to safely parse incoming values, c-strings and slices.
This commit is contained in:
Toby Jaffey 2025-12-09 21:51:35 +00:00
parent f046a590c0
commit 76fba39a21
28 changed files with 543 additions and 537 deletions

104
README.md
View file

@ -56,7 +56,7 @@ uint8_t bytecode[] = { /* ... */ }; // some compiled bytecode
uvm32_state_t vmst; // execution state of the vm
uvm32_evt_t evt; // events passed from vm to host
uvm32_init(&vmst, NULL, 0); // setup vm and pass in handlers for host functions
uvm32_init(&vmst); // setup vm
uvm32_load(&vmst, bytecode, sizeof(bytecode)); // load the bytecode
uvm32_run(&vmst, &evt, 100); // run up to 100 instructions
@ -86,7 +86,7 @@ If the bytecode attempts to execute more instructions than the the passed value
## Internals
uvm32 emulates a RISC-V 32bit CPU using [mini-rv32ima](https://github.com/cnlohr/mini-rv32ima). All IO from vm bytecode to the host is performed using `ecall` syscalls. Each syscall provided by the host requires a unique syscall value. A syscall passes a single `uint32_t` from bytecode to the host and may receive a returned `uint32_t`. The host may treat the value as a pointer and modify memory.
uvm32 emulates a RISC-V 32bit CPU using [mini-rv32ima](https://github.com/cnlohr/mini-rv32ima). All IO from vm bytecode to the host is performed using `ecall` syscalls. Each syscall provided by the host requires a unique syscall value. A syscall passes two values and receives one on return.
uvm32 is always in one of 4 states, paused, running, ended or error.
@ -101,42 +101,48 @@ stateDiagram
## Boot
At boot, the whole memory is zeroed. The user program is placed at the start. The stack pointer is set to the start of the CPU registers and grows downwards. No heap region is setup and all code is in RAM.
At boot, the whole memory is zeroed. The user program is placed at the start. The stack pointer is set to the end of memory and grows downwards. No heap region is setup and all code is in RAM.
## syscall ABI
All communication between bytecode and the vm host is performed via syscalls.
To make a syscall, register `a7` is set with the syscall number (a `UVM32_SYSCALL_x`) and `a0` is set with the syscall parameter. The response is returned in `a1`.
To make a syscall, register `a7` is set with the syscall number (a `UVM32_SYSCALL_x`) and `a0`, `a1` are set with the syscall parameters. The response is returned in `a2`.
[target.h](common/uvm32_target.h#L12)
```c
static uint32_t syscall(uint32_t id, uint32_t param) {
register uint32_t a0 asm("a0") = (uint32_t)(param);
register uint32_t a1 asm("a1");
static uint32_t syscall(uint32_t id, uint32_t param1, uint32_t param2) {
register uint32_t a0 asm("a0") = (uint32_t)(param1);
register uint32_t a1 asm("a1") = (uint32_t)(param2);
register uint32_t a2 asm("a2");
register uint32_t a7 asm("a7") = (uint32_t)(id);
asm volatile (
"ecall"
: "=r"(a1) // output
: "r"(a0), "r"(a7) // input
: "=r"(a2) // output
: "r"(a7), "r"(a0), "r"(a1) // input
: "memory"
);
return a1;
return a2;
}
```
The [RISC-V SBI](https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc) is not followed, a simpler approach is taken.
## syscalls
There are two system syscalls used by uvm32, `halt()` and `yield()`.
`halt()` tells the host that the program has ended normally. `yield()` tells the host that the program requires more instructions to be executed.
`halt()` tells the host that the program has ended normally. `yield()` tells the host that the program requires more instructions to be executed. These are handled internally to uvm32.
New syscalls can be added to the host via `uvm32_init()`.
Each syscall maps a syscall number to a value understood by the host (`F_PRINTD` below) and has an associated type which tells the host how to interpret the data passed to the syscall.
Syscalls are handled in the host by reading the syscall identifier, then using the provided functions to get arguments and set a return response. Direct access to the VM's memory space is not allowed, to avoid memory corruption issues.
The following functions are used to access syscall parameters safely:
uint32_t uvm32_getval(uvm32_state_t *vmst, uvm32_evt_t *evt, uvm32_arg_t);
const char *uvm32_getcstr(uvm32_state_t *vmst, uvm32_evt_t *evt, uvm32_arg_t);
void uvm32_setval(uvm32_state_t *vmst, uvm32_evt_t *evt, uvm32_arg_t, uint32_t val);
uvm32_evt_syscall_buf_t uvm32_getbuf(uvm32_state_t *vmst, uvm32_evt_t *evt, uvm32_arg_t argPtr, uvm32_arg_t argLen);
Here is a full example of a working VM host from [apps/host-mini](apps/host-mini)
@ -151,38 +157,28 @@ Here is a full example of a working VM host from [apps/host-mini](apps/host-mini
uint8_t rom[] = { // mandel.bin
0x23, 0x26, 0x11, 0x00, 0xef, 0x00, 0xc0, 0x00, 0xb7, 0x08, 0x00, 0x01,
0x73, 0x00, 0x00, 0x00, 0x37, 0xf5, 0xff, 0xff, 0xb7, 0x15, 0x00, 0x00,
0x37, 0xe6, 0xff, 0xff, 0x13, 0x07, 0xf0, 0x01, 0xb7, 0x47, 0x00, 0x00,
0xb7, 0x06, 0x00, 0x01, 0x13, 0x08, 0xd5, 0xcc, 0x93, 0x82, 0x35, 0x33,
0x13, 0x03, 0x76, 0xe6, 0x93, 0x83, 0x35, 0xb3, 0x13, 0x86, 0x16, 0x00,
0x63, 0xc8, 0x02, 0x09, 0x13, 0x0e, 0x03, 0x00, 0x63, 0xca, 0x63, 0x06,
0x93, 0x0e, 0x00, 0x00, 0x93, 0x05, 0x00, 0x00, 0x13, 0x0f, 0x00, 0x00,
0x93, 0x0f, 0x00, 0x00, 0x93, 0x06, 0x00, 0x02, 0x13, 0x85, 0x06, 0xfe,
0x63, 0x62, 0xa7, 0x04, 0x33, 0x85, 0xd5, 0x01, 0x63, 0xee, 0xa7, 0x02,
0x13, 0x05, 0x00, 0x00, 0x93, 0x08, 0x06, 0x00, 0x33, 0x8f, 0xef, 0x03,
0xb3, 0x8f, 0xd5, 0x41, 0x73, 0x00, 0x00, 0x00, 0x13, 0x5f, 0xbf, 0x40,
0xb3, 0x8f, 0xcf, 0x01, 0x33, 0x0f, 0x0f, 0x01, 0xb3, 0x85, 0xff, 0x03,
0x93, 0xd5, 0xc5, 0x00, 0x33, 0x05, 0xef, 0x03, 0x93, 0x5e, 0xc5, 0x00,
0x93, 0x86, 0x16, 0x00, 0x6f, 0xf0, 0xdf, 0xfb, 0x13, 0x85, 0x06, 0x00,
0x93, 0x08, 0x00, 0x00, 0x73, 0x00, 0x00, 0x00, 0x13, 0x0e, 0x1e, 0x09,
0xe3, 0xda, 0xc3, 0xf9, 0x13, 0x05, 0xa0, 0x00, 0x93, 0x08, 0x00, 0x00,
0x73, 0x00, 0x00, 0x00, 0x13, 0x08, 0x98, 0x19, 0xe3, 0xdc, 0x02, 0xf7,
0x37, 0x05, 0x00, 0x80, 0x13, 0x05, 0x05, 0x0e, 0x93, 0x08, 0x30, 0x00,
0x73, 0x00, 0x00, 0x00, 0x67, 0x80, 0x00, 0x00, 0x48, 0x65, 0x6c, 0x6c,
0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64, 0x00
};
// Create an identifier for our host handler
// syscalls exposed to vm environement
typedef enum {
F_PUTC,
F_PRINTLN,
} f_code_t;
// Map exposed syscalls to syscalls
const uvm32_mapping_t env[] = {
{ .syscall = UVM32_SYSCALL_PRINTLN, .typ = UVM32_SYSCALL_TYP_BUF_TERMINATED_WR, .code = F_PRINTLN },
{ .syscall = UVM32_SYSCALL_PUTC, .typ = UVM32_SYSCALL_TYP_U32_WR, .code = F_PUTC },
0x73, 0x00, 0x00, 0x00, 0x13, 0x01, 0x01, 0xff, 0x23, 0x26, 0x81, 0x00,
0x37, 0xf5, 0xff, 0xff, 0xb7, 0x15, 0x00, 0x00, 0x37, 0xe6, 0xff, 0xff,
0x13, 0x07, 0xf0, 0x01, 0xb7, 0x47, 0x00, 0x00, 0xb7, 0x06, 0x00, 0x01,
0x13, 0x08, 0xd5, 0xcc, 0x93, 0x82, 0x35, 0x33, 0x13, 0x03, 0x76, 0xe6,
0x93, 0x83, 0x35, 0xb3, 0x13, 0x86, 0x16, 0x00, 0x63, 0xce, 0x02, 0x09,
0x13, 0x0e, 0x03, 0x00, 0x63, 0xce, 0x63, 0x06, 0x93, 0x0f, 0x00, 0x00,
0x13, 0x0f, 0x00, 0x00, 0x13, 0x04, 0x00, 0x00, 0x93, 0x0e, 0x00, 0x00,
0x93, 0x06, 0x00, 0x02, 0x13, 0x85, 0x06, 0xfe, 0x63, 0x64, 0xa7, 0x04,
0x33, 0x05, 0xff, 0x01, 0x63, 0xe0, 0xa7, 0x04, 0x13, 0x05, 0x00, 0x00,
0x93, 0x05, 0x00, 0x00, 0x93, 0x08, 0x06, 0x00, 0xb3, 0x8e, 0x8e, 0x02,
0x33, 0x0f, 0xff, 0x41, 0x13, 0xd4, 0xbe, 0x40, 0xb3, 0x0e, 0xcf, 0x01,
0x73, 0x00, 0x00, 0x00, 0x33, 0x04, 0x04, 0x01, 0x33, 0x85, 0xde, 0x03,
0x13, 0x5f, 0xc5, 0x00, 0x33, 0x05, 0x84, 0x02, 0x93, 0x5f, 0xc5, 0x00,
0x93, 0x86, 0x16, 0x00, 0x6f, 0xf0, 0x9f, 0xfb, 0x13, 0x85, 0x06, 0x00,
0x93, 0x05, 0x00, 0x00, 0x93, 0x08, 0x00, 0x00, 0x73, 0x00, 0x00, 0x00,
0x13, 0x0e, 0x1e, 0x09, 0xe3, 0xd6, 0xc3, 0xf9, 0x13, 0x05, 0xa0, 0x00,
0x93, 0x05, 0x00, 0x00, 0x93, 0x08, 0x00, 0x00, 0x73, 0x00, 0x00, 0x00,
0x13, 0x08, 0x98, 0x19, 0xe3, 0xd6, 0x02, 0xf7, 0x37, 0x05, 0x00, 0x80,
0x13, 0x05, 0x05, 0x10, 0x93, 0x08, 0x30, 0x00, 0x93, 0x05, 0x00, 0x00,
0x73, 0x00, 0x00, 0x00, 0x03, 0x24, 0xc1, 0x00, 0x13, 0x01, 0x01, 0x01,
0x67, 0x80, 0x00, 0x00, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f,
0x72, 0x6c, 0x64, 0x00
};
int main(int argc, char *argv[]) {
@ -190,7 +186,7 @@ int main(int argc, char *argv[]) {
uvm32_evt_t evt;
bool isrunning = true;
uvm32_init(&vmst, env, sizeof(env) / sizeof(env[0]));
uvm32_init(&vmst);
uvm32_load(&vmst, rom, sizeof(rom));
while(isrunning) {
@ -201,12 +197,16 @@ int main(int argc, char *argv[]) {
isrunning = false;
break;
case UVM32_EVT_SYSCALL: // vm has paused to handle UVM32_SYSCALL
switch((f_code_t)evt.data.syscall.code) {
case F_PUTC:
printf("%c", evt.data.syscall.val.u32);
switch(evt.data.syscall.code) {
case UVM32_SYSCALL_PUTC:
printf("%c", uvm32_getval(&vmst, &evt, ARG0));
break;
case F_PRINTLN:
printf("%.*s\n", evt.data.syscall.val.buf.len, evt.data.syscall.val.buf.ptr);
case UVM32_SYSCALL_PRINTLN: {
const char *str = uvm32_getcstr(&vmst, &evt, ARG0);
printf("%s\n", str);
} break;
default:
printf("Unhandled syscall 0x%08x\n", evt.data.syscall.code);
break;
}
break;