Conversation
3f51fdf to
f37bb13
Compare
f37bb13 to
d8d82b1
Compare
|
#21 concerns the system call overhead in kbox, which has been observed to be as high as 33× compared to native (bare-metal) execution. Addressing this overhead is a prerequisite for improving overall system performance. To this end, I plan to prioritize closing #21 by exploring a solution based on binary rewriting. Given current infrastructure constraints and to ensure efficient validation, the initial implementation will focus on x86-64 and arm64 architectures, both of which are fully supported by the machines provided through GitHub Actions. After the proposed solution has been thoroughly implemented and validated on these architectures, we can proceed to extend support to RISC-V. |
src/seccomp-defs.h
Outdated
| { \ | ||
| (unsigned short) (c), (unsigned char) (t), (unsigned char) (f), \ | ||
| (unsigned int) (val) \ | ||
| } |
There was a problem hiding this comment.
Looks like unrelated style changes?
Please keep the changes to a minimum.
Otherwise, this will make git blame messy.
There was a problem hiding this comment.
This change was made because the source code wasn't properly formatted with clang-format, which caused it to be blocked by the new commit-hook. Did you mean I have to separate it to another commit?
There was a problem hiding this comment.
Ideally, a single patch should contain only one logical change.
However, it looks like this patch is doing two things:
- Fixing a pre-existing clang-format error.
- Adding RISC-V support.
So, yes, I would prefer splitting this into two patches.
src/seccomp-supervisor.c
Outdated
| const struct kbox_host_nrs *host_nrs = &HOST_NRS_X86_64; | ||
| #elif defined(__aarch64__) | ||
| #elif defined(__aarch64__) || (defined(__riscv) && __riscv_xlen == 64) | ||
| const struct kbox_host_nrs *host_nrs = &HOST_NRS_AARCH64; |
There was a problem hiding this comment.
Reusing the HOST_NRS_AARCH64 struct directly for RISC-V is semantically confusing for future maintenance. Maybe consider adding a preparatory refactoring patch to rename it to something more generic before wiring up RISC-V support?
d8d82b1 to
0000234
Compare
|
I have implemented the seccomp mode for riscv64 in this commit, and trap mode and rewrite mode are left unimplemented. This can be build with Or using cross-compile toolchain on the x86-64 machine: The seccomp mode is tested for all architectures(aarch64, x86-64, riscv64), and the result is as same as the result before this commit: If the trap mode is specified when running on riscv64 architecture, the following error message will be shown currently: I am currently working on the trap and rewrite modes for riscv64. Would you prefer I include them in this PR, or should I submit them in a follow-up PR? |
0000de4 to
0000846
Compare
|
@jserv I have implemented both the seccomp mode and the trap mode for riscv64 architecture and update the unit test. Also, I have run the unit tests, integration tests and stress tests for riscv64 architecture. For the unit test, all of the 276 tests passed. For the integration tests, there are two FAILs, signal-test and rewrite-smoke. For the stress tests, there is one FAIL, concurrent-io. The signal-test and concurrent-io failed before these commits (Even on x86-64 machine), so it should be fixed in another PR. The reason that rewrite-smoke failed is that the rewrite mode is unimplemented for riscv64. For not making this PR too big, I think it should be included in another PR. My test process is as follows. $ make ARCH=riscv64 CC=riscv64-linux-gnu-gcc
$ make ARCH=riscv64 CC=riscv64-linux-gnu-gcc check-unitPut $ ./run-tests.sh ./kbox alpine.ext4Use the following command to run the stress tests: $ ./run-stress.sh ./kbox alpine.ext4 |
Agree. Update docs/ directory accordingly. |
0000ef3 to
00007be
Compare
src/syscall-trap.c
Outdated
| int64_t kbox_syscall_trap_host_rt_sigprocmask_unblock(const uint64_t *mask, | ||
| size_t sigset_size) | ||
| { | ||
| } |
There was a problem hiding this comment.
Won't these empty functions trigger compile warnings for non-void return types but omitting the return statements?
There was a problem hiding this comment.
These functions will never be called at that commit and was just implemented to avoid error during link stage.
In the commit I rebased just now, the warnings are fixed by returning -1 for every function.
| p_align = read_le64(buf + off + P_ALIGN_OFF); | ||
|
|
||
| if (p_filesz > p_memsz) | ||
| if (p_memsz && p_filesz > p_memsz) |
There was a problem hiding this comment.
Looks like a drive-by fix that is not mentioned in the commit message?
There was a problem hiding this comment.
This is because only those PT_LOAD sections which has non-zero p_memsz will be loaded into the memory. However, the p_filesz > p_memsz will be checked for every section. Thus, those sections with p_memsz (,such as .riscv.attributes,) will make it fail. This checks whether the p_memsz is zero before comparing the p_filesz and the p_memsz.
I have added this to the commit message.
There was a problem hiding this comment.
I'm not an expert in the ELF parser area, but based on reading the commit message, I have the following questions:
-
Is this a riscv specific issue, or has it always existed? From the new commit message, it isn't immediately clear to me why this problem is unique to riscv.
-
It looks very much like this commit is doing two separate things:
a) Fixing a bug
b) Adding support for a new feature
Should these be split into separate commits? If not, could you please explain the reasoning behind keeping them together?
There was a problem hiding this comment.
Is this a riscv specific issue, or has it always existed?
It is not a riscv specific issue. However, so far, it has only been triggered by riscv busybox.
The things that kbox_build_elf_load_plan do are the following two things:
- Parse flags in non-LOAD sections
- Find out all the LOAD sections, and output their informations (such as base and size)
All the non-LOAD sections are filtered out with the following check[1]:
if (p_type != PT_LOAD || p_memsz == 0)
continue;
However, all the sections (,including non-LOAD sections,) will be checked with the following if statement[2]:
if (p_filesz > p_memsz)
return -1;
Thus, if there is any non-LOAD sections with p_memsz equals to 0 and p_filesz bigger than 0, the kbox_build_elf_load_plan will fail. The reason why this error was not found is this kind of section was only found in riscv so far.
It looks very much like this commit is doing two separate things
OK, I separated them in rebased commits.
[1] https://github.com/sysprog21/kbox/blob/main/src/elf.c#L452C9-L453C22
[2] https://github.com/sysprog21/kbox/blob/main/src/elf.c#L420
Here are some additional descriptions about ELF sections.
A possible question is:
Why there are some non-LOAD sections whose
p_memszare non-zero
This is because they will actually be loaded into the memory. Using the following commands can see the informations about sections:
$ readelf -l busybox
...
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x0000000000000230 0x0000000000000230 R 0x8
...
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000c59a0 0x00000000000c59a0 R E 0x1000
It is found that a section with type PHDR is a non-LOAD section with memsz greater than 0, and its range is totally included by the LOAD section below it. This is the reason why the PHDR section should be loaded into the memory but it was skipped during kbox_build_elf_load_plan.
This introduces the riscv64 architecture to kbox and just implements seccomp mode first. The trap mode and the rewrite mode are left unimplemented at this moment. Change-Id: Ie95e7d74dbef45a05df8c81c37362b9d119520c4
00007be to
0000254
Compare
Previously while ELF parsing, for every sections, it will check if its size in file is less or equals to its size in memory. However, for those sections that don't have to be loaded into the memory (,such as .riscv.attributes), it will fail. This fixs it by checking if that section have to be loaded into the memory before comparing the filesz and memsz. Change-Id: I4cd6cd33c0c1681cbfc02a563c8f42e74726a1d3
This implements the trap mode for the riscv64 architecture. The rewrite mode is left unimplemented at this moment. Change-Id: Icadc0741b8405b5ee4a39fc803ee11d55cf6f238
The are some arch-related implementations in the unit test. This updates them. Change-Id: Ic97b0639aa70f4499d6f5e4b95ff6b512baf0db1
The rewrite mode of riscv64 is currently not supported. This commit updates the README and the document to state this limitation. Change-Id: I7cf2ef5dc7e6a01547b9a8a31a25dba600db74fc
0000254 to
0000392
Compare
This PR did the following things:
According to the kernel document[1], syscalls are specified in
arch/*/kernel/Makefile.syscallsandscripts/syscall.tbl.It is found that all the aarch64 host syscalls used in kbox has either common, 64 or rlimit tag in
scripts/syscall.tbl[2], which means riscv64 share the same host syscall table.Also, there are two errors found in the host syscall table. (Also according to
scripts/syscall.tbl[2]):The build script of rootfs is updated, so the following command can produce a usable alpine.ext4:
For the
fetch-lkl.shscript, the riscv64 support is added. However, the pre-built lkl binary has to be updated on GitHub (The github action has to be updated), so for now, it can only be compiled by explicitly assigning the LKL path in the config. The LKL build process is as follows (on x86-64 ubuntu 24.04):With the LKL path set in the config, the following command can successfully build the kbox:
The compiled kbox is successfully run on QEMU and execute /bin/sh.[3]
[1] https://docs.kernel.org/process/adding-syscalls.html#since-6-11
[2] https://github.com/torvalds/linux/blob/master/scripts/syscall.tbl
[3] https://hackmd.io/@rota1001/kbox-rv64#Run-it-on-riscv64
Summary by cubic
Add riscv64 support for seccomp and trap modes by sharing a generic 64-bit host syscall table with
aarch64, wiring up loader entry/transfer, and documenting that rewrite mode isn’t supported on riscv64.New Features
HOST_NRS_GENERIC; reuse theaarch64BPF deny list; set audit arch0xc00000f3; reportuname -mas "riscv64"; add loader entry/transfer; implement trap syscall6 and fast paths forfutexwait/wake,exit_group,execve/execveat,clone/clone3, andrt_sigprocmaskunblock.riscv64SHA256; pass-Wl,--no-relaxto speed riscv64 linking; listriscv64in README; note riscv64 rewrite-mode limitation in README and architecture docs.Bug Fixes
p_memszis non-zero and smaller thanp_filesz.HOST_NRS_GENERICfor 64-bit; definepidfd_*,faccessat2, andstatxfor riscv64; correctaarch64syscall numbers; update unit tests for riscv64 trap decoding, syscall IP ranges, runtime paths, and result handling.Written for commit 0000392. Summary will update on new commits.