Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boot: convert paging from assembly to Zig (take 2) #23

Merged
merged 54 commits into from
May 3, 2024
Merged

Conversation

mewmew
Copy link
Collaborator

@mewmew mewmew commented Apr 15, 2024

The main focus of this PR is to make it easier to work with paging.

For this reason, the kernel code and data segments are treated separately by dedicated functions, and each function has a start index (e.g. P2_KERNEL_STACK_FIRST_INDEX) to make it easier to rearrange the relative order of page mappings in the future.

This PR supersedes #19.

@mewmew
Copy link
Collaborator Author

mewmew commented Apr 22, 2024

Zig 0.12 lyckades åstadkomma ett stenras i Lappis, hehe, boot-asm rewrite'n funkar inte längre.

@mewmew
Copy link
Collaborator Author

mewmew commented Apr 27, 2024

Userland is now working with the following paging table.
In particular, user accessibility has now been removed from the kernel code/data and stack segments.

gef➤  pt -phys_verbose
             Address :         Length :         Phys | Permissions          
                 0x0 :      0x4000000 :          0x0 | W:1 X:1 S:1 UC:0 WB:1
           0x4000000 :       0x200000 :    0x4000000 | W:0 X:0 S:1 UC:0 WB:1
           0x4200000 :       0x400000 :    0x4200000 | W:1 X:0 S:1 UC:0 WB:1
           0x4600000 :       0x200000 :    0x4600000 | W:0 X:0 S:1 UC:0 WB:1
           0x4800000 :       0x200000 :    0x4800000 | W:0 X:1 S:0 UC:0 WB:1
           0x4a00000 :      0xfc00000 :    0x4a00000 | W:1 X:1 S:0 UC:0 WB:1
          0x14600000 :       0x200000 :   0x14600000 | W:0 X:1 S:0 UC:0 WB:1
          0x14800000 :       0x600000 :   0xfd000000 | W:1 X:0 S:1 UC:0 WB:1

@mewmew
Copy link
Collaborator Author

mewmew commented Apr 27, 2024

CI build requires lld. From https://github.com/karlek/lappis/actions/runs/8858873900/job/24328021167?pr=23#step:9:41:

make: ld.lld: No such file or directory

@karlek
Copy link
Owner

karlek commented Apr 29, 2024

I'll fix the CI build and then merge after a short review.

Run equivalent of zigfmt to automate the update.
Note: used gdb to get latest hardcoded value of tss_addr xD
This was needed for debugging before the jump to long mode was
fully functional.
This was used to switch from 16-bit segmented mode to
32-bit protected mode.

As Lappis is now using Multiboot, the switch from 16-bit to
32-bit mode is done by the multiboot boot loader.
This was used for debug printing in 16-bit mode.

As Lappis is now using multiboot, the boot loader handles the switch
from 16-bit to 32-bit mode.
Run equivalent of zigfmt to automate the update.
Also, fix build for Zig version 0.11.0

And bump hardcoded tss64 address value.
Prior to this commit, set_up_page_tables was used to both initialize the
page table tree mapping and to map kernel code and data segment pages.

Split out the latter two into dedicated functions.
The the foo_32.zig files are compiled for 32-bit architecture, and
functions are exported in the object file using the `export` keyword.

To avoid re-exporting functions and facilitate re-use of declared
constants, move paging constants to their own source file and
include this source file where needed.
The absence of the nx bit when using the ld linker exposed
a bug in the ld linker, as further described in [1].
In particular this bug lead to the OS being fully functional,
but pages being loaded as RWX even though they were supposed
to have the nx bit set. Very sneaky, and almost too good to
be true, in the sense of "trusting trust" kind of bugs in
compilers and linkers.

In either case, future commits will create dedicated pages
for userland code and data segments, and then make the
kernel data segment NX again.

Prior to this commit, `info tlb` gave the following:

	(qemu) info tlb
	0000000000000000: 0000000000000000 --PDA--UW
	0000000000200000: 0000000000200000 --PDA--UW
	0000000000400000: 0000000000400000 --PDA--UW
	0000000000600000: 0000000000600000 --PDA--UW
	0000000000800000: 0000000000800000 --PDA--UW
	0000000000a00000: 0000000000a00000 --PDA--UW
	0000000000c00000: 0000000000c00000 --PDA--UW
	0000000000e00000: 0000000000e00000 --PDA--UW
	0000000001000000: 0000000001000000 --P----UW
	0000000001200000: 0000000001200000 --P----UW
	0000000001400000: 0000000001400000 --P----UW
	0000000001600000: 0000000001600000 --P----UW
	0000000001800000: 0000000001800000 --P----UW
	0000000001a00000: 0000000001a00000 --P----UW
	0000000001c00000: 0000000001c00000 --P----UW
	0000000001e00000: 0000000001e00000 --P----UW
	0000000002000000: 0000000002000000 --P----UW
	0000000002200000: 0000000002200000 --P----UW
	0000000002400000: 0000000002400000 --P----UW
	0000000002600000: 0000000002600000 --P----UW
	0000000002800000: 0000000002800000 --P----UW
	0000000002a00000: 0000000002a00000 --P----UW
	0000000002c00000: 0000000002c00000 --P----UW
	0000000002e00000: 0000000002e00000 --P----UW
	0000000003000000: 0000000003000000 --P----UW
	0000000003200000: 0000000003200000 --P----UW
	0000000003400000: 0000000003400000 --P----UW
	0000000003600000: 0000000003600000 --P----UW
	0000000003800000: 0000000003800000 --P----UW
	0000000003a00000: 0000000003a00000 --P----UW
	0000000003c00000: 0000000003c00000 --P----UW
	0000000003e00000: 0000000003e00000 --P----UW
	0000000004000000: 0000000004000000 --P----U-
	0000000004200000: 0000000004200000 --P----UW
	0000000004400000: 0000000004400000 --PDA--UW
	0000000004600000: 0000000004600000 --P----U-
	0000000004800000: 00000000fd000000 --PDA--UW
	0000000004a00000: 00000000fd200000 --PDA--UW
	0000000004c00000: 00000000fd400000 --PDA--UW

And after the commit, `info tlb` gives the following:
	(qemu) info tlb
	0000000000000000: 0000000000000000 --PDA--UW
	0000000000200000: 0000000000200000 --PDA--UW
	0000000000400000: 0000000000400000 --PDA--UW
	0000000000600000: 0000000000600000 --PDA--UW
	0000000000800000: 0000000000800000 --PDA--UW
	0000000000a00000: 0000000000a00000 --PDA--UW
	0000000000c00000: 0000000000c00000 --PDA--UW
	0000000000e00000: 0000000000e00000 --PDA--UW
	0000000001000000: 0000000001000000 --P----UW
	0000000001200000: 0000000001200000 --P----UW
	0000000001400000: 0000000001400000 --P----UW
	0000000001600000: 0000000001600000 --P----UW
	0000000001800000: 0000000001800000 --P----UW
	0000000001a00000: 0000000001a00000 --P----UW
	0000000001c00000: 0000000001c00000 --P----UW
	0000000001e00000: 0000000001e00000 --P----UW
	0000000002000000: 0000000002000000 --P----UW
	0000000002200000: 0000000002200000 --P----UW
	0000000002400000: 0000000002400000 --P----UW
	0000000002600000: 0000000002600000 --P----UW
	0000000002800000: 0000000002800000 --P----UW
	0000000002a00000: 0000000002a00000 --P----UW
	0000000002c00000: 0000000002c00000 --P----UW
	0000000002e00000: 0000000002e00000 --P----UW
	0000000003000000: 0000000003000000 --P----UW
	0000000003200000: 0000000003200000 --P----UW
	0000000003400000: 0000000003400000 --P----UW
	0000000003600000: 0000000003600000 --P----UW
	0000000003800000: 0000000003800000 --P----UW
	0000000003a00000: 0000000003a00000 --P----UW
	0000000003c00000: 0000000003c00000 --P----UW
	0000000003e00000: 0000000003e00000 --P----UW
	0000000004000000: 0000000004000000 X-P----U-
	0000000004200000: 0000000004200000 X-P----UW
	0000000004400000: 0000000004400000 X-PDA--UW
	0000000004600000: 0000000004600000 X-P----U-
	0000000004800000: 00000000fd000000 X-PDA--UW
	0000000004a00000: 00000000fd200000 X-PDA--UW
	0000000004c00000: 00000000fd400000 X-PDA--UW

[1]: 9cf1a8a#r883015557
mewmew and others added 13 commits April 29, 2024 22:20
Refactor of paging_32.zig now allows for virtual addresses of up-to 4GB.
Now implementing a state-of-the-art ASCII compatible virtual address
scheme for next-gen debugging performance.

Specifically map the "bootstrap" page to start the kernel. In time we'll
probably zero out this segment and page-out the null page.

Move the non-temporary stack position calculation to zig.
Since we are using a very naive virtual paging algorithm which basically
maps virtual to physical 1-to-1, we use a _lot_ of memory.

A clear improvement would be to track how many pages we actually use
instead of calculating the base address from the page index. That way we
can reduce the memory footprint to probably less than 1G.
Before, the ctrl-c would cancel the script. This made gdb interrupts
impossible.
debug.sh Outdated
@@ -22,7 +22,7 @@ kitty --class qemu-starter \
-drive media=disk,index=0,file=bin/zipfs.img,format=raw,if=ide \
-cdrom bin/kernel.iso &

gdb \
kitty --class lappis-serial-raw -e gdb \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fet quality of life!

karlek added 14 commits May 1, 2024 22:55
Enable Supervisor Memory Execute Protection (SMEP) and lay down pipes
for a future when we can also enable Supervisor Memory Access Protection
(SMAP).
- remove nested fish for tailing `serial.raw`
- move sleep after qemu to give focus to gdb
- give class names similar style (`lappis-` prefix)
We parse and handle the ELF file as long as possible in kernel memory
before calling `userland_malloc` and `userland_memcpy`. It quickly gets
murky when mixing kernel & userland objects.
Lots of things:
- `swapgs` to swap between KernelGSBase and GSBase. Basically, save
  arbitrary offset in `gs` segment. We use this to point to `tss64` for
  quickly and easily accessing a scratch space for .rsp0 (kernel) and .rsp2
  (userland).

- syscall calling conventions are wonky when maintaining C ABI. Since
  rcx contains the return address of userland, the 4th argument is
  instead passed in r10. Also, r11 contains the RFLAGS (masked by
  MSR_FMASK), we use this to turn off interrupts when calling syscalls.
  That's why we no longer need `cli` and `sti`.

- MSR_STAR is wonky. It's upper bits denotes which selectors to update
  with on `sysret`. But, they use different offsets so it's very
  confusing:
  ```
  // syscall offsets
  CS.Selector ← IA32_STAR[47:32] AND FFFCH (* Operating system provides CS; RPL forced to 0 *)
  SS.Selector ← IA32_STAR[47:32] + 8;

  // sysret offsets
  CS.Selector ← IA32_STAR[63:48]+16;
  SS.Selector ← (IA32_STAR[63:48]+8) OR 3;
  ```
  This makes the `IA32_STAR[63:48]` point to the kernel data segment,
  because when the offsets resolve they actually point to the userland
  code segment instead. Why it's offset 8 and 16 instead of 0 and 8 like
  for CPL 0 is beyond me.

- Could probably use some more documentation about the machine specific
  registers and their magic values. However, there's a lot of
  refactoring left before syscalls work with SMAP! (Like copying
  userland buffers into the kernel heap).
Also, changed the immediate value to 1 to _not_ catch the sys_print
(since SMAP would crash us).
Basically:
```
ptr = kmalloc(SIZE_OF_USERLAND_OBJ);
userland_memcpy(ptr, userland_obj, SIZE_OF_USERLAND_OBJ);
```
Also fix missing register clobbering of `rsi`.
Dear god it actually works!
@karlek
Copy link
Owner

karlek commented May 3, 2024

I'll merge this now, huge success!

@karlek karlek merged commit 9665f3e into main May 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants