Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GOT.func imports in mo-ld #1811

Merged
merged 2 commits into from
Aug 18, 2020
Merged

Support GOT.func imports in mo-ld #1811

merged 2 commits into from
Aug 18, 2020

Conversation

osa1
Copy link
Contributor

@osa1 osa1 commented Aug 6, 2020

Fixes #1810

See comments for how this works.

Other changes:

  • The test 'closures' is updated to generate more than one table element in the
    Wasm module. Not necessary to test this feature, but it helps to have more
    than fun elements when working on element sections in the linker.

  • Make rule to build C files in linker tests updated:

    • The parameter --std=c11 is removed as it's already the default.
    • Updated the target to wasm32-emscripten as otherwise -fPIC doesn't work
      (fails with "compile with -fPIC" errors even though the flag is already
      passed).

@osa1 osa1 marked this pull request as draft August 6, 2020 11:21
@dfinity-ci
Copy link

dfinity-ci commented Aug 6, 2020

This PR does not affect the produced WebAssembly code.

test/ld/ok/fun-ptr.linked.wat.ok Outdated Show resolved Hide resolved
test/ld/Makefile Show resolved Hide resolved
@osa1
Copy link
Contributor Author

osa1 commented Aug 6, 2020

@nomeata does the test input and outputs make sense to you?

It'd be good if we had an example that updates value of the symbol f. I'll see if I can come up with that in C.

test/ld/ok/fun-ptr.linked.wat.ok Outdated Show resolved Hide resolved
test/ld/fun-ptr.wat Show resolved Hide resolved
test/ld/fun-ptr.wat Outdated Show resolved Hide resolved
@nomeata
Copy link
Collaborator

nomeata commented Aug 6, 2020

It'd be good if we had an example that updates value of the symbol f. I'll see if I can come up with that in C.

Thinking more about it, maybe it’s not so much about mutability, but simply the fact that in C all pointers (even function pointers) need to be pointers into memory (so that you can do arithmetic, comparisons etc.), and maybe that’s the reason for the GOP indirection.

@osa1
Copy link
Contributor Author

osa1 commented Aug 6, 2020

Thinking more about it, maybe it’s not so much about mutability, but simply the fact that in C all pointers (even function pointers) need to be pointers into memory (so that you can do arithmetic, comparisons etc.),

I'm not sure. Looking at C11 spec, it doesn't seem to mention arithmetic on function pointers. It just says function pointers may be converted to different function types and back. Doing anything else with a function pointer seems to be illegal, though the compilers currently seem to allow basically everything. I can do fun + 1 for example and it compiles to integer addition on Wasm. I have no idea why fun + 1 is not rejected by default.

@osa1
Copy link
Contributor Author

osa1 commented Aug 6, 2020

I figured this out: https://groups.google.com/g/llvm-dev/c/yjGdJadhO9g/m/K-TXukQ6AwAJ

I'll write a more detailed response on Monday but the link explains why we need the "GOT.func" import and how it's used.

@nomeata
Copy link
Collaborator

nomeata commented Aug 7, 2020

Did some more investigation on my, copying it here (just for my own reference maybe):

function pointers on the stack are just i32 containing the table index.
function pointers as top-level C variables have an indirection through the memory:

int (*imported)(int x, int y);

__attribute__ ((visibility("default")))
int exported2(int (*ptr)(int, int)) {
    return (*imported)(3,4) + (*ptr)(5,6);
}

when compiled to a dynamic library with

$WASM_CLANG --compile -fPIC --target=wasm32-unknown-emscripten --optimize=3 fun-ptr.c -o fun-ptr.wasm && $WASM_LD --shared fun-ptr.wasm -o fun-ptr.so.wasm && wasm2wat fun-ptr.so.wasm

produces

(module
  (type (;0;) (func (param i32 i32) (result i32)))
  (type (;1;) (func))
  (type (;2;) (func (param i32) (result i32)))
  (import "env" "memory" (memory (;0;) 0))
  (import "env" "__indirect_function_table" (table (;0;) 0 funcref))
  (import "env" "__memory_base" (global (;0;) i32))
  (import "env" "__table_base" (global (;1;) i32))
  (import "GOT.mem" "imported" (global (;2;) (mut i32)))
  (func $__wasm_call_ctors (type 1)
    call $__wasm_apply_relocs)
  (func $__wasm_apply_relocs (type 1))
  (func $exported2 (type 2) (param i32) (result i32)
    i32.const 3
    i32.const 4
    global.get 2
    i32.load
    call_indirect (type 0)
    i32.const 5
    i32.const 6
    local.get 0
    call_indirect (type 0)
    i32.add)
  (global (;3;) i32 (i32.const 0))
  (export "__wasm_call_ctors" (func $__wasm_call_ctors))
  (export "exported2" (func $exported2))
  (export "imported" (global 3))
  (data (;0;) (global.get 0) "\00\00\00\00"))

and there is a load when using imported, but not when using the ptr argument.

@osa1
Copy link
Contributor Author

osa1 commented Aug 10, 2020

function pointers as top-level C variables have an indirection through the memory:

This is not correct. A function pointer in C is something like:

int imported(int x, int y);

Here the value of symbol imported is the pointer to the function (e.g. in native compilation the memory address stored in imported contains the code for the function imported).

If you declare something like:

int (*imported)(int x, int y);

This is one level of indirection that you introduce, i.e. it's a pointer to a function pointer. That's why you see a GOT.mem import for this, instead of a GOT.func.


I did some reading and experimenting with GOT.mem and GOT.func imports. Here's some notes and how I think we should implement GOT.mem and GOT.func imports:

  • Note: GOT.func imports resolve to table indices, GOT.mem imports resolve to memory indices.

  • Note: Referring to a function/data in a function or other data does not generate GOT.func/GOT.mem imports, only referring to a function/data address does.

  • The goal is to replace (import "GOT.func" "x" (global (;N;) (mut i32))) with (global (;N;) i32 <x's table index>) and (import "GOT.mem" "x" (global (;N;) (mut i32))) with (global (;N;) i32 <x's mem index>).

    Note that normally we may have multiple GOT.func/GOT.mem imports to the same symbol in different modules and those imports may have different global numbers in their modules, but moc-generated code doesn't have GOT.func/GOT.mem imports (as we don't refer to function or data addresses in moc-rts in the moc-generated code) currently so we can assume that the RTS module will have GOT imports.

    (This will have to change if we decide to e.g. store function addresses or refer to arrays in moc-generated code)

  • We simply merge two modules as we do today, without changing GOT.func/GOT.mem imports. This mostly works, except the linker panics if the RTS has a GOT.mem import (I don't know why yet).

  • To resolve a GOT.func import, we simply find the index of the function in the final module, and see if it's in the table already. If it is then we replace the import with the table index. If not then we add it to the table and replace the import with its index.

  • To resolve a GOT.mem import, we need to find the symbol's address in the module's "data" section. As an example, if we have this in the RTS:

    __attribute__ ((visibility("default")))
    int32_t array[100] = {0};
    

    This generates a GOT.mem import:

    (import "GOT.mem" "array" (global (;4;) (mut i32)))
    

    When we see a "GOT.mem" import we also expect to see an export for it:
    (I'm not sure if this must always hold, but as far as I can see the export is the only way to find the data's offset in module's memory)

    (export "array" (global 5))
    

    Now global 5 will give us the offset in "data" section:

    (global (;5;) i32 (i32.const 544))
    

    So we have to replace the global 4 import (GOT.mem) with the i32 constant <RTS memory base + 544>.

@nomeata
Copy link
Collaborator

nomeata commented Aug 10, 2020

Sounds good!

table (elem section) we add it to the table, and then replace the import
with

(global (;N;) i32 (i32.const M'_i))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this needs to be a mutable global.
OT1H, the global was mutable before, and maybe some code actually mutates them?
OTOH, that would be quite odd, probably not intended, and maybe good to produce invalid wasm modules in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these globals don't need to be mutable. It was mutable before because LLVM generates mutable globals for GOT.func imports, but maybe that's just laziness. I don't think address of a function can change in C.

test/run/closures.mo Outdated Show resolved Hide resolved
@osa1
Copy link
Contributor Author

osa1 commented Aug 12, 2020

Here's a summary of this and why this has been tricky to implement correctly.

We have imports

(import "GOT.func" "x_1" (;g_1;) (global))
(import "GOT.func" "x_2" (;g_2;) (global))
...
(import "GOT.func" "x_N" (;g_N;) (global))

and in the same module, the functions and exports:

(export "x_1" (fun y_1))
(export "x_2" (fun y_2))
...
(export "x_N" (fun y_N))

We need to add functions y_1..y_N to the table in the final module, remove the imports, and add globals g_1..g_N for the table indices.

For that we need to create a map { g_1 :-> y_1; ..; g_N :-> y_N }. We can ignore xs and simply remove all GOT.func imports, and exports are already removed anyway (not sure how/where).

The tricky bit is to update global indices (keys) and function indices (values) in that map as we update the RTS module. I think I mostly figured this out, but it took a while. I'll hopefully finish this by the end of tomorrow.

@osa1
Copy link
Contributor Author

osa1 commented Aug 13, 2020

I can now build & run an example!

@osa1 osa1 force-pushed the osa1/GOT_func_linking branch 2 times, most recently from faa58ff to 93daf7c Compare August 13, 2020 11:12
@osa1
Copy link
Contributor Author

osa1 commented Aug 13, 2020

I confirmed that with this patch I can use some of the Rust core functions in #1750 that I couldn't before.

@osa1 osa1 marked this pull request as ready for review August 13, 2020 13:23
@osa1 osa1 requested a review from nomeata August 13, 2020 13:46
Copy link
Collaborator

@nomeata nomeata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, almost there. I think we can actually simplify this:

  • allocate table space based on old_elem_size of the first module
  • always create table entries for these fucntion.

This way things don’t break once tables become mutable, and the linker code might actually be smaller and simpler.

src/linking/linkModule.ml Outdated Show resolved Hide resolved
src/linking/linkModule.ml Outdated Show resolved Hide resolved
src/linking/linkModule.ml Show resolved Hide resolved
src/linking/linkModule.ml Outdated Show resolved Hide resolved
src/linking/linkModule.ml Outdated Show resolved Hide resolved
src/linking/linkModule.ml Outdated Show resolved Hide resolved
src/linking/linkModule.ml Outdated Show resolved Hide resolved
Comment on lines +728 to +717
let global_idx =
if is_global_import import.it.idesc.it then
Int32.add global_idx (Int32.of_int 1)
else
global_idx
in
( global_idx, imports )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let global_idx =
if is_global_import import.it.idesc.it then
Int32.add global_idx (Int32.of_int 1)
else
global_idx
in
( global_idx, imports )
if is_global_import import.it.idesc.it
then (Int32.add global_idx (Int32.of_int 1), imports)
else (global_idx, imports)

seems simpler, at the expense of symmetry with the previous. Up to you.

src/linking/linkModule.ml Outdated Show resolved Hide resolved
src/linking/linkModule.ml Outdated Show resolved Hide resolved
@osa1 osa1 force-pushed the osa1/GOT_func_linking branch 4 times, most recently from de83479 to 6885dd8 Compare August 17, 2020 09:34
(global (;1;) i32 (i32.const 2))
(global (;2;) i32 (i32.const 65536))
(start $link_start)
(elem (;0;) (i32.const 1) func $f0 $f1))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offset is 1 here because the first module (fun-ptr.wat above) has a table with size 1. I think this could be 0 because the table in the first module is imported; it doesn't define any elems.

@osa1
Copy link
Contributor Author

osa1 commented Aug 17, 2020

I think I addressed all comments and fixed the issues.

We currently create one (elem ...) section for the GOT.func imports, after the elem sections of the merged module. The offset of the section is calculated as

table size of first module + padding for alignment + table size of the second module

Table alignment and table size are read from the dynlink section (see "tablesize" and "tablealignment").

Copy link
Contributor

@ggreif ggreif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A few nits only. Excellent note. (We will need it!)

Thanks for doing this, it opens doors.

src/lib/lib.ml Outdated Show resolved Hide resolved
src/linking/linkModule.ml Outdated Show resolved Hide resolved
src/linking/linkModule.ml Outdated Show resolved Hide resolved
test/ld/Makefile Outdated Show resolved Hide resolved
@osa1 osa1 added the automerge-squash When ready, merge (using squash) label Aug 18, 2020
@mergify mergify bot merged commit 4a17463 into master Aug 18, 2020
@mergify mergify bot deleted the osa1/GOT_func_linking branch August 18, 2020 07:12
@mergify mergify bot removed the automerge-squash When ready, merge (using squash) label Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mo-ld should understand GOT.func imports
4 participants