-
Notifications
You must be signed in to change notification settings - Fork 60
Do function pointers behave like data pointers (wrt provenance and other aspects)? #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It would be pretty nice if they don't, as for a while now we've taught that the way to invert the |
Have we?
That's fair. I think a large part of programmers that write unsafe code also want to understand the model, so we shouldn't make it more complicated than absolutely necessary. However, the majority of programmers will probably never look at the model so there also is value is making it "do the expected thing", and it is worth spending some complexity on that. So to me this depends on how complicated we have to go to support this. I would also like to distinguish the two directions:
|
This isn't just about fnptr2int transmutes so maybe I should bring this up somewhere else, but I really think it is a bad idea to make these transmutes UB instead of simply stripping the provenance. (That is, when converting bytes with provenance to a value of integer type, the provenance is lost, and when it is saved again the provenance is not recovered.) I see no gain in making it immediate UB. |
That is basically #286. Though that thread is so huge now, not sure how useful it still is... It's definitely off-topic for this thread though. :) |
I think function pointers on CHERI do have provenance (you cannot simply make up function pointers from integers) which is an argument supporting function pointers having provenance. |
#309 got folded into this issue, so I generalized the title a bit to not just be a about provenance -- there are also questions around whether these types even have the same size etc |
Somewhat related to this is: what are even the semantics of function {item,pointer} to {pointer,address} cast. These are named in the reference but don't appear in the following semantics section. AFAIK the example in the |
Function items have no data, so casting those to function pointers is a very special operation that synthesizes a suitable pointer "out of thin air". This operation is non-deterministic; executing it multiple times for the same function can produce different pointers. Functions points are either like |
I believe that we need function pointers to have provenance if we consider the fact that code is modifiable at runtime (e.g. with a JIT compiler). Consider this somewhat contrived example: // Assume code is runtime-generated and mapped with RWX permissions.
fn patch_and_run(code: &mut [u8]) {
// Modify a byte in the code.
code[20] = 0x34;
// Execute the modified code.
let f: fn() = unsafe { transmute(code.as_ptr()) };
f();
// Modify the byte to something else.
code[20] = 0x56;
} It would be very surprising to users if the execution of the code did not see |
When it comes to concurrent modifications, modifying code requires platform-specific extra fences -- normal fences do not suffice as the hardware will otherwise fetch outdated code. I have no clue if hardware guarantees that it will immediately see updated code within a single thread, but I would not be surprised if a fence was required here even within a single thread. And anyway, in the opsem I don't think we want to model self-modifying code. This can only be done with inline asm. |
Some architectures do require explicit instruction cache invalidation and fences, but at least x86 does not: it guarantees that the instruction and data caches are coherent and code modifications are immediately visible. On x86 we need to guarantee that the example I showed works as expected from the point of view of compiler optimizations. |
I think it is reasonable for the compiler to assume that it can reorder noalias memory accesses and fn ptr calls. So I disagree with the claim that we need to guarantee this, I think we should require an inline asm block (that takes the mutable ref as input) for code like this even if the hardware does not require a fence.
|
It seems LLVM already optimizes this incorrectly(?) today: https://rust.godbolt.org/z/9E6o3qcz9 |
Actually it works if I cast it to an integer first with exposed provenance: https://rust.godbolt.org/z/fYKTYqb83 |
I'd be surprised if LLVM supported self-modifying code without inline assembly -- though by exposing enough provenance you can probably confuse the optimizer enough that it won't touch the code any more. |
Uh oh!
There was an error while loading. Please reload this page.
Miri currently treats fn ptrs and data ptry very similarly, in particular with regards to provenance. When calling a function pointer, its provenance is consulted to identify which function to invoke. This makes int2fnptr transmutes a problem (see rust-lang/rust#97321). fnptr2int transmutes are also UB because fn ptrs carry provenance which integers must not.
However, the trouble with provenance for data pointers come from multiple pointers with the same address but different provenance. Function pointers can't be offset and don't have aliasing restrictions or a "one-past-the-end" rule, so none of this applies. Hence we potentially could make them not carry provenance, and we could do the mapping from pointer to function without its provenance (basically, doing the int2ptr cast at the time the call is made).
Beyond these formal details, there are pragmatic concerns on niche architectures, such as whether data and function pointers even have the same size and representation.
Also see this Zulip discussion.
The text was updated successfully, but these errors were encountered: