Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify offset safety rules #44688

Closed
joshlf opened this issue Sep 18, 2017 · 13 comments
Closed

Clarify offset safety rules #44688

joshlf opened this issue Sep 18, 2017 · 13 comments
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@joshlf
Copy link
Contributor

joshlf commented Sep 18, 2017

According to the nightly offset docs, in order to avoid UB, "both the starting and resulting pointer must be either in bounds or one byte past the end of an allocated object."

However, it's not clear to me what constitutes an "allocated object." This confusion caused this issue about whether it's safe to use offset in the implementation of an allocator.

cc @kennytm @gankro @japaric @ezrosent

@joshlf
Copy link
Contributor Author

joshlf commented Sep 18, 2017

One way of looking at this might be: does "allocated object" mean "allocated by any Alloc implementation" or "from the heap", or something else entirely?

@Gankra
Copy link
Contributor

Gankra commented Sep 18, 2017

I think the precise definition depends on memory model work, but my intuitive model is "thing you got from malloc".

@jonas-schievink
Copy link
Contributor

Note that stack allocations are fine as well

@joshlf
Copy link
Contributor Author

joshlf commented Sep 19, 2017

I think the precise definition depends on memory model work, but my intuitive model is "thing you got from malloc".

I'm curious about this definition because I can't imagine how Rust or LLVM would formalize the notion of an allocation. Does anybody know what LLVM offset is implemented in terms of? I tried to look for the intrinsic implementation in the compiler but I'm not familiar with where those are in the compiler source, and I couldn't find it by grepping.

@jonas-schievink
Copy link
Contributor

@joshlf You're looking for this part of trans:

"offset" => {
let ptr = llargs[0];
let offset = llargs[1];
bcx.inbounds_gep(ptr, &[offset])
}

So basically, the invariants of getelementptr inbounds must hold.

@joshlf
Copy link
Contributor Author

joshlf commented Sep 19, 2017

Awesome, thanks so much @jonas-schievink !

I'm still a little unsure of what the semantics are, though. Quoting from that link:

If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object.

I'm not sure on how LLVM defines "allocated object." Do you have any insight on this?

@jonas-schievink
Copy link
Contributor

I don't, sorry. Not really an LLVM expert.

@aidanhs aidanhs added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Sep 19, 2017
@QuietMisdreavus QuietMisdreavus added the A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools label Sep 21, 2017
@steveklabnik
Copy link
Member

@rust-lang/lang and/or @rust-lang/compiler , some guidance here on the wording would be great. We could always just do what we do with aliasing and link to that LLVM page, but that's not as great.

@cramertj
Copy link
Member

cramertj commented Oct 31, 2017

Someone who knows more about LLVM could probably answer this better, but I'd expect that LLVM is using "allocated object" to mean that the starting and resulting pointer must both point to memory that can be legally dereferenced (without e.g. segfaulting). I'd imagine LLVM is using this additional knowledge to know that it can safely prefetch the memory behind the pointer.

Edit: I was wrong! See below.

@Gankra
Copy link
Contributor

Gankra commented Oct 31, 2017

No, I don't think the important part is that the result can be accessed right away, but rather that the pointer doesn't point at something seemingly unrelated. I believe this condition is designed to avoid having to treat every pointer offset as "welp, now it can point to anything, give up on all alias analysis".

For instance, if I modify an element of a Vec, it sure as heck shouldn't invalidate the values of my local stack variables! But without [inbounds], it technically could.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Oct 31, 2017

Yeah @gankro is right. Code like this is UB:

let x: u8 = ...;
let y: u8 = ...;
// assume &x as usize < &y as usize, to avoid worrying about wrapping in the offset calculation
let ptr_to_y = (&x as *const _).offset(&y as usize - &x as usize);

(even if the x and y are actually allocated right next to each other at runtime)

@cramertj
Copy link
Member

That makes sense-- thanks for the clarification!

@steveklabnik steveklabnik added the P-medium Medium priority label Nov 21, 2017
@steveklabnik steveklabnik added the E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. label May 29, 2018
@jonas-schievink jonas-schievink added T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Mar 6, 2020
@Dylan-DPC
Copy link
Member

Revisiting this as part of a triage run. There is a link added to the docs which links you to this part. So it should be fine to resolve this issue and if anything is lacking, a new issue might be a better idea than this being buried deep into the issue tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants