-
-
Notifications
You must be signed in to change notification settings - Fork 379
fix Issue 18016 - using uninitialized value is considered @safe but has undefined behavior #2260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request, @WalterBright! Bugzilla references
|
$(UNDEFINED_BEHAVIOR If a void initialized variable's value is | ||
used before it is set, the behavior is undefined. | ||
$(IMPLEMENTATION_DEFINED If a void initialized variable's value is | ||
used before it is set, the value is implementation defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation defined means that the implementation must define the behavior, meaning that all D compilers must have this defined in their docs, which is obviously not the case. Hence why I think this should remain undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the bug report in question, I think void initialized variables should not be usable in @safe
code, unless we add data flow to the frontend which can prove the safety, which is not planned for near future.
(Because while it can be argued that this is memory safe, it is neither type-safe, nor something that we should allow to be easily overlooked in code review, due to the huge potential for logic errors.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that the compilers don't define the behavior doesn't make it undefined according to the language spec. That's a bug against the implementation, not the spec.
As for safety, that means memory safety. Ints with random values are not memory unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation defined behavior is something that you can rely on, given that you make sure you understand the differences between the implementations (if you want to make your code portable). Are you saying that you can rely to find particular values in a void initialized static array like char[128] arr = void;
even when compiling with optimizations? I suspect that some optimizing backends like GCC and LLVM may consider such code (after data flow analysis) as having undefined behavior and subsequently generate arbitrary behaving programs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation defined value means the implementation defines what the value is, which may include "arbitrary bit pattern". Undefined behavior is altogether different, as it would include "execute arbitrary code". If the implementation does the latter, it is bug in that implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to imply that the implementation is free to define the contents of a void
-initialized variable. The implementation may choose to define it as 0
. I suppose a security-hardened implementation may choose to do that. The implementation could also define it to be "whatever value was last written to that memory location either by the currently running process or some other process", but that would still be defined. I'm leaning towards @WalterBright's interpretation.
I suspect that some optimizing backends like GCC and LLVM may consider such code (after data flow analysis) as having undefined behavior and subsequently generate arbitrary behaving programs.
That's what's kindof weird about this. If it is implementation-defined, then isn't the implementation free to define it as undefined behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm nervous that you think this is @safe
... why would you say code that interacts with verifiably uninitialised data is cool and normal? It could do anything as a side-effect, and it's basically a gift-wrapped invitation to anyone crafting exploits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WalterBright If the value is implementation defined, the implementation needs to define what the value is for each type will be if you use void
-initialization. The point of void
-initialization is to not initialize the memory. How can an implementation correctly specify what the value should be if it reuses random memory from other parts of the program? This change is bad, it requires implementations to default-initialize void
-initialized variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C compilers leave it implementation defined, and indeed some compilers leave it as whatever happens to be in that memory location, and others set it to zero. I think that's quite reasonable, and it isn't necessary to define it further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C compilers leave it implementation defined
Sorry to drop in at random (the PR got linked from a forum discussion thread), but just as a point of fact: the C99 spec §6.7.8 "Initialization" para 10 states: If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
§3.17.2 defines an "indeterminate value" as "either an unspecified value or a trap representation". And §3.17.1 explicitly defines "implementation defined" as "unspecified value where each implementation documents how the choice is made".
With this change, compilers would still be allowed to leave the variable uninitialized, right? That means using uninitialized values would be allowed in D (as long as they're not interpreted as pointers). That would be a problem for |
…s undefined behavior
Pure functions returning void initialized data is still implementation defined behavior, even in @safe code. |
I don't think it's reasonable to allow turning |
It's implementation defined, not undefined. I don't think this is fundamentally different from returning allocated pointers from pure functions. The pointer values will be different each time, even though the function's arguments are identical. |
What's the status on this? |
I think this is the wrong way to fix this problem, because it leaves gaping @safe-ty holes. Also see: Note that issue 19968 is a symptom of a general problem. Allowing uninitialized memory to be read in @safe functions is a bad idea, and shouldn't be allowed by default. |
As I noted in the issue, it is a problem specific to bools. |
This is NOT a problem specific to bools!
|
Indeed, and hence this PR to fix that.
I would appreciate an example of UB from other non-pointer types, as I can't think of one. There's a reason I push on this. Consider:
It's a performance issue. I don't want to relegate this to @trusted code only. |
module data;
int[N] array;
static assert(0<N);
struct VerifiedIndex{
private int index_=0;
@disable this(int); // shouldn't be necessary, but it is
@property int index()@safe{ return index_; }
@property void index(int i)@trusted{ // no, this can't be @safe
enforce(0<=x&&x<N, "index out of bounds");
index_=i;
}
int read()@trusted{ return array.ptr[index]; }
void write(int x)@trusted{ array.ptr[index]=x; }
} module app;
import data;
void main()@safe{
VerifiedIndex i = void;
i.write(1337);
}
I understand, but no memory corruption in
|
Just directed here from a recent forum discussion. Reading Timon's example, it makes it difficult to justify this change. There are a couple options I can think of to address this. First, and a good possibility, is to allow only POD types to be initialized as void. Anything that has methods, or contains any types that have methods, would be prevented from being void initialized in Second option is to somehow allow the type to opt-in to void initializability. Can't think of a good syntax, but this only slightly gives more discretion. Currently, if you void initialize a member, it does nothing, as the whole type is initialized anyway. However, it may simply be prudent to keep the current rules and fully implement them in the compiler (i.e. void initialization is not allowed in safe code), because Timon is right -- any possibility of corruption with |
As I stated earlier, you could do this: struct TmpBuffer(T,size_t n){
private T[n] buffer=void;
// ... add @trusted interface here that ensures data is initialized before use
} Also, given that accessing uninitialized memory is made defined (but non-deterministic) behavior, |
I don't know if that works, though. When I declare an instance of |
|
Right, but that just means you can't use your proposed workaround/solution. Again, is there a way around it? |
Then wrap the rest of the function in a safe function to enable checking again:
|
@ntrel a possibility, but not practical for templates where you want everything inferred. |
@schveiguy sounds like a job for your |
I'm not sure that solves the problem. For example, if you want |
@schveiguy Yes, you're right that doesn't solve it. I think it could be done something like this: // user code
void parent(...) {
void work(alias safeVoidBuffer) {...}
mixin UseSafeVoidBuffer!(T, n, work);
}
// library
module safevoidbuf;
struct SafeVoidBuffer(T, size_t n) {...}
template UseSafeVoidBuffer(T, size_t n, alias worker) {
() @trusted {
SafeVoidBuffer!(T, n) buf = void;
buf.initSafely();
worker!buf();
}();
static if (isSystem!(worker!(...))) // assuming we can implement isSystem
if (0) doSomethingSystem; // force system inference for parent function
} So it might be doable. But supporting void initialization for struct fields seems like the best fix, avoiding the need for a nested function and mixin. |
Yes. |
I don't understand how we can honestly make an argument that |
@edi33416 and I have master student that tries to find holes in D's safety guarantees. He has been able to obtain random code execution from overlapping the stack with the heap by allocating a large void initialized static array on the stack. Something similar to: https://issues.dlang.org/show_bug.cgi?id=17566 . @WalterBright please reconsider this. At the very least, it should be illegal to have void initialized stack allocated static arrays. |
A
|
I agree. Pure in D is becoming meaningless with this type of behaviours. void main() @safe
{
auto voidInitialize(size_t Tsize)() @safe pure
{
ubyte[Tsize] _ = void;
void[Tsize] ret = _[];
return ret;
}
import std.stdio : writeln;
voidInitialize!100.writeln;
voidInitialize!100.writeln;
voidInitialize!100.writeln;
} This should simply not be an accepted code. Weak purity rules can be checked if any reference is passed to the function, but void initializers in a
You miss understanding the My opinion on this is that |
--- | ||
) | ||
|
||
$(UNDEFINED_BEHAVIOR If a void initialized variable is a reference type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does reference type
wording cover types with indirections too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better way to word this paragraph would be:
Void-initialization of variables whose types have unsafe values is not allowed in
@safe
code.
I would still consider implementation defined or defined behaviour as this can be checked at runtime. Take the example of GCC flag |
Not having a safe interface doesn't mean every call can be considered undefined behavior. Undefined behavior only comes up when the function is called incorrectly. I'm not sure what code you're referring to. If it's issue 17566, then the only relevant call is the one to |
True, but there's a possibility. ("that may be considered undefined behaviour")
No. If the function internally does something against D rules it may cause undefined behaviour like dereferencing a pointer that is pointing to an invalid object. See 12.1.1.5. point in the specification.
In the void* dst = stackBottom - sz;
|
No, there isn't a different kind of "possibility" of UB for unsafe interfaces. When used incorrectly, both safe and unsafe interfaces can exhibit UB. When used correctly, they both still can, but only due to bugs. The difference between safe and unsafe interfaces is how a correct call is defined. For safe interfaces, it's a bunch of language rules. For unsafe interfaces, it's an agreement between the author of the function and the user.
If an instance of UB is not due to an incorrect call, then it's a bug. Bugs can cause UB in both safe and unsafe interfaces, so they're not relevant here.
That's an unsafe operation. Doesn't mean it has undefined behavior. Passing a garbage pointer to an
It doesn't. You can read here what I maintain: |
What do you mean used incorrectly? A safe interface cannot exhibit undefined behaviour. Do you mean called with unsafe values? I just assumed that possibility as
You are right. I thought that |
From Note that the spec doesn't say that a safe interface must be called only with safe values. Passing unsafe values is allowed. It's just the programmer's responsibility then to ensure no UB, as with all unsafe operations. Example: import std.stdio;
void main() @system
{
char* p = cast(char*) 0xDEADBEEF; /* This is unsafe value. */
char[] a = p[0 .. 3]; /* This is also an unsafe value. */
writeln(p); /* Correct call of safe interface with unsafe value. Prints "DEADBEEF". */
writeln(a); /* Incorrect call exhibits UB. Usually segfaults. */
} (Assuming that I'm not missing some obscure unsafety in |
Yes, that's it.
When calling a safe interface with unsafe values there is a possibility of UB. Unless you know the exact instructions of it, you can't be ensured of no UB, just trusting the documentation, especially when that interface is external. Anyway, my major point on this is being against making void initializers unsafe for types without indirections. I think https://issues.dlang.org/show_bug.cgi?id=17566 can be fixed with runtime stack checking on the compiler, unless there are more edge cases that stack checking can't cover. AFAIK, this is also environment-specific. I'm still a bit sceptical about how the code provided in the issue is 100% following D rules, but I can't prove it, given your counterarguments about the |
I don't think I agree 100% with that, but let's not derail this PR discussion further.
I agree that stack probing is a solution for issue 17566.
Even if there is something wrong with the |
This surely needs a solution. Maybe stack checking can be enforced when void initializers are being used in @safe code. |
No description provided.