-
Notifications
You must be signed in to change notification settings - Fork 5
3C loses struct definition as part of a field nested in another struct definition #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yep struct outer {
struct inner {
int x;
} *i;
}; becomes struct outer {
_Ptr<struct inner> i;
}; |
3c -alltypes
loses struct definition as part of an array field nested in another struct definition
I updated the issue title to reflect that there are examples that do not require array fields and thus do not require |
This problem may be worse that we thought:
converts to
which isn't even correct syntax, or use of the name "inner". |
It looks like several other error message templates that currently occur in our benchmarks may be caused by this issue when the nested struct type has no name (not to be confused with an "anonymous" struct where the variable has no name and the fields enter the surrounding scope). With this slight modification of the original example: struct outer {
struct inner {
int x;
} i_arr[1];
};
int foo(struct outer *o) {
return o->i_arr[0].x;
}
struct outer {
// Now it looks like `i_arr` is the type name rather than the field name, so we get:
// error: expected member name or ';' after declaration specifiers
// error: array has incomplete element type 'struct i_arr'
struct i_arr _Checked[1];
};
int foo(_Ptr<struct outer> o) {
// error: no member named 'i_arr' in 'struct outer'
return o->i_arr[0].x;
} |
Summary of discussion from today's meeting: I suggested that one approach we could take that should completely avoid this problem is to move all struct definitions to the top level and leave the field declarations just referring to them by name. Kyle pointed out that 3C tries to minimize its edits to the user's code and this would be a dramatic edit, but Mike says that nested struct definitions could be considered a poor coding practice that 3C is justified in preemptively cleaning up. As Kyle pointed out, the next implementation problem is that the Clang API makes it straightforward for us to get the proper source locations to rewrite the inner struct fields in place or to move the entire inner struct definition as a chunk of text, but it would be tricky to do both in the same rewriting pass. I proposed a workaround of performing an initial pass that just moves all struct definitions to the top level and then re-parsing all the source files so we have the correct source locations to rewrite the fields of the moved struct definitions. This could all be done within a single run of the 3C executable. (If we wanted, 3C could avoid moving structs that it won't need to rewrite by solving for the annotations, figuring out which structs it will need to rewrite, moving them, and then starting over.) Further ideas from me after the meeting: We could consider additionally addressing #542 by inventing names for unnamed struct definitions and moving them in the same way. But I wonder if we could just fix the rewriter to bring the struct body along with the type name wherever it needs to go in the rewritten code for the first variable declaration. For a multi-variable declaration, the subsequent variables can reference the struct type by name. For example, 3C could convert this: struct mystruct {
int *x;
} (*f)(struct { int *y; } *p), b; to this: _Ptr<struct mystruct {
_Ptr<int> x;
} (struct { _Ptr<int> y; } *p)> f;
struct mystruct b; Yes, it's valid Checked C to define a named struct type inside a function pointer type like this! It looks crazy, but the user is the one who decided to combine the struct and variable definitions in the first place. 🙂 This could work the same way whether the code is at the top level or nested in an outer struct. The same approach would work for single-variable declarations using unnamed struct types (in effect, automating in 3C what the 3C warning in #542 currently tells the user to do manually) and even for multi-variable declarations if we invent a name for the struct type. Update: I realize that the "keep the body with the name" approach has a similar problem as the "move to the top level" approach in that we may have to rewrite the fields of the struct as well as the surrounding declaration, and it's unclear whether it would be feasible to break this into multiple passes as I proposed for the "move to the top level" approach. In simpler examples like the one above, we might be able to fiddle with the syntax surrounding the struct definitions while preserving the source locations of the struct fields so we can rewrite them in parallel. (In this case, we would add the struct mystruct2 {
int *x;
} (*(*g)(struct { double *z; } *q))(struct { int *y; } *p); might have to become this: _Ptr<_Ptr<struct mystruct2 {
_Ptr<int> x;
} (struct { _Ptr<int> y; } *p)> (struct { _Ptr<double> z; } *q)> g; Note that the order of |
The latest plan is that I'm going to work on de-nesting structs, not Kyle. |
It's been a while, but I'm just seeing the above update now, so I just wanted to chime in to say that depending on how much work it is, we could "keep the body with the name" and ignore the complex fp example as uncommon code. I prefer smaller diffs and we don't need to support every possibility (though I would like to if we had the time). But I don't know which solution you favor or how much work they will be. |
Our current plan after discussion at today's meeting is essentially as I previously suggested: Move all structs to the top level (or if they are inside a function, maybe to the top level of the function instead) as a first pass, then if any structs actually needed to be moved, re-parse the ASTs before proceeding with the normal inference. Implement struct de-nesting first and then decide whether to change ordinary multi-decl rewriting to use it (see #647). 3C would accept a flag to stop after struct de-nesting (and any similar refactorings we implement in the future) and not do inference. The re-parsing will have some performance cost. We hope that users who are committed to porting their program will be willing and able to accept the struct de-nesting refactoring up front (assuming we can make it reliable enough); then subsequent 3C runs will detect that no structs need to be de-nested and the performance cost would be minimal. However, users who want to make small changes to their original program and see the corresponding changes to the 3C output would have to pay the performance cost repeatedly. An additional thought from me: If there are multiple levels of nested structs: struct A {
struct B {
struct C { ... } c;
} b;
}; The most straightforward approach would be to move only the second-level structs such as |
I had a new idea about the general problem of moving a code block when rewrites also need to be made inside of it. This problem comes up both when de-nesting three or more levels of structs as in my previous comment (there are actually a few examples of this in libarchive) and if we want to de-nest structs and perform the normal inference in the same run of the 3c tool. My original proposed approach of re-parsing the ASTs has a number of downsides. A new one I realized is that 3C might issue a diagnostic against an intermediate version of a file that differs from either version available to the user (the original or the final version in the output dir). This would be confusing, though I don't know how often it would happen in practice and how confusing it would actually be. And it would complicate the integration with the diagnostic verifier, for what that's worth. More broadly, if something goes wrong in producing an intermediate version of a file and that leads to later problems, it might be more work to troubleshoot; we'd need an option to dump out all intermediate file versions. My new idea is to buffer all the requested rewrites and then have code that looks at the source ranges to find an order in which to perform the rewrites so that all rewrites inside a code block are done before the block is moved. When we move the block, we call
Any thoughts on which approach is better? If I don't get any feedback, I'll plan to try the new one first. Migrating all of 3C to buffer rewrites might become a somewhat invasive change, but I can start by implementing it only for multi-level struct de-nesting and see how that goes. |
I'd prefer to avoid the temporary files if possible. I don't know if you checked, but can you add text multiple times at the same location to get multiple lines added? |
You mean you prefer the second approach? Great, then I will try it.
Yes, that works: I have seen it happen in my tests so far on libarchive.
I don't understand how this relates to the problem I described. In the example: struct A {
struct B {
struct C { ... } c;
} b;
}; 3C would generate two rewrites:
Maybe your point was that we need to ensure that struct C { ... };
struct B {
struct C c;
};
struct A {
struct B b;
}; not the following, which is what we would get if we copied the content of the source range for struct C { ... };
struct B {
struct C { ... } c;
};
struct A {
struct B b;
}; Either the AST re-parsing approach or the approach that ensures rewrite (2) is performed first and then uses |
multi-decl rewriting splits them rather than losing them. Fixes #531 except for a "declaration does not declare anything" compiler warning that will be addressed by de-nesting the inline structs.
multi-decl rewriting splits them rather than losing them. Fixes #531 except for a "declaration does not declare anything" compiler warning that will be addressed by de-nesting the inline structs.
multi-decl rewriting splits them rather than losing them. Fixes #531 except for a "declaration does not declare anything" compiler warning that will be addressed by de-nesting the inline structs.
Example reduced from libarchive:
3c -alltypes
produces:We lost the definition of
struct inner
, so we get the following compile error:I believe this also explains at least some of the "incomplete definition of type" errors in libarchive, though I haven't confirmed that. I haven't looked into whether there are other variants of the example that trigger the bug without using an array.
It looks like 3C has code somewhere that splits
struct inner { ... } i_arr[1];
intostruct inner { ... };
andstruct inner i_arr[1];
when it occurs at the top level, but this doesn't work for nested structs.The text was updated successfully, but these errors were encountered: