-
Notifications
You must be signed in to change notification settings - Fork 340
[libclang] Add API to generate a reproducer for the explicitly-built modules. #10577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
[libclang] Add API to generate a reproducer for the explicitly-built modules. #10577
Conversation
Right now I'd like to get a high-level opinion if such an approach for reproducers is reasonable. This feature will have more commits, to get an idea of how it is supposed to look like you can check users/vsapsai/reproducer-explicitly-built-modules branch. |
An example of API usage is in swiftlang/swift-build#455 |
By the way, this doesn't work as a reproducer for dependency scanning, I'm just capturing the scanning results. My hope that with including the original command [used for scanning] and with the captured files you can still do something like
After landing the proposed level of reproducing we can think how to support scan-deps better. I was considering an alternative of invoking clang-scan-deps from the shell script and building modules based on the scanning results. But I've decided that embedding a little build system in a shell script is too much complexity and not justified at this stage. Maybe we'll see a need for something like that, then we can bite the bullet and add the required complexity. |
I had an idea of adding |
5185b66
to
dd4cc19
Compare
Ping. |
I'd like the API changes from users/vsapsai/reproducer-explicitly-built-modules (and the addition of |
…modules. (#10577) Capturing a single `-cc1` command to reproduce a bug with the explicitly-built modules isn't sufficient, we need to know what .pcm files were built and how. rdar://59743925
dd4cc19
to
9977985
Compare
Tools using the reproducers can specify a custom location to support scenarios similar to those covered by `-fcrash-diagnostics-dir` and `CLANG_CRASH_DIAGNOSTICS_DIR`.
…modules. (#10577) Capturing a single `-cc1` command to reproduce a bug with the explicitly-built modules isn't sufficient, we need to know what .pcm files were built and how. rdar://59743925
Tools using the reproducers can specify a custom location to support scenarios similar to those covered by `-fcrash-diagnostics-dir` and `CLANG_CRASH_DIAGNOSTICS_DIR`.
9977985
to
d440bf1
Compare
Added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but let's see if @Bigcheese or @benlangmuir have any objections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have an entry point in clang-scan-deps? It's unusual to have something in CDependencies that doesn't have an equivalent clang-scan-deps command-line interface, though it's not strictly necessary
* for a success case but is allowed to be empty when encountered an error. | ||
*/ | ||
CINDEX_LINKAGE enum CXErrorCode | ||
clang_experimental_DependencyScanner_generateReproducer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should use CXDependencyScannerWorkerScanSettings
instead of int argc, const char *const *argv, const char *ModuleName, const char *WorkingDirectory
for the basic compilation setup.
How sure are we that const char *ReproducerLocation, bool UseUniqueReproducerName,
are the only two reproducer-specific options we will ever want here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should use
CXDependencyScannerWorkerScanSettings
instead ofint argc, const char *const *argv, const char *ModuleName, const char *WorkingDirectory
for the basic compilation setup.
I was going back and forth about it and can be persuaded either way. My [weak] justification is that we don't pass the settings to a dependency service/worker/tool. So it's not a self-sufficient entity but more a convenience container that happens to be the same between different functions. So I avoid treating it as an independent entity.
How sure are we that
const char *ReproducerLocation, bool UseUniqueReproducerName,
are the only two reproducer-specific options we will ever want here?
For the potential enhancements I expect only something to help capture the scanner state. Unfortunately, I don't know what that would be. Maybe .pcm cache, maybe dumping stat cache, maybe something else. So far it is too vague and not the most important direction for the next steps. That's why I'd like to land something reasonably future-proof and iterate on it to see what else we might need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My [weak] justification is that we don't pass the settings to a dependency service/worker/tool.
Isn't that exactly what we do? The difference is that you're creating a new one instead of using an existing one.
hat's why I'd like to land something reasonably future-proof and iterate on it to see what else we might need.
If you want to make this future proof the C API should probably be much more opaque about the configuration, similar to what we do with the worker scan settings where the settings become their own opaque type and we can add new C functions over time to add new settings to that configuration object without breaking the existing API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My [weak] justification is that we don't pass the settings to a dependency service/worker/tool.
Isn't that exactly what we do? The difference is that you're creating a new one instead of using an existing one.
I mean that in clang_experimental_DependencyScannerWorker_getDepGraph
, for instance, Settings.MLO
and Settings.MLOContext
goes into DependencyScanningTool::createActionController
. Settings.argc
, Settings.argv
, Settings.WorkingDirectory
goes into computeDependencies
. And Settings.ModuleName
affects the function's logic. We don't use Settings
as a single entity anywhere and taking it apart is the first thing we do.
hat's why I'd like to land something reasonably future-proof and iterate on it to see what else we might need.
If you want to make this future proof the C API should probably be much more opaque about the configuration, similar to what we do with the worker scan settings where the settings become their own opaque type and we can add new C functions over time to add new settings to that configuration object without breaking the existing API.
Do you suggest using CXDependencyScannerWorkerScanSettings
or introducing something like CXDependencyScannerReproducerSettings
? I believe a single function with simple arguments is easier to work with and introducing clang_experimental_DependencyScanner_generateReproducer_v2
when necessary is not a big deal. But it's not a belief worth fighting, so I'm fine using whatever settings wrapper you think would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Settings.argc, Settings.argv, Settings.WorkingDirectory goes into computeDependencies. And Settings.ModuleName affects the function's logic.
The use of getTranslationUnitDependencies
is basically the same here as the use of computeDependencies
elsewhere. The fact you're not using the lookup callback could be a possible reason not to use CXDependencyScannerWorkerScanSettings
.
Incidentally, why do you accept ModuleName
and not use it?
Do you suggest using CXDependencyScannerWorkerScanSettings or introducing something like CXDependencyScannerReproducerSettings? I believe a single function with simple arguments is easier to work with and introducing clang_experimental_DependencyScanner_generateReproducer_v2 when necessary is not a big deal. But it's not a belief worth fighting, so I'm fine using whatever settings wrapper you think would be better.
I'm trying to understand what kind of changes are likely to come to this API over time, how it fits with the other scanner APIs, and so what makes sense to make it stable. Introducing a v2 is not necessarily a big deal, but one of these APIs made it to v5 very quickly and was annoying to work with in client code that needs to support multiple versions of the dylib, like swift-build does. Having explicit opaque settings objects is harder to work with when you have one version of an API, but it's easier to factor if we are supporting N versions of the API.
If you are confident that this API will not need more parameters, then I'm fine with keeping it like it is and we can make it opaque in a hypothetical v2 if there is an unexpected need to add one. If we already expect there is a good chance of needing changes to the API in future, then I'm inclined to make it opaque from the start.
DependencyScanningService DepsService( | ||
ScanningMode::DependencyDirectivesScan, ScanningOutputFormat::Full, | ||
CASOpts, /*CAS=*/nullptr, /*ActionCache=*/nullptr, FS); | ||
DependencyScanningTool DepsTool(DepsService); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this API create its own service/worker/tool instead of taking a CXDependencyScannerWorker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this API create its own service/worker/tool instead of taking a
CXDependencyScannerWorker
?
I wanted the reproducer scan to be independent from previous scans, that's why don't use an existing worker. It is a trade-off, not a uniquely correct approach. Reproducers should be useful in general, not only for debugging issues with EBM, that's why I'm leaning towards from-scratch scanning. And I hope it is easier to reason about than to take into account an invisible worker's state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
independent from previous scans
Can you clarify what specifically you want to be independent from? Is it the file system cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's less about reusing specific data but more about creating API that is easier to reason about. For example, I don't need to know implementation details that CXDependencyScannerWorker
doesn't maintain state that affects the reproducer results. That generating a reproducer doesn't affect CXDependencyScannerWorker
state in unpredictable way. That I don't need to be concerned about CXDependencyScannerWorker
being accessed concurrently. In practice, passing CXDependencyScannerWorker
works despite all of these concerns but there is no guarantee it won't change in the future.
llvm::SmallString<128> ReproScriptPath; | ||
int ScriptFD; | ||
if (ReproducerLocation) { | ||
if (!llvm::sys::fs::exists(ReproducerLocation)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_directories
has bool IgnoreExisting = true
I agree it is a good idea. But when I've started working on it, I've realized that for the proper API I need a build system ready to use it. And I don't have such a build system. I'm not strongly against it but I'd rather avoid developing functionality nobody uses. |
Tools using the reproducers can specify a custom location to support scenarios similar to those covered by `-fcrash-diagnostics-dir` and `CLANG_CRASH_DIAGNOSTICS_DIR`.
d440bf1
to
fc7ae9f
Compare
Capturing a single
-cc1
command to reproduce a bug with the explicitly-built modules isn't sufficient, we need to know what .pcm files were built and how.rdar://59743925