-
Notifications
You must be signed in to change notification settings - Fork 340
[libclang] Add API to generate a reproducer for the explicitly-built modules. #10577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
// Test generating a reproducer for a modular build where required modules are | ||
// built explicitly as separate steps. | ||
|
||
// RUN: rm -rf %t | ||
// RUN: split-file %s %t | ||
// | ||
// RUN: c-index-test core -gen-deps-reproducer -working-dir %t \ | ||
// RUN: -- clang-executable -c %t/reproducer.c -o %t/reproducer.o \ | ||
// RUN: -fmodules -fmodules-cache-path=%t | FileCheck %t/reproducer.c | ||
|
||
// Test a failed attempt at generating a reproducer. | ||
// RUN: not c-index-test core -gen-deps-reproducer -working-dir %t \ | ||
// RUN: -- clang-executable -c %t/failed-reproducer.c -o %t/reproducer.o \ | ||
// RUN: -fmodules -fmodules-cache-path=%t 2>&1 | FileCheck %t/failed-reproducer.c | ||
|
||
// Test the content of a reproducer script. | ||
// RUN: c-index-test core -gen-deps-reproducer -working-dir %t -o %t/repro-content \ | ||
// RUN: -- clang-executable -c %t/reproducer.c -o %t/reproducer.o \ | ||
// RUN: -fmodules -fmodules-cache-path=%t | ||
// RUN: FileCheck %t/script-expectations.txt --input-file %t/repro-content/reproducer.sh | ||
|
||
//--- modular-header.h | ||
void fn_in_modular_header(void); | ||
|
||
//--- module.modulemap | ||
module Test { header "modular-header.h" export * } | ||
|
||
//--- reproducer.c | ||
// CHECK: Sources and associated run script(s) are located at: | ||
#include "modular-header.h" | ||
|
||
void test(void) { | ||
fn_in_modular_header(); | ||
} | ||
|
||
//--- failed-reproducer.c | ||
// CHECK: fatal error: 'non-existing-header.h' file not found | ||
#include "non-existing-header.h" | ||
|
||
//--- script-expectations.txt | ||
CHECK: clang-executable | ||
CHECK: -fmodule-file=Test=reproducer.cache/explicitly-built-modules/Test-{{.*}}.pcm |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -317,6 +317,150 @@ void clang_experimental_DependencyScannerWorkerScanSettings_dispose( | |
delete unwrap(Settings); | ||
} | ||
|
||
namespace { | ||
// Helper class to capture a returnable error code and to return a formatted | ||
// message in a provided CXString pointer. | ||
class MessageEmitter { | ||
const CXErrorCode ErrorCode; | ||
CXString *OutputString; | ||
std::string Buffer; | ||
llvm::raw_string_ostream Stream; | ||
|
||
public: | ||
MessageEmitter(CXErrorCode Code, CXString *Output) | ||
: ErrorCode(Code), OutputString(Output), Stream(Buffer) {} | ||
~MessageEmitter() { | ||
if (OutputString) | ||
*OutputString = clang::cxstring::createDup(Buffer.c_str()); | ||
} | ||
|
||
operator CXErrorCode() const { return ErrorCode; } | ||
|
||
template <typename T> MessageEmitter &operator<<(const T &t) { | ||
Stream << t; | ||
return *this; | ||
} | ||
}; | ||
} // end anonymous namespace | ||
|
||
enum CXErrorCode clang_experimental_DependencyScanner_generateReproducer( | ||
int argc, const char *const *argv, const char *ModuleName, | ||
const char *WorkingDirectory, const char *ReproducerLocation, | ||
bool UseUniqueReproducerName, CXString *MessageOut) { | ||
auto Report = [MessageOut](CXErrorCode ErrorCode) -> MessageEmitter { | ||
return MessageEmitter(ErrorCode, MessageOut); | ||
}; | ||
auto ReportFailure = [&Report]() -> MessageEmitter { | ||
return Report(CXError_Failure); | ||
}; | ||
|
||
if (argc < 2 || !argv) | ||
return Report(CXError_InvalidArguments) << "missing compilation command"; | ||
if (!WorkingDirectory) | ||
return Report(CXError_InvalidArguments) << "missing working directory"; | ||
if (!UseUniqueReproducerName && !ReproducerLocation) | ||
return Report(CXError_InvalidArguments) | ||
<< "non-unique reproducer is allowed only in a custom location"; | ||
|
||
CASOptions CASOpts; | ||
IntrusiveRefCntPtr<llvm::cas::CachingOnDiskFileSystem> FS; | ||
DependencyScanningService DepsService( | ||
ScanningMode::DependencyDirectivesScan, ScanningOutputFormat::Full, | ||
CASOpts, /*CAS=*/nullptr, /*ActionCache=*/nullptr, FS); | ||
DependencyScanningTool DepsTool(DepsService); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does this API create its own service/worker/tool instead of taking a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I wanted the reproducer scan to be independent from previous scans, that's why don't use an existing worker. It is a trade-off, not a uniquely correct approach. Reproducers should be useful in general, not only for debugging issues with EBM, that's why I'm leaning towards from-scratch scanning. And I hope it is easier to reason about than to take into account an invisible worker's state. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you clarify what specifically you want to be independent from? Is it the file system cache? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's less about reusing specific data but more about creating API that is easier to reason about. For example, I don't need to know implementation details that |
||
|
||
llvm::SmallString<128> ReproScriptPath; | ||
int ScriptFD; | ||
if (ReproducerLocation) { | ||
if (auto EC = llvm::sys::fs::create_directories(ReproducerLocation)) | ||
return ReportFailure() << "failed to create a reproducer location '" | ||
<< ReproducerLocation << "'\n" | ||
<< EC.message(); | ||
SmallString<128> Path(ReproducerLocation); | ||
llvm::sys::path::append(Path, "reproducer"); | ||
const char *UniqueSuffix = UseUniqueReproducerName ? "-%%%%%%" : ""; | ||
if (auto EC = llvm::sys::fs::createUniqueFile(Path + UniqueSuffix + ".sh", | ||
ScriptFD, ReproScriptPath)) | ||
return ReportFailure() << "failed to create a reproducer script file\n" | ||
<< EC.message(); | ||
} else { | ||
if (auto EC = llvm::sys::fs::createTemporaryFile( | ||
"reproducer", "sh", ScriptFD, ReproScriptPath)) { | ||
return ReportFailure() << "failed to create a reproducer script file\n" | ||
<< EC.message(); | ||
} | ||
} | ||
SmallString<128> FileCachePath = ReproScriptPath; | ||
llvm::sys::path::replace_extension(FileCachePath, ".cache"); | ||
|
||
std::string FileCacheName = llvm::sys::path::filename(FileCachePath).str(); | ||
auto LookupOutput = [&FileCacheName](const ModuleDeps &MD, | ||
ModuleOutputKind MOK) -> std::string { | ||
if (MOK != ModuleOutputKind::ModuleFile) | ||
return ""; | ||
return FileCacheName + "/explicitly-built-modules/" + | ||
MD.ID.ModuleName + "-" + MD.ID.ContextHash + ".pcm"; | ||
}; | ||
|
||
std::vector<std::string> Compilation{argv, argv + argc}; | ||
llvm::DenseSet<ModuleID> AlreadySeen; | ||
auto TUDepsOrErr = DepsTool.getTranslationUnitDependencies( | ||
Compilation, WorkingDirectory, AlreadySeen, std::move(LookupOutput)); | ||
if (!TUDepsOrErr) | ||
return ReportFailure() << "failed to generate a reproducer\n" | ||
<< toString(TUDepsOrErr.takeError()); | ||
|
||
TranslationUnitDeps TU = *TUDepsOrErr; | ||
llvm::raw_fd_ostream ScriptOS(ScriptFD, /*shouldClose=*/true); | ||
ScriptOS << "# Original command:\n#"; | ||
for (StringRef Arg : Compilation) | ||
ScriptOS << ' ' << Arg; | ||
ScriptOS << "\n\n"; | ||
|
||
ScriptOS << "# Dependencies:\n"; | ||
std::string ReproExecutable = std::string(argv[0]); | ||
auto PrintArguments = [&ReproExecutable, | ||
&FileCacheName](llvm::raw_fd_ostream &OS, | ||
ArrayRef<std::string> Arguments) { | ||
OS << ReproExecutable; | ||
for (int I = 0, E = Arguments.size(); I < E; ++I) | ||
OS << ' ' << Arguments[I]; | ||
OS << " -ivfsoverlay \"" << FileCacheName << "/vfs/vfs.yaml\""; | ||
OS << '\n'; | ||
}; | ||
for (ModuleDeps &Dep : TU.ModuleGraph) | ||
PrintArguments(ScriptOS, Dep.getBuildArguments()); | ||
ScriptOS << "\n# Translation unit:\n"; | ||
for (const Command &BuildCommand : TU.Commands) | ||
PrintArguments(ScriptOS, BuildCommand.Arguments); | ||
|
||
SmallString<128> VFSCachePath = FileCachePath; | ||
llvm::sys::path::append(VFSCachePath, "vfs"); | ||
std::string VFSCachePathStr = VFSCachePath.str().str(); | ||
llvm::FileCollector FileCollector(VFSCachePathStr, | ||
/*OverlayRoot=*/VFSCachePathStr); | ||
for (const auto &FileDep : TU.FileDeps) { | ||
FileCollector.addFile(FileDep); | ||
} | ||
for (ModuleDeps &ModuleDep : TU.ModuleGraph) { | ||
ModuleDep.forEachFileDep([&FileCollector](StringRef FileDep) { | ||
FileCollector.addFile(FileDep); | ||
}); | ||
} | ||
if (FileCollector.copyFiles(/*StopOnError=*/true)) | ||
return ReportFailure() | ||
<< "failed to copy the files used for the compilation"; | ||
SmallString<128> VFSOverlayPath = VFSCachePath; | ||
llvm::sys::path::append(VFSOverlayPath, "vfs.yaml"); | ||
if (FileCollector.writeMapping(VFSOverlayPath)) | ||
return ReportFailure() << "failed to write a VFS overlay mapping"; | ||
|
||
return Report(CXError_Success) | ||
<< "Created a reproducer. Sources and associated run script(s) are " | ||
"located at:\n " | ||
<< FileCachePath << "\n " << ReproScriptPath; | ||
} | ||
|
||
enum CXErrorCode clang_experimental_DependencyScannerWorker_getDepGraph( | ||
CXDependencyScannerWorker W, | ||
CXDependencyScannerWorkerScanSettings CXSettings, CXDepGraph *Out) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should use
CXDependencyScannerWorkerScanSettings
instead ofint argc, const char *const *argv, const char *ModuleName, const char *WorkingDirectory
for the basic compilation setup.How sure are we that
const char *ReproducerLocation, bool UseUniqueReproducerName,
are the only two reproducer-specific options we will ever want here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going back and forth about it and can be persuaded either way. My [weak] justification is that we don't pass the settings to a dependency service/worker/tool. So it's not a self-sufficient entity but more a convenience container that happens to be the same between different functions. So I avoid treating it as an independent entity.
For the potential enhancements I expect only something to help capture the scanner state. Unfortunately, I don't know what that would be. Maybe .pcm cache, maybe dumping stat cache, maybe something else. So far it is too vague and not the most important direction for the next steps. That's why I'd like to land something reasonably future-proof and iterate on it to see what else we might need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that exactly what we do? The difference is that you're creating a new one instead of using an existing one.
If you want to make this future proof the C API should probably be much more opaque about the configuration, similar to what we do with the worker scan settings where the settings become their own opaque type and we can add new C functions over time to add new settings to that configuration object without breaking the existing API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that in
clang_experimental_DependencyScannerWorker_getDepGraph
, for instance,Settings.MLO
andSettings.MLOContext
goes intoDependencyScanningTool::createActionController
.Settings.argc
,Settings.argv
,Settings.WorkingDirectory
goes intocomputeDependencies
. AndSettings.ModuleName
affects the function's logic. We don't useSettings
as a single entity anywhere and taking it apart is the first thing we do.Do you suggest using
CXDependencyScannerWorkerScanSettings
or introducing something likeCXDependencyScannerReproducerSettings
? I believe a single function with simple arguments is easier to work with and introducingclang_experimental_DependencyScanner_generateReproducer_v2
when necessary is not a big deal. But it's not a belief worth fighting, so I'm fine using whatever settings wrapper you think would be better.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of
getTranslationUnitDependencies
is basically the same here as the use ofcomputeDependencies
elsewhere. The fact you're not using the lookup callback could be a possible reason not to useCXDependencyScannerWorkerScanSettings
.Incidentally, why do you accept
ModuleName
and not use it?I'm trying to understand what kind of changes are likely to come to this API over time, how it fits with the other scanner APIs, and so what makes sense to make it stable. Introducing a v2 is not necessarily a big deal, but one of these APIs made it to v5 very quickly and was annoying to work with in client code that needs to support multiple versions of the dylib, like swift-build does. Having explicit opaque settings objects is harder to work with when you have one version of an API, but it's easier to factor if we are supporting N versions of the API.
If you are confident that this API will not need more parameters, then I'm fine with keeping it like it is and we can make it opaque in a hypothetical v2 if there is an unexpected need to add one. If we already expect there is a good chance of needing changes to the API in future, then I'm inclined to make it opaque from the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I've added it mostly because Jan asked about it. I can see there might be a noticeable enough divergence between for-TU scanning and for-module scanning in the future, so it can be useful. But I'm not using it now because there is no data indicating it is useful. So this is "let's try and see how it works" approach. Having
ModuleName
in API seems like a plausible enough future-proofing step.Fair enough, that v5 doesn't give off good vibes.
I think the main question to you would be if you see a need for some CAS special handling?
As for the known problems (that we need to be prepared to fix) we have
The last two are a little bit of a stretch because reproducers don't promise to help with reproducing intermittent issues. If you cannot reproduce an issue in a clean build, there is no magic in reproducers that makes intermittent issues predictable.
It is possible there are unknown issues that would require API changes but as far as I know, API is fairly stable in its current shape.