Skip to content

cmd/link, cmd/go: emit split DWARF on darwin #62577

Open
@cherrymui

Description

@cherrymui

Overview

On macOS, we currently generate debug information combined into the executable. This is not Apple's convention, and it's been difficult to make the platform toolchain happy with the combined debug info. With Apple's new linker to be released in Xcode 15, it is even harder. We propose to generate debug info in a separate file on macOS, following the system convention.

Background and context

Platform conventions

Currently, on macOS, for DWARF debug information, the Go toolchain generates it in the executable as a __DWARF segment, similar to what we do on other platforms. However, this is not Apple's convention for its C toolchain. Instead, the C toolchain often creates debug info in a separate file/directory.

Specifically, for C compilation with debug info enabled,

  • the C compiler generates object files that contains debug info in debug_* sections
  • the C linker links the object files to an executable without debug info, but with STAB symbols referencing the object files
  • optionally another program dsymutil can be run on the executable, which extracts the debug info from the object files and stores it into either a dSYM directory or a single file. If this is done, the STAB symbol and object files are no longer needed, and can be stripped/deleted.

Combined DWARF in Go

For Go toolchain, we currently generate debug information combined into the executable, similar to what we do on other platforms. In internal linking mode, the Go linker directly produces a binary with debug info combined into the executable as a __DWARF segment. In external linking mode, the Go linker

  • passes Go and C object files with debug info to the C linker, which produces an executable with STAB symbols
  • run dsymutil to extract the debug info
  • strip the STAB symbols (which contains object file paths which are nondeterministic)
  • post-edit the executable, combine the debug info to the executable
  • (delete the temp directory containing Go and C objects)

While it is simple in internal linking mode, in external linking mode this process is a bit convoluted.

Combining DWARF into the executable requires post-edit the executable, adding a __DWARF segment, which requires editing the program header, and some other data. The Mach-O loader in the platform's static linker and dynamic linker have a number of integrity checks for the program, which generally doesn't like an extra unmapped segment. The code in the Go linker that adds the segment has been revised several times to make the dynamic linker happy.

With Apple's new linker to be released in Xcode 15, there are even more checks and it is hard to work around all the requirements. Currently, if one builds Go code into a c-shared object, then link with C code using Apple's linker, it will reject the shared object produced by the Go toolchain (see also #61229). We could potentially try harder to work around more checks (if possible). But it may get harder and harder in the future and eventually be forced to change.

Debugger support

For the debugger side, the system's default debugger, LLDB, understands the C toolchain's convention. When debugging an executable (say x),

  • it can automatically find debug info combined in the executable
  • it can automatically find debug info from object files referenced by the STAB symbols
  • it can automatically find debug info from the dSYM directory x.dSYM
  • or the debug info file can be specified with target create --symfile command.
    Notably, LLDB doesn't understand compressed DWARF which we generate by default. So currently Go programs do not work out of box with LLDB. (An easy workaround is -ldflags=-compressdwarf=0).

Delve, a commonly used debugger for Go programs, understand the DWARF combined in the executable, and also the compressed DWARF. So Delve works for Go programs out of box.

Proposed changes

We propose that the Go toolchain switches to generate split DWARF on macOS, following the platform conventions. This would make Go toolchain more consistent with Apple's convention, and behave more similar to the system C toolchain. We would no longer need to "fight against" the checks in Mach-O loader in the system static and dynamic linker. So it will be more forward compatible against platform updates.

Naming convention

Following the system convention, for an executable named x we will generate a directory named x.dSYM which contains a DWARF file at x.dSYM/Contents/Resources/DWARF/x. In the system convention, there are other files in the dSYM directory (a Info.plist file and a relocation file), which are irrelevant to DWARF. We may skip them for now. We could consider generating them if it is needed in the future. (For c-archive build mode, as we produce C objects, which contain combined DWARF in the C toolchain's convention, we will continue to do so.)

We could also consider using a different naming convention, e.g. for an executable named x we will generate a single DWARF file named x.dwarf. LLDB would not load it automatically. But as LLDB already does not work out of box (due to compressions), maybe this is not too bad. One needs to pass the --symfile flag. Feedback welcome.

Go linker

The Go linker will generate split DWARF on macOS.

  • In internal linking mode the Go linker will emit an executable (without DWARF) and a separate DWARF file.
  • In external linking mode the Go linker will invoke the C linker to emit an executable and invoke dsymutil to generate a DWARF file; this is the same as before, but the Go linker will not post-edit the executable to combine the DWARF back into the executable.

The go command

The go command needs to understand that we now generate two output files, the executable and the DWARF file (in the case of c-shared build mode, three files: the shared object, the C header file, and the DWARF file). It needs to copy them from the temporary directory where the build is performed to the output directory. Specifically for file naming,

  • go build without the -o flag will generate executable <exe> (which is the default name matching the main module or .go file name) and a DWARF file in <exe>.dSYM
  • go build -o <exe> will generate executable <exe> and a DWARF file in <exe>.dSYM
  • go build -o <dir> will generate executable <dir>/<exe> and a DWARF file in <dir>/<exe>.dSYM (where <exe> is the default name based on the main module or .go file name)
  • a special case for go build -o /dev/null, which generates no file

go test -c will follow the similar naming convention.

In order not to clutter directories that contains installed binaries like $HOME/bin, we propose that go install will have DWARF disabled by default (by passing the -w flag to the linker). One can still explicitly ask for DWARF by passing -ldflags=-w=0 (the -w flag disables DWARF, -w=0 negates it).

There is a prior art for emitting two output files: in c-shared build mode go build command generates a C shared object (usually named with .so) and a C header file (usually named with .h). So outputting two files isn't completely new. Maybe it could be implemented similarly.

go clean will also understand the naming convention, and remove the DWARF file if it is invoked to remove the executable file.

Build cache

Executables are not cached. So the DWARF file will not be cached, either. However, for executables the go command checks if the output file already exists and contains the expected build ID, and if so, it will assume it is up to date and not relink it. With split DWARF, we propose that it will also check if the DWARF file is up to date (the DWARF file will probably also contain the build ID so it can be checked, details TBD). If either the executable or the DWARF file is not up to date, it will relink and generate both.

Debugger support

With this change, LLDB understands the naming convention so it should still be able to load the DWARF info automatically (if it is not compressed). If either the executable or the DWARF file is moved or renamed, it can still be loaded with the --symfile flag.

Delve will need to be updated to understand the naming convention, finding the DWARF file from the dSYM directory. We suggest it also provides a way (e.g. a command line flag, if it does not already have one) to explicitly specify the DWARF file's location, in case that the user wants to move or rename the file.

debug/macho package

Currently, for a Mach-O executable with combined DWARF, the debug/macho.(*File).DWARF function can load the debug information. With split DWARF, the binary will not contain DWARF, so it cannot be loaded from the same macho.File. One could open another macho.File for the DWARF file.

If the macho.File is from an OS file (e.g. opened from macho.Open), it may be possible that the macho package automatically tries to find the split DWARF from the DWARF file following the naming convention. Then the user won't need to open another file. On the other hand, automatically opening another file seems a but magic. Feedback welcome.

If accepted, we plan to implement this in Go 1.22.

Thanks.

cc @golang/compiler @rsc @bcmills @aarzilli @derekparker @archanaravindar

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    In Progress

    Status

    Accepted

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions