Skip to content

Commit 9718ce2

Browse files
committed
Initial start of documenting the ABI.
1 parent 80a02b3 commit 9718ce2

File tree

4 files changed

+233
-0
lines changed

4 files changed

+233
-0
lines changed

ABIDoc/abi.tex

+226
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
\documentclass[a4paper]{report}
2+
3+
\usepackage[utf8x]{inputenc}
4+
\usepackage{listings}
5+
\usepackage{fancyref}
6+
\newcommand*{\fancyreflstlabelprefix}{lst}
7+
\fancyrefaddcaptions{english}{%
8+
\providecommand*{\freflstname}{listing}%
9+
\providecommand*{\Freflstname}{Listing}%
10+
}
11+
\frefformat{plain}{\fancyreflstlabelprefix}{\freflstname\fancyrefdefaultspacing#1}
12+
\Frefformat{plain}{\fancyreflstlabelprefix}{\Freflstname\fancyrefdefaultspacing#1}
13+
14+
\frefformat{vario}{\fancyreflstlabelprefix}{%
15+
\freflstname\fancyrefdefaultspacing#1#3%
16+
}
17+
\Frefformat{vario}{\fancyreflstlabelprefix}{%
18+
\Freflstname\fancyrefdefaultspacing#1#3%
19+
}
20+
21+
22+
\usepackage[svgnames]{xcolor}
23+
\usepackage{hyperref}
24+
25+
\lstset{
26+
basicstyle={\footnotesize \ttfamily},
27+
breaklines=true,
28+
commentstyle={\color{Blue}},
29+
extendedchars=true,
30+
keywordstyle={[0]\color{Green}},
31+
keywordstyle={[1]\color{Brown}},
32+
keywordstyle={[2]\color{DarkMagenta}},
33+
keywordstyle={[3]\color{Maroon}},
34+
keywordstyle={[4]\color{Blue}},
35+
showspaces=false,
36+
showstringspaces=false,
37+
stringstyle={\color{IndianRed}},
38+
tabsize=2,
39+
}
40+
41+
42+
\newcommand{\file}[1]{\textsf{#1}}
43+
\newcommand{\keyword}[1]{\textit{#1}}
44+
\newcommand{\ccode}[1]{\lstinline[language={C}]{#1}}
45+
\newcommand{\objc}[1]{\lstinline[language={[Objective]C}]{#1}}
46+
47+
\newcommand{\inccode}[4]{
48+
\lstinputlisting[language=C,
49+
rangebeginprefix =//\ begin:\ ,
50+
rangeendprefix =//\ end:\ ,
51+
includerangemarker=false,
52+
linerange=#3-#3,
53+
numbers = left,
54+
label={lst:#2},
55+
float,
56+
caption={#4 {\small [From #1] }}
57+
]{../#1}
58+
}
59+
60+
61+
\title{GNUstep Objective-C ABI version 10}
62+
\author{David Chisnall}
63+
64+
\begin{document}
65+
\maketitle{}
66+
\tableofcontents
67+
68+
\chapter{Introduction}
69+
70+
The GNUstep Objective-C runtime has a complicated history.
71+
It began as the Étoilé Objective-C runtime, a research prototype that adopted a lot of ideas from the VPRI Combined Object-Lambda Architecture (COLA) model and was intended to support languages like Self (prototype-based, multiple inheritance) as well as Objective-C.
72+
This code was repurposed as \file{libobjc2} to provide an Objective-C runtime that clang could use.
73+
At the time, the GCC Objective-C runtime had a GPLv2 exemption that applied only to code compiled with GCC and so any code compiled with clang and linked against the GCC runtime had to be licensed as GPLv2 or later.
74+
75+
GCC's Objective-C support at this time was lacking a number of features of more modern Objective-C (for example, declared properties) and showed no signs of improving.
76+
77+
Eventually \file{libobjc2} was adopted by the GNUstep project and became the GNUstep Objective-C runtime.
78+
It was intended as a drop-in replacement for the GCC runtime and so adopted the GCC Objective-C ABI and extended it in a variety of backwards-compatible ways.
79+
80+
The GCC ABI was, itself, inherited from the original NeXT Objective-C runtime.
81+
The Free Software Foundation used the GPL and the threat of legal action to force NeXT to release their GCC changes to support Objective-C.
82+
They were left with some shockingly bad code, which was completely useless without an Objective-C runtime.
83+
The GCC team committed the shockingly bad code and wrote a runtime that was almost, but not quite, compatible with the NeXT one.
84+
In particular, it did not implement \ccode{objc_msgSend}, which requires hand-written assembly, and instead modified the compiler to call a function to look up the method and then to call the result, giving a portable pure-C design.
85+
86+
As such, the ABI supported by the GNUstep Objective-C runtime dates back to 1988 and is starting to show its age.
87+
It includes a number of hacks and misfeatures that are in dire need of replacing.
88+
This document describes the new ABI used by version 2.0 of the runtime.
89+
90+
\section{Mistakes}
91+
92+
Supporting a non-fragile ABI was one of the early design goals of the GNUstep Objective-C runtime.
93+
When Apple switched to a new runtime, they were able to require that everyone recompiled all of their code to support the non-fragile ABI and, in particular, were able to support only the new ABI on ARM and on 64-bit platforms.
94+
95+
At the time, it was not possible to persuade everyone to recompile all of their code for a new GNUstep runtime, and so I made a number of questionable design decisions to allow classes compiled with the non-fragile ABI to subclass ones compiled with the fragile ABI.
96+
These decisions have led to some issues where code using the non-fragile ABI ends up being fragile.
97+
98+
The new ABI makes no attempt to support mixing old and new ABI code.
99+
The runtime will work with either, but not with both at the same time.
100+
It will upgrade the old structures to the new on load (consuming more memory and providing an incentive to recompile) and will then use only the new structures internally.
101+
102+
\section{Changed circumstances}
103+
104+
When the original NeXT runtime was released, linkers were designed primarily to work with C.
105+
C guarantees that each symbol is defined in precisely one compilation unit.
106+
In contrast, C++ (10 years away from standardisation at the time the NeXT runtime was released) has a number of language features that rely on symbols appearing in multiple compilation units.
107+
The original 4Front C++ compiler worked by compiling without emitting any of these, parsing the linker errors, and then recompiling adding missing ones.
108+
109+
More modern implementations of C++ emit these symbols in every compilation unit that references them and rely on the linker to discard duplicates.
110+
Modern linkers support \keyword{COMDATs} for this purpose.
111+
112+
The NeXT runtime was able to work slightly differently.
113+
The Mach-O binary format (used by NeXT and Apple) provides a mechanism for registering code that will handle loading for certain sections, thus delegating some linker functionality to the runtime.
114+
115+
In addition to COMDATs, modern linkers support generating symbols that correspond to the start and end of sections.
116+
This makes it possible for the new ABI to emit all declarations of a particular kind in a section and for the runtime to then receive an array of all of the objects in that section.
117+
118+
\chapter{Entry point}
119+
120+
The legacy GCC ABI provided a \ccode{__objc_exec_class} function that registered all of the Objective-C data for a single compilation unit.
121+
This has two downsides:
122+
123+
\begin{itemize}
124+
\item It means that the \objc{+load} methods will be called one at a time, as classes are loaded, because the runtime has no way of knowing when an entire library has been loaded.
125+
\item It prevents any deduplication between compilation units, and so a selector used in 100 \file{.m} files and linked into a single binary will occur 100 times and be passed to the runtime for merging 100 times.
126+
\end{itemize}
127+
128+
\section{The new entry point}
129+
130+
The new runtime provides a \ccode{__objc_load} function for loading an entire library at a time.
131+
This function takes a pointer to the structure shown in \Fref{lst:initobjc}.
132+
133+
For the current ABI, the \ccode{version} field must always be zero.
134+
This field exists to allow future versions of the ABI to add new fields to the end, which can be ignored by older runtime implementations.
135+
136+
The remaining fields all contain pointers to the start and end of a section.
137+
The sections are listed in \fref{tab:sections}.
138+
139+
\inccode{loader.c}{initobjc}{objc_init}{The Objective-C library description structure.}
140+
141+
The \ccode{__objc_selectors} section contains all of the selectors referenced by this library.
142+
As described in \fref{chap:selectors}, these are deduplicated by the linker, so each library should contain only one copy of any given selector.
143+
144+
\begin{table}
145+
\begin{center}
146+
\begin{tabular}{l|l}
147+
Prefix & Section \\\hline
148+
\ccode{sel_} & \ccode{__objc_selectors}\\
149+
\ccode{cls_} & \ccode{__objc_classes}\\
150+
\ccode{cls_ref_} & \ccode{__objc_class_refs}\\
151+
\ccode{cat_} & \ccode{__objc_cats}\\
152+
\ccode{proto_} & \ccode{__objc_protocols}\\
153+
\end{tabular}
154+
\caption{\label{tab:sections}Section names for Objective-C components.}
155+
\end{center}
156+
\end{table}
157+
158+
Similarly, the \ccode{__objc_classes}, \ccode{__objc_cats}, and \ccode{__objc_protocols} sections contain classes, categories, and protocols: the three top-level structural components in an Objective-C program.
159+
These are all described in later chapters.
160+
161+
The \ccode{__objc_class_refs} section contains variables that are used for accessing classes.
162+
These are described in \Fref{sec:classref} and provide loose coupling between the representation of the class and accesses to it.
163+
164+
\section{Compiler responsibilities}
165+
166+
For each compilation unit, the compiler must emit a copy of both the \ccode{objc_init} structure and a function that passes it to the runtime, in such a way that the linker will preserve a single copy.
167+
On ELF platforms, these are hidden weak symbols with a comdat matching their name.
168+
The load function is called \ccode{.objcv2_load_function} and the initializer structure is called \ccode{.objc_init} (the dot prefix preventing conflicts with any C symbols).
169+
The compiler also emits a \ccode{.objc_ctor} variable in the \ccode{.ctors} section, with a \ccode{.objc_ctor} comdat.
170+
171+
The end result after linking is a single copy of the \ccode{.objc_ctor} variable in the \ccode{.ctors} section, which causes a single copy of the \ccode{.objcv2_load_function} to be called, passing a single copy of the \ccode{.objc_init} structure to the runtime on binary load.
172+
173+
The \ccode{.objc_init} structure is initialised by the \ccode{__start_\{section name\}} and \ccode{__stop_\{section name\}} symbols, which the linker will replace with relocations describing the start and end of each section.
174+
175+
The linker does not automatically initialise these variables if the sections do not exist, so compilation units that do not include any entries for one or more of them must emit a zero-filled section.
176+
The runtime will then ignore the zero entry.
177+
178+
\chapter{Selectors}
179+
\label{chap:selectors}
180+
181+
Typed selectors are one of the largest differences between the GNU family of runtimes (GCC, GNUstep, ObjFW) and the NeXT (NeXT, macOS, iOS) family.
182+
In the NeXT design, selectors are just (interned) strings representing the selector name.
183+
This can cause stack corruption when different leafs in the class hierarchy implement methods with the same name but different types and some code sends a message to one of them using a variable of type \objc{id}.
184+
In the GNU family, they are a pair of the method name and the type encoding.
185+
186+
The GNUstep ABI represents selectors using the structure described in \Fref{lst:selector}.
187+
The first field is a union of the value emitted by the compiler and the value used by the runtime.
188+
The compiler initialises the \ccode{name} field with the string representation of the selector name, but when the runtime registers the selector it will replace this with an integer value that uniquely identifies the selector (it will also store the name in a table at this index so selectors can be mapped back to names easily).
189+
190+
\inccode{selector.h}{selector}{objc_selector}{The selector structure.}
191+
192+
\section{Symbol naming}
193+
194+
In this ABI, unlike the GCC ABI, we try to ensure that the linker removes as much duplicate data as possible.
195+
As such, each selector, selector name, and selector type encoding is emitted as a weak symbol with a well-known name name, with hidden visibility.
196+
When linking, the linker will discard all except for one (though different shared libraries will have different copies).
197+
198+
The selector names are emitted as \ccode{.objc_sel_name_\{selector name\}}, the type encodings as \ccode{.objc_sel_name_\{mangled type encoding\}} and the selectors themselves as \ccode{.objc_sel_name_\{selector name\}_\{mangled type encoding\}}.
199+
The \textit{mangled} type encoding replaces the @ character with a \ccode{'\\1'} byte.
200+
This mangling prevents conflicts with symbol versioning (which uses the @ character to separate the symbol name from its version).
201+
202+
This deduplication is not required for correctness: the runtime ensures that selectors have unique indexes, but should reduce the binary size.
203+
204+
\chapter{Classes}
205+
206+
\inccode{class.h}{class}{objc_class}{The class structure.}
207+
208+
\section{Class references}
209+
\label{sec:classref}
210+
211+
Each entry in the \ccode{__objc_class_refs} section is a symbol (in a COMDAT of the same name) called \ccode{_OBJC_CLASS_REF_\{class name\}}, which is initialised to point to a variable called \ccode{_OBJC_CLASS_\{class name\}}, which is the symbol for the class.
212+
This is the \textit{only} place where the \ccode{_OBJC_CLASS_\{class name\}} symbols may be referenced.
213+
214+
All other accesses to the class (i.e. from message sends to classes or to \objc{super}) must be via a load of the \ccode {_OBJC_CLASS_REF_\{class name\}} variable.
215+
216+
The current version of the runtime ignores this section, but if a future runtime changes the class structure then it can update these pointers to heap-allocated versions of the new structure.
217+
218+
219+
\chapter{Categories}
220+
221+
\chapter{Protocols}
222+
223+
\chapter{Message sending}
224+
225+
\end{document}
226+

class.h

+2
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ static inline BOOL objc_bitfield_test(uintptr_t bitfield, uint64_t field)
3737
}
3838

3939

40+
// begin: objc_class
4041
struct objc_class
4142
{
4243
/**
@@ -131,6 +132,7 @@ struct objc_class
131132
*/
132133
struct objc_property_list *properties;
133134
};
135+
// end: objc_class
134136

135137
struct legacy_gnustep_objc_class
136138
{

loader.c

+2
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ static void init_runtime(void)
9191
}
9292
}
9393

94+
// begin: objc_init
9495
struct objc_init
9596
{
9697
uint64_t version;
@@ -105,6 +106,7 @@ struct objc_init
105106
struct objc_protocol2 *proto_begin;
106107
struct objc_protocol2 *proto_end;
107108
};
109+
// end: objc_init
108110
#include <dlfcn.h>
109111

110112
void registerProtocol(Protocol *proto);

selector.h

+3
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ struct sel_type_list
2020
/**
2121
* Structure used to store selectors in the list.
2222
*/
23+
// begin: objc_selector
2324
struct objc_selector
2425
{
2526
union
@@ -40,6 +41,8 @@ struct objc_selector
4041
*/
4142
const char * types;
4243
};
44+
// end: objc_selector
45+
4346

4447
/**
4548
* Returns the untyped variant of a selector.

0 commit comments

Comments
 (0)