Description
Consider the following simplistic test case:
extern int v1, v2, v3, v4, v5;
void func(int x)
{
v1 = x;
v2 = x;
v3 = x;
v4 = x;
v5 = x;
}
arc-linux-uclibc-gcc -c -fPIC -O3 lib.c --save-temps
A snippet of generated code is
...
; END PROLOGUE
ld r2,[pcl,@v5@gotpc] ; 8 bytes
ld r3,[pcl,@v1@gotpc] ; 8 bytes
st r0,[r2]
ld r2,[pcl,@v2@gotpc] ; 8 bytes
st r0,[r3]
ld r3,[pcl,@V3@gotpc] ; 8 bytes
st r0,[r2]
ld r2,[pcl,@v4@gotpc] ; 8 bytes
st r0,[r3]
st r0,[r2]
; EPILOGUE
...
This is not optimal as gcc could have generated code to get
GOT base once and then address all symbols relative as GOT[x] - GOT[0]
using GOT32 relos.
Something roughly as follows:
...
add gp,pcl,_DYNAMIC@gotpc ; 8 bytes
st_s r0,[gp, @v1@got32] ; 2 bytes or worst case 4 bytes
st_s r0,[gp, @vr2@got32] ; 2 bytes or worst case 4 bytes
...
The orig code is 8 byte each (typically GOT and code are in different
virtual pages hence the offset computed is typically 0x2NNN causing long
insn generated. While the newer code could most certainly be shorter.
Note that the PIC related asm annotations we have currently are
=@gotpc (got GOT base itself or offset to symbol's GOT slot)
=@Gotoff (offset from got base)
=@plt (offset to sym's PLT stub)
It seems there is support for GOT32 relos in binutils, probably gcc
needs to generate them.