Description
Compiling on tip:
func Slice(x []int, y int) []int {
return x[y:]
}
and cutting out the function prologue on amd64 we get:
CMPQ BX, DI
JCS panicpath
SUBQ DI, CX
SUBQ DI, BX
MOVQ CX, DX
NEGQ CX
SHLQ $3, DI
SARQ $63, CX
ANDQ CX, DI
ADDQ DI, AX
MOVQ DX, CX
We should be able to slim that down to (not tested):
CMPQ BX, DI
JCS panicpath
SUBQ DI, CX
SUBQ DI, BX
SHLQ $3, DI
TESTQ CX, CX
CMOVZ CX, DI
ADDQ DI, AX
by pulling the AND and OpSlicemask operation in the ssa generation phase into a single new OpSlicedelta operation:
before:
mask := s.newValue1(ssa.OpSlicemask, types.Types[types.TINT], rcap)
delta = s.newValue2(andOp, types.Types[types.TINT], delta, mask)
after:
delta = s.newValue2(ssa.OpSlicedelta, types.Types[types.TINT], delta, rcap)
By either making the compiler SSA optimizations smarter or pulling even more operations into a special SSA Op we could save the TESTQ and be able to get to:
CMPQ BX, DI
JCS panicpath
SUBQ DI, BX
SUBQ DI, CX
CMOVE CX, DI
SHLQ $3, DI
ADDQ DI, AX
However it is unclear if this will be any faster (or worth the complexity) without benchmarking when the scaling of the index for the delta happens after the CMOV.
A further reduction in instructions is possible by moving the panic jumps to be dependent on the SUB instructions:
SUBQ DI, BX
JS panicpath
SUBQ DI, CX
CMOVE CX, DI
SHLQ $3, DI
ADDQ DI, AX
That then will need extra handling in recovering the original slice len/cap in the panicpath.
At last for this specific case the SHL and ADD can be folded into a LEA:
SUBQ DI, BX
JS panicpath
SUBQ DI, CX
CMOVE CX, DI
LEAQ [AX+DI*8], AX
Metadata
Metadata
Assignees
Labels
Type
Projects
Status