Skip to content

cmd/compile: no automatic use of fused multiply-add on amd64 even with GOAMD64=v3 #71204

Closed
@dominikh

Description

@dominikh

Go version

go version go1.23.4 linux/amd64

Output of go env in your module/workspace:

-

What did you do?

Compile the following program with GOARCH=amd64 GOAMD64=v3 go build -gcflags=-S

package pkg

import "math"

func fooImplicit(x, y, z float64) float64 {
	return x*y + z
}

func fooExplicit(x, y, z float64) float64 {
	return math.FMA(x, y, z)
}

What did you see happen?

command-line-arguments.fooImplicit STEXT nosplit size=9 args=0x18 locals=0x0 funcid=0x0 align=0x0
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:5)	TEXT	command-line-arguments.fooImplicit(SB), NOSPLIT|NOFRAME|ABIInternal, $0-24
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:5)	FUNCDATA	$0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:5)	FUNCDATA	$1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:5)	FUNCDATA	$5, command-line-arguments.fooImplicit.arginfo1(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:5)	FUNCDATA	$6, command-line-arguments.fooImplicit.argliveinfo(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:5)	PCDATA	$3, $1
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:6)	MULSD	X1, X0
	0x0004 00004 (/home/dominikh/prj/src/example.com/bar.go:6)	ADDSD	X2, X0
	0x0008 00008 (/home/dominikh/prj/src/example.com/bar.go:6)	RET
	0x0000 f2 0f 59 c1 f2 0f 58 c2 c3                       ..Y...X..
command-line-arguments.fooExplicit STEXT nosplit size=9 args=0x18 locals=0x0 funcid=0x0 align=0x0
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:9)	TEXT	command-line-arguments.fooExplicit(SB), NOSPLIT|NOFRAME|ABIInternal, $0-24
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:9)	FUNCDATA	$0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:9)	FUNCDATA	$1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:9)	FUNCDATA	$5, command-line-arguments.fooExplicit.arginfo1(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:9)	FUNCDATA	$6, command-line-arguments.fooExplicit.argliveinfo(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:9)	PCDATA	$3, $1
	0x0000 00000 (/home/dominikh/prj/src/example.com/bar.go:10)	VFMADD231SD	X1, X0, X2
	0x0005 00005 (/home/dominikh/prj/src/example.com/bar.go:10)	MOVUPS	X2, X0
	0x0008 00008 (/home/dominikh/prj/src/example.com/bar.go:10)	RET
	0x0000 c4 e2 f9 b9 d1 0f 10 c2 c3                       .........

What did you expect to see?

I expected fooImplicit and fooExplicit to generate identical code when setting GOAMD64=v3.

On arm64, the compiler detects the x*y + z pattern and automatically uses FMA. On amd64, math.FMA uses runtime feature detection unless the GOAMD64 environment variable is set to v3 or higher, in which case calls to math.FMA compile directly to VFMADD231SD. However, x*y + z isn't detected, regardless of the value of GOAMD64.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsFixThe path to resolution is known, but the work has not been done.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions