-march=native better knows what is better for your machine than you (even if you feel experienced).

Zbigniew Luszpinski zbiggy at o2.pl
Mon Sep 27 00:13:16 CEST 2010


Hello,

here is short proof of concept that native processor type target is better 
than manually selecting processor type.

Here is short test you can use to see what flags your gcc will _really_ 
use for compilation (unwrap text below to one line, execute in empty dir)

export TESTFLAGS="-march=athlon-xp"; export OUTPUT=athlon-xp; touch 
$OUTPUT.cc; gcc $TESTFLAGS -fverbose-asm $OUTPUT.cc -S; cat $OUTPUT.s; 
unset OUTPUT TESTFLAGS; rm -f $OUTPUT.cc

(you can put in TESTFLAGS variable any flags you wish to test and in 
OUTPUT variable file name where results will be stored).

I have AMD Phenom(tm) 9550 Quad-Core Processor, lets select "best" arch 
for it: amdfam10
export TESTFLAGS="-march=amdfam10"; export OUTPUT=amdfam10; touch 
$OUTPUT.cc; gcc $TESTFLAGS -fverbose-asm $OUTPUT.cc -S; cat $OUTPUT.s; 
unset OUTPUT TESTFLAGS; rm -f $OUTPUT.cc

and repeat the test for native arch:
export TESTFLAGS="-march=native"; export OUTPUT=native; touch $OUTPUT.cc; 
gcc $TESTFLAGS -fverbose-asm $OUTPUT.cc -S; cat $OUTPUT.s; unset OUTPUT 
TESTFLAGS; rm -f $OUTPUT.cc

then compare the outputs: diff -u amdfam10.s native.s
--- amdfam10.s  2010-09-26 21:33:51.000000000 +0000
+++ native.s    2010-09-26 21:35:07.000000000 +0000
@@ -1,8 +1,10 @@
-       .file   "amdfam10.cc"
+       .file   "native.cc"
 # GNU C++ (GCC) version 4.5.1 (i686-pc-linux-gnu)
 #      compiled by GNU C version 4.5.1, GMP version 5.0.1, MPFR version 
2.4.2, MPC version 0.8.2
 # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-
heapsize=131072
-# options passed:  -D_GNU_SOURCE amdfam10.cc -march=amdfam10 -fverbose-
asm
+# options passed:  -D_GNU_SOURCE native.cc -march=amdfam10 -mcx16 -msahf
+# -mpopcnt -mabm --param l1-cache-size=64 --param l1-cache-line-size=64
+# --param l2-cache-size=512 -mtune=amdfam10 -fverbose-asm
 # options enabled:  -falign-loops -fargument-alias -fauto-inc-dec
 # -fbranch-count-reg -fcommon -fdelete-null-pointer-checks -fdwarf2-cfi-
asm
 # -fearly-inlining -feliminate-unused-debug-types -fexceptions

As you see in this diff output -march=native not only correctly set -
march=amdfam10 but also added more optimization flags and set best values 
for cache:
-mcx16 -msahf -mpopcnt -mabm --param l1-cache-size=64 --param l1-cache-
line-size=64 --param l2-cache-size=512 -mtune=amdfam10

As you see manually selecting processor type in Lunar optimize menu is 
always worse choice than using native target.

In attachments you will find raw dumps from the experiment so you can make 
diff -u yourself or read what else flags were auto enabled (like sse 
sse2).

have a nice day,
Zbigniew Luszpinski
-------------- next part --------------
	.file	"amdfam10.cc"
# GNU C++ (GCC) version 4.5.1 (i686-pc-linux-gnu)
#	compiled by GNU C version 4.5.1, GMP version 5.0.1, MPFR version 2.4.2, MPC version 0.8.2
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -D_GNU_SOURCE amdfam10.cc -march=amdfam10 -fverbose-asm
# options enabled:  -falign-loops -fargument-alias -fauto-inc-dec
# -fbranch-count-reg -fcommon -fdelete-null-pointer-checks -fdwarf2-cfi-asm
# -fearly-inlining -feliminate-unused-debug-types -fexceptions
# -ffunction-cse -fgcse-lm -fident -finline-functions-called-once
# -fira-share-save-slots -fira-share-spill-slots -fivopts
# -fkeep-static-consts -fleading-underscore -fmath-errno
# -fmerge-debug-strings -fmove-loop-invariants -fpcc-struct-return
# -fpeephole -fsched-critical-path-heuristic -fsched-dep-count-heuristic
# -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
# -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
# -fsched-stalled-insns-dep -fshow-column -fsigned-zeros
# -fsplit-ivs-in-unroller -ftrapping-math -ftree-cselim -ftree-forwprop
# -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
# -ftree-parallelize-loops= -ftree-phiprop -ftree-pta -ftree-reassoc
# -ftree-scev-cprop -ftree-slp-vectorize -ftree-vect-loop-version
# -funit-at-a-time -fvect-cost-model -fverbose-asm
# -fzero-initialized-in-bss -m32 -m3dnow -m80387 -m96bit-long-double -mabm
# -maccumulate-outgoing-args -malign-stringops -mcx16 -mfancy-math-387
# -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mno-red-zone
# -mno-sse4 -mpopcnt -mpush-args -msahf -msse -msse2 -msse3 -msse4a
# -mtls-direct-seg-refs

# Compiler executable checksum: c9911e0fd9fbc35683fc629a6242e6b9

	.ident	"GCC: (GNU) 4.5.1"
	.section	.note.GNU-stack,"", at progbits
-------------- next part --------------
	.file	"native.cc"
# GNU C++ (GCC) version 4.5.1 (i686-pc-linux-gnu)
#	compiled by GNU C version 4.5.1, GMP version 5.0.1, MPFR version 2.4.2, MPC version 0.8.2
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -D_GNU_SOURCE native.cc -march=amdfam10 -mcx16 -msahf
# -mpopcnt -mabm --param l1-cache-size=64 --param l1-cache-line-size=64
# --param l2-cache-size=512 -mtune=amdfam10 -fverbose-asm
# options enabled:  -falign-loops -fargument-alias -fauto-inc-dec
# -fbranch-count-reg -fcommon -fdelete-null-pointer-checks -fdwarf2-cfi-asm
# -fearly-inlining -feliminate-unused-debug-types -fexceptions
# -ffunction-cse -fgcse-lm -fident -finline-functions-called-once
# -fira-share-save-slots -fira-share-spill-slots -fivopts
# -fkeep-static-consts -fleading-underscore -fmath-errno
# -fmerge-debug-strings -fmove-loop-invariants -fpcc-struct-return
# -fpeephole -fsched-critical-path-heuristic -fsched-dep-count-heuristic
# -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
# -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
# -fsched-stalled-insns-dep -fshow-column -fsigned-zeros
# -fsplit-ivs-in-unroller -ftrapping-math -ftree-cselim -ftree-forwprop
# -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
# -ftree-parallelize-loops= -ftree-phiprop -ftree-pta -ftree-reassoc
# -ftree-scev-cprop -ftree-slp-vectorize -ftree-vect-loop-version
# -funit-at-a-time -fvect-cost-model -fverbose-asm
# -fzero-initialized-in-bss -m32 -m3dnow -m80387 -m96bit-long-double -mabm
# -maccumulate-outgoing-args -malign-stringops -mcx16 -mfancy-math-387
# -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mno-red-zone
# -mno-sse4 -mpopcnt -mpush-args -msahf -msse -msse2 -msse3 -msse4a
# -mtls-direct-seg-refs

# Compiler executable checksum: c9911e0fd9fbc35683fc629a6242e6b9

	.ident	"GCC: (GNU) 4.5.1"
	.section	.note.GNU-stack,"", at progbits
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4595 bytes
Desc: not available
URL: <http://foo-projects.org/pipermail/lunar/attachments/20100926/f5cc045d/attachment.bin>


More information about the Lunar mailing list