-march=native better knows what is better for your machine than you (even if you feel experienced).
Zbigniew Luszpinski
zbiggy at o2.pl
Mon Sep 27 00:13:16 CEST 2010
Hello,
here is short proof of concept that native processor type target is better
than manually selecting processor type.
Here is short test you can use to see what flags your gcc will _really_
use for compilation (unwrap text below to one line, execute in empty dir)
export TESTFLAGS="-march=athlon-xp"; export OUTPUT=athlon-xp; touch
$OUTPUT.cc; gcc $TESTFLAGS -fverbose-asm $OUTPUT.cc -S; cat $OUTPUT.s;
unset OUTPUT TESTFLAGS; rm -f $OUTPUT.cc
(you can put in TESTFLAGS variable any flags you wish to test and in
OUTPUT variable file name where results will be stored).
I have AMD Phenom(tm) 9550 Quad-Core Processor, lets select "best" arch
for it: amdfam10
export TESTFLAGS="-march=amdfam10"; export OUTPUT=amdfam10; touch
$OUTPUT.cc; gcc $TESTFLAGS -fverbose-asm $OUTPUT.cc -S; cat $OUTPUT.s;
unset OUTPUT TESTFLAGS; rm -f $OUTPUT.cc
and repeat the test for native arch:
export TESTFLAGS="-march=native"; export OUTPUT=native; touch $OUTPUT.cc;
gcc $TESTFLAGS -fverbose-asm $OUTPUT.cc -S; cat $OUTPUT.s; unset OUTPUT
TESTFLAGS; rm -f $OUTPUT.cc
then compare the outputs: diff -u amdfam10.s native.s
--- amdfam10.s 2010-09-26 21:33:51.000000000 +0000
+++ native.s 2010-09-26 21:35:07.000000000 +0000
@@ -1,8 +1,10 @@
- .file "amdfam10.cc"
+ .file "native.cc"
# GNU C++ (GCC) version 4.5.1 (i686-pc-linux-gnu)
# compiled by GNU C version 4.5.1, GMP version 5.0.1, MPFR version
2.4.2, MPC version 0.8.2
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-
heapsize=131072
-# options passed: -D_GNU_SOURCE amdfam10.cc -march=amdfam10 -fverbose-
asm
+# options passed: -D_GNU_SOURCE native.cc -march=amdfam10 -mcx16 -msahf
+# -mpopcnt -mabm --param l1-cache-size=64 --param l1-cache-line-size=64
+# --param l2-cache-size=512 -mtune=amdfam10 -fverbose-asm
# options enabled: -falign-loops -fargument-alias -fauto-inc-dec
# -fbranch-count-reg -fcommon -fdelete-null-pointer-checks -fdwarf2-cfi-
asm
# -fearly-inlining -feliminate-unused-debug-types -fexceptions
As you see in this diff output -march=native not only correctly set -
march=amdfam10 but also added more optimization flags and set best values
for cache:
-mcx16 -msahf -mpopcnt -mabm --param l1-cache-size=64 --param l1-cache-
line-size=64 --param l2-cache-size=512 -mtune=amdfam10
As you see manually selecting processor type in Lunar optimize menu is
always worse choice than using native target.
In attachments you will find raw dumps from the experiment so you can make
diff -u yourself or read what else flags were auto enabled (like sse
sse2).
have a nice day,
Zbigniew Luszpinski
-------------- next part --------------
.file "amdfam10.cc"
# GNU C++ (GCC) version 4.5.1 (i686-pc-linux-gnu)
# compiled by GNU C version 4.5.1, GMP version 5.0.1, MPFR version 2.4.2, MPC version 0.8.2
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed: -D_GNU_SOURCE amdfam10.cc -march=amdfam10 -fverbose-asm
# options enabled: -falign-loops -fargument-alias -fauto-inc-dec
# -fbranch-count-reg -fcommon -fdelete-null-pointer-checks -fdwarf2-cfi-asm
# -fearly-inlining -feliminate-unused-debug-types -fexceptions
# -ffunction-cse -fgcse-lm -fident -finline-functions-called-once
# -fira-share-save-slots -fira-share-spill-slots -fivopts
# -fkeep-static-consts -fleading-underscore -fmath-errno
# -fmerge-debug-strings -fmove-loop-invariants -fpcc-struct-return
# -fpeephole -fsched-critical-path-heuristic -fsched-dep-count-heuristic
# -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
# -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
# -fsched-stalled-insns-dep -fshow-column -fsigned-zeros
# -fsplit-ivs-in-unroller -ftrapping-math -ftree-cselim -ftree-forwprop
# -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
# -ftree-parallelize-loops= -ftree-phiprop -ftree-pta -ftree-reassoc
# -ftree-scev-cprop -ftree-slp-vectorize -ftree-vect-loop-version
# -funit-at-a-time -fvect-cost-model -fverbose-asm
# -fzero-initialized-in-bss -m32 -m3dnow -m80387 -m96bit-long-double -mabm
# -maccumulate-outgoing-args -malign-stringops -mcx16 -mfancy-math-387
# -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mno-red-zone
# -mno-sse4 -mpopcnt -mpush-args -msahf -msse -msse2 -msse3 -msse4a
# -mtls-direct-seg-refs
# Compiler executable checksum: c9911e0fd9fbc35683fc629a6242e6b9
.ident "GCC: (GNU) 4.5.1"
.section .note.GNU-stack,"", at progbits
-------------- next part --------------
.file "native.cc"
# GNU C++ (GCC) version 4.5.1 (i686-pc-linux-gnu)
# compiled by GNU C version 4.5.1, GMP version 5.0.1, MPFR version 2.4.2, MPC version 0.8.2
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed: -D_GNU_SOURCE native.cc -march=amdfam10 -mcx16 -msahf
# -mpopcnt -mabm --param l1-cache-size=64 --param l1-cache-line-size=64
# --param l2-cache-size=512 -mtune=amdfam10 -fverbose-asm
# options enabled: -falign-loops -fargument-alias -fauto-inc-dec
# -fbranch-count-reg -fcommon -fdelete-null-pointer-checks -fdwarf2-cfi-asm
# -fearly-inlining -feliminate-unused-debug-types -fexceptions
# -ffunction-cse -fgcse-lm -fident -finline-functions-called-once
# -fira-share-save-slots -fira-share-spill-slots -fivopts
# -fkeep-static-consts -fleading-underscore -fmath-errno
# -fmerge-debug-strings -fmove-loop-invariants -fpcc-struct-return
# -fpeephole -fsched-critical-path-heuristic -fsched-dep-count-heuristic
# -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
# -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
# -fsched-stalled-insns-dep -fshow-column -fsigned-zeros
# -fsplit-ivs-in-unroller -ftrapping-math -ftree-cselim -ftree-forwprop
# -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
# -ftree-parallelize-loops= -ftree-phiprop -ftree-pta -ftree-reassoc
# -ftree-scev-cprop -ftree-slp-vectorize -ftree-vect-loop-version
# -funit-at-a-time -fvect-cost-model -fverbose-asm
# -fzero-initialized-in-bss -m32 -m3dnow -m80387 -m96bit-long-double -mabm
# -maccumulate-outgoing-args -malign-stringops -mcx16 -mfancy-math-387
# -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mno-red-zone
# -mno-sse4 -mpopcnt -mpush-args -msahf -msse -msse2 -msse3 -msse4a
# -mtls-direct-seg-refs
# Compiler executable checksum: c9911e0fd9fbc35683fc629a6242e6b9
.ident "GCC: (GNU) 4.5.1"
.section .note.GNU-stack,"", at progbits
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4595 bytes
Desc: not available
URL: <http://foo-projects.org/pipermail/lunar/attachments/20100926/f5cc045d/attachment.bin>
More information about the Lunar
mailing list