Performance of serial code

The table also shows the timings for the bench.Hg.tar and bench.PdO benchmarks, which are located on the VASP server in the src directory (bench.Hg.tar.gz and bench-PdO.tar.gz). The shown numbers are those written in the line ``LOOP+'' in the OUTCAR file (type: grep 'LOOP+' OUTCAR).

You can test your own machine by compiling ffttest and dgemmtest in the VASP.4.X (X3) directory, and typing

Currently, all high performance machines run VASP fairly well. The cheapest option (best value at lowest price) are presently AMD Athlon-64 based and Intel P4 PC's. For compilation we recommend the ifc compiler. Which processor (clock speed) to buy depends a little bit on the budget and the available space. If you need a high packing density, dual Opteron machines are a good option. IBM Power 4 based machines, Intel Itanium (SGI Altix, HP-UX) remain competitive, but at a somewhat steeper price than PC's.


	IBM RS6000	IBM RS6000	IBM RS6000	IBM RS6000	IBM RS6000	IBM SP3
	590	3CT	595 $^{++}$	595 $^{++}$	397	High Node
lincom-TPP(Mflops)	245	237	389	389	580	1220
matrix-vec(Mflops)	110	73/128	110	110	300	300/400
Lincom-TPP	40.6 s	42.7 s	25.0 s	21.4 s	17.8 s	8.4 s
matrix-vec	32.3 s	40.4 s	32.3 s	19.4 s	15.3 s	12.1 s
fft	31.4 s	35.0 s	24.0 s	17.3 s	14.4 s	5.1 s
TOTAL	103 s	117 s	81.3 s	58.3 s	47.5 s	26.8 s
RATING	1	0.9	1.3	1.8	2.2	3.8
bench.Hg	1663	1920	1380	1000	809	356
	IBM RS6000	IBM SP4	ITANIUM 2	ITANIUM 2	Altix 350	Altix 3700 Bx2
	590		1300	1300	1600	1600
			HP-UX	LINUX	SUSE SLES9	SUSE SLES 9
lincom-TPP(Mflops)	245	3100	5000	4300	5932	6129
matrix-vec(Mflops)	110	600/800	1200/2300	1200/1500	1378/2021	2671/3135
Lincom-TPP	40.6 s	3.2 s	2.0 s	2.3 s	1.7 s	1.7 s
matrix-vec	32.3 s	6.0 s	2.3 s	2.6 s	3.1 s	1.9 s
fft	31.4 s	2.8 s	1.7 s	2.1 s	1.1 s	1.1 s
TOTAL	103 s	12.0 s	6.0 s	7.2 s	5.9 s	4.7 s
RATING	1	8.5	16.3	14.8	17.5	21.9
bench.Hg	1663	181/50	127	135	81	76
bench.PdO		4000/1129	2758	2900	1733	1625/450
	SGI	SGI	SUN	DEC-SX	DEC-LX
	Power C.	Origin	USparc 366	ev5/530	ev5/530
lincom-TPP(Mflops)	300	430	290	439	650
matrix-vec(Mflops)	38	100/150	42/65	74/108	67/100
Lincom-TPP	32.0 s	22.0 s	19.7 s	21.8 s	14.3 s
matrix-vec	90.2 s	31.0 s	59 s	40.3 s	48.8 s
fft	41.0 s	17.0 s	24 s	26.1 s	17.8 s
TOTAL	163 s	70 s	111 s	90 s	81 s
RATING	0.64	1.47	0.9	1.12	1.3
bench.Hg	2200/653	1200/330	1660	1424	1140
	DS20	DS20	DS20e	UP2000	UP2000	UP 1000
	ev6/500	ev6/500	ev6/666	ev6/666	ev6/666	ev6/600
lincom-TPP(Mflops)	800	1000	1200	1100	1100	800
matrix-vec(Mflops)	135/200	135/200	135/200	170/260		140/200
Lincom-TPP	12.0 s	10.6 s	8.4 s	9.3 s	9.0 s	11.4 s
matrix-vec	19.8 s	20.8 s	17.6 s	17.9 s	17.1 s	30.0 s
fft	9.8 s	8.6 s	6.7 s	8.5 s	7.7 s	10.9 s
TOTAL	41.4 s	40.0 s	33.7 s	35.7 s	34 s	52 s
RATING	2.4	2.6	3.1	2.8	3.0	2.0
bench.Hg	546	536	385	465	453	786
bench.Hg	584	564	395	516	485
bench.PdO		10792	8151
	CRAY T3D	CRAY T3E	CRAY T3E	CRAY	CRAY	VPP
	ev4	ev5	1200	C90	J90	500
lincom-TPP(Mflops)	96	400	579	800	188	1500
matrix-vec(Mflops)	28/42	101	101	459	50	600
lincom-tpp	99.5 s	25 s	16.5 s	12.0 s	53 s	7.1 s
matrix-vec	110.0 s	33 s	33 s	8.3 s	74 s	5.0 s
fft	174.0 s	42 s	34 s	6.9 s	43 s	5.4 s
TOTAL	400 s	100 s	100 s	27.2 s	170 s	17.5 s
RATING	0.25	1.0	1.2	4.1	0.6	6.5
bench.Hg		639	420			220


LINUX	Xeon GX	Xeon GX	PIII BX	PIII BX	PIII
based PC's	450	550/512	450	500	700c
lincom-TPP(Mflops)	268	378	303	324	500
matrix-vec(Mflops)	70/100	90/120	80/105	90/118	90/118
Lincom-TPP	36 s	27.3 s	34.0 s	32.9 s	29.6 s
matrix-vec	44 s	37.1 s	43.2 s	41.9 s	30.0 s
fft	27 s	22.4 s	26.6 s	24.6 s	25.1 s
TOTAL	107 s	87 s	104 s	100 s	84 s
RATING	1	1.18	1.0	0.9	0.9
bench.Hg		1631	2000	1866	1789
LINUX $^{}$	Athlon	Athlon	Athlon	Athlon	Athlon	Athlon
based PC's	550	TB 800	TB 850	TB 850	TB 900	1200
lincom-TPP(Mflops)	700	770	800	850	890	1100
matrix-vec(Mflops)	100/142	115/190	115/190	130/210	120/200	200/300
Lincom-TPP	16.8 s	12.8 s	12.3 s	11.6 s	11.3 s	8.6 s
matrix-vec	30.6 s	26.3 s	25.8 s	22.6 s	24.6 s	18.7 s
fft	19.5 s	18.7 s	18.0 s	17.3 s	14.0 s	10.9 s
TOTAL	67 s	57.8 s	56 s	51.5 s	50 s	38.3 s
RATING	1.5	1.8	1.8	2.0	2.1	2.5
bench.Hg	1350 s	1131 s	1124 s	1045 s	959 s	818 s
LINUX	Athlon	Athlon	Opteron	Opteron	Opteron	Opteron
based PC's	1400	XP/1900	244	246	250	246
	SDRAM	DDR	32 bit	32 bit	32 bit	64 bit
lincom-TPP(Mflops)	1200	2200	2900	3300	3800	3300
matrix-vec(Mflops)	200/300	230/370	650/850	700/950	750/1050	700/950
Lincom-TPP	5.9 s	4.9 s	3.5 s	3.1 s	2.7 s	3.2 s
matrix-vec	17.3 s	13.1 s	5.4 s	4.3 s	4.2 s	3.9 s
fft	9.8 s	7.3 s	3.3 s	3.0 s	2.6 s	2.6 s
TOTAL	39.3 s	25.3 s	12.2	10.4 s	9.5 s	9.8 s
RATING
bench.Hg	644	455	248	203	177	211
bench.PdO		8412	4840	4256	3506	4172
LINUX $^{}$	Ath-64
based PC's	3700+
	DDRAM
lincom-TPP(Mflops)	3400
matrix-vec(Mflops)	700/1050
Lincom-TPP	2.9 s
matrix-vec	4.3 s
fft	2.6 s
TOTAL	9.8 s
RATING
bench.Hg	173
bench.PdO	3550


LINUX	P4	XEON	XEON	XEON	P4 nrthw	P4 nrthw
based PC's	1700	2400	2800	2800	3200	3400
	RAMBUS	RAMBUS	RAMBUS	DDR	FSB 800	FSB 800
lincom-TPP(Mflops)	2000	3030	4100	4200	4700	5400
matrix-vec(Mflops)	422/555	600/750	566/880	650/950	890/1300	1200/1500
Lincom-TPP	5.5 s	3.5 s	2.6 s	2.5 s	2.3 s	2.0 s
matrix-vec	7.6 s	5.3 s	5.6 s	5.0 s	3.9 s	3.8 s
fft	7.5 s	4.9 s	3.1 s	2.9 s	2.6 s	2.4 s
TOTAL	20.6 s	13.7 s	11.3 s	10.5 s	8.8 s	8.2 s
RATING	5	7.5	9.4	10	11.7	12.5
bench.Hg	384	298	226/94	208/85	175	165
bench.PdO	7600	6335	4790/1801	4542/1787	3784	3250
LINUX	P4 pres	P4 pres	P4 pres	P4 940s	P4 940s
based PC's	3200	3400	3400	2x3200	2x3200
	FSB800/DDR1	FSB800/DDR2	FSB800/DDR2	FSB800/DDR2	FSB800/DDR2
lincom-TPP(Mflops)	5200	5200	5200	5500	5500
matrix-vec(Mflops)	1000/1300	1000/1300	1000/1300	1100/1400	1100/1400
Lincom-TPP		2.0 s	2.0 s	1.9 s	1.9 s
matrix-vec		3.1 s	3.1 s	2.8 s	2.8 s
fft		2.0 s	2.0 s	1.8 s	1.7 s
TOTAL	7.1 s	7.1 s	7.1 s	6.5 s	6.5 s
RATING	14.5	14.5	14.5	16.5	16.5
bench.Hg	148/47	144	129	129	111
bench.PdO	3224/939	2850	2580		2270