next up previous contents
Next: CRAY_MPP Up: Pre-compiler flags overview, parallel Previous: T3D_SMA   Contents


scaLAPACK

If specified, VASP will use scaLAPACK instead of LAPACK for the LU decomposition (timing ORTHCH) and diagonalisation (timing SUBROT) of the sub space matrix ( $N_{\rm bands} \times N_{\rm bands}$). These operations are very fast in the serial version ($2 \%$) but become a bottleneck on massively parallel machine for systems with many electrons. If scaLAPACK is installed on massively parallel machine use this switch (T3E, SGI, IBM SPX). scaLAPACK can be used on the T3E starting from programming environment 3.0.1.0. (3.0.0.0 does for instance not offer the required routines). On the T3D (but not T3E) the additional switch


 -DT3D_SCA
must be specified, at least for the scaLAPACK version we have tested (the T3D scaLAPACK is not compatible to standard scaLAPACK routines).

On slow networks and PC clusters (100 Mbit Ethernet and even 1 Gbit Ethernet), it is not recommended to use scaLAPACK. Performance improvements are small or scaLAPACK is even slower than LAPACK. If you still want to give it a try, please download the required source files from www.netlib.org/SCALAPACK. Compilation is fairly straightforward, but requires familiarity with MPI, Fortran, C and UNIX makefiles (always make sure that the underlying BLACS routines are working correctly !).

ScaLAPACK can be switched of during runtime by specifying


 LSCALAPACK = .FALSE.
in the INCAR file. Use this as a fallback, when you encounter problems with scaLAPACK. Furthermore, in some cases, the LU decomposition (timing ORTHCH) based on scaLAPACK is slower than the serial LU decomposition. Hence it also is possible, to switch of the parallel LU decomposition by specifying

 LSCALU = .FALSE.
in the INCAR file (the subspace rotation is still done with scaLAPACK in this case).


next up previous contents
Next: CRAY_MPP Up: Pre-compiler flags overview, parallel Previous: T3D_SMA   Contents
Georg Kresse
2009-04-23