If specified, VASP will use scaLAPACK instead of LAPACK
for the LU decomposition (timing ORTHCH) and diagonalisation (timing SUBROT)
of the sub space matrix (
).
These operations are very fast in
the serial version (
) but become a bottleneck on
massively parallel machine for systems with many electrons.
If scaLAPACK is installed
on massively parallel machine use this switch (T3E, SGI, IBM SPX).
scaLAPACK can be used on
the T3E starting from programming environment 3.0.1.0. (3.0.0.0 does for
instance not offer the required routines). On the
T3D (but not T3E) the additional switch
-DT3D_SCAmust be specified, at least for the scaLAPACK version we have tested (the T3D scaLAPACK is not compatible to standard scaLAPACK routines).
On slow networks and PC clusters (100 Mbit Ethernet and even 1 Gbit Ethernet), it is not recommended to use scaLAPACK. Performance improvements are small or scaLAPACK is even slower than LAPACK. If you still want to give it a try, please download the required source files from www.netlib.org/SCALAPACK. Compilation is fairly straightforward, but requires familiarity with MPI, Fortran, C and UNIX makefiles (always make sure that the underlying BLACS routines are working correctly !).
ScaLAPACK can be switched of during runtime by specifying
LSCALAPACK = .FALSE.in the INCAR file. Use this as a fallback, when you encounter problems with scaLAPACK. Furthermore, in some cases, the LU decomposition (timing ORTHCH) based on scaLAPACK is slower than the serial LU decomposition. Hence it also is possible, to switch of the parallel LU decomposition by specifying
LSCALU = .FALSE.in the INCAR file (the subspace rotation is still done with scaLAPACK in this case).
Next: CRAY_MPP
Up: Pre-compiler flags overview, parallel
Previous: T3D_SMA
  Contents
Georg Kresse
2009-04-23