1.1 Requirement: cygwin (gcc, g++, g77, make, gdb )
(#Use cygwin.bat and don't use RXVT,VT102 terminal emulator or you can't go after 1.5.6.)
1.2 Access to http://www.netlib.org/atlas/and obtain atlas3.6.0.gz.
1.3 After downloading atlas3.6.0.gz in /cygwin/usr/local/, unzip atlas3.6.0.gz, when the command line is
$ gunzip -c atlas3.6.0.gz | tar xv
or
$ tar xvfz atlas3.6.0.gz
1.4 Since current directory is /cygwin/usr/local/, move it to /cygwin/usr/local/ATLAS/, when the command line is
$ cd ATLAS
1.5 Build xconfig.exe, when the command line is make and answer the question below appropriately.
1.5.1 160
159
158
...
3
2
1
Enter number at top left of screen [0]: 160
1.5.2 Have you scoped the errata file? [y]: y
1.5.3 Are you ready to continue? [y]: y
1.5.4 Enter your machine type:
1. Other/Unknown
2. AMD Athlon
3. 32 bit AMD Hammer
4. 64 bit AMD Hammer
5. Pentium PRO
6. Pentium Ⅱ
7. Pentium Ⅲ
8. Pentium 4
Enter machine number [1]: 2
1.5.5 enable Posix threads support? [n]: y
1.5.6 Enter the number processors in system [0]: 2
1.5.7 use express setup? [y]: y
1.5.8 Enter Architecture name (ARCH) [WinNT_ATHLONSSE2_2]:
WinNT_ATHLONSSE2_2
1.5.9 Enter Maximum cache size (KB) [4096]: 4096
1.5.10 Enter File creation delay in seconds [0]: 0
1.5.11 Tune the Level 1 BLAS? [y]: y
1.6 Build ATLAS, when the command line is
# make install arch=WinNT_ATHLONSSE2_2
This work will take more than 2.5 hours to complete.
1.7 Copy the library files, when the command lines are given below.
$ cd ./lib/WinNT_ATHLONSSE2_2
# cp *.a /lib
$ ranlib /lib/liblapack.a
$ ranlib /lib/libatlas.a
$ ranlib /lib/libcblas.a
$ ranlib /lib/libf77blas.a
$ ranlib /lib/libptf77blas.a
$ ranlib /lib/libptcblas.a
$ ranlib /lib/libtstatlas.a
1.8 Now you can use ATLAS, when the command lines are given below.
$ g77 -o file01 file01.f -llapack -lf77blas -lcblas -latlas -lg2c -lm
or
$ g95 -o file01 file01.f90 -llapack -lf77blas -lcblas -latlas -lg2c -lm
file01 is an example and you can use other fortran 77 or 95 programs.
2.1 About the LINPACK benchmark
Access to http://www.netlib.org/benchmark/and you can obtain 1000s, 1000d, linpacks, and linpackd, which are benchmark programs. The data which has the largest mflops in the several trials is adopted.
2.1.1 The results from LINPACK 1000s benchmark
$ g77 -o 1000s 1000s.f
norm resid | resid | machep |
9.56832123E+00 | 5.70633099E-04 | 1.19209290E-07 |
X(1) | X(n) | |
1.00003088E+00 | 9.99999046E-01 | |
factor | solve | total |
2.418E+00 | 0.000E+00 | 2.418E+00 |
mflops | unit | ratio |
2.765E+02 | 7.232E-03 | 4.318E+01 |
2.1.2 The results from LINPACK 1000d benchmark
$ g77 -o 1000d 1000d.f
norm resid | resid | machep |
1.05174252E+01 | 1.16766853E-12 | 2.22044605E-16 |
X(1) | X(n) | |
1.00000000E+00 | 1.00000000E+00 | |
factor | solve | total |
2.995E+00 | 0.000E+00 | 2.995E+00 |
mflops | unit | ratio |
2.233E+02 | 8.958E-03 | 5.348E+01 |
2.2 About the DGEMM benchmark
Access to http://www.mcs.anl.gov/index.php, go to Software>MPICH>Win IA32 Binary (1.2.1p1), and obtain mpich2-1.2.1p1-win-ia32.msi. After downloading mpich2-1.2.1p1-win-ia32.msi, we install it on Windows system. Then we add C:\Program Files\MPICH2\bin to path in environment variables, copy libfmpich2g.a, libmpi.a, and libmpicxx.a into /lib, and enter the command lines given below.
$ ranlib /lib/libfmpich2g.a
$ ranlib /lib/libmpi.a
$ ranlib /lib/libmpicxx.a
Then access to https://computecanada.org/ and go to Committees>TECC>Working groups>Benchmarking>Benchmark Collection>Microbenchmarks>DGEMM, and you can obtain dgemm-1.0.0.tar.gz.
$ tar xvfz dgemm-1.0.0.tar.gz
$ cd dgemm-1.0.0
$ cp ./setup/Make.Linux_AtlasFBLAS_Lam ./
We edit Make.Linux_AtlasFBLAS_Lam to solve the problem about Message Passing library (MPI).
$ vi Make.Linux_AtlasFBLAS_Lam
MPlib = -lfmpich2g -lmpi -lmpicxx
# When the value of MPlib was null, it also
# went well.
We build dgemm-1.0.0, when the command line is given below.
$ make arch=Linux_AtlasFBLAS_Lam
Now we can use mpiexec.exe and hpcc-dgemm.exe. Then we will confirm the performance differences ATLAS+BLAS and BLAS which is included in LAPACK using the benchmark software.
$ mpiexec -np 1 hpcc-dgemm 100
Figure1 shows the processing speed of ATLAS+BLAS is approximately 5.6 times faster than the speed of BLAS mentioned above regarding the performance measured by Single DGEMM Gflop/s and the performance differeneces related to the calculation accuracy is not mentioned.
Figure 1 The DGEMM Performance
Finally, I am happy to assist you in installing ATLAS and the DGEMM benchmark and evaluating the performance differences of some software libraries.