Serial processing pprogramming

Here is the summarized text of porting your programs from former system to current system.

Compiling Command

  • ifort
    • for Fortran programs
  • icc
    • for C programs
  • icpc
    • for C++ programs

We recommend you to use Intel Compiler which could get good performance of Xeon processors. It is also possible using GNU compiler.

Compiler Options

  • Optimization Options
    • Recomended optimization options
      The follows are recomended optimization options for bugfree programs.
      • eic or eich
        -O3 -xAVX
  • eicp
    -O3 -xCORE-AVX2
  • Optimization Options
    -O0Disables all optimizations
    -O1Enables optimizations for speed and disables some optimizations that increase code size and affect speed.
    -O2Enables optimizations for speed. This is the generally recommended optimization level.(default)
    -O3Performs O2 optimizations and enables more aggressive loop transformations.
  • Code Generation Options
    -xAVXMay generate AVX instructions
    -xCORE-AVX2May generate AVX2 instructions

    The AVX instructions support 256-bit vectors. Programs can pack four double precision floating numbers in the vectors.
    • eight single precision floatingp-point arithmetic (8x32 bit = 256 bit)
    • four double precision floatingp-point arithmetic (4x64 bit = 256 bit)

      AVX2 instruction set includes FMA(Fused Multiply-Add).
      FMA calculate the above expression in an instruction.
  • eic and eich support -xAVX option
  • eicp supports -xAVX and -xCORE-AVX2 options.
  • Floating point Operation Options
    -no-prec-div[Improves performance]Enables optimization of floating-point divides
    -fp-model fast [=1/2][Improves performance]Enables more aggressive optimizations on floating-point data
    -fp-model precise[Improves precision]Disables optimizations that are not value-safe on floating-point data
    Deault : -fp-model fast=1
  • Debug Options
    • ifort Only
      -traceback -gWhen the severe error occurs, source file, routine name, and line number correlation information is displayed along with call stack hexadecimal addresses (program counter trace).
      -traceback -g -check boundsDetermines whether checking occurs for array subscript and character substring expressions.
      -traceback -g -fpe0If floating-point invalid divide-by-zero, and overflow exceptions occur, execution is aborted.
      (*)Specifying -g turn off -O2 and make -O0 the deault unless -O2 is explicity speciied int the same command line.
      Debug Options may affect the speed of your programs. So, when debugging is done, you would be better off removing these debug options.

Specific Memory Model

The compiler restricts code and data to the first 2GB of address space. If, during linking, you fail to use the appropriate memory model and dynamic library options, an error message in this format occurs:

relocation  truncated  to  fit: R_X86_64_32S against  `.bss'
relocation  truncated  to  fit: R_X86_64_32S  against  `.bss'

When you specify option -mcmodel=medium or -mcmodel=large, it sets option -shared-intel.

-mcmodel=small(default)Tells the compiler to restrict code and data to the first 2GB of address space.
-mcmodel=mediumTells the compiler to restrict code to the first 2GB; it places no memory restriction on data.
-mcmodel=largePlaces no memory restriction on code or data.
-shared-intelThis option causes Intel-provided libraries to be linked in dynamically.

Math Kernel Library (MKL)

  • MKL provides math processing routines as follows.
  • BLAS
  • sparse solvers
  • Vector Math (VML)
  • Vector Statistics (VSL)
  • Fast Fourier Transform
  • FFTW interface for Fast Fouriew Transform
  • How to link serial version or multi-threaded version
    • Serial version
      $ ifort -o a.out -mkl=sequential
    • Multi-threaded version
      $ ifort -o a.out -mkl=parallel
  • ex.1)Vector inner product calculation using SDOT routine
    $ cat test1.f
        program test1
        real x(10), y(10), sdot, res
        integer n, incx, inxy, i
        external sdot
        n = 5
        incx = 2
        ncy = 1
        do i = 1, 10
           x(i) = real(i)
           y(i) = 1.0e0
        res = sdot(n, x, incx, y, incy)
        print*,'SDOT = ', res
    $ ifort  -O3 -xAVX test1.f  -mkl=sequential
    $ dplace ./a.out
    SDOT =    25.00000
  • ex.2)FFTW using FFT in MKL
    $ cat test2.f
    .... FFTW source code ......
    $ ifort -O3 -xAVX test2.f -I${MKLROOT}/include/fftw  -mkl=sequential
    $ dplace ./a.out

Time Functions

  • Fortran
    • dclock
      Return elapsed time from 0:00 in the day. Return value has real(8) data type.
      real(8) time1, dclock
      time1 = dclock()
      $ cat test3.f
          program test3
          real*8 dclock, t1, t2
          t1 = dclock()
          call sub()
          t2 = dclock()
           write(6,*) "time :", t2 - t1
          subroutine sub()
          call system("sleep 3")
      $ ifort -O3  -xAVX test3.f
      $ dplace ./a.out
       time :   3.01978499999677

      (*)To mimute elapsed time of your Fortran programs which running until the following day, we would introduce wrapper routine of gettimeofday. dclock.c convert the return of gettimeofday in micro second bit to second bit.

      $ cat dclock.c
       double  dclock_()
           struct timeval tp;
           struct timezone tzp;
      Bellow is a compiling example as test3.f linking with dclock.c.
      $ icc  -c  dclock.c
      $ ifort  -O3 -xAVX  -o  a.out  test3.f  dclock.o
      $ dplace  ./a.out
       time :   3.01569199562073

      In the above example, gettimeofday minute elapsed time of test3.f.
  • C/C++
    • gettimeofday
      Returns seconds and microseconds since 00:00 Jan 1, 1970.
      Return value has INTEGER data type. If an error occurs, the value is -1, otherwise 0.
      ex)The function as follows, returns elapsed time in second
      $ cat elapsed.c
           double elapsed()
           struct timeval tp;
           struct timezone tzp;
      $ cat test5.c
           int main(void)
           int i;
           float s=0; 
           double ts, te;
           double elapsed();

Performance Analyzing Tool

  • SGI Perfsuite
    For detection which functions, routines, lines consume run-rime. Using psrun command with your program. When your job is done, the result of the psrun command are available on current directory. Use psprocess to format that result files
    $ dplace psrun ./a.out
    $ ls *.xml
    The result of the psrun command is :
    $ psprocess a.out.78075.eicp1.xml
    Samples   Self   Total %  Function
       3770   69.43%   69.43%  FUNC1__
        420    7.73%   77.16%  FUNC3_
        412    7.59%   84.75%  FUNC-tmp4_
        262    4.83%   89.58%  SUB-diff_
        133    2.45%   92.03%  SUB_init_
         11    2.03%   94.05%  SUB_out_
    Samples : The number of sampling counts
    Self % : Percentage of total
    Total % : Accumulated counts
    Function : The function name

    Compilation with -g, psrun provide source-line profiling.
    Samples    Self%   Total%   Function:  File:Line
        601  10.20%  10.20%   FUNC1:/home/t2.f:556
        466   7.91%  18.12%   FUNC1:/home/t2.f:389
        258   4.38%  22.50%   FUNC1:/home/t2.f:383
        252   4.28%  26.77%   FUNC1:/home/t2.f:451
        233   3.96%  30.73%   FUNC1:/home/t2.f:178

添付ファイル: fileAVX_e.jpg 432件 [詳細]

トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ