High Performance Computing

  • There is an ever increasing demand for computation power to improve the speed or accuracy of solutions to real-world problems through faster computations and/or bigger simulations
  • Important factors to consider in parallel programming:
    • Physical limitations: the speed of light, CPU heat dissipation
    • Economic factors: cheaper components can be used to achieve comparable levels of performance in total
    • Scalability: allows data to be divided over CPUs to increase performance
    • Memory: allow aggregate memory bandwidth to be increased together with processing power
  • Why not parallelize all programs?
    • Bad written parallel programs can be worse than their sequential counterparts
      • Slower because of communication overhead
      • Some parallel algorithms are only faster when problem size is really large
    • Not all problems are a good fit for parallelism
    • Some algorithms are inherently sequential
    • There is no linear speedup for parallel algorithms compared to sequential versions
      • Due to overhead of communication, resource contention and synchronization


  • The speedup of an algorithm using \(P\) processors is defined as

$$ S_P = \frac{t_S}{t_P} $$

  • with \(t_S\) = execution time of the best sequential algorithm
  • and \(t_P\) = execution time of the parallel algorithm
  • The speedup is
    • ideal if \(S_P = P\)
    • linear if \(S_P \approx P\)
    • superlinear if, for some \(P\), \(S_P > P\)
  • Superlinear speedup might be caused by cache effects
    • When data is partitioned and distributed over multiple processors, then the individual data items are (much) smaller and may fit entirely in the data cache of each processor


  • The efficiency of an algorithm using \(P\) processors is defined as

$$ E_P = \frac{S_P}{P} $$

  • With efficiency we estimate how well-utilized the processors are in solving the problem, compared to how much effort is lost by the overhead
  • Ideal efficiency means \(E_P = 1\)


  • With speedup we describe how the parallel algorithm’s performance changes with increasing \(P\)
  • Scalability is about the efficiency of an algorithm with changing problem size \(N\) by choosing \(P\) dependent on \(N\)

Amdahl’s Law

  • Several factors limit speedup
    • Processors may idle
    • Extra computation steps might be performed in the parallel version
    • Communication and synchronization overhead
  • Informally: Amdahl’s law states that speedup for parallel algorithms is always inherently limited
  • There’s also Gustanfson’s law


  • Memory hierarchy forms a barrier to performance when locality is poor
  • Temporal locality
    • The same memory location is accessed frequently and repeatedly
    • Poor temporal locality results from frequent access to fresh new memory locations
  • Spatial locality
    • Consecutive (or “near enough”) memory locations are accessed
    • Poor spatial locality means that memory locations are accessed in a more random pattern

Future of Computing

  • Increased transistor density causes huge CPU energy consumption and heat dissipation issues
    • This fundamentally limits CPU clock frequencies
    • CPU performance will be relatively flat
  • Computers will get a lot cheaper but not faster