Jabir Hussain

Research Notes Blog About


Text: Week 1

What is HPC?

  • Definition: High-Performance Computing uses supercomputers and clusters to solve advanced computational problems.
  • Supercomputer: Performs at much higher levels than general-purpose computers.
  • Performance metrics:
    • FLOPS (floating point operations per second)
    • Memory bandwidth
    • Storage and interconnect speed
  • Architecture:
    • Many racks of compute nodes.
    • Each node has multiple CPUs (sockets) and cores.

Number Representation

Integers

  • Example: 121 → 1-byte integer 01111001
  • Representation is exact (signed vs. unsigned)
  • Arithmetic is exact.

Reals (Floating Point)

  • Scientific notation: 123456 ≈ 1.234 × 10⁶
  • Computers use base-2 exponent + mantissa representation.
  • Rounding errors:
    • 0.9 cannot be represented exactly.
    • 1 + 1e−8 = 1 (in 32-bit float)
  • Example:

    123456 in 32-bit representation:

    0 10001111110001001000000000000000


FLOPS

  • FLOPS = Floating Point Operations Per Second.
  • Primary metric for scientific compute performance.

Example: Apollo Guidance Computer (1969)

  • 72 KB ROM, 4 KB RAM
  • 14,245 FLOPS @ 2 MHz
  • Power: 40 W

Modern Supercomputer: El Capitan (LLNL)

  • 11,039,616 compute cores
  • 2 GHz, 5.7 PB memory
  • Power: 30 MW
  • Performance: 2.8 exaFLOPS (2.8×10¹⁸)

Your smartphone can reach several teraFLOPS!


Moore’s Law

  • Observation (1965): Transistors on a chip double every ~2 years.
  • Achieved by scaling down feature size (20μm → 3–5nm).
  • Predicted to end around 2025.
  • Led to exponential growth in compute power.

Inside a Chip: MOSFET

  • MOSFET = Metal-Oxide-Semiconductor Field-Effect Transistor
  • Gate voltage controls current between source and drain.
  • Building block of logic gates.
  • Almost no current flows except during switching.

CMOS Logic

  • CMOS = Complementary MOSFET design.
  • No power dissipated except during switching.
  • Smaller devices → smaller capacitance → faster switching.
Logic Example Output Behavior
NOT 1 input Inverts signal
NAND 2 input Output = 1 except when both inputs = 1

Dennard Scaling (1974)

  • As transistors shrink, power density remains constant.
  • Voltage & current scale with length → faster switching at same power.

Ceased Around 2005

  • Leakage currents (thin oxide layer) increased.
  • Power consumption & heat dissipation rose.
  • Result: Clock speed increases stopped.
  • Performance now scales via multi-core parallelism.

Era Main Drivers
1940s–1960s Science, military, mainframes
1970s–1990s Rise of personal computers
1990s–2010s GPUs & gaming revolution
2000s–Now Mobile & cloud computing
2010s–??? Cryptocurrency mining

The Cost of HPC

Example: Archer2 (UK Tier 1 Supercomputer)

  • 28 PFLOPS (2020)
  • £48M for 4 years
  • Power: 25 GWh/year (~£6M)
  • Compute nodes: 85% of energy cost

Carbon footprint:

  • Hardware: 7,320 tons CO₂
  • Electricity (if non-renewable): +7,000 tons/year
  • Using renewable energy reduces this drastically.
  • Reducing clock speed (2.3 → 2.0 GHz):
    • Performance: 74–95%
    • Energy use: 80–93%

Towards Zettascale Computing

  • Goal: 10²¹ FLOPS (zettascale).
  • On current tech: ~21 GW (≈21 nuclear reactors).
  • Focus now on performance per watt and energy efficiency.

HPC Challenges

As systems scale, problems grow:

Challenge Description
Efficiency Using FLOPS effectively
Latency Data availability on time
Pipelining Keeping processors busy
Parallelism Multi-core usage
Load Balance Fair distribution of work
Scalability Adapting to resource size
Energy Efficiency Power usage control
Memory Capacity Fit entire problem in memory
Data Locality CPU access to needed data
Storage Manage input/output
Reproducibility Verifiable results
Reliability Correctness of computation
Visualization Interpretable outputs

CPU vs GPU

Feature CPU GPU
Purpose General tasks High-throughput maths
Design Fewer, powerful cores Thousands of simple cores
Memory Access to main system RAM Own on-board fast memory
Best for Serial / control-heavy tasks Parallel / matrix-heavy tasks

Programming Languages

  • Compiled (C, Fortran): Fast, efficient, low-level memory control.
  • Interpreted (Python): Slower, but flexible and easy to use.
  • Domain-specific Python libraries (NumPy, PyTorch, TensorFlow) can outperform low-level code for certain workloads.

Summary

  • HPC = complex ecosystem of hardware, software, and human optimization.
  • Single-core performance has stagnated.
  • Modern improvements = parallelism + efficiency.
  • Optimisation remains critical for effective scientific computing.