This lesson is in the early stages of development (Alpha version)

Introduction to High-Performance Computing: Glossary

Key Points

Why use a Cluster?
  • High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world.

  • These other systems can be used to do work that would either be impossible or much slower on smaller systems.

  • HPC resources are shared by multiple users.

  • The standard method of interacting with such systems is via a command line interface.

Connecting to a remote HPC system
  • An HPC system is a set of networked machines.

  • HPC systems typically provide login nodes and a set of worker nodes.

  • The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).

  • Files saved on one node are available on all nodes.

Exploring Remote Resources
  • An HPC system is a set of networked machines.

  • HPC systems typically provide login nodes and a set of compute nodes.

  • The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted filesystems, etc.).

  • Files saved on shared storage are available on all nodes.

  • The login node is a shared machine: be considerate of other users.

Scheduler Fundamentals
  • The scheduler handles how compute resources are shared between users.

  • A job is just a shell script.

  • Request slightly more resources than you will need.

Accessing software via Modules
  • Load software with module load softwareName.

  • Unload software with module unload

  • The module system handles software versioning and package conflicts for you automatically.

Transferring files with remote computers
  • wget and curl -O download a file from the internet.

  • scp and rsync transfer files to and from your computer.

  • You can use an SFTP client like FileZilla to transfer files through a GUI.

Running a parallel job
  • Parallel programming allows applications to take advantage of parallel hardware.

  • The queuing system facilitates executing parallel tasks.

  • Performance improvements from parallel execution do not scale linearly.

Using resources effectively
  • Accurate job scripts help the queuing system efficiently allocate shared resources.

Using shared resources responsibly
  • Be careful how you use the login node.

  • Your data on the system is your responsibility.

  • Plan and test large data transfers.

  • It is often best to convert many files to a single archive file before transferring.

Glossary

FIXME