Page tree
Skip to end of metadata
Go to start of metadata

The Linux side of Bragg is being replaced by the new Bracewell cluster.

The Bragg Windows HPC cluster will still be operational for the time being.

System Overview

The CSIRO Accelerator Cluster Bragg consists of the following components:

  • 128 Dual Xeon 8-core E5-2650 Compute Nodes (i.e. a total of 2048 compute cores) with 128 GB of RAM, 500 GB SATA storage and FDR10 InfiniBand interconnect
  • 384 Kepler Tesla K20 GPUs (a total of  950,976 CUDA cores)
  • 162 port FDR InfiniBand Switch
  • large shared NFS/Windows file systems

The cluster is supplied by Xenon Systems of Melbourne and is located in Canberra, Australia.

The Bragg cluster is currently (November 2014) at position 154 on the Top500 list of supercomputers

Using the Cluster

The Accelerator cluster supports both Linux and Windows HPC environments.

See: Using Linux HPC systems, Using Windows HPC systems and Data Handling.

The linux system's full external network name is:

bragg-gpu.hpc.csiro.au

The windows system's remote desktop login node network name is:

bragg-w.csiro.au

If you need to develop software for the GPUs the bragg-w-test node is available with the same hardware as the GPU compute nodes. Use Microsoft Remote Desktop utility and connect to:

bragg-w-test.csiro.au

File systems

The filesystem setup on the Accelerator cluster follows SC filesystem conventions.

The same filesystems are available from both the windows and linux environments but there are only environment variables in the linux environment. The filesystems can be accessed as network shares from outside the cluster via \\braggxxxx\xxxx (xxxx = home, data or flush) or from inside the cluster the same files are available using the faster internal network via \\braggxxxx.cluster\xxxx (xxxx = home, data or flush). Inside the cluster the file systems are mounted as c:\xxxx (xxxx = home, data or flush) which can be used in place of the full URL.

The table below lists the mapping between the linux and windows environments.

bragg-gpu variable name

bragg-w share (NEXUS)

bragg-w share (Internal to Cluster)

$HOME

\\bragghome\home\yourident

c:\home\yourident

$FLUSH1DIR

\\braggflush1\flush1\yourident

c:\flush1\yourident

$FLUSH2DIR

\\braggflush2\flush2\yourident

c:\flush2\yourident

$DATADIR

\\braggdata\data\yourident

c:\data\yourident

$TMPDIR/$LOCALDIR

not available

not available

$MEMDIR

not available

not available
$STOREDIR

(only available via an IO job;
see below for details)

\\hpc.csiro.au\yourident
\\hpc.csiro.au\yourident

Even if you use the windows environment, it may be useful to transfer files using the linux environment and services. Also, checking your quota is most easily done by logging into the linux environment (and running 'quota').

Up to 16GB can be used for MEMDIR on each node in the linux environment.

Copy files between Bragg and Ruby

Access to $STOREDIR on Bragg currently is only available via the pearcey-dm data mover node. The reason is because Bragg is located in Canberra while the data store is in Docklands, Victoria. The current network design on the cluster is not optimised for the long distance network connectivity for data access purpose in which there is a risk of network issue and thus can cause nodes on Bragg to hang.

The recommended way to access the data mover node is by starting an interactive batch shell via a command like this:

sinteractive -p io

See "sinteractive --help" for other options.

In the current setup, the data mover node, pearcey-dm, is used by both Bragg and Pearcey. Therefore, when you run an IO job on Bragg, you will be given access to pearcey-dm. Note that $HOME on Pearcey is different from Bragg, while $DATADIR, $FLUSH1DIR and $FLUSH2DIR are shared across the two clusters.

From within the interactive shell you can then copy your files from and to $STOREDIR. If you need access to multiple offline files, you should pre-fetch them with the "dmget" command on pearcey-dm (or on ruby); this will save you a lot of time by bringing them back in concurrently as a group rather than consecutively, one at a time.  See DMF - dmget command - recall files

Batch System

On bragg-gpu, you can see what queues are available using the general information on: Job queues

 

 

The root page ASC:@self could not be found in space Scientific Computing.