UBELIX

  • {{ layer1.title }}
    • {{ layer2.title }}
      • {{ layer3.title }}
        • {{ layer4.title }}
  1. Home

UBELIX 101

View in Confluence Edit Page Log In Log Out

Description

This page provides an general overview of the UBELIX cluster. It describes the different components of the cluster that you will interact with. Subsequent pages in UBELIX 101 will take you on a tour on important topics that will provide you the information to get up and running with UBELIX. You will learn how to login to the cluster, how to move files between the cluster and your local workstation, and how to submit your first job. If you never worked on a shared cluster and/or never used Slurm before, this page and subsequent pages are your starting point.

Some remarks about Linux:

UBELIX is a Linux cluster meaning that all nodes in the cluster run on Linux (CentOS). We assume that you have at least a basic knowledge of how to get things done on a Linux operating system. This includes using a text editor, manipulating files and writing simple bash scripts. While we have some tutorials about selected topics we will not provide you with a crash course in Linux, yet we will provide you with links to useful resources.

On this page

Introduction

UBELIX (University of Bern Linux Cluster) is a HPC cluster that consists of about 266 compute nodes/4'408 cores and a software-defined storage infrastructure providing ~580 TB of disk storage net. Compute nodes, front-end servers and the storage are interconnected through a high speed Infiniband network. The front-end servers also provide a link to the outside world. UBELIX is used by various institutes and research groups within chemistry, biology, physics, astronomy, computer science, geography, medical radiology and others for scientific research and by students working on their thesis.

UBELIX System Overview


Login Server AKA Frontend Server AKA Submit Server

A user connects to the cluster by logging into the submit host via SSH. You can use this host for medium-performance tasks, e.g. to edit files or to compile programs. Resource-demanding/high-performance tasks must be submitted to the batch queuing system as jobs, and will finally run on one or multiple compute nodes. Even long running compile tasks could fit as a job on a compute instead of running it on the submit host

Batch-Queueing System 

On UBELIX we use the open-source batch-queueing system Slurm for executing jobs on a pool of cooperating compute nodes. Slurm manages the distributed resources provided by the compute nodes and is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of sequential, parallel or interactive user jobs.

Compute Nodes

Compute nodes eventually execute the user jobs. After submitting a job to the cluster, the scheduler checks the job's resource requirements and dispatches it to one or multiple compute nodes that can fulfill those requirements at the given time. The following table contains the hardware details for the different compute nodes:

Class #Nodes CPU Type #Cores RAM Local Scratch
anodes 108 Intel Xeon CPU E5-2630 v4 @ 2.0GHz 20 125GB 850GB
enodes 28

Intel Xeon CPU X5550 @ 2.67GHz

Intel Xeon CPU E5620 @ 2.40GHz

8 30GB 100GB
fnodes 28

Intel Xeon CPU  X5680 @ 3.33GHz

Intel Xeon CPU X5650 @ 2.67GHz

Intel Xeon CPU E5649 @ 2.53GHz

12 46GB 250GB
hnodes[01-42] 42 Intel Xeon CPU E5-2665 0 @ 2.40GHz 16 78GB 250GB
hnodes[43-49] 7 Intel Xeon CPU E5-2695 v2 @ 2.40GHz 24 94GB 500GB
jnodes 21 Intel Xeon CPU E5-2665 0 @ 2.40GHz 16 252GB 500GB
knodes 36 Intel Xeon CPU E5-2650 v2 @ 2.60GHz 16 125GB 850GB
knlnodes 4 Intel Xeon Phi CPU 7210 @ 1.30GHz 64 108GB
+16GB on CPU
170GB (SSD)

Cluster Partitions (Queues) 

A partition is a container for a class of jobs. You can choose a partition depending on your jobs requirements. UBELIX provides 4 different partitions as shown in the following table:

Partition name max runtime (wall clock time) in h max memory per node max (cores|threads)/node (shared memory jobs) Compute Nodes

all

96h 252 GB 20cores | 1 thread per core = 20 "slurm cpus" anodes, enodes, fnodes, hnodes[01-34],
jnodes, knodes 
empi 24h 125 GB 20 | 1 thread per core = 20 "slurm cpus" anodes
long1) 360h 94 GB 24 | 1 thread per core = 24 "slurm cpus" hnode[43-49]
phi2) 24h 108GB 64 | 4 threads per core = 256 "slurm cpus" knlnode[01-04]

1) Due to the limited resources and the potentially long job runtime, access to the long partition must be requested explicitly once.
2) The phi partition is currently open for all. Only use with code that can benefit of the architecture.

The all partition is the default partition if you do not specify one explicitly.

Storage Infrastructure

A modular, software-defined storage system (IBM Spectrum Scale) provides a shared, parallel file system that is mounted on all frontend servers and compute nodes. Ubelix also provides a limited amount of storage space on the Campus Storage. The different storage locations are summarized in the table below. For more information about the storage infrastructure see here.

Path Connection Availability Backup Quota
/home/ubelix/<group>/<user> Network global no yes1)
/home/storage/<group>/<user> Network submit host yes yes2)

1) Default: 3TB/user, 15TB/group

2) Default: 50GB/user

Related pages: