Clusters at CÉCI

The aim of the Consortium is to provide researchers with access to powerful computing equipment (clusters). Clusters are installed and managed locally at the different sites of the universities taking part in the Consortium, but they are accessible by all researchers from the member universities. A single login/passphrase is used to access all clusters through SSH.

All of them run Linux, and use Slurm as the job manager. Basic parallel computing libraries (OpenMP, MPI, etc) are installed, as well as the optimized computing subroutines (e.g. BLAS, LAPACK, etc.). Common interpreters such as R, Octave, Python, etc. are also installed. See each cluster's FAQ for more details.

Cluster	Host	CPU type	CPU count*	RAM/node	Network	Filesystem**	GPU	Max time	Preferred jobs***
Lyra	ULB	x86-64-v4 3.25 GHz	1280 (40 x 32)	128GB	50 GbE	CephFS 500 TB	40x NVidia RTX 6000 ADA	5 days	serial / SMP / GPU
Lemaitre4	UCLouvain	Genoa 2.4 GHz	5120 (40 x 128)	766GB	HDR Ib	BeeGFS 320 TB	None	2 days	MPI
NIC5	ULiège	Rome 2.9 GHz	4672 (73 x 64)	256 GB..1 TB	HDR Ib	BeeGFS 520 TB	None	2 days	MPI
Hercules2	UNamur	Naples 2 GHz	1024 (30 x 32 + 2 x 64)	256 GB..2 TB	10 GbE	NFS 80 TB	8x NVidia A40 4x NVidia A6000	15 days	serial / SMP
Dragon2	UMons	SkyLake 2.60 GHz	592 (17 x 32 + 2 x 24)	192..384 GB	10 GbE	RAID0 3.3 TB	4x Volta V100	21 days	serial / SMP
Dragon1	UMons	SandyBridge 2.60 GHz	416 (26 x 16) 32 (2x16)	128 GB	GbE	RAID0 1.1 TB	4x Tesla C2075, 4x Tesla Kepler K20m	41 days	serial / SMP
Lemaitre3*	UCL	SkyLake 2.3 GHz Haswell 2.6 GHz	1872 (78 x 24) 112 (4 x 28)	95 GB 64 GB	Omnipath	BeeGFS 440 TB	None	2 days 6 hours	MPI
NIC4*	ULiège	SandyBridge 2.0 GHz IvyBridge 2.0 GHz	2048 (120 x 16 + 8 x 16)	64 GB	QDR Ib	FHGFS 144 TB	None	3 days	MPI
Vega*	ULB	Bulldozer 2.1 GHz	896 (14 x 64)	256 GB	QDR Ib	GPFS 70 TB	None	14 days	serial / SMP / MPI
Hercules*	UNamur	SandyBridge 2.20 GHz	512 (32 x 16)	64..128 GB	GbE	NFS 20 TB	None	63 days	serial / SMP
Lemaitre2*	UCL	Westmere 2.53 GHz	1380 (115 x 12)	48 GB	QDR Ib	Lustre 120 TB	3x Quadro Q4000	3 days	MPI
Hmem*	UCL	MagnyCours 2.2 GHz	816 (17 x 48)	128..512 GB	QDR Ib	FHGFS 30 TB	None	15 days	SMP

* Decomissioned clusters are listed with a greyed background.

Tier-1 Cluster

The Consortium also enables users with access to Tier-1 facilities, not operated by the universities. How to get acces

Cluster	Host	CPU type	CPU count*	RAM/node	Network	Filesystem**	GPU	Max time	Preferred jobs***
Lucia	Cenaero	Milan 2.45 GHz Milan 2.6 GHz	38400 (300 x 128) 1600 (50 x 32)	241 GB 241 GB	HDR Ib HDR Ib	GPFS 3.2 PB	/ 200 (50 x 4) Tesla A100	48 hours 48 hours	MPI GPU
Zenobe	Cenaero	Haswell 2.50 GHz IvyBridge 2.7 GHz	5760 (240 x 24) 8208 (342 x 24)	64..256 GB 64 GB	QDR Ib FDR Ib + QDR Ib	GPFS 350 TB	t.b.a.	24 hours	MPI

* In this context, a CPU is to be understood as a core or a hardware thread | count = #nodes x CPU/node ** Filesystem = global scratch space (other than /home) | RAID is a filesystem local to the nodes *** SMP = all processes/threads on the same node | MPI = multi-node

CÉCI clusters capabilities comparison

Lyra | Lemaitre4 | Dragon2 | NIC5 | Hercules2

The CÉCI clusters have been designed to accommodate the large diversity of workloads and needs of the researchers from the five universities.

The graph on the left shows a polar plot (also known as spider plot) representation of the capabilities of the CECI clusters.

On one end is the sequential workload. That type of workload needs very fast CPUs, accelerators, and often a large maximum job time (several weeks, or months!), requiring limitations on the number of jobs a user can run simultaneously to allow a fair sharing of the cluster.

On the other end is the massively parallel workload. For such workloads, individual core performance is less crucial, as long as there are many available cores. A job will be allowed to use a very large number of CPUs per job, but only for a limited period of time (a few days maximum) to ensure a fair sharing of the cluster. Generally, parallel workloads necessitate of course a fast and low latency network and a large parallel filesystem.

Finally, some workloads need huge amounts of memory be it RAM memory or local disk memory. Such workloads often also need many CPUs on the same node to take advantage of the large memory available (so-called "fat nodes").

The clusters have been installed gradually since early 2011, first at UCL, with HMEM being a proof of concept. At that time, the whole account infrastructure was designed and deployed so that every researcher from any university was able to create an account and login to HMEM. Then, LEMAITRE2 was setup as the first cluster entirely funded by the F.N.R.S. for the CÉCI. DRAGON1, HERCULES, VEGA and NIC4 have followed, in that order, as shown in the timeline here-under.

Common storage

We provide a central storage solution which is visible from all the frontends and compute nodes of all CÉCI clusters. This system is deployed on a private, dedicated, fast (10Gbps) network connecting all CÉCI sites. To move to your personal share on this common storage, it is just enough to do

 cd $CECIHOME

from any of the CÉCI clusters. As that common share is mounted on all of them, each file you copy there will be accessible from any CÉCI cluster.

Please, take a careful look at the documentation to learn about the other shares for fast transfer of big files between clusters and for group projects.

Lyra

Hosted at ULB, the Lyra cluster is a hyperconverged virtual HPC infrastructure designed specifically for Machine Learning (ML), Artificial Intelligence (AI), and High Throughput Computing (HTC) workloads. It consists of 40 virtual compute nodes. Each virtual node has access to one Nvidia RTX 6000 ADA GPU with 48 GB of GDDR6 memory (18,176 FP32 CUDA cores), 32-cores CPU based on the x86-64-v4 microarchitecture feature level and 128 GB of RAM.

Suitable for:

ML/AI and high throughput workloads using GPU computing (CUDA, TensorFlow, PyTorch, etc.); 5 days Wall Time.

Resources

Home directory (100 GB quota per user)
Global working directory /globalsc ($GLOBALSCRATCH) (5TB quota per user)
Node local working directory $LOCALSCRATCH dynamically defined in jobs
default batch queue*

Access/Support:

SSH to lyra.ulb.be (port 22) through your university gateway, with the appropriate login and id_rsa.ceci file.

SUPPORT: ceci-support@ulb.be

Server SSH key fingerprint: (What's this?)

ECDSA: SHA256:qND98sfbhXnPC6vkS/r/whUpIjPoSiVCuxAojG6N2jI
ED25519: SHA256:CGQINW7XKq2ZKuvJBvGrmQJG5E+pjBILmeECbiXZ0Ug
RSA: SHA256:zGzVwKSR0AZaoyX43Mmi95byqxI3NmSzysL66E0KA+c

Lemaitre4

Hosted at UCLouvain (CISM), this cluster consists of more than 5000 cores AMD Epyc Genoa at 3.7 GHz. . All the nodes are interconnected by a 100 Gbps Infiniband HDR interconnect. The compute nodes have access to a 320 TB fast BeeGFS /scratch space.

Suitable for:

MPI Parallel jobs (several dozens of cores) with many communications and/or a lot of parallel disk I/O, and SMP/OpenMP parallel jobs; 2 days max.

Resources

Home directory (100 GB quota per user)
Global working directory /scratch ($GLOBALSCRATCH)
Node local working directory $LOCALSCRATCH dynamically defined in jobs
default batch queue*

Access/Support:

SSH to lemaitre4.cism.ucl.ac.be (port 22) through your university gateway, with the appropriate login and id_rsa.ceci file.

SUPPORT: CISM

Server SSH key fingerprint: (What's this?)

ECDSA: SHA256:krYWLlE32ygG0u8uYbXUNBRTpbxDoDVyCvg3B1zLvGQ
ED25519: SHA256:mWlgUkE+tBNbklXLgvrt7pL/3Ohn7uidqFfBUU0fSkQ
RSA: SHA256:NIhjzqQgxgkG7K1x4kqoFnNSGrbc9b8AUG8+JT68jg4

NIC5

Hosted at the University of Liège (SEGI), this cluster consists of 4672 cores spread across 73 compute nodes with two 32 cores AMD Epyc Rome 7542 CPUs at 2.9 GHz. The default partition holds 70 nodes with 256GB of RAM, and a second "hmem" partition with 3 nodes with 1TB of RAM is also available. All the nodes are interconnected by a 100 Gbps Infiniband HDR interconnect (blocking factor 1,2:1). The compute nodes have access to a 520 TB fast BeeGFS /scratch space.

Suitable for:

MPI Parallel jobs (several dozens of cores) with many communications and/or a lot of parallel disk I/O, and SMP/OpenMP parallel jobs; 2 days max.

Resources

Home directory (100 GB quota per user)
Global working directory /scratch ($GLOBALSCRATCH)
Node local working directory $LOCALSCRATCH dynamically defined in jobs
default batch queue* (Max 2 days, 256GB of RAM nodes)
hmem queue* (Max 2 days, 1TB RAM nodes, only for jobs that cannot run on the 256GB nodes)
Max 320 cpus per user

Access/Support:

SSH to nic5.uliege.be (port 22) through your university gateway, with the appropriate login and id_rsa.ceci file.

FAQ: https://www.campus.uliege.be/nic5

SUPPORT: CECI support form

Server SSH key fingerprint: (What's this?)

ECDSA: SHA256:xKYPziAtsf0FwtIYYa3NDL1ibZGbhUCf9B5A8p0MR30
ED25519: SHA256:27uhpA+zocCxLayg5g1ogej/6zJnx3kLNOftg1IOXpE
RSA: SHA256:oHCr1TlkQb+4Sjq/9wzBmsd8v2QfP9jJJRO+L2284gU

HERCULES2

Hosted at the University of Namur, this system currently consists of 1536 cores spread across 30 AMD Epyc Naples and 32 Intel Sandy Bridge compute nodes. The group of AMD nodes are composed of 24 ones with a single 32-core AMD Epyc 7551P CPU at 2.0 GHz and 256 GB of RAM, 4 nodes with the same CPUs and 512 GB of RAM and 2 nodes with dual 32-core AMD Epyc 7501 CPU at 2.0 GHz and 2 TB of RAM. The Intel nodes have dual 8-core Xeon E5-2660 CPU at 2.2 GHz and 64 or 128 GB of RAM (8 nodes). All the nodes are interconnected by a 10 Gigabit Ethernet network and have access to three NFS file systems for a total capacity of 100 TB.

Suitable for:

Sared-memory parallel jobs (OpenMP or Pthreads), or resource-intensive sequential jobs, specially large in memory.

Resources

Home directory (200 GB quota per user)
Working directory /workdir (400 GB per user) ($WORKDIR)
Local working directory /scratch ($LOCALSCRATCH) dynamically defined in jobs
Nodes have access to internet
default queue* (Max 15 days)
hmem queue* (at least 64GB per core, Max 15 days)
Max 128 cpus/user on all partitions

Access/Support:

SSH to hercules2.ptci.unamur.be (port 22) with the appropriate login and id_rsa.ceci file.

SUPPORT: ptci.support@unamur.be

Server SSH key fingerprint: (What's this?)

MD5:66:50:e1:67:91:d8:17:1e:b7:be:48:00:e2:2c:7a:9f
ED25519 SHA256:fHuc0Y+QuAZW2FrI9NXrfDt2CeDmVWD6wHeDW4I3ztw.
RSA: SHA256:SyLaaBe7CuO7Dpa6vJa0vbAUxnYSpl30xaJo5yBF//c

DRAGON2

Hosted at the University of Mons, this cluster is made of 17 computing nodes, each with two Intel Skylake 16-cores Xeon 6142 processors at 2.6 GHz, with 15 nodes having 192GB of RAM and 2 with 384GB, all of them with 3.3 TB of local scratch disk space. Two additional nodes with two Intel Skylake 12-cores Xeon 6126 processors at 2.6 GHz have each two high-end NVidia Tesla V100 GPUs (5120 CUDA Cores/16GB HBM2/7.5 TFlops double precision). The compute nodes are interconnected with a 10 Gigabit Ethernet network.

Suitable for:

Long (max. 21 days) shared-memory parallel jobs (OpenMP or Pthreads), or resource-intensive (cpu speed and memory) sequential jobs.

Resources

Home directory (40GB quota per user)
Local working directory $LOCALSCRATCH (/scratch)
Global working directory $GLOBALSCRATCH (/globalscratch)
No internet access from nodes
long queue* (Max 21 days, 48 cpus/user)
gpu queue* (Max 5 days – 24cpus/user 1/gpu/user )
debug queue* (Max 30 minutes, 48 cpus/user)
Generic resource*: gpu

Access/Support:

SSH to dragon2.umons.ac.be (port 22) with the appropriate login and id_rsa.ceci file.

SUPPORT: CECI Support form

Server SSH key fingerprint: (What's this?)

MD5:0e:a7:21:df:a5:a0:27:6c:47:ba:61:57:76:d0:82:ad
SHA256:LEX1JwKes2Sg1P+95Ymf+uwwrVyZaEjUMts5xejtW9A

LEMAITRE3

This cluster has been decommissioned in 2024.

Lemaitre3 comes to replace Lemaitre2. It is hosted at Université catholique de Louvain (CISM). It features 78 compute nodes with two 12-cores Intel SkyLake 5118 processors at 2.3 GHz and 95 GB of RAM (3970MB/core), interconnected with an OmniPath network (OPA-56Gbps), and having exclusive access to a fast 440 TB BeeGFS parallel filesystem.

Suitable for:

Massively parallel jobs (MPI, several dozens of cores) with many communications and/or a lot of parallel disk I/O, 2 days max.

Resources

Home directory (100G quota per user)
Working directory /scratch ($GLOBALSCRATCH)
Nodes have access to internet
Max 100 running jobs per user
Default queue* (max 2 days walltime per job, SkyLake processors) and debug queue (max 6 hours, Haswell processors)

Access/Support:

SSH to lemaitre3.cism.ucl.ac.be (port 22) with the appropriate login and id_rsa.ceci file.

SUPPORT: egs-cism@listes.uclouvain.be

Server SSH key fingerprints: (What's this?)

ECDSA: SHA256:1Z6M2WISLylvdH9gD8vHqJ9Z7bCDdJ03avlEXO9BKsc
ED25519: SHA256:63mf1cm89YoPvZnpVnUXn4JjNiIpafSCfuXG+Z/LzrI
RSA: SHA256:eWHb7N10/Wn+sdG2ED8NqudyZ2kcWTiR33BCq2PKD7Y

NIC4

New CÉCI accounts are no more created on NIC4, and existing accounts are no more automatically renewed. Existing users are strongly encouraged to backup their important data (/home and /scratch), delete unneeded ones, and migrate to NIC5.

Hosted at the University of Liège (SEGI facility), it features 128 compute nodes with two 8-cores Intel E5-2650 processors at 2.0 GHz and 64 GB of RAM (4 GB/core), interconnected with a QDR Infiniband network, and having exclusive access to a fast 144 TB FHGFS parallel filesystem.

Suitable for:

Massively parallel jobs (MPI, several dozens of cores) with many communications and/or a lot of parallel disk I/O, 3 days max.

Resources

Home directory (20 GB quota per user)
Working directory /scratch ($GLOBALSCRATCH)
Nodes have access to internet
Default queue* (3 days, 448 cores max per user, 64 jobs max per user, among which max 32 running, 256 CPUs max per job)

Access/Support:

SSH to login-nic4.segi.ulg.ac.be (port 22) from your CECI gateway with the appropriate login and id_rsa.ceci file.

FAQ: https://www.campus.uliege.be/nic4

SUPPORT: CECI support form

Server SSH key fingerprint: (What's this?)

MD5:94:6c:d6:cc:f8:ca:b2:d0:79:38:3c:e9:d3:e3:a7:6f
SHA256:5mQYQTjeW1XVYDFhIfMaGyFEJiTen56r2Kyz5ocj72I

VEGA

This cluster has been decommissioned in October 2020.

Hosted at the University of Brussels, it features 14 fat compute nodes with 64 cores (four 16-cores AMD Bulldozer 6272 processors at 2.1 GHz) and 256 GB of RAM, interconnected with a QDR Infiniband network, and 70 TB of high performance GPFS storage.

Suitable for:

Many-cores (SMP and MPI) and many single core jobs, 14 days max.

Resources

Home/Working directory /home ($GLOBALSCRATCH=$HOME, 200GB quota)
Nodes have access to internet
Def queue* (Max 14 days, 400 cpus/user, 350 running jobs/user, 1000 jobs in queue per user)

HERCULES

This cluster has been decommissioned in August 2019.

Hosted at the University of Namur, this system currently consists of 512 cores spread across 32 Intel Sandy Bridge compute nodes, each with two 8-core E5-2660 processors at 2.2 GHz and 64 or 128 GB of RAM (8 nodes). All the nodes are interconnected by a Gigabit Ethernet network and have access to three NFS file systems for a total capacity of 100 TB.

Suitable for:

Long (max. 63 days) shared-memory parallel jobs (OpenMP or Pthreads), or resource-intensive sequential jobs.

Resources

Home directory (200 GB quota per user)
Working directory /workdir (400 GB per user) ($WORKDIR)
Local working directory /scratch ($TMPDIR) dynamically defined in jobs
No internet access from nodes
cpu queue* (Max 63 days, 48 cpus/user)

DRAGON1

Hosted at the University of Mons, this cluster is made of 28 computing nodes, 26 computing nodes with two Intel Sandy Bridge (2 x 8-cores E5-2670 processors at 2.6 GHz) and 2 computing nodes with Intel Sandy Bridge (2 x 8-cores E5-2650 processors at 2.00GHz), 128 GB of RAM and 1.1 TB of local scratch disk space. The compute nodes are interconnected with a Gigabit Ethernet network (10 Gigabit for the 36 TB NFS file server). Two of those compute nodes have 2 x Tesla M2075 GPU (512Gflops float64) each one and two of those compute nodes have 2 x Tesla Kepler K20m (1.1 Tflops float64) each one.

Suitable for:

Long (max. 41 days) shared-memory parallel jobs (OpenMP or Pthreads), or resource-intensive (cpu speed and memory) sequential jobs.

Resources

Home directory (20GB quota per user)
Local working directory /scratch ($LOCALSCRATCH)
No internet access from nodes
Long queue*: long (Max 41 days, 40 cpus/user, 500 jobs/user)
Def queue*: batch (Max 5 days, 40 cpus/user, 500 jobs/user)
Generic resource*: gpu (Max 15 days, gres=gpu:kepler:1 or gres=gpu:tesla:1)
Generic resource*: lgpu (Max 21 days gres=gpu:1)

Access/Support:

SSH to dragon1.umons.ac.be (port 22) with the appropriate login and id_rsa.ceci file.

FAQ: http://dragon1.umons.ac.be/

SUPPORT: CECI Support form

Server SSH key fingerprint: (What's this?) MD5: 2e:98:38:cf:99:68:89:2c:1f:6a:0e:19:fb:3b:02:d1 SHA256: dbPE5/40W2M7mF7B+pc4pSo00/bqYwuv4QycU5yv+IQ

LEMAITRE2

This cluster has been decommissioned in July 2018.

Hosted at Université catholique de Louvain, it comprises 112 compute nodes with two 6-cores Intel E5649 processors at 2.53 GHz and 48 GB of RAM (4 GB/core). The cluster has exclusive access to a fast 120 TB Lustre parallel filesystem. All compute nodes and management (NFS, Lustre, Frontend, etc.) are interconnected with a fast QDR Infiniband network.

Suitable for:

Massively parallel jobs (MPI, several dozens of cores) with many communications and/or a lot of parallel disk I/O, 3 days max.

Resources

Home directory (50GB quota per user)
Working directory /scratch ($GLOBALSCRATCH)
Nodes have access to internet
Default queue* (3 days, max 50 running jobs/user)
PostP queue* with GPUs (6 hours)
Generic resource*: gpu

HMEM

This cluster has been decommissioned in July 2020.

Hosted at the Université catholique de Louvain, it mainly comprises 12 fatnodes with 48 cores (four 12-cores AMD Opteron 6174 processors at 2.2 GHz). 2 nodes have 512 GB of RAM, 7 nodes have 256 GB and 3 nodes have 128 GB. All the nodes are interconnected with a fast Infiniband QDR network and have a 1.7 TB fast RAID setup for scratch disk space. All the local disks are furthermore gathered in a a global 12TB BeeGFS filesystem.

Suitable for:

Large shared-memory jobs (100+GB of RAM and 24+ cores), 15 days max.

Resources

Home directory (50GB quota per user)
Working directory /globalfs ($GLOBALSCRATCH)
Local working directory /scratch ($LOCALSCRATCH)
Nodes have access to internet
Low, Middle, High queues* (15 days max, 40 running jobs per user max)
Fast queue* (24 hours, no access to $GLOBALSCRATCH)

LUCIA

Hosted at, and operated by, Cenaero, it features a total of 38.400 cores (AMD Milan) with up to 512 GB of RAM, 200 nVIDIA Tesla A100 GPUs, interconnected with a HDR Infiniband network, and having access to a fast 2.5PB GPFS (Spectrum Scale) parallel filesystem.

Suitable for:

Massively parallel jobs (MPI, several hundreds cores) with many communications and/or a lot of parallel disk I/O, 2 days max.

Resources

Home directory (200 GB quota per user)
Working directory /gpfs/scratch
Project directory /gpfs/projects
Batch queue + GPU queue (whole node allocation)

Access/Support:

SSH to frontal.lucia.cenaero.be (port 22) with the appropriate login and id_rsa.ceci file, from a CÉCI SSH gateway.

ABOUT: tier1.cenaero.be

DOC: https://doc.lucia.cenaero.be/overview/

GETTING ACCES: FAQ

CREATE A TIER-1 PROJECT : How to create a Tier-1 project

SUPPORT: https://support.lucia.cenaero.be

Server SSH key fingerprint: (What's this?)
ED25519: SHA256:iO2HH1V1uHUGMEEj2yvSx2TfVUNhUwqdtqdIi31jxEA ECDSA: SHA256:a5Zv6m0RJsJR4CLDmva2RrUWQea+aUC3/RWyeLYJPdg

ZENOBE

Hosted at, and operated by, Cenaero, it features a total of 13.536 cores (Haswell and Ivybridge) with up to 64 GB of RAM, interconnected with a QDR/FDR mixed Infiniband network, and having access to a fast 350 TB GPFS parallel filesystem.

Suitable for:

Massively parallel jobs (MPI, several hundreds cores) with many communications and/or a lot of parallel disk I/O, 1 day max.

Resources

Home directory (50 GB quota per user)
Working directory /SCRATCH
Project directory /projects
Large queue (1 day max walltime, 96 CPUs minimum and 4320 CPUs maximum per jobs, whole node allocation)
Default queue (no time limit but jobs must be restartable)

Access/Support:

SSH to zenobe.hpc.cenaero.be (port 22) with the appropriate login and id_rsa.ceci file.

QUICKSTART: www.ceci-hpc.be/zenobe.html
DOC: tier1.cenaero.be/en/faq-page
ABOUT tier1.cenaero.be

SUPPORT: it@cenaero.be

Server SSH key fingerprint: (What's this?)
MD5: 47:b1:ab:3a:f7:76:48:05:44:d9:15:f7:2b:42:b7:30
SHA256: 8shVbcnKHt861M4Duwcxpgug6l8mjj+KZu/lmYyYgpY