View on GitHub

Carme-Docu

Documentation Project for Carme

Carme Multi-Node and Multi-GPU Jobs

Currently CARME supports different engines that can be used for multi-node repectively multi-GPU jobs. Within in the singularity containers users have - depneding on the rules provided by the system administrator - access to ssh and therefore to a large variety of frameworks for distributed multi-node and/or multi-GPU trainings.

Amoung the frameworks that can be used within CARME are

How to get the nodes of a running job

echo $CARME_NODES

see also Carme ENVs