View on GitHub

Carme-Docu

Documentation Project for Carme

Logging

So far we provide different types of logging within CARME - for the proxy, the webfrontend and slurm (different levels). Depending on what logging we talk about we have to prepare some folders and additional files on the different nodes.

Proxy and Webfrontend

The login node is here the node that runs the proxy and the webfrontend. Note that if you run proxy and webfrontend on different nodes you have to do the steps on the respective nodes.

Both services create log files and therefore we have to take care of rotating those logs. To do so we have to create the following file /etc/logrotate.d/carme

/var/log/Carme/Carme-Apache-Logs/*.log {
  daily
  rotate 7
  missingok
  compress
  compresscmd /usr/bin/xz
  uncompresscmd /usr/bin/unxz
  compressext .xz
  compressoptions -9
  delaycompress
  notifempty
}

/var/log/Carme/Carme-Traefik-Logs/*.log {
  daily
  rotate 7
  missingok
  maxsize 100M
  compress
  compresscmd /usr/bin/xz
  uncompresscmd /usr/bin/unxz
  compressext .xz
  compressoptions -9
  delaycompress
  notifempty
}

This tells the logrotate process to rotate the generated files under some special conditions e.g. rotating latest every 7 days and in the case of the proxy every time the file gets larger than 100MB.

SLURM

In the case of SLURM we have different levels of logging - slurmctld, slurmd and job output.

The job output - both regular output and errors that appear inside a job - are written to the users home folder in ${HOME}/.local/share/carme/job-log-dir.

Regarding the logging for slurmctld and slurmd we are referring to respective output of the slurmctld-prolog and -epilog scripts and the same for the slurmd-prolog and -epilog. This is helpful for debugging as we demand certain steps to be done in those scripts and if one of these steps fails the job will abort.

slurmctld

(Note that if you have a backup controller carefully duplicate these steps)

The output of the slurmctld-prolog and -eplilog is written to /var/log/carme/slurmctld/prolog and /var/log/carme/slurmctld/epilog respectively, so make sure that these folders exist!
Inside these folders every job creates a file ${SLURM_JOB_ID}.log in order to clean (or archive) those logs we recommend using a simple cronjob as this fits better in that case.

An example of such a cronjob would be

00 11 * * 6 root find /var/log/carme/slurmctld/prolog -maxdepth 1 -type f -mtime +7 -delete
00 11 * * 6 root find /var/log/carme/slurmctld/prolog -maxdepth 1 -type f -mtime +7 -delete

This deletes all files older than 7 days every Saturday at 11.00 a.m., note that you can modify this according to your needs.

slurmd

As we do the logging of the slurmctld-prolog and -eplilog scripts we do for the slurmd-prolog and -eplilog. Those files are written to /var/log/carme/slurmd/prolog and /var/log/carme/slurmd/epilog (make sure those folders exist on each compute node).

Like before we recommend a cronjob. So an example would be

30 11 * * 6 root find /var/log/carme/slurmd/prolog -maxdepth 1 -type f -mtime +7 -delete
30 11 * * 6 root find /var/log/carme/slurmd/prolog -maxdepth 1 -type f -mtime +7 -delete