View on GitHub

Carme-Docu

Documentation Project for Carme

Install Zabbix

Zabbix is our (preferred) monitoring tool as it fits great in our setup. In order to install zabbix on the cluster we strongly recommend following the zabbix documentation. It provides you with all the information you need to install and configure zabbix.

In (most) linux repositories you find nearly all the packages needed for the installation. In order to use zabbix in combination with GPUs you have to install an additional plugin. In contrast to the installation instructions given there, we suggest to place the script named get_gpus_info.sh in a folder that is accessible for all nodes in the cluster. Which results in a simple change

UserParameter=gpu.discovery,/etc/zabbix/scripts/get_gpus_info.sh -> UserParameter=gpu.discovery,/PATH-TO-THE-SCRIPT/get_gpus_info.sh

Note that the only reason for this has administrative reasons.

Another handy plugin is to monitor the disk performance.

Those are only two examples of useful plugins but there are many more on zabbix share and you always have the possibility to write your own plugins.