#+title: Prometheus #+setupfile: ../org-templates/page.org ** Download and install Go to [[https://prometheus.io/download/]] and download the latest version. #+BEGIN_SRC shell export PROM_VER="2.54.0" wget "https://github.com/prometheus/prometheus/releases/download/v${PROM_VER}/prometheus-${PROM_VER}.linux-amd64.tar.gz" #+END_SRC Verify the checksum is correct. Unpack the tarball: #+BEGIN_SRC shell tar xvfz prometheus-*.tar.gz rm prometheus-*.tar.gz #+END_SRC Create two directories for Prometheus to use. ~/etc/prometheus~ for configuration files and ~/var/lib/prometheus~ for application data. #+BEGIN_SRC shell sudo mkdir /etc/prometheus /var/lib/prometheus #+END_SRC Move the ~prometheus~ and ~promtool~ binaries to ~/usr/local/bin~: #+BEGIN_SRC shell cd prometheus-* sudo mv prometheus promtool /usr/local/bin #+END_SRC Move the configuration file to the configuration directory: #+BEGIN_SRC shell sudo mv prometheus.yml /etc/prometheus/prometheus.yml #+END_SRC Move the remaining files to their appropriate directories: #+BEGIN_SRC shell sudo mv consoles/ console_libraries/ /etc/prometheus/ #+END_SRC Verify that Prometheus is installed: #+BEGIN_SRC shell prometheus --version #+END_SRC ** Configure prometheus.service Create a prometheus user and assign ownership to directories: #+BEGIN_SRC shell sudo useradd -rs /bin/false prometheus sudo chown -R prometheus: /etc/prometheus /var/lib/prometheus #+END_SRC Save the following contents to a file at ~/etc/systemd/system/prometheus.service~: #+BEGIN_SRC systemd [Unit] Description=Prometheus Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple Restart=on-failure RestartSec=5s ExecStart=/usr/local/bin/prometheus \ --config.file /etc/prometheus/prometheus.yml \ --storage.tsdb.path /var/lib/prometheus/ \ --web.console.templates=/etc/prometheus/consoles \ --web.console.libraries=/etc/prometheus/console_libraries \ --web.listen-address=0.0.0.0:9090 \ --web.enable-lifecycle \ --log.level=info [Install] WantedBy=multi-user.target #+END_SRC Reload the system daemons: #+BEGIN_SRC shell sudo systemctl daemon-reload #+END_SRC Start and enable ~prometheus.service~: #+BEGIN_SRC shell sudo systemctl enable --now prometheus.service #+END_SRC #+BEGIN_QUOTE For systems running SELinux, the following policy settings must be applied. #+END_QUOTE #+BEGIN_SRC selinux module prometheus 1.0; require { type init_t; type websm_port_t; type user_home_t; type unreserved_port_t; type hplip_port_t; class file { execute execute_no_trans map open read }; class tcp_socket name_connect; } #============= init_t ============== allow init_t hplip_port_t:tcp_socket name_connect; allow init_t unreserved_port_t:tcp_socket name_connect; allow init_t user_home_t:file { execute execute_no_trans map open read }; allow init_t websm_port_t:tcp_socket name_connect; #+END_SRC Now compile and import the module: #+BEGIN_SRC shell sudo checkmodule -M -m -o prometheus.mod prometheus.te sudo semodule_package -o prometheus.pp -m prometheus.mod sudo semodule -i prometheus.pp #+END_SRC Restart ~prometheus.service~. If it does not start, ensure all SELinux policies have been applied. #+BEGIN_SRC shell sudo grep "prometheus" /var/log/audit/audit.log | sudo audit2allow -M prometheus sudo semodule -i prometheus.pp #+END_SRC Restart ~prometheus.service~ again. The Prometheus web interface and dashboard should now be browsable at [[http://localhost:9090]] ** Install and configure Node Exporter on each client using Ansible Install the prometheus.prometheus role from Ansible Galaxy. #+BEGIN_SRC shell ansible-galaxy collection install prometheus.prometheus #+END_SRC Ensure you have an inventory file with clients to setup Prometheus on. #+BEGIN_SRC yaml --- prometheus-clients: hosts: host0: ansible_user: user0 ansible_host: host0 ip address or hostname ansible_python_interpreter: /usr/bin/python3 host1: ... host2: ... #+END_SRC Create ~prometheus-setup.yml~. #+BEGIN_SRC yaml --- - hosts: prometheus-clients tasks: - name: Import the node_exporter role import_role: name: prometheus.prometheus.node_exporter #+END_SRC The default values for the node_exporter role variables should be fine. Run ansible-playbook. #+BEGIN_SRC shell ansible-playbook -i inventory.yml node_exporter-setup.yml #+END_SRC Node Exporter should now be installed, started, and enabled on each host with the homelab label in the inventory. To confirm that statistics are being collected on each host, navigate to ~http://host_url:9100~. A page entitled Node Exporter should be displayed containing a link for Metrics. Click the link and confirm that statistics are being collected. Note that each node_exporter host must be accessible through the firewall on port 9100. Firewalld can be configured for the ~internal~ zone on each host. #+BEGIN_SRC shell sudo firewall-cmd --zone=internal --permanent --add-source= sudo firewall-cmd --zone=internal --permanent --add-port=9100/tcp #+END_SRC #+BEGIN_QUOTE Note: I have to configure the ~internal~ zone on Firewalld to allow traffic from my IP address on ports HTTP, HTTPS, SSH, and 1965 in order to access, for example, my web services on the node_exporter host. #+END_QUOTE ** Install Node Exporter on FreeBSD As of FreeBSD 14.1-RELEASE, the version of Node Exporter available, v1.6.1, is outdated. To install the latest version, ensure the ports tree is checked out before running the commands below. #+BEGIN_SRC shell sudo cp -v /usr/ports/sysutils/node_exporter/files/node_exporter.in /usr/local/etc/rc.d/node_exporter sudo chmod +x /usr/local/etc/rc.d/node_exporter sudo chown root:wheel /usr/local/etc/rc.d/node_exporter sudo pkg install gmake go #+END_SRC Download the latest release's source code from https://github.com/prometheus/node_exporter. Unpack the tarball. #+BEGIN_SRC shell tar xvf v1.8.2.tar.gz cd node_exporter-1.8.2 gmake build sudo mv node_exporter /usr/local/bin/ sudo chown root:wheel /usr/local/bin/node_exporter sudo sysrc node_exporter_enable="YES" sudo service node_exporter start #+END_SRC ** Configure Prometheus to monitor the client nodes Edit ~/etc/prometheus/prometheus.yml~. My Prometheus configuration looks like this: #+BEGIN_SRC yaml # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=` to any timeseries scraped from this config. - job_name: "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ["localhost:9090"] - job_name: "remote_collector" scrape_interval: 10s static_configs: - targets: ["hyperreal.coffee:9100", "box.moonshadow.dev:9100", "10.0.0.26:9100", "bttracker.nirn.quest:9100"] #+END_SRC The job ~remote_collector~ scrapes metrics from each of the hosts running the node_exporter. Ensure that port ~9100~ is open in the firewall, and if it is a public-facing node, ensure that port ~9100~ can only be accessed from my IP address. ** Configure Prometheus to monitor qBittorrent client nodes For each qBittorrent instance you want to monitor, setup a Docker or Podman container with [[https://github.com/caseyscarborough/qbittorrent-exporter]]. The containers will run on the machine running Prometheus so they are accessible at localhost. Let's say I have three qBittorrent instances I want to monitor. #+BEGIN_SRC shell podman run \ --name=qbittorrent-exporter-0 \ -e QBITTORRENT_USERNAME=username0 \ -e QBITTORRENT_PASSWORD=password0 \ -e QBITTORRENT_BASE_URL=http://localhost:8080 \ -p 17871:17871 \ --restart=always \ caseyscarborough/qbittorrent-exporter:latest podman run \ --name=qbittorrent-exporter-1 \ -e QBITTORRENT_USERNAME=username1 \ -e QBITTORRENT_PASSWORD=password1 \ -e QBITTORRENT_BASE_URL=https://qbittorrent1.tld \ -p 17872:17871 \ --restart=always \ caseyscarborough/qbittorrent-exporter:latest podman run \ --name=qbittorrent-exporter-2 \ -e QBITTORRENT_USERNAME=username2 \ -e QBITTORRENT_PASSWORD=password2 \ -e QBITTORRENT_BASE_URL=https://qbittorrent2.tld \ -p 17873:17871 \ --restart=always \ caseyscarborough/qbittorrent-exporter:latest #+END_SRC *** Using systemd quadlets #+BEGIN_SRC systemd [Unit] Description=qbittorrent-exporter After=network-online.target [Container] Image=docker.io/caseyscarborough/qbittorrent-exporter:latest ContainerName=qbittorrent-exporter Environment=QBITTORRENT_USERNAME=username Environment=QBITTORRENT_PASSWORD=password Environment=QBITTORRENT_BASE_URL=http://localhost:8080 PublishPort=17871:17871 [Install] WantedBy=multi-user.target default.target #+END_SRC Now add this to the ~scrape_configs~ section of ~/etc/prometheus/prometheus.yml~ to configure Prometheus to scrape these metrics. #+BEGIN_SRC yaml - job_name: "qbittorrent" static_configs: - targets: ["localhost:17871", "localhost:17872", "localhost:17873"] #+END_SRC ** Monitor Caddy with Prometheus and Loki *** Caddy: metrics activation Add the ~metrics~ global option and ensure the admin endpoint is enabled. #+BEGIN_SRC caddyfile { admin 0.0.0.0:2019 servers { metrics } } #+END_SRC Restart Caddy: #+BEGIN_SRC shell sudo systemctl restart caddy sudo systemctl status caddy #+END_SRC *** Caddy: logs activation I have my Caddy configuration modularized with ~/etc/caddy/Caddyfile~ being the central file. It looks something like this: #+BEGIN_SRC caddyfile { admin 0.0.0.0:2019 servers { metrics } } ## hyperreal.coffee import /etc/caddy/anonoverflow.caddy import /etc/caddy/breezewiki.caddy import /etc/caddy/cdn.caddy ... #+END_SRC Each file that is imported is a virtual host that has its own separate configuration and corresponds to a subdomain of hyperreal.coffee. I have logging disabled on most of them except the ones for which troubleshooting with logs would be convenient, such as the one for my Mastodon instance. For ~/etc/caddy/fedi.caddy~, I've added these lines to enable logging: #+BEGIN_SRC caddyfile fedi.hyperreal.coffee { log { output file /var/log/caddy/fedi.log { roll_size 100MiB roll_keep 5 roll_keep_for 100d } format json level INFO } } #+END_SRC Restart caddy. #+BEGIN_SRC shell sudo systemctl restart caddy sudo systemctl status caddy #+END_SRC Ensure port ~2019~ can only be accessed by my IP address, using Firewalld's internal zone: #+BEGIN_SRC shell sudo firewall-cmd --zone=internal --permanent --add-port=2019/tcp sudo firewall-cmd --reload sudo firewall-cmd --info-zone=internal #+END_SRC Add the Caddy configuration to the ~scrape_configs~ section of ~/etc/prometheus/prometheus.yml~: #+BEGIN_SRC yaml - job_name: "caddy" static_configs: - targets: ["hyperreal.coffee:2019"] #+END_SRC Restart Prometheus on the monitor host: #+BEGIN_SRC shell sudo systemctl restart prometheus.service #+END_SRC *** Loki and Promtail setup On the node running Caddy, install the loki and promtail packages: #+BEGIN_SRC shell sudo apt install -y loki promtail #+END_SRC Edit the Promtail configuration file at ~/etc/promtail/config.yml~: #+BEGIN_SRC yaml - job_name: caddy static_configs: - targets: - localhost labels: job: caddy __path__: /var/log/caddy/*.log agent: caddy-promtail pipeline_stages: - json: expressions: duration: duration status: status - labels: duration: status: #+END_SRC The entire Promtail configuration should look like this: #+BEGIN_SRC yaml # This minimal config scrape only single log file. # Primarily used in rpm/deb packaging where promtail service can be started during system init process. # And too much scraping during init process can overload the complete system. # https://github.com/grafana/loki/issues/11398 server: http_listen_port: 9080 grpc_listen_port: 0 positions: filename: /tmp/positions.yaml clients: - url: http://localhost:3100/loki/api/v1/push scrape_configs: - job_name: system static_configs: - targets: - localhost labels: job: varlogs #NOTE: Need to be modified to scrape any additional logs of the system. __path__: /var/log/messages - job_name: caddy static_configs: - targets: - localhost labels: job: caddy __path__: /var/log/caddy/*log agent: caddy-promtail pipeline_stages: - json: expressions: duration: duration status: status - labels: duration: status: #+END_SRC Restart Promtail and Loki services: #+BEGIN_SRC shell sudo systemctl restart promtail sudo systemctl restart loki #+END_SRC To ensure that the promtail user has permissions to read caddy logs: #+BEGIN_SRC shell sudo usermod -aG caddy promtail sudo chmod g+r /var/log/caddy/*.log #+END_SRC The [[http://localhost:9090/targets][Prometheus dashboard]] should now show the Caddy target with a state of "UP". ** Monitor TOR node Edit ~/etc/tor/torrc~ to add Metrics info. ~x.x.x.x~ is the IP address where Prometheus is running. #+BEGIN_SRC shell ## Prometheus exporter MetricsPort 0.0.0.0:9035 prometheus MetricsPortPolicy accept x.x.x.x #+END_SRC Configure FirewallD to allow inbound traffic to port ~9035~ on the internal zone. Ensure the internal zone's source is the IP address of the server where Prometheus is running. Ensure port ~443~ is accessible from the Internet on FirewallD's public zone. #+BEGIN_SRC shell sudo firewall-cmd --zone=internal --permanent --add-source=x.x.x.x sudo firewall-cmd --zone=internal --permanent --add-port=9035/tcp sudo firewall-cmd --zone=public --permanent --add-service=https sudo firewall-cmd --reload #+END_SRC Edit ~/etc/prometheus/prometheus.yml~ to add the TOR config. ~y.y.y.y~ is the IP address where TOR is running. #+BEGIN_SRC yaml scrape_configs: - job_name: "tor-relay" static_configs: - targets: ["y.y.y.y:9035"] #+END_SRC Restart Prometheus. #+BEGIN_SRC shell sudo systemctl restart prometheus.service #+END_SRC Go to Grafana and import [[https://files.hyperreal.coffee/grafana/tor_stats.json][tor_stats.json]] as a new dashboard, using the Prometheus datasource. ** Monitor Synapse homeserver On the server running Synapase, edit ~/etc/matrix-synapse/homeserver.yaml~ to enable metrics. #+BEGIN_SRC yaml enable_metrics: true #+END_SRC Add a new listener to ~/etc/matrix-synapse/homeserver.yaml~ for Prometheus metrics. #+BEGIN_SRC yaml listeners: - port: 9400 type: metrics bind_addresses: ['0.0.0.0'] #+END_SRC On the server running Prometheus, add a target for Synapse. #+BEGIN_SRC yaml - job_name: "synapse" scrape_interval: 1m metrics_path: "/_synapse/metrics" static_configs: - targets: ["hyperreal:9400"] #+END_SRC Also add the Synapse recording rules. #+BEGIN_SRC yaml rule_files: - /etc/prometheus/synapse-v2.rules #+END_SRC On the server running Prometheus, download the Synapse recording rules. #+BEGIN_SRC shell sudo wget https://files.hyperreal.coffee/prometheus/synapse-v2.rules -O /etc/prometheus/synapse-v2.rules #+END_SRC Restart Prometheus. Use [[https://files.hyperreal.coffee/grafana/synapse.json][synapse.json]] for Grafana dashboard.