Prometheus

Prometheus
-open source
-metric based monitoring system
-it does one thing and does it well
-simple text format makes it easy to expose metrics to Prometheus.
-the data model identifies each time series an unordered set of key-value pairs called labels
-scraped data is stored in local time-series database.
-lack of autonotification
- Caution! -> If you need 100 accuracy, such as for perrequest billing, Prometheus is not a good choice as the collected data will likely not be detailed and complete enough.
- pull based system

1. PromQL expression language allows easy metrics selection and aggregation

- create graphs
- set alert rukes
- expose data

2. Architecture

3. How to gather data?

- pull based system
-regular plain text -> HTTP exposed
-metrics exposition format

record labels value

-embed into software
*official client libraries:
** Go
** Java or Scala
** Python
** Ruby

*unofficial third-party client libraries:
** Bash
** C++
** Common Lisp
** elixir
** Erlang
** Haskell
** Lua for Nginx
** Lua fo Tranatool
** .NET /C#
** Node.js
** PHP
** Rust
-or use metrics exporters

## Core components starting at 9090

* 9090 - Prometheus server
* 9091 - Pushgateway
* 9093 - Altermanager
* 9094 - Altermanager clustering

## Exporters starting at 9100

* 9100 - Node exporter
* 9101 - HAProxy exproter
* 9102 - StatsD exproter
* 9103 - Collectd exproter
* 9108 - Graphite exproter
* 9110 - Blackbox exproter

-sample exporter in python with 2 metrics (runtime of build and timestamp)

4. Visualization

a) Promdash

b) Grafana
-full suport from PromQL
-Prometheus Integration:
*datasource support
*Prometheus dashboard
*PromQL autocomplete
*Alerts

5. Prometheus Alerts

Alertmanager
-Alertmanager handles alerts sent by client application such as the Prometheus, Grafana, etc.

-Functions:
*deduplication
*grouping
*routing
*sending
*silencing
*inhibition - uzależnienie jednego alerta od drugiego

-Alertmanager supports a mesh configuration to create a cluster for High Availability. Warning: High Availability is under active development

6.) Installation

Method

Recomended
-source
-pre-compiled binary
-docker container

Don't do this
-apt-get install prometheus
-yum install prometheus
-any installation from package

Binary

cd /tmp
wget https://github.com/prometheus/releases/download/v2.2.0/prometheus-2.2.0.linux-amd64.tar.gz
tar -xzf prometheus-2.2.0.linux-amd64.tar.gz

sudo chmod +x prometheus-2.2.0.linux-amd64/{prometheus, promtool}
sudo cp prometheus-2.2.0.linux-amd64/{prometheus, promtool}/usr/local/bin
sudo chown root:root /usr/local/bin/{prometheus, promtool}

sudo mkdir -p /etc/prometheus
sudo vim /etc/prometheus/prometheus.yml
promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 0 rule files found

prometheus --config.file "etc/prometheus/prometheus.yml" &

Repeat for every component (prometheus, alertmanager, node_exporter, blackbox_exporter, *_exporter) on multiple nodes every month or so

Problems
-too many operations
-won't survive reboot
-no dedicated user
-try changing config
-troublesome upgrade
-SELinux anyone?

Manage (aka why Ansible?) - Cloud Alchemy
https://github.com/cloudalchemy/ansible-prometheus

Goals for ansible Roles
-zero-configuration depoyment
-easy managment of multiple nodes
-error checking
-multiple CPU architecture support

Where is my config?
-command line parameters
-main configuration file (in YAML)
-files included from main file (ex. alert rules or file_sd config) - File service discovery

Mian config

a) Prometheus

global:
evaluation_interval: 15s
scrape_interval: 15s
scrape_timeout: 10s

external_labels:
environment: localhost.localdomain

scrape_configs:
- job_name: "prometheus"
metrics_path: "/metrics"
static_configs:
- targets:
- "localhost:9090"
- job_name: node
file sd_configs:
- files:
- "/etc/prometheus/file_sd/node.yml"

b)Ansible

Mian config (extended)

a) Prometheus

global:
evaluation_interval: 15s
scrape_interval: 15s
scrape_timeout: 10s

external_labels:
environment: localhost.localdomain

rule files:
- /etc/prometheus/rules/*.rules

alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- localhost:9093

scrape_configs:
- job_name: "prometheus"
metrics_path: "/metrics"
static_configs:
- targets:
- "localhost:9090"
- job_name: node
file sd_configs:
- files:
- "/etc/prometheus/file_sd/node.yml"

b)Ansible

prometheus_alertmanager_config:
- scheme: http
static_configs:
- targets:
- "localhost:9093"

prometheus_scrape_configs:
- job_name: "node"
file_sd_configs:
- files:
- "/etc/prometheus/file_sd/node.yml"

prometheus_targets:
node:
- targets:
- "locaslhost:9100"

Command line parameters

#Ansible managed file. Be wary of possible overwrites.
[Unit]
Description=Prometheus
After=network.target

[Service]
Type=cimple
Environment="GOMAXPROCS-1"
User=prometheus
Group=prometheus
ExecReload=/bin/kill .HUP $MAINPID
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--storage.tsdb.retention=30d \
--web.listen.address=0.0.0.0:9090 \
--web.external-url=http://demo.cloudalchemy.org:9090

SyslogIdentifier=prometheus
Restart=always

[Install]
WantedBy=multi-user.target

Awarie przy zmianie konfiguracji:
-preflight checks included in role use 'promtool' in ansible 'validate' directive //validacja przed zastąpieniem

Gathering system metrics from many nodes with multiple CPU architectures?

node_exporter
-one binary
-simple configuration with cli flags
-ansible bonuses:
*versioning
*system user managment
*CPU architecture auto-detection
*systemd service files
*linux capabilites support // creating role which have some of capabilites of root user but it is not root user
*basic SELinux support

Example
-demo.cloudalchemy.org
-daily ansible deploy with travis CI

Resources:
-presentation.cloudalchemy.org
-github.com/cloudalchemy
-prometheus.io/docs
-www.safaribookonline.com/library/view/prometheus-up/9781492034131
-prometheus.io/webtools/alerting/routing-tree-editor
-prometheusbook.com

Lepsza wydajność w Prometheus 2.x

Influx dobrze współgra- > remote write and read

Push gateway - dla systemów krótko dostępnych

service discovery

https://www.robustperception.io/tag/prometheus

https://www.youtube.com/watch?v=cNjKWOk4YPU
https://prometheus.io/docs/prometheus/latest/querying/basics/

Szukaj na tym blogu

DevSecOps Notes

Prometheus

Komentarze

Prześlij komentarz

Popularne posty z tego bloga

Kubernetes

Helm

Ansible Tower / AWX