Prometheus
Prometheus
-open source
-metric based monitoring system
-it does one thing and does it well
-simple text format makes it easy to expose metrics to Prometheus.
-the data model identifies each time series an unordered set of key-value pairs called labels
-scraped data is stored in local time-series database.
-lack of autonotification
- Caution! -> If you need 100 accuracy, such as for perrequest billing, Prometheus is not a good choice as the collected data will likely not be detailed and complete enough.
- pull based system
1. PromQL expression language allows easy metrics selection and aggregation
- create graphs
- set alert rukes
- expose data
2. Architecture
3. How to gather data?
- pull based system
-regular plain text -> HTTP exposed
-metrics exposition format
record labels value
-embed into software
*official client libraries:
** Go
** Java or Scala
** Python
** Ruby
*unofficial third-party client libraries:
** Bash
** C++
** Common Lisp
** elixir
** Erlang
** Haskell
** Lua for Nginx
** Lua fo Tranatool
** .NET /C#
** Node.js
** PHP
** Rust
-or use metrics exporters
## Core components starting at 9090
* 9090 - Prometheus server
* 9091 - Pushgateway
* 9093 - Altermanager
* 9094 - Altermanager clustering
## Exporters starting at 9100
* 9100 - Node exporter
* 9101 - HAProxy exproter
* 9102 - StatsD exproter
* 9103 - Collectd exproter
* 9108 - Graphite exproter
* 9110 - Blackbox exproter
-sample exporter in python with 2 metrics (runtime of build and timestamp)
4. Visualization
a) Promdash
b) Grafana
-full suport from PromQL
-Prometheus Integration:
*datasource support
*Prometheus dashboard
*PromQL autocomplete
*Alerts
5. Prometheus Alerts
Alertmanager
-Alertmanager handles alerts sent by client application such as the Prometheus, Grafana, etc.
-Functions:
*deduplication
*grouping
*routing
*sending
*silencing
*inhibition - uzależnienie jednego alerta od drugiego
-Alertmanager supports a mesh configuration to create a cluster for High Availability. Warning: High Availability is under active development
6.) Installation
Method
Recomended
-source
-pre-compiled binary
-docker container
Don't do this
-apt-get install prometheus
-yum install prometheus
-any installation from package
Binary
cd /tmp
wget https://github.com/prometheus/releases/download/v2.2.0/prometheus-2.2.0.linux-amd64.tar.gz
tar -xzf prometheus-2.2.0.linux-amd64.tar.gz
sudo chmod +x prometheus-2.2.0.linux-amd64/{prometheus, promtool}
sudo cp prometheus-2.2.0.linux-amd64/{prometheus, promtool}/usr/local/bin
sudo chown root:root /usr/local/bin/{prometheus, promtool}
sudo mkdir -p /etc/prometheus
sudo vim /etc/prometheus/prometheus.yml
promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 0 rule files found
prometheus --config.file "etc/prometheus/prometheus.yml" &
Repeat for every component (prometheus, alertmanager, node_exporter, blackbox_exporter, *_exporter) on multiple nodes every month or so
Problems
-too many operations
-won't survive reboot
-no dedicated user
-try changing config
-troublesome upgrade
-SELinux anyone?
Manage (aka why Ansible?) - Cloud Alchemy
https://github.com/cloudalchemy/ansible-prometheus
Goals for ansible Roles
-zero-configuration depoyment
-easy managment of multiple nodes
-error checking
-multiple CPU architecture support
Where is my config?
-command line parameters
-main configuration file (in YAML)
-files included from main file (ex. alert rules or file_sd config) - File service discovery
Mian config
a) Prometheus
global:
evaluation_interval: 15s
scrape_interval: 15s
scrape_timeout: 10s
external_labels:
environment: localhost.localdomain
scrape_configs:
- job_name: "prometheus"
metrics_path: "/metrics"
static_configs:
- targets:
- "localhost:9090"
- job_name: node
file sd_configs:
- files:
- "/etc/prometheus/file_sd/node.yml"
b)Ansible
Mian config (extended)
a) Prometheus
global:
evaluation_interval: 15s
scrape_interval: 15s
scrape_timeout: 10s
external_labels:
environment: localhost.localdomain
rule files:
- /etc/prometheus/rules/*.rules
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: "prometheus"
metrics_path: "/metrics"
static_configs:
- targets:
- "localhost:9090"
- job_name: node
file sd_configs:
- files:
- "/etc/prometheus/file_sd/node.yml"
b)Ansible
prometheus_alertmanager_config:
- scheme: http
static_configs:
- targets:
- "localhost:9093"
prometheus_scrape_configs:
- job_name: "node"
file_sd_configs:
- files:
- "/etc/prometheus/file_sd/node.yml"
prometheus_targets:
node:
- targets:
- "locaslhost:9100"
Command line parameters
#Ansible managed file. Be wary of possible overwrites.
[Unit]
Description=Prometheus
After=network.target
[Service]
Type=cimple
Environment="GOMAXPROCS-1"
User=prometheus
Group=prometheus
ExecReload=/bin/kill .HUP $MAINPID
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--storage.tsdb.retention=30d \
--web.listen.address=0.0.0.0:9090 \
--web.external-url=http://demo.cloudalchemy.org:9090
SyslogIdentifier=prometheus
Restart=always
[Install]
WantedBy=multi-user.target
Awarie przy zmianie konfiguracji:
-preflight checks included in role use 'promtool' in ansible 'validate' directive //validacja przed zastąpieniem
Gathering system metrics from many nodes with multiple CPU architectures?
node_exporter
-one binary
-simple configuration with cli flags
-ansible bonuses:
*versioning
*system user managment
*CPU architecture auto-detection
*systemd service files
*linux capabilites support // creating role which have some of capabilites of root user but it is not root user
*basic SELinux support
Example
-demo.cloudalchemy.org
-daily ansible deploy with travis CI
Resources:
-presentation.cloudalchemy.org
-github.com/cloudalchemy
-prometheus.io/docs
-www.safaribookonline.com/library/view/prometheus-up/9781492034131
-prometheus.io/webtools/alerting/routing-tree-editor
-prometheusbook.com
Lepsza wydajność w Prometheus 2.x
Influx dobrze współgra- > remote write and read
Push gateway - dla systemów krótko dostępnych
service discovery
https://www.robustperception.io/tag/prometheus
https://www.youtube.com/watch?v=cNjKWOk4YPU
https://prometheus.io/docs/prometheus/latest/querying/basics/
-open source
-metric based monitoring system
-it does one thing and does it well
-simple text format makes it easy to expose metrics to Prometheus.
-the data model identifies each time series an unordered set of key-value pairs called labels
-scraped data is stored in local time-series database.
-lack of autonotification
- Caution! -> If you need 100 accuracy, such as for perrequest billing, Prometheus is not a good choice as the collected data will likely not be detailed and complete enough.
- pull based system
1. PromQL expression language allows easy metrics selection and aggregation
- create graphs
- set alert rukes
- expose data
2. Architecture
3. How to gather data?
- pull based system
-regular plain text -> HTTP exposed
-metrics exposition format
record labels value
-embed into software
*official client libraries:
** Go
** Java or Scala
** Python
** Ruby
*unofficial third-party client libraries:
** Bash
** C++
** Common Lisp
** elixir
** Erlang
** Haskell
** Lua for Nginx
** Lua fo Tranatool
** .NET /C#
** Node.js
** PHP
** Rust
-or use metrics exporters
## Core components starting at 9090
* 9090 - Prometheus server
* 9091 - Pushgateway
* 9093 - Altermanager
* 9094 - Altermanager clustering
## Exporters starting at 9100
* 9100 - Node exporter
* 9101 - HAProxy exproter
* 9102 - StatsD exproter
* 9103 - Collectd exproter
* 9108 - Graphite exproter
* 9110 - Blackbox exproter
-sample exporter in python with 2 metrics (runtime of build and timestamp)
4. Visualization
a) Promdash
b) Grafana
-full suport from PromQL
-Prometheus Integration:
*datasource support
*Prometheus dashboard
*PromQL autocomplete
*Alerts
5. Prometheus Alerts
Alertmanager
-Alertmanager handles alerts sent by client application such as the Prometheus, Grafana, etc.
-Functions:
*deduplication
*grouping
*routing
*sending
*silencing
*inhibition - uzależnienie jednego alerta od drugiego
-Alertmanager supports a mesh configuration to create a cluster for High Availability. Warning: High Availability is under active development
6.) Installation
Method
Recomended
-source
-pre-compiled binary
-docker container
Don't do this
-apt-get install prometheus
-yum install prometheus
-any installation from package
Binary
cd /tmp
wget https://github.com/prometheus/releases/download/v2.2.0/prometheus-2.2.0.linux-amd64.tar.gz
tar -xzf prometheus-2.2.0.linux-amd64.tar.gz
sudo chmod +x prometheus-2.2.0.linux-amd64/{prometheus, promtool}
sudo cp prometheus-2.2.0.linux-amd64/{prometheus, promtool}/usr/local/bin
sudo chown root:root /usr/local/bin/{prometheus, promtool}
sudo mkdir -p /etc/prometheus
sudo vim /etc/prometheus/prometheus.yml
promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 0 rule files found
prometheus --config.file "etc/prometheus/prometheus.yml" &
Repeat for every component (prometheus, alertmanager, node_exporter, blackbox_exporter, *_exporter) on multiple nodes every month or so
Problems
-too many operations
-won't survive reboot
-no dedicated user
-try changing config
-troublesome upgrade
-SELinux anyone?
Manage (aka why Ansible?) - Cloud Alchemy
https://github.com/cloudalchemy/ansible-prometheus
Goals for ansible Roles
-zero-configuration depoyment
-easy managment of multiple nodes
-error checking
-multiple CPU architecture support
Where is my config?
-command line parameters
-main configuration file (in YAML)
-files included from main file (ex. alert rules or file_sd config) - File service discovery
Mian config
a) Prometheus
global:
evaluation_interval: 15s
scrape_interval: 15s
scrape_timeout: 10s
external_labels:
environment: localhost.localdomain
scrape_configs:
- job_name: "prometheus"
metrics_path: "/metrics"
static_configs:
- targets:
- "localhost:9090"
- job_name: node
file sd_configs:
- files:
- "/etc/prometheus/file_sd/node.yml"
b)Ansible
Mian config (extended)
a) Prometheus
global:
evaluation_interval: 15s
scrape_interval: 15s
scrape_timeout: 10s
external_labels:
environment: localhost.localdomain
rule files:
- /etc/prometheus/rules/*.rules
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: "prometheus"
metrics_path: "/metrics"
static_configs:
- targets:
- "localhost:9090"
- job_name: node
file sd_configs:
- files:
- "/etc/prometheus/file_sd/node.yml"
b)Ansible
prometheus_alertmanager_config:
- scheme: http
static_configs:
- targets:
- "localhost:9093"
prometheus_scrape_configs:
- job_name: "node"
file_sd_configs:
- files:
- "/etc/prometheus/file_sd/node.yml"
prometheus_targets:
node:
- targets:
- "locaslhost:9100"
Command line parameters
#Ansible managed file. Be wary of possible overwrites.
[Unit]
Description=Prometheus
After=network.target
[Service]
Type=cimple
Environment="GOMAXPROCS-1"
User=prometheus
Group=prometheus
ExecReload=/bin/kill .HUP $MAINPID
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--storage.tsdb.retention=30d \
--web.listen.address=0.0.0.0:9090 \
--web.external-url=http://demo.cloudalchemy.org:9090
SyslogIdentifier=prometheus
Restart=always
[Install]
WantedBy=multi-user.target
Awarie przy zmianie konfiguracji:
-preflight checks included in role use 'promtool' in ansible 'validate' directive //validacja przed zastąpieniem
Gathering system metrics from many nodes with multiple CPU architectures?
node_exporter
-one binary
-simple configuration with cli flags
-ansible bonuses:
*versioning
*system user managment
*CPU architecture auto-detection
*systemd service files
*linux capabilites support // creating role which have some of capabilites of root user but it is not root user
*basic SELinux support
Example
-demo.cloudalchemy.org
-daily ansible deploy with travis CI
Resources:
-presentation.cloudalchemy.org
-github.com/cloudalchemy
-prometheus.io/docs
-www.safaribookonline.com/library/view/prometheus-up/9781492034131
-prometheus.io/webtools/alerting/routing-tree-editor
-prometheusbook.com
Lepsza wydajność w Prometheus 2.x
Influx dobrze współgra- > remote write and read
service discovery
https://www.robustperception.io/tag/prometheus
https://www.youtube.com/watch?v=cNjKWOk4YPU
https://prometheus.io/docs/prometheus/latest/querying/basics/



Komentarze
Prześlij komentarz