PostgreSQL — Extend Your Monitoring Infrastructure with Pgwatch2
Hi all, in this blog post ,with my colleague Hamit Uğur Çelebier, we’ll try to explain pgwatch2 monitoring tool for PostgreSQL databases. The main goal for this study to extend our current monitoring infrastrcture and provide a solution for other DBA’s to build an automated healtcheck reports for PostgreSQL databases.
In Trendyol database team, we’re using prometheus and grafana to monitor and analyze our PostgreSQL infrastructure. So, we’re familiar with prometheus exporters.
Prometheus exporters are not native healtcheck tools so we were suffering from deeper healtcheck analysis. To overcome this problem, we decided to implement a solution which is capable of deeper analysis and healtcheck. The solution had to be implemented on top of our current monitoring infrastructure. In other words, we needed to combine prometheus and pgwatch2 solutions together. This article explains how we implemented pgwatch2 on top of our monitoring system.
Shortly, you’ll see the technical stack besides the infrastructure for PostgreSQL monitoring.
Pgwatch2 Architecture,Installation, and Configuration
Pgwatch2 Architecture
Pgwatch2 supports different deployment models. But, we decided to use push model combined with our current prometheus infrastructure.
In this architecture, PostgreSQL database nodes pushes the metrics to our TimescaleDB infrastructure. In TimescaleDB, we deployed a schema for our metrics. TimescaleDB operations and mainteance practices will be written in configuration steps.
Finally, we visualize the TimescaleDB metrics on Grafana by using an additional data source. The architecture consists of three components
- Pgwatch2 daemon
- TimescaleDB
- Grafana
In this part of the article, I will talk about how to configure the components mentioned in the production environment.
Pgwatch2 Daemon Deployment and Configuration
Pgwatch2 daemon responsible for collecting and sending metrics to the TimescaleDB store. Thus, every PostgreSQL node has been running the daemon to send its’ metrics.
To install and configure pgwatch2 we have to configure instance metric and meta files. Pgwatch2 uses intances.yaml and preset-configs.yaml file to manage and push instance meta information to TimescaleDB. preset-configs.yaml is the file that we can configure which metrics will be collected and send to metric database. Instance.yaml stores information about connection, excluding/including databases, metric definitions(preset-configs.yaml) etc…
Firstly, we are going to configure
- /opt/pgwatch2/instances.yaml
- /opt/pgwatch2/preset-configs.yaml
- unique_name: initial_example_hostname
dbtype: postgres-continuous-discovery
host: localhost
port: 5432
dbname: postgres
user: pgwatch_example_user
password: pgwatch_example_pass
sslmode: disable
stmt_timeout: 5
is_superuser: false
preset_metrics: d-platform-default
dbname_exclude_pattern: (trtemplate|postgres)
is_enabled: true
group: default
custom_tags:
_environment: initial_example_environment
_team: inital_example_team
_cluster: initial_example_cluster
_responsible: initial_example_responsible
sslrootcert: ''
sslcert: ''
sslkey: ''
---
- name: basic
description: only the most important metrics - WAL, DB-level statistics (size, tx and backend counts)
metrics:
db_stats: 60
db_size: 300
wal: 60
- name: pgbouncer
description: per DB stats
metrics:
pgbouncer_stats: 60
- name: pgpool
description: pool global stats, 1 row per node ID
metrics:
pgpool_stats: 60
- name: standard
description: basic level + table, index, stat_statements stats
metrics:
cpu_load: 60
db_stats: 60
db_size: 300
index_stats: 900
sequence_health: 3600
sproc_stats: 180
stat_statements: 180
table_stats: 300
wal: 60
- name: d-platform-default
description: almost all available metrics for a even deeper performance understanding
metrics:
change_events: 300
db_stats: 60
db_size: 600
index_stats: 900
instance_up: 60
locks: 60
locks_mode: 60
replication: 120
sequence_health: 3600
stat_statements: 180
stat_statements_calls: 180
table_io_stats: 600
table_stats: 300
wal: 60
wal_size: 120
kpi: 120
wal_receiver: 120
Preset-config yaml file can be written according to your requirements. In other words, you can combine which metric to collect and create your own metric and interval profile. Basically, we decided to create d-platform-default profile to collect metrics that we need and changed the interval.
Note: I assumed you created a directory to store pgwatch2 files with the right permissions. If don’t, you can use opt directory as we used.
mkdir -p /opt/pgwatch2/
chown postgres:postgres /opt/pgwatch2/ -R
chmod 750 /opt/pgwatch2/ -R
Secondly, we have to install pgwatch2 daemon binary file on operating system. You can use debian package to install the daemon on DEB distros.
wget https://github.com/cybertec-postgresql/pgwatch2/releases/download/v1.8.0/pgwatch2_v1.8.0-SNAPSHOT-064fdaf_linux_64-bit.deb
sudo dpkg -i pgwatch2_v1.8.0-SNAPSHOT-064fdaf_linux_64-bit.deb
After installing the daemon, we’re going to create a systemd service file to manage daemon with systemd.
# This is an example of a systemD config file for pgwatch2.
# You can copy it to "/etc/systemd/system/pgwatch2.service", adjust as necessary and then call
# systemctl daemon-reload && systemctl start pgwatch2 && systemctl enable pgwatch2
# to start and also enable auto-start after reboot.
[Unit]
Description=Pgwatch2 Gathering Daemon
After=network-online.target
# If you're using the config DB approach and when on the same machine then it's a good idea to launch after Postgres
#After=postgresql@12-main.service
[Service]
Environment="PW2_PG_METRIC_STORE_CONN_STR=postgresql://pgwatch2:secret@metric_db_conn_ip:5432/pgwatch2_metric"
Environment="PW2_MIN_DB_SIZE_MB=1"
Environment="PW2_PG_RETENTION_DAYS=7"
User=root
Type=notify
ExecStart=/usr/bin/pgwatch2-daemon -c /opt/pgwatch2/instances.yaml -m /opt/pgwatch2/pgwatch2/metrics --datastore=postgres
Restart=on-failure
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
The daemon setup has been completed. We can continue with the databases that needs to be monitored. There are some expected steps that has to be applied for each database in the PostgreSQL cluster we’re working on.
We applied the following bash script to our primary nodes in the cluster. The script will prepare the pgwatch2 role and permissions on the databases.
#!/bin/bash
psql postgres -c "CREATE ROLE pgw2_user WITH LOGIN PASSWORD 'secret_pw'"
psql postgres -c "ALTER ROLE pgw2_user CONNECTION LIMIT 100"
psql postgres -c "GRANT pg_monitor TO pgw2_user"
all="SELECT datname FROM pg_database WHERE datistemplate = false and datname != 'postgres'"
psql --no-align -t -c "${all}" | while read -a bd ; do
echo "Processing ${bd}..."
psql "${bd}" -c "GRANT pg_monitor TO pgw2_user"
psql "${bd}" -c "GRANT CONNECT ON DATABASE ${bd} TO pgw2_user"
psql "${bd}" -c "GRANT USAGE ON SCHEMA public TO pgw2_user"
psql "${bd}" -c "GRANT EXECUTE ON FUNCTION pg_stat_file(text) to pgw2_user"
psql "${bd}" -c "CREATE EXTENSION pg_stat_statements"
done
Note: We assume that pg_stat_statements is in your shared_preload_libraries and ready to use. If it is not in your shared_preload_libraries you need to add it and restart the PostgreSQL service.
TimescaleDB Preperation
Before starting, we also assume that you have a running PostgreSQL TimescaleDB cluster that will store metrics sent by pgwatch2 daemon. So, we can study the schema deployment.
CREATE DATABASE pgwatch2_metric;
\c pgwatch2_metric
CREATE EXTENSION timescaledb;
After creating the database and extension, we can apply the pgwatch2 schema deployments. The schema can found on github repo of the project. So, we’re going to clone the repository and apply the psql blocks.
cd /opt/pgwatch2
git clone https://github.com/cybertec-postgresql/pgwatch2.git
chown postgres:postgres /opt/pgwatch2/ -R
chmod 750 /opt/pgwatch2/ -R
su - postgres
psql -f roll_out_timescale.psql pgwatch2_metric
Now we’re ready to start pgwatch2 daemon and see the results on the metric database which was created above. To start collecting metrics
- Make sure your pgwatch2 daemon service file contains the right metric database connection string.
- Ensure that you can access TimescaleDB(metric DB) from your PostgreSQL node
psql -h metric_db_connection_ip -p postgresql_port -U pgwatch2_user -d pgwatch2_metric
If you can connect the database with the command written above you can start your pgwatch2 daemon agent on PostgreSQL instance
systemctl start pgwatch2
After starting the daemon, we can see the tables are ready and metric can be queried
To summarize so far
- Create directories on PostgreSQL instance for pgwatch2 daemon
- Create instances and preset-configs yaml files on PostgreSQL instance for pgwatch2 daemon
- Download and install pgwatch2 daemon
- Prepare systemd file for pgwatch2 daemon
- Create and configure TimescaleDB for metrics
- Ensure that the database connection from PostgreSQL node to TimescaleDB cluster works.
- Start pgwatch2 daemon service on PostgreSQL node
Because pgwatch2 tool can be extended according to your database environment the steps or configurations can be changed on your environment. So, to understand deeper you can always check the official documentation
Grafana
The grafana part is the last part of this blog post. We created a data source on the grafana and added important panels that we want to analyze. You can see all dashboard templates on the github repository of pgwatch2
To add a datasource to grafana follow these steps.
It’is not a complex configuration if you can access the database for metric database.
After that, you can add your panels according to the official pgwatch2 repository
Conclusion
To sum up, In this article, we tried to explain how we further developed our PostgreSQL database monitoring infrastructure using the pgwatch2 tool and revised it to suit our needs.
- Our monitoring infrastructure has become more detailed integrated with prometheus. Database/vm specific metrics are visible.
- An infrastructure has been created that can be used for as a code solutions and custom solutions for healthcheck processes. In this way, we can set up the database and metric specific healtcheck processes and put them into code.
- We can add our own designed metrics to our monitoring infrastructure and can be improved. These can also work integrated with our existing structure.
The Pgwatch2 tool is one of the best monitoring tools that can be used with PostgreSQL and is configurationally flexible to meet your needs.
Apart from the architecture and technologies we used while designing this infrastructure, there are many architectural solutions and technologies that can be used with this tool. You can get more detailed information about this in the product’s own official documentation and github repo. In our study, the architecture that we think is most suitable for the Trendyol production environment and which we decided as a result of our evaluations was preferred.
Regards.