Monitor your node with Prometheus and Grafana
When running a node, it's crucial to set up effective monitoring to ensure quick response times in case of failures and maintain high uptime. This topic contains instructions for installing and configuring Prometheus and Grafana to monitor your node effectively.
Installing Prometheus and Grafana
One popular monitoring solution is the open-source combination of Prometheus and Grafana. Prometheus collects metrics from Postchain and other sources, while Grafana visualizes these metrics and triggers alerts for undesirable states. Follow the installation guides for Grafana and Prometheus.
Configure Prometheus
You can edit the prometheus.yml
file to configure the Prometheus server. It contains various settings for the
Prometheus tool. For more information, see the
configuration guide.
-
Start Postchain with Prometheus metrics exposed:
-
Use the command
chr node start -p metrics.prometheus.port=9190
to expose Prometheus metrics on port 9190. -
Alternatively, you can set
metrics.prometheus.port=9190
in yournode.properties
file and start Postchain withchr node start -np node.properties
.
-
-
Configure Prometheus to collect metrics from Postchain:
-
With subnodes: If your Postchain is running with subnodes enabled, each Docker container (master node and subnodes) will expose its own Prometheus metrics. To collect metrics from all containers, you can use the metrics-collector utility. Run the utility as a Docker container:
docker run \
--detach \
--name metrics-collector \
--restart unless-stopped \
--volume /var/run/docker.sock:/var/run/docker.sock \
--publish 8080:8080/tcp \
--env PROMETHEUS_PORT_SERVER=9190 \
--env PROMETHEUS_PORT_SUBNODE=9190 \
registry.gitlab.com/chromaway/core-tools/metrics-collector:latestThen, configure Prometheus to scrape the metrics from the utility by adding the following to your
prometheus.yml
file:scrape_configs:
- job_name: appnet
static_configs:
- targets:
- 127.0.0.1:9190 -
Without subnodes: If your Postchain is running without subnodes, you can configure Prometheus to scrape the metrics directly from the single Docker container by adding the following to your
prometheus.yml
file:scrape_configs:
- job_name: appnet
static_configs:
- targets:
- 127.0.0.1:9190Replace
127.0.0.1
with your node's hostname or IP address, and ensure the port number is correct.
-
Start Prometheus
Enter the following command to expose Prometheus metrics on a specific port (for example, 9190).
chr node start -p metrics.prometheus.port=9190
The command line provides details on the startup process and other services. It should also indicate that the service is listening on port 9190.
You can also use the following command:
chr node start -np node.properties
In this case, you need to make sure that metrics.prometheus.port=9190
attribute is available in the node.properties
file.
You can verify that Prometheus serves metrics by navigating to its metrics endpoint: localhost:9190/metrics.
Postchain metrics
Postchain nodes directly provide
metrics to Prometheus. You
can configure this using the metrics.prometheus.port
property in the node configuration or the
POSTCHAIN_PROMETHEUS_PORT
environment variable. These metrics show the individual status and performance of each node.
Metrics can differ between nodes, so aggregating them across the cluster might not accurately reflect overall performance.
Name | Type | Description | Tags |
---|---|---|---|
blockchains | gauge | Number of blockchains currently operational on the node. | node_pubkey |
subnodes | gauge | Count of subnodes anticipated to be running on the node. | node_pubkey |
containers | gauge | Quantity of containers expected to be active on the node. | node_pubkey |
submitted_transactions,result=OK | timer | Transactions successfully submitted by clients and queued. | chainIID, blockchainRID, node_pubkey |
submitted_transactions,result=INVALID | timer | Transactions submitted by clients but rejected due to invalidity. | chainIID, blockchainRID, node_pubkey |
submitted_transactions,result=DUPLICATE | timer | Transactions rejected for being duplicates. | chainIID, blockchainRID, node_pubkey |
submitted_transactions,result=FULL | timer | Transactions rejected due to a full queue. | chainIID, blockchainRID, node_pubkey |
transaction_queue_size | gauge | Current size of the transaction queue. | chainIID, blockchainRID, node_pubkey |
processed_transactions,result=ACCEPTED | timer | Transactions extracted from the queue and appended to an incomplete block. | chainIID, blockchainRID, node_pubkey |
processed_transactions,result=REJECTED | timer | Transactions extracted from the queue but rejected. | chainIID, blockchainRID, node_pubkey |
blocks | timer | Blocks built by the node. | chainIID, blockchainRID, node_pubkey |
signedBlocks | timer | Blocks signed by the node. | chainIID, blockchainRID, node_pubkey |
confirmedBlocks | timer | Blocks confirmed, reflecting consensus among cluster nodes. | chainIID, blockchainRID, node_pubkey |
confirmedTransactions | counter | Confirmed transactions, indicative of cluster-wide consensus. | chainIID, blockchainRID, node_pubkey |
blockHeight | counter | Current height of the blockchain. | chainIID, blockchainRID, node_pubkey |
RevoltsOnNode | counter | Count of revolts directed towards the node. | node_pubkey |
RevoltsByNode | counter | Number of revolts initiated by the node. | node_pubkey |
RevoltsBetweenOtherNodes | counter | Revolts occurring between other nodes. | node_pubkey |
queries,result=success | timer | Time taken for successful queries. | chainIID, blockchainRID, queryName, node_pubkey |
queries,result=failure | timer | Time taken for failed queries. | chainIID, blockchainRID, queryName, node_pubkey |
validatorFastSyncSwitch | counter | Instances of fast sync initiated by the validator. | chainIID, blockchainRID, nodeBlockState, node_pubkey |
ebftResponseTime | timer | Response time for EBFT messages. (Enabled if tracked_ebft_message_max_keep_time_ms is set) | chainIID, blockchainRID, sourceNode, targetNode, messageType, node_pubkey |
statusChangeTime | timer | Duration of status changes for different nodes. (Enabled if tracked_ebft_message_max_keep_time_ms is set) | chainIID, blockchainRID, sourceNode, targetNode, nodeBlockState, node_pubkey |
Metric tags
node_pubkey
: Hex-encoded node public key.chainIID
: Numeric blockchain IID (specific to the node).blockchainRID
: Hex-encoded global blockchain RID.queryName
: Name of the query.nodeBlockState
: State of the node block.messageType
: Type of EBFT message.sourceNode
: Public key of the source node.targetNode
: Public key of the target node.
In addition to these metrics, a standard set of JVM and machine metrics are also exposed.
Explore available metrics
Once Prometheus is configured and collects metrics from Postchain, you can explore the available metrics in the Grafana
Explore page or when creating dashboards and alerts. Some valuable metrics include blockHeight_total
(insights into
running blockchains and their heights) and tx_count_total
(total transactions processed).
Import a sample dashboard
To get started with visualizing Postchain metrics, you can import the following sample dashboard into Grafana:
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"description": "",
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": 89,
"links": [],
"liveNow": false,
"panels": [
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 58,
"panels": [
{
"datasource": {
"type": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 7,
"w": 6,
"x": 0,
"y": 1
},
"id": 59,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"repeat": "blockchainRID",
"repeatDirection": "h",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"disableTextWrap": false,
"editorMode": "code",
"exemplar": false,
"expr": "blockHeight_total{blockchainRID=\"$blockchainRID\"}",
"fullMetaSearch": false,
"includeNullMetadata": true,
"interval": "",
"legendFormat": "{{ instance }}",
"range": true,
"refId": "A",
"useBackend": false
}
],
"title": "$blockchainRID",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "(.*?)\\..*?chromia\\.dev:[0-9]?.*",
"renamePattern": "$1"
}
}
],
"type": "timeseries"
}
],
"title": "blockchain height",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 1
},
"id": 175,
"panels": [
{
"datasource": {
"type": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "blocks/s",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 6,
"x": 0,
"y": 2
},
"id": 180,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"repeat": "blockchainRID",
"repeatDirection": "h",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "rate(blockHeight_total{blockchainRID=\"$blockchainRID\"}[5m])",
"legendFormat": "{{ instance }}",
"range": true,
"refId": "A"
}
],
"title": "$blockchainRID",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "(.*?)\\..*",
"renamePattern": "$1"
}
}
],
"type": "timeseries"
}
],
"title": "blockchain height rate",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 2
},
"id": 202,
"panels": [
{
"datasource": {
"type": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "number of transactions in queue",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 6,
"x": 0,
"y": 3
},
"id": 203,
"options": {
"legend": {
"calcs": ["min", "mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"repeat": "blockchainRID",
"repeatDirection": "h",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "transaction_queue_size{blockchainRID=\"$blockchainRID\"}",
"legendFormat": "{{ instance }}, {{ container }}",
"range": true,
"refId": "A"
}
],
"title": "$blockchainRID",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "(.*?)\\..*?chromia\\.dev:[0-9]?.*, (.*)",
"renamePattern": "$1 ($2)"
}
}
],
"type": "timeseries"
}
],
"title": "transaction queue size",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 3
},
"id": 38,
"panels": [
{
"datasource": {
"type": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "transactions/s",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 6,
"x": 0,
"y": 4
},
"id": 93,
"options": {
"legend": {
"calcs": ["min", "mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"repeat": "blockchainRID",
"repeatDirection": "h",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "rate(processed_transactions_seconds_count{blockchainRID=\"$blockchainRID\", result=\"ACCEPTED\"}[1m])",
"legendFormat": "{{ instance }}, {{ container }}",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "sum(rate(processed_transactions_seconds_count{blockchainRID=\"$blockchainRID\", result=\"ACCEPTED\"}[1m]))",
"hide": false,
"instant": false,
"legendFormat": "Accepted Txs - all nodes",
"range": true,
"refId": "B"
},
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "sum(rate(submitted_transactions_seconds_count{blockchainRID=\"$blockchainRID\", result=\"OK\"}[1m]))",
"hide": false,
"instant": false,
"legendFormat": "Submitted Txs - all nodes",
"range": true,
"refId": "C"
}
],
"title": "$blockchainRID",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "(.*?)\\..*?chromia\\.dev:[0-9]?.*, (.*)",
"renamePattern": "$1 ($2)"
}
}
],
"type": "timeseries"
}
],
"title": "accepted transactions",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 4
},
"id": 186,
"panels": [
{
"datasource": {
"type": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "transactions/s",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 6,
"x": 0,
"y": 5
},
"id": 192,
"options": {
"legend": {
"calcs": ["min", "mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"repeat": "blockchainRID",
"repeatDirection": "h",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "rate(confirmedTransactions_total{blockchainRID=\"$blockchainRID\"}[1m])",
"legendFormat": "{{ instance }}, {{ container }}",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "avg(rate(confirmedTransactions_total{blockchainRID=\"$blockchainRID\"}[1m]))",
"hide": false,
"instant": false,
"legendFormat": "tps",
"range": true,
"refId": "B"
}
],
"title": "$blockchainRID",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "(.*?)\\..*?chromia\\.dev:[0-9]?.*, (.*)",
"renamePattern": "$1 ($2)"
}
}
],
"type": "timeseries"
}
],
"title": "confirmed transactions",
"type": "row"
}
],
"refresh": "10s",
"schemaVersion": 38,
"tags": [],
"templating": {
"list": [
{
"current": {
"selected": true,
"text": ["All"],
"value": ["$__all"]
},
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"definition": "label_values(processed_transactions_seconds_count,blockchainRID)",
"hide": 0,
"includeAll": true,
"label": "BRID",
"multi": true,
"name": "blockchainRID",
"options": [],
"query": {
"qryType": 1,
"query": "label_values(processed_transactions_seconds_count,blockchainRID)",
"refId": "PrometheusVariableQueryEditor-VariableQuery"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
}
]
},
"time": {
"from": "now-30m",
"to": "now"
},
"timepicker": {},
"timezone": "utc",
"title": "my postchain node",
"version": 4,
"weekStart": ""
}
Set up alerts
Grafana allows you to set up alerts based on specific metrics and conditions. For example, you can create an alert to notify you if a node is not producing or retrieving blocks. While alert definitions cannot be imported directly into Grafana, you can use the following alert definition as a reference to set up your own alert using the Grafana web interface or API:
{
"apiVersion": 1,
"groups": [
{
"orgId": 1,
"name": "1m",
"folder": "alert",
"interval": "1m",
"rules": [
{
"title": "node is not producing or retrieving blocks",
"condition": "D",
"data": [
{
"refId": "A",
"relativeTimeRange": {
"from": 600,
"to": 0
},
"model": {
"editorMode": "code",
"exemplar": false,
"expr": "rate(blockHeight_total[10m])",
"instant": true,
"intervalMs": 1000,
"legendFormat": "__auto",
"maxDataPoints": 43200,
"range": false,
"refId": "A"
}
},
{
"refId": "D",
"relativeTimeRange": {
"from": 600,
"to": 0
},
"datasourceUid": "__expr__",
"model": {
"conditions": [
{
"evaluator": {
"params": [0, 0],
"type": "within_range"
},
"operator": {
"type": "and"
},
"query": {
"params": []
},
"reducer": {
"params": [],
"type": "avg"
},
"type": "query"
}
],
"datasource": {
"name": "Expression",
"type": "__expr__",
"uid": "__expr__"
},
"expression": "A",
"intervalMs": 1000,
"maxDataPoints": 43200,
"refId": "D",
"type": "threshold"
}
}
],
"noDataState": "OK",
"execErrState": "Error",
"for": "10m",
"annotations": {
"description": "Node {{ $labels.instance }} is not producing or retrieving blocks on {{ $labels.blockchainRID }}.",
"runbook_url": "",
"summary": ""
},
"isPaused": false
}
]
}
]
}