Skip to main content

Monitor your node with Prometheus and Grafana

When running a node, it's crucial to set up effective monitoring to ensure quick response times in case of failures and maintain high uptime. This topic contains instructions for installing and configuring Prometheus and Grafana to monitor your node effectively.

Installing Prometheus and Grafana

One popular monitoring solution is the open-source combination of Prometheus and Grafana. Prometheus collects metrics from Postchain and other sources, while Grafana visualizes these metrics and triggers alerts for undesirable states. Follow the installation guides for Grafana and Prometheus.

Configure Prometheus

You can edit the prometheus.yml file to configure the Prometheus server. It contains various settings for the Prometheus tool. For more information, see the configuration guide.

  1. Start Postchain with Prometheus metrics exposed:

    • Use the command chr node start -p metrics.prometheus.port=9190 to expose Prometheus metrics on port 9190.

    • Alternatively, you can set metrics.prometheus.port=9190 in your node.properties file and start Postchain with chr node start -np node.properties.

  2. Configure Prometheus to collect metrics from Postchain:

    • With subnodes: If your Postchain is running with subnodes enabled, each Docker container (master node and subnodes) will expose its own Prometheus metrics. To collect metrics from all containers, you can use the metrics-collector utility. Run the utility as a Docker container:

      docker run \
      --detach \
      --name metrics-collector \
      --restart unless-stopped \
      --volume /var/run/docker.sock:/var/run/docker.sock \
      --publish 8080:8080/tcp \
      --env PROMETHEUS_PORT_SERVER=9190 \
      --env PROMETHEUS_PORT_SUBNODE=9190 \
      registry.gitlab.com/chromaway/core-tools/metrics-collector:latest

      Then, configure Prometheus to scrape the metrics from the utility by adding the following to your prometheus.yml file:

      scrape_configs:
      - job_name: appnet
      static_configs:
      - targets:
      - 127.0.0.1:9190
    • Without subnodes: If your Postchain is running without subnodes, you can configure Prometheus to scrape the metrics directly from the single Docker container by adding the following to your prometheus.yml file:

      scrape_configs:
      - job_name: appnet
      static_configs:
      - targets:
      - 127.0.0.1:9190

      Replace 127.0.0.1 with your node's hostname or IP address, and ensure the port number is correct.

Start Prometheus

Enter the following command to expose Prometheus metrics on a specific port (for example, 9190).

chr node start -p metrics.prometheus.port=9190

The command line provides details on the startup process and other services. It should also indicate that the service is listening on port 9190.

You can also use the following command:

chr node start -np node.properties

In this case, you need to make sure that metrics.prometheus.port=9190 attribute is available in the node.properties file.

You can verify that Prometheus serves metrics by navigating to its metrics endpoint: localhost:9190/metrics.

Postchain metrics

Postchain nodes directly provide metrics to Prometheus. You can configure this using the metrics.prometheus.port property in the node configuration or the POSTCHAIN_PROMETHEUS_PORT environment variable. These metrics show the individual status and performance of each node.

note

Metrics can differ between nodes, so aggregating them across the cluster might not accurately reflect overall performance.

NameTypeDescriptionTags
blockchainsgaugeNumber of blockchains currently operational on the node.node_pubkey
subnodesgaugeCount of subnodes anticipated to be running on the node.node_pubkey
containersgaugeQuantity of containers expected to be active on the node.node_pubkey
submitted_transactions,result=OKtimerTransactions successfully submitted by clients and queued.chainIID, blockchainRID, node_pubkey
submitted_transactions,result=INVALIDtimerTransactions submitted by clients but rejected due to invalidity.chainIID, blockchainRID, node_pubkey
submitted_transactions,result=DUPLICATEtimerTransactions rejected for being duplicates.chainIID, blockchainRID, node_pubkey
submitted_transactions,result=FULLtimerTransactions rejected due to a full queue.chainIID, blockchainRID, node_pubkey
transaction_queue_sizegaugeCurrent size of the transaction queue.chainIID, blockchainRID, node_pubkey
processed_transactions,result=ACCEPTEDtimerTransactions extracted from the queue and appended to an incomplete block.chainIID, blockchainRID, node_pubkey
processed_transactions,result=REJECTEDtimerTransactions extracted from the queue but rejected.chainIID, blockchainRID, node_pubkey
blockstimerBlocks built by the node.chainIID, blockchainRID, node_pubkey
signedBlockstimerBlocks signed by the node.chainIID, blockchainRID, node_pubkey
confirmedBlockstimerBlocks confirmed, reflecting consensus among cluster nodes.chainIID, blockchainRID, node_pubkey
confirmedTransactionscounterConfirmed transactions, indicative of cluster-wide consensus.chainIID, blockchainRID, node_pubkey
blockHeightcounterCurrent height of the blockchain.chainIID, blockchainRID, node_pubkey
RevoltsOnNodecounterCount of revolts directed towards the node.node_pubkey
RevoltsByNodecounterNumber of revolts initiated by the node.node_pubkey
RevoltsBetweenOtherNodescounterRevolts occurring between other nodes.node_pubkey
queries,result=successtimerTime taken for successful queries.chainIID, blockchainRID, queryName, node_pubkey
queries,result=failuretimerTime taken for failed queries.chainIID, blockchainRID, queryName, node_pubkey
validatorFastSyncSwitchcounterInstances of fast sync initiated by the validator.chainIID, blockchainRID, nodeBlockState, node_pubkey
ebftResponseTimetimerResponse time for EBFT messages. (Enabled if tracked_ebft_message_max_keep_time_ms is set)chainIID, blockchainRID, sourceNode, targetNode, messageType, node_pubkey
statusChangeTimetimerDuration of status changes for different nodes. (Enabled if tracked_ebft_message_max_keep_time_ms is set)chainIID, blockchainRID, sourceNode, targetNode, nodeBlockState, node_pubkey

Metric tags

  • node_pubkey: Hex-encoded node public key.
  • chainIID: Numeric blockchain IID (specific to the node).
  • blockchainRID: Hex-encoded global blockchain RID.
  • queryName: Name of the query.
  • nodeBlockState: State of the node block.
  • messageType: Type of EBFT message.
  • sourceNode: Public key of the source node.
  • targetNode: Public key of the target node.

In addition to these metrics, a standard set of JVM and machine metrics are also exposed.

Explore available metrics

Once Prometheus is configured and collects metrics from Postchain, you can explore the available metrics in the Grafana Explore page or when creating dashboards and alerts. Some valuable metrics include blockHeight_total (insights into running blockchains and their heights) and tx_count_total (total transactions processed).

Import a sample dashboard

To get started with visualizing Postchain metrics, you can import the following sample dashboard into Grafana:

{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"description": "",
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": 89,
"links": [],
"liveNow": false,
"panels": [
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 58,
"panels": [
{
"datasource": {
"type": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 7,
"w": 6,
"x": 0,
"y": 1
},
"id": 59,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"repeat": "blockchainRID",
"repeatDirection": "h",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"disableTextWrap": false,
"editorMode": "code",
"exemplar": false,
"expr": "blockHeight_total{blockchainRID=\"$blockchainRID\"}",
"fullMetaSearch": false,
"includeNullMetadata": true,
"interval": "",
"legendFormat": "{{ instance }}",
"range": true,
"refId": "A",
"useBackend": false
}
],
"title": "$blockchainRID",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "(.*?)\\..*?chromia\\.dev:[0-9]?.*",
"renamePattern": "$1"
}
}
],
"type": "timeseries"
}
],
"title": "blockchain height",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 1
},
"id": 175,
"panels": [
{
"datasource": {
"type": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "blocks/s",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 6,
"x": 0,
"y": 2
},
"id": 180,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"repeat": "blockchainRID",
"repeatDirection": "h",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "rate(blockHeight_total{blockchainRID=\"$blockchainRID\"}[5m])",
"legendFormat": "{{ instance }}",
"range": true,
"refId": "A"
}
],
"title": "$blockchainRID",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "(.*?)\\..*",
"renamePattern": "$1"
}
}
],
"type": "timeseries"
}
],
"title": "blockchain height rate",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 2
},
"id": 202,
"panels": [
{
"datasource": {
"type": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "number of transactions in queue",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 6,
"x": 0,
"y": 3
},
"id": 203,
"options": {
"legend": {
"calcs": ["min", "mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"repeat": "blockchainRID",
"repeatDirection": "h",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "transaction_queue_size{blockchainRID=\"$blockchainRID\"}",
"legendFormat": "{{ instance }}, {{ container }}",
"range": true,
"refId": "A"
}
],
"title": "$blockchainRID",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "(.*?)\\..*?chromia\\.dev:[0-9]?.*, (.*)",
"renamePattern": "$1 ($2)"
}
}
],
"type": "timeseries"
}
],
"title": "transaction queue size",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 3
},
"id": 38,
"panels": [
{
"datasource": {
"type": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "transactions/s",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 6,
"x": 0,
"y": 4
},
"id": 93,
"options": {
"legend": {
"calcs": ["min", "mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"repeat": "blockchainRID",
"repeatDirection": "h",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "rate(processed_transactions_seconds_count{blockchainRID=\"$blockchainRID\", result=\"ACCEPTED\"}[1m])",
"legendFormat": "{{ instance }}, {{ container }}",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "sum(rate(processed_transactions_seconds_count{blockchainRID=\"$blockchainRID\", result=\"ACCEPTED\"}[1m]))",
"hide": false,
"instant": false,
"legendFormat": "Accepted Txs - all nodes",
"range": true,
"refId": "B"
},
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "sum(rate(submitted_transactions_seconds_count{blockchainRID=\"$blockchainRID\", result=\"OK\"}[1m]))",
"hide": false,
"instant": false,
"legendFormat": "Submitted Txs - all nodes",
"range": true,
"refId": "C"
}
],
"title": "$blockchainRID",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "(.*?)\\..*?chromia\\.dev:[0-9]?.*, (.*)",
"renamePattern": "$1 ($2)"
}
}
],
"type": "timeseries"
}
],
"title": "accepted transactions",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 4
},
"id": 186,
"panels": [
{
"datasource": {
"type": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "transactions/s",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 6,
"x": 0,
"y": 5
},
"id": 192,
"options": {
"legend": {
"calcs": ["min", "mean", "max"],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"repeat": "blockchainRID",
"repeatDirection": "h",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "rate(confirmedTransactions_total{blockchainRID=\"$blockchainRID\"}[1m])",
"legendFormat": "{{ instance }}, {{ container }}",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"editorMode": "code",
"expr": "avg(rate(confirmedTransactions_total{blockchainRID=\"$blockchainRID\"}[1m]))",
"hide": false,
"instant": false,
"legendFormat": "tps",
"range": true,
"refId": "B"
}
],
"title": "$blockchainRID",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "(.*?)\\..*?chromia\\.dev:[0-9]?.*, (.*)",
"renamePattern": "$1 ($2)"
}
}
],
"type": "timeseries"
}
],
"title": "confirmed transactions",
"type": "row"
}
],
"refresh": "10s",
"schemaVersion": 38,
"tags": [],
"templating": {
"list": [
{
"current": {
"selected": true,
"text": ["All"],
"value": ["$__all"]
},
"datasource": {
"type": "prometheus",
"uid": "ec3962df-ff53-4482-95c9-3704c248c0eb"
},
"definition": "label_values(processed_transactions_seconds_count,blockchainRID)",
"hide": 0,
"includeAll": true,
"label": "BRID",
"multi": true,
"name": "blockchainRID",
"options": [],
"query": {
"qryType": 1,
"query": "label_values(processed_transactions_seconds_count,blockchainRID)",
"refId": "PrometheusVariableQueryEditor-VariableQuery"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
}
]
},
"time": {
"from": "now-30m",
"to": "now"
},
"timepicker": {},
"timezone": "utc",
"title": "my postchain node",
"version": 4,
"weekStart": ""
}

Set up alerts

Grafana allows you to set up alerts based on specific metrics and conditions. For example, you can create an alert to notify you if a node is not producing or retrieving blocks. While alert definitions cannot be imported directly into Grafana, you can use the following alert definition as a reference to set up your own alert using the Grafana web interface or API:

{
"apiVersion": 1,
"groups": [
{
"orgId": 1,
"name": "1m",
"folder": "alert",
"interval": "1m",
"rules": [
{
"title": "node is not producing or retrieving blocks",
"condition": "D",
"data": [
{
"refId": "A",
"relativeTimeRange": {
"from": 600,
"to": 0
},
"model": {
"editorMode": "code",
"exemplar": false,
"expr": "rate(blockHeight_total[10m])",
"instant": true,
"intervalMs": 1000,
"legendFormat": "__auto",
"maxDataPoints": 43200,
"range": false,
"refId": "A"
}
},
{
"refId": "D",
"relativeTimeRange": {
"from": 600,
"to": 0
},
"datasourceUid": "__expr__",
"model": {
"conditions": [
{
"evaluator": {
"params": [0, 0],
"type": "within_range"
},
"operator": {
"type": "and"
},
"query": {
"params": []
},
"reducer": {
"params": [],
"type": "avg"
},
"type": "query"
}
],
"datasource": {
"name": "Expression",
"type": "__expr__",
"uid": "__expr__"
},
"expression": "A",
"intervalMs": 1000,
"maxDataPoints": 43200,
"refId": "D",
"type": "threshold"
}
}
],
"noDataState": "OK",
"execErrState": "Error",
"for": "10m",
"annotations": {
"description": "Node {{ $labels.instance }} is not producing or retrieving blocks on {{ $labels.blockchainRID }}.",
"runbook_url": "",
"summary": ""
},
"isPaused": false
}
]
}
]
}