Skip to main content

Set up Prometheus monitoring

This topic contains instructions to install and configure Prometheus. Prometheus is a time series-based, open-source monitoring system. It collects data by sending HTTP requests to hosts and services on metrics endpoints, which it then makes available for analysis and alerting using a powerful query language. In short, the Prometheus server collects time series data, stores it, makes it available for querying, and sends alerts based on it.

Install Prometheus

You can install Prometheus in several ways. You can use Prometheus Docker image or configuration management systems like Ansible, chef, puppet, and salt stack. For more information, see the official installation guide.

Configure Prometheus

You can edit the prometheus.yml file to configure the Prometheus server. It contains various settings for the Prometheus tool. For more information, see the configuration guide.

Start Prometheus

Enter the following command to expose Prometheus metrics on a specific port (for example, 9190).

chr node start -p metrics.prometheus.port=9190

The command line provides details on the startup process and other services. It should also indicate that the service is listening on port 9190.

You can also use the following command:

chr node start -np node.properties

In this case, you need to make sure that metrics.prometheus.port=9190 attribute is available in the node.properties file.

You can verify that Prometheus serves metrics by navigating to its metrics endpoint: localhost:9190/metrics.

Postchain metrics

Postchain nodes directly provide metrics to Prometheus. You can configure this using the metrics.prometheus.port property in the node configuration or the POSTCHAIN_PROMETHEUS_PORT environment variable. These metrics show the individual status and performance of each node.

note

Metrics can differ between nodes, so aggregating them across the cluster might not accurately reflect overall performance.

NameTypeDescriptionTags
blockchainsgaugeNumber of blockchains currently operational on the node.node_pubkey
subnodesgaugeCount of subnodes anticipated to be running on the node.node_pubkey
containersgaugeQuantity of containers expected to be active on the node.node_pubkey
submitted_transactions,result=OKtimerTransactions successfully submitted by clients and queued.chainIID, blockchainRID, node_pubkey
submitted_transactions,result=INVALIDtimerTransactions submitted by clients but rejected due to invalidity.chainIID, blockchainRID, node_pubkey
submitted_transactions,result=DUPLICATEtimerTransactions rejected for being duplicates.chainIID, blockchainRID, node_pubkey
submitted_transactions,result=FULLtimerTransactions rejected due to a full queue.chainIID, blockchainRID, node_pubkey
transaction_queue_sizegaugeCurrent size of the transaction queue.chainIID, blockchainRID, node_pubkey
processed_transactions,result=ACCEPTEDtimerTransactions extracted from the queue and appended to an incomplete block.chainIID, blockchainRID, node_pubkey
processed_transactions,result=REJECTEDtimerTransactions extracted from the queue but rejected.chainIID, blockchainRID, node_pubkey
blockstimerBlocks built by the node.chainIID, blockchainRID, node_pubkey
signedBlockstimerBlocks signed by the node.chainIID, blockchainRID, node_pubkey
confirmedBlockstimerBlocks confirmed, reflecting consensus among cluster nodes.chainIID, blockchainRID, node_pubkey
confirmedTransactionscounterConfirmed transactions, indicative of cluster-wide consensus.chainIID, blockchainRID, node_pubkey
blockHeightcounterCurrent height of the blockchain.chainIID, blockchainRID, node_pubkey
RevoltsOnNodecounterCount of revolts directed towards the node.node_pubkey
RevoltsByNodecounterNumber of revolts initiated by the node.node_pubkey
RevoltsBetweenOtherNodescounterRevolts occurring between other nodes.node_pubkey
queries,result=successtimerTime taken for successful queries.chainIID, blockchainRID, queryName, node_pubkey
queries,result=failuretimerTime taken for failed queries.chainIID, blockchainRID, queryName, node_pubkey
validatorFastSyncSwitchcounterInstances of fast sync initiated by the validator.chainIID, blockchainRID, nodeBlockState, node_pubkey
ebftResponseTimetimerResponse time for EBFT messages. (Enabled if tracked_ebft_message_max_keep_time_ms is set)chainIID, blockchainRID, sourceNode, targetNode, messageType, node_pubkey
statusChangeTimetimerDuration of status changes for different nodes. (Enabled if tracked_ebft_message_max_keep_time_ms is set)chainIID, blockchainRID, sourceNode, targetNode, nodeBlockState, node_pubkey

Metric Tags

  • node_pubkey: Hex-encoded node public key.
  • chainIID: Numeric blockchain IID (specific to the node).
  • blockchainRID: Hex-encoded global blockchain RID.
  • queryName: Name of the query.
  • nodeBlockState: State of the node block.
  • messageType: Type of EBFT message.
  • sourceNode: Public key of the source node.
  • targetNode: Public key of the target node.

In addition to these metrics, a standard set of JVM and machine metrics are also exposed.