Set up Prometheus monitoring
This topic contains instructions to install and configure Prometheus. Prometheus is a time series-based, open-source monitoring system. It collects data by sending HTTP requests to hosts and services on metrics endpoints, which it then makes available for analysis and alerting using a powerful query language. In short, the Prometheus server collects time series data, stores it, makes it available for querying, and sends alerts based on it.
Install Prometheus
You can install Prometheus in several ways. You can use Prometheus Docker image or configuration management systems like Ansible, chef, puppet, and salt stack. For more information, see the official installation guide.
Configure Prometheus
You can edit the prometheus.yml
file to configure the Prometheus server. It contains various settings for the Prometheus tool. For more information, see the configuration guide.
Start Prometheus
Enter the following command to expose Prometheus metrics on a specific port (for example, 9190).
chr node start -p metrics.prometheus.port=9190
The command line provides details on the startup process and other services. It should also indicate that the service is listening on port 9190.
You can also use the following command:
chr node start -np node.properties
In this case, you need to make sure that metrics.prometheus.port=9190
attribute is available in the node.properties
file.
You can verify that Prometheus serves metrics by navigating to its metrics endpoint: localhost:9190/metrics.
Postchain metrics
Postchain nodes directly provide metrics to Prometheus. You can configure this using the metrics.prometheus.port
property in the node configuration or the POSTCHAIN_PROMETHEUS_PORT
environment variable. These metrics show the individual status and performance of each node.
Metrics can differ between nodes, so aggregating them across the cluster might not accurately reflect overall performance.
Name | Type | Description | Tags |
---|---|---|---|
blockchains | gauge | Number of blockchains currently operational on the node. | node_pubkey |
subnodes | gauge | Count of subnodes anticipated to be running on the node. | node_pubkey |
containers | gauge | Quantity of containers expected to be active on the node. | node_pubkey |
submitted_transactions,result=OK | timer | Transactions successfully submitted by clients and queued. | chainIID, blockchainRID, node_pubkey |
submitted_transactions,result=INVALID | timer | Transactions submitted by clients but rejected due to invalidity. | chainIID, blockchainRID, node_pubkey |
submitted_transactions,result=DUPLICATE | timer | Transactions rejected for being duplicates. | chainIID, blockchainRID, node_pubkey |
submitted_transactions,result=FULL | timer | Transactions rejected due to a full queue. | chainIID, blockchainRID, node_pubkey |
transaction_queue_size | gauge | Current size of the transaction queue. | chainIID, blockchainRID, node_pubkey |
processed_transactions,result=ACCEPTED | timer | Transactions extracted from the queue and appended to an incomplete block. | chainIID, blockchainRID, node_pubkey |
processed_transactions,result=REJECTED | timer | Transactions extracted from the queue but rejected. | chainIID, blockchainRID, node_pubkey |
blocks | timer | Blocks built by the node. | chainIID, blockchainRID, node_pubkey |
signedBlocks | timer | Blocks signed by the node. | chainIID, blockchainRID, node_pubkey |
confirmedBlocks | timer | Blocks confirmed, reflecting consensus among cluster nodes. | chainIID, blockchainRID, node_pubkey |
confirmedTransactions | counter | Confirmed transactions, indicative of cluster-wide consensus. | chainIID, blockchainRID, node_pubkey |
blockHeight | counter | Current height of the blockchain. | chainIID, blockchainRID, node_pubkey |
RevoltsOnNode | counter | Count of revolts directed towards the node. | node_pubkey |
RevoltsByNode | counter | Number of revolts initiated by the node. | node_pubkey |
RevoltsBetweenOtherNodes | counter | Revolts occurring between other nodes. | node_pubkey |
queries,result=success | timer | Time taken for successful queries. | chainIID, blockchainRID, queryName, node_pubkey |
queries,result=failure | timer | Time taken for failed queries. | chainIID, blockchainRID, queryName, node_pubkey |
validatorFastSyncSwitch | counter | Instances of fast sync initiated by the validator. | chainIID, blockchainRID, nodeBlockState, node_pubkey |
ebftResponseTime | timer | Response time for EBFT messages. (Enabled if tracked_ebft_message_max_keep_time_ms is set) | chainIID, blockchainRID, sourceNode, targetNode, messageType, node_pubkey |
statusChangeTime | timer | Duration of status changes for different nodes. (Enabled if tracked_ebft_message_max_keep_time_ms is set) | chainIID, blockchainRID, sourceNode, targetNode, nodeBlockState, node_pubkey |
Metric Tags
node_pubkey
: Hex-encoded node public key.chainIID
: Numeric blockchain IID (specific to the node).blockchainRID
: Hex-encoded global blockchain RID.queryName
: Name of the query.nodeBlockState
: State of the node block.messageType
: Type of EBFT message.sourceNode
: Public key of the source node.targetNode
: Public key of the target node.
In addition to these metrics, a standard set of JVM and machine metrics are also exposed.