Node health and maintenance
The maintenance page takes you through the following node issues:
- Is your baker node working?
- Update your node manually
- Check your IP accessibility and peers
- How to monitor node performance
- Interpret log messages and debugging problems - See if your node is signing blocks
- For ZK and reader nodes: Sorting logs of nginx proxy and acme
- Confirm that your BYOC endpoints are working
- How to migrate your node to a different VPS
- Install Network Time Protocol (NTP) to avoid time drift
Is your baker node working?
If your node is unattended for to long it can run into problems. Problems that may affect your node's earning potential
and the safety of your stake. Your node has to be up-to-date to participate in the committee. If your node is not
updated regularly, it is bound to fall out of committee. Only nodes up-to-date can participate in forming a new
committee, so every time a new committee is formed from registered nodes, only nodes with the newest version of Partisia
Software can be included. Your node can only perform services and by extension earn rewards when in the committee. After
you are included you want to make sure your node is able to continue to participate.
To optimize your node's earning potential you should implement automatic updates and check up on the node's performance
regularly.
Your baker node is working when:
- Your node is producing blocks when chosen as producer. At the moment nodes take turns based on their index from the list of committee members. This can be affirmed in How to monitor node performance below
- Your node is signing blocks. Can be checked in the logs as explained below
- Your node is running the newest version of Partisia Software. The easiest way to ensure this is by implementing automatic updates
You can confirm that your node software is up-to-date with the following command:
docker inspect --format='{{.Image}}' YOUR_CONTAINER_NAME
The number must match the latest configuration digest.
Updating your node manually
You should always have enabled automatic updates on your node. But, there can be situation where you want to update it manually if you have had a problem on the node.
In the following it is assumed you are using ~/pbc
as directory for your docker-compose.yml
.
Updating the PBC node is a simple 3-step process:
If you are running more than one container in your docker-compose.yml
read this
To update a update a specific service in docker-compose.yml
specify which service you want to update.
E.g. update only the service pbc
:
docker compose pull pbc
docker compose up -d pbc
cd ~/pbc
docker compose pull
docker compose up -d
First you change the directory to where you put your docker-compose.yml
. You then pull the newest image and start it
again. You should now be running the newest version of the software.
Check your connection to the peers in the network and your uptime
Your node can only get registered as a block producer and participate in the committee if your host IP is reachable. Replace the letters in the URL below with the IP of the server hosting your node. This should navigate you to a page showing a JSON, with the following information:
http://PUBLIC_IP_OF_SERVER_HOSTING_THIS_NODE:9888/status
{
"versionIdentifier": "Version number of PBC",
"uptime": 11552567,
"systemTime": 1700491419888,
"knownPeersNetworkKeys": [
"network addresses (Base64) of connected baker nodes"
],
"networkKey": "Network address(Base64)",
"blockchainAddress": "account address (Hex)",
"finalizationKey": "finalization publicKey (base64)",
"numberOfProcessors": 8,
"systemLoad": 0.51,
"freeMemory": 1574211176,
"totalMemory": 2701131776,
"maxMemory": 17179869184
}
Uptime is measured in milliseconds, and show how long your server has been running uninterrupted.
If you cannot open your status endpoint there is probably a problem with the opening of ports of the VPS. See which ports are allowed through the firewall:
sudo ufw status
Make sure you have opened for ports 9888-9897. If not consult the instructions here.
How to monitor node performance
The node operator community has made several tools that can help you monitor the performance of your node:
Check block production and finalization time with MPC Node Stats:
Compare your received stake delegations with that of other committee members:
Logs and storage
You use the docker logs to see activity on the chain and if your node is signing blocks. The logs of the node are written to the standard output of the container and are therefore managed using the tools provided by Docker. You can read about configuring Docker logs here.
The storage of the node is based on RocksDB. It is write-heavy and will increase in size for the foreseeable future. The number and size of reads and writes is entirely dependent on the traffic on the network.
Common log messages
Signing BlockState - All is well.
Not signing as shutdown is active - You may assume all is well. Shutdown happens when chosen producer fails to produce a block, a reset block is made, and then a new node is chosen for the role of producer.
Not signing - This is not a good sign, you are not signing blocks. First, check if you are on the list of current committee members, if you are not, and you have already sent the Register Transaction, then you should search for your PBC account address in the state the Block Producer Orchestration Contract ( BPOC). There is a field for each producer called "status": - after this you will see either "CONFIRMED" or "PENDING". Confirmed means you are registered as a block producer and are formally eligible to participate in the committee. Pending means your public information is still awaiting manual approval from the team cross-checking the information you have given. If you cannot find your address in the BPOC at all you need to resend your registration. Alternatively, if you are on the list of committee members and still get persistent “Not signing” then you almost certainly have some problem in your config.json Probably you have a wrong or no key in one of the fields: networkKey, accountKey or finalizationKey, or you forgot to add the host IP address.
Got a message with wrong protocol identifier - This message comes every time a shutdown has occurred (in other words whenever a producer has not produced the block he is supposed to). So, on its own that message does not indicate a problem. But, if the log just repeats and don't change to a new message saying Executing Block… it could suggest you are running an outdated version of our software, a version that does not pull the newest docker image automatically.
WebApplicationException. Status=404 - You may assume all is well. You may encounter different types of not found errors in the logs. Most of them are not indicative of a problem at your end. They occur when a node in the network has not received what it expected you can in most cases see the address or producer index of the nodes related to the error.
Sorting Baker node logs
Latest logs:
docker logs -f nameOfDockerContainer
This will show the latest logs after they have caught up to present.
Sorting by time:
docker logs --since 1h nameOfDockerContainer
This will give you the latest hour of logs
Sorting by number of lines:
docker logs --tail 1000 nameOfDockerContainer
This will give you the latest 1000 lines of logs.
docker logs --tail 1000 -f nameOfDockerContainer
You can add the -f
after a command to continue the logs afterwards.
Sort for specific messages:
You can use the grep command to get logs containing a specific string.
docker logs --since 1h pbc-mainnet | grep "Signing BlockState"
This will give you the blocks you have signed the last hour. You might also want to look for blocks you created when you
were chosen as producer | grep "Created Block"
.
For ZK and reader nodes: Sorting logs of nginx proxy and acme
Your nginx reverse proxy and acme certificate renewal are run in docker containers (pbc-nginx
and pbc-acme
). Same way you run your baker or reader node
service in the container pbc-mainnet
. You can use the same docker commands to get the nginx and acme container logs. When sorting the logs, its recommended to use relevant keywords: E.g. keywords related to your SSL/TSL certificate.
The docker logs of a node service are stored as one category in the same file, and displayed in the same color when you
print the container logs with the docker logs
command. Whereas nginx stores logs in two separate categories displayed in
its own color, access logs (white text) and error logs (red text). The access logs shows client request received by
nginx. Error logs shows messages related to the function of nginx including processes started and ended, requests
processed or skipped and shutdowns.
The SSL/TSL certificate renewal, done with acme-companion, shows up in the nginx logs because the pbc-nginx
and pbc-acme
containers communicates: challenge, proof and certificate on a container with port
80 (details in proxy server guide).
Use the same commands as for baker logs
You can use the same docker commands, but remember to specify the name you used for the nginx docker container in your docker-compose.yml
. Container name used in our docker-compose.yml
template is pbc-nginx
.
Find out if you have downloaded the SSL/TSL certificate (to limit logs to a recent period use --since 1h
for last hour):
docker logs pbc-nginx | grep "Downloading cert."
Check if your SSL/TSL certificate has been renewed:
docker logs pbc-nginx | grep "renewal"
Check for the nginx container shutting down and restarting:
docker logs pbc-nginx | grep "starting nginx"
Note
It is a sign of a problem if restarts of your proxy server happen very frequently (not counting the restarts related to your automatic update schedule defined in your auto-update script). Read the log statements leading up to the shutdown and find out what happened.
Confirm that your BYOC endpoints are working
Nodes must have working BYOC REST endpoints to participate in oracle service.
Bad or outdated endpoints cause serious problems
- Can make your price oracle node start a wrongful dispute causing slashing
- Two nodes with bad endpoints in a deposit or withdrawal oracle will crash the bridge
Check if your BYOC endpoints for other chains in config.json are working:
curl "ENDPOINT_YOU_WANT_TO_CHECK" \
-X POST \
-H "Content-Type: application/json" \
--data '{"method":"eth_chainId","params":[],"id":1,"jsonrpc":"2.0"}'
Alternatively:
curl -X POST "ENDPOINT_YOU_WANT_TO_CHECK" \
-H 'Content-Type: application/json' \
--data '{"method":"eth_blockNumber", "jsonrpc":"2.0", "params":[],"id":1}'
If the block number is way off, or if you don't get anything with either command, there is likely a problem with the endpoint, replace it!
How to migrate your node to a different VPS
When changing VPS there are a few important precautions you take ensuring a problem free migration.
You may never run two nodes performing baker services at the same time
Running two nodes with same config can be interpreted as malicious behavior.You can start
a reader node on the new VPS. Then, when you are ready to change
the config.json
to the BP version, you stop the node from running on the old server:
docker stop nameOfDockerContainer
If you change host IP, you need to correct your config.json
In config.json
correct the IPv4:
"host": "PUBLIC_IP_OF_SERVER_HOSTING_THIS_NODE",
You must migrate certain files for your node to participate in voting on a new committee (Large Oracle)
From the storage directory /opt/pbc-mainnet/storage
of your old node host you move the 3 files below to the storage
directory of the new server:
large-oracle-backup-database.db
, large-oracle-database.db
and peers.json
Install Network Time Protocol
To avoid time drift use Network Time Protocol (NTP). First install:
sudo apt-get update
sudo apt-get install ntp ntpdate
Stop NTP service and point to NTP server:
sudo service ntp stop
sudo ntpdate pool.ntp.org
Start NTP service and check status:
sudo service ntp start
sudo systemctl status ntp