This is the 11th in a series of weekly reports on our off-chain score-keeping system for the elected validator / community RPC set. Each week on Sunday, we will publish our next actions for the score management multi-sig, and open up the dispute period.
Changed Scores
We will only report any deviations from a full score. If you are not listed here, we need to take no action and your validator score remains: 1.0 (full payment, claimable each epoch).
StakeCapital (0xbe505Db3e26655cd0a5488f396E391BFA523BeA4), RPC downtime for the week was above 40%, set score to 0.6
Staking Fund #2 (0x5e687A6A1017d4E199921fd01f2dA91f737837cA), RPC uptime for the week was above the required threshold, set score back to 1.00
RPC is in a degraded state
TPT2 (0xea11740878D662A4d4e9b9f0c7C04378ACf1E869), last reported block: 38048719 (Jun 15 2025 03:04:37 AM (+02:00 UTC)), set score to 0
TPT3 (0x3a8431fa7810dE768c6468Dd0dB83E695c3BaEC9), last reported block: 38049357 (Jun 15 2025 03:15:15 AM (+02:00 UTC)), set score to 0
Next Steps
The dispute period is now open for approximately 24 hours, after which, we will execute the above score changes if there are no challenges. Those changes will apply for another week until we review the scores again on 29/06/2025.
TPT (The Passive Trust) 2,3 stop syncing around 7 days ago (at the time of this post) and newer chaindata is not available and is hence marked as down. We urge operators to keep an eye out on the growing state size and system requirements of the chain.
Heads up to fellow validators, we recently experienced an edge case where both of our nodes (TPT2/3) appeared healthy on monitoring (responding to RPC), yet had silently stalled at a past block for over a week. The nodes were technically “up” - eth_syncing returned false, HTTP 200 responses were still flowing - but they were stuck returning the same stale block, trailing behind the L1 safe block. No tell tale error logs appeared on either node but they were running software from the repository(GitHub - celo-org/celo-l2-node-docker-compose: The easiest way to run a Celo node) that had not been updated since the migration to L2 which may have caused the issue considering it happened on both nodes.
As a result, we were set to a 0% score for the week. Both our nodes are currently operational again.
We’re sharing this to help others avoid the same fate. Make sure your alerting stack accounts for these edge cases:
RPC responding ≠healthy state
Responding with a stale block height is still “up” but unusable
eth_syncing: false doesn’t mean you’re synced - stalled node may still return that
Compare latest_block or safe_block height vs known good reference (e.g., L1 head)
Lagging behind L1 safe block? Consider it down
The issue seems to stem from a sync-related reorg or possible restart edge case, not caught due to weak alert conditions. We’ve since:
Pulled latest repo changes
Restarted and re-synced the nodes
Added alerting to detect stale block numbers and syncing inconsistencies
Big thanks to @Thylacine for reaching out to us in DMs. @clemens is there documentation regarding how the Score Management Committee determines uptime?
Celo OP Geth has received two bugfix updates, along with updates to the docker-compose configs since L2 migration. Either of them could have been the cause of the stall.
Generally if an RPC node meets a minimum usability standard for an end user it is considered up. E.g. an end user should be able to use say squidrouter without any issues: send transactions and wait for a result, see history e.t.c.
We will publish detailed guidelines sometime this week on the forum on operation recommendations and how exactly scores are determined along with future expectations.