Score Management Committee - Results for week ending 22/06/2025

clemens · June 22, 2025, 1:01pm

This is the 11th in a series of weekly reports on our off-chain score-keeping system for the elected validator / community RPC set. Each week on Sunday, we will publish our next actions for the score management multi-sig, and open up the dispute period.

Changed Scores

We will only report any deviations from a full score. If you are not listed here, we need to take no action and your validator score remains: 1.0 (full payment, claimable each epoch).

StakeCapital (0xbe505Db3e26655cd0a5488f396E391BFA523BeA4), RPC downtime for the week was above 40%, set score to 0.6
Staking Fund #2 (0x5e687A6A1017d4E199921fd01f2dA91f737837cA), RPC uptime for the week was above the required threshold, set score back to 1.00

RPC is in a degraded state

TPT2 (0xea11740878D662A4d4e9b9f0c7C04378ACf1E869), last reported block: 38048719 (Jun 15 2025 03:04:37 AM (+02:00 UTC)), set score to 0
TPT3 (0x3a8431fa7810dE768c6468Dd0dB83E695c3BaEC9), last reported block: 38049357 (Jun 15 2025 03:15:15 AM (+02:00 UTC)), set score to 0

Next Steps

The dispute period is now open for approximately 24 hours, after which, we will execute the above score changes if there are no challenges. Those changes will apply for another week until we review the scores again on 29/06/2025.

kamikazechaser · June 22, 2025, 2:19pm

TPT (The Passive Trust) 2,3 stop syncing around 7 days ago (at the time of this post) and newer chaindata is not available and is hence marked as down. We urge operators to keep an eye out on the growing state size and system requirements of the chain.

ETH_RPC_URL="https://celo-rpc-1.thepassivetrust.com/" cast bl
ETH_RPC_URL="https://celo-rpc-2.thepassivetrust.com" cast bl

Wade · June 22, 2025, 7:07pm

TPT Validator Downtime Postmortem & Alerting Reminder

Heads up to fellow validators, we recently experienced an edge case where both of our nodes (TPT2/3) appeared healthy on monitoring (responding to RPC), yet had silently stalled at a past block for over a week. The nodes were technically “up” - eth_syncing returned false, HTTP 200 responses were still flowing - but they were stuck returning the same stale block, trailing behind the L1 safe block. No tell tale error logs appeared on either node but they were running software from the repository(GitHub - celo-org/celo-l2-node-docker-compose: The easiest way to run a Celo node) that had not been updated since the migration to L2 which may have caused the issue considering it happened on both nodes.

As a result, we were set to a 0% score for the week. Both our nodes are currently operational again.

We’re sharing this to help others avoid the same fate. Make sure your alerting stack accounts for these edge cases:

RPC responding ≠ healthy state
Responding with a stale block height is still “up” but unusable
eth_syncing: false doesn’t mean you’re synced - stalled node may still return that
Compare latest_block or safe_block height vs known good reference (e.g., L1 head)
Lagging behind L1 safe block? Consider it down

The issue seems to stem from a sync-related reorg or possible restart edge case, not caught due to weak alert conditions. We’ve since:

Pulled latest repo changes
Restarted and re-synced the nodes
Added alerting to detect stale block numbers and syncing inconsistencies

Big thanks to @Thylacine for reaching out to us in DMs. @clemens is there documentation regarding how the Score Management Committee determines uptime?

Wade · June 22, 2025, 7:09pm

Thanks. Please check our latest message regarding the issue

kamikazechaser · June 24, 2025, 6:43am

Celo OP Geth has received two bugfix updates, along with updates to the docker-compose configs since L2 migration. Either of them could have been the cause of the stall.

Generally if an RPC node meets a minimum usability standard for an end user it is considered up. E.g. an end user should be able to use say squidrouter without any issues: send transactions and wait for a result, see history e.t.c.

We will publish detailed guidelines sometime this week on the forum on operation recommendations and how exactly scores are determined along with future expectations.

aaronmgdr · June 24, 2025, 2:12pm

btw the latest beta of celocli v7 celocli@ displays these scores when showing the list of “validators”

celocli validator:list --node celo 

```

Topic		Replies	Views
Score Management Committee - Results for week ending 15/06/2025 Mainnet	3	66	June 17, 2025
Score Management Committee - Results for week ending 06/07/2025 Mainnet technical	1	44	July 7, 2025
Score Management Committee - Results for week ending 13/07/2025 Mainnet technical	0	30	July 13, 2025
Score Management Committee - Results for week ending 27/04/2025 Mainnet governance , technical	1	79	April 29, 2025
Score Management Committee - Results for week ending 29/06/2025 Mainnet technical	0	38	June 29, 2025

Score Management Committee - Results for week ending 22/06/2025

Changed Scores

RPC is in a degraded state

Next Steps

Related topics