To all validator-operators:
The previous mainnet stall uncovered a defect in the celo-blockchain implementation of the Istanbul PBFT consensus, which was temporarily solved with the 1.5.8 patch and a hotfix on the gas limit per block. In order to increase the gas limit again, a change in the implementation of the Istanbul message types is needed.
The technical TLDR; of the issue is the following: in the proposal message, a RoundChangeCertificate is included if the validator is proposing for round > 0. This certificate holds at least quorum amount of RoundChange messages from validators. Said RoundChange messages also may hold a PreparedCertificate if they have one, which contains the previous proposal that may have passed prepare consensus. This leads to, in a worst case scenario, the RoundChangeCertificate holding at least [(validators * â…“) + 1] different blocks in the proposal message.
The change we plan to implement is adding to the RoundChange a signed slim certificate, referencing a block hash and not a full block via the PreparedCertificate, which can then be grouped in the new RoundChangeCertificate (since only the highest round PreparedCertificate block is used).
Such change is backwards incompatible, and a careful approach must be taken to ensure the liveness guarantees of the blockchain.
After some thought, we at cLabs have currently two different upgrade paths that lead to the fix being applied while maintaining the blockchain uptime:
1. Two phase rollout
A phase1 client where both message types are used, both for sending and receiving, increasing the bandwidth used.
A phase2 client which only uses the new message types, reverting the increased bandwidth usage to normal, and the fix being completely applied at this point.
It is important to note that phase2 can only be rolled out when at least quorum amount of validators are running phase1, but the more the safer. There is no particular rush to switch to phase2, other than the community need to have the issue fixed.
Also, it is not necessary to have different version patches, since phase2 could easily be activated by a configuration flag or even an admin rpc call.
2. Consensus fork
Deciding a block number where the new message types will be used for Istanbul, marking a clear point in time where the change will be activated. This will mark a clear deadline, prompting operators to upgrade upon the risk of stalling the network if quorum is not reached.
It’s important to note that this is not a chain Hard Fork, but only a consensus fork, and full nodes won’t be even aware that this is happening.
The reason why a mixed approach is not viable is that if we mix old and new RoundChange messages, the still unupdated machines won’t be able to see or understand the new RoundChange messages present in the RoundChangeCertificate.
We would like to hear your input on what you think is the best approach for applying the update.