Wow! What happened?
On Thursday April 1st, Coinbase Earn launched an Advanced Task to educate users on how to easily send Celo assets using Valora. The popularity of this task exceeded cLabs’ wildest expectations. We saw tens of thousands of users attempt to onboard to the network within the space of a few minutes. This load, combined with a cloud provider outage, prevented the network from allowing some users to complete the task, so Coinbase temporarily paused the program.
Most operational issues are the perfect storm of several different underlying problems. First, the dramatic spike of user onboarding temporarily exceeded what the Celo network could support in those short moments. Within minutes, tens of thousands of users started the verification flow, leading to unusually high load on Komenci, a cLabs-operated service that pays network onboarding transaction fees for new Valora users. Komenci deploys a smart contract wallet for each user, a gas-intensive operation. The network was unable to process all of these operations at the rate at which they arrived, leading to some Valora users being unable to complete the verification flow. Second, Microsoft Azure, on which several components supporting Valora are hosted, experienced an unexpected and simultaneous multi-data center networking outage. And finally third, Coinbase also experienced a simultaneous issue with withdrawals to the Celo network, now resolved, which delayed some transfers appearing for users. A perfect storm indeed.
At all points, the Celo network remained available and continued to process transactions. And the gas price minimum, part of Celo’s implementation of EIP-1559, rose under the congestion as intended, and as we would want to see in a healthy network experiencing an unexpected spike in volume.
Although a less than ideal experience is never the goal, cLabs has made efforts to be in constant contact with Valora users experiencing delays and unexpected errors. We have absolutely appreciated the support of Valora and the tremendous popularity of the campaign, and the teams are working together to resolve these issues as quickly as possible.
We have learnt a lot in the last 24 hours. cLabs will be sharing and engaging the community in dialog to build a detailed incident report in the coming days. The high loads tested several platform components, and these events are always opportunities to improve Celo for everyone.
Calling all validators: Accelerating a higher performance blockchain
cLabs would love the community to help rapidly get the network to a point where it can deliver greater throughput and enable more users to concurrently onboard. A number of changes have been made and deployed to Komenci and Valora already.
The Celo Blockchain community teams have been working steadily on performance improvements, but are actually in a position to dramatically shorten the timetable to release the first of these, given that there is a clear user need to do so.
A new point release of Celo Blockchain, v1.2.5, is now released!. It will contain select performance improvements backported from v1.3.0-beta, master, and some focused new code. This is undergoing testing at present on Alfajores and on cLabs-operated validators on Baklava.
We are calling on validator operators to upgrade as soon as possible (on validator instances). Other node operators are encouraged to upgrade, but it is not essential that they do.
If the response from the validator community is positive, and a new surge in demand is anticipated in the following days, cLabs will propose a new Celo Governance Proposal (CGP) as soon as Tuesday April 6th, to do two things:
- increase the block gas limit (essentially, the throughput of the blockchain)
adjust the parameters associated with the gas price minimum to allow higher sustained load without causing a rise in gas prices, and to make any rise occur more steadily.
Unusually, cLabs is proposing that these changes be made by governance hotfix, a mechanism that allows a two-thirds majority of validators to apply changes to Celo Core Contracts in a way that doesn’t take the usual 10 days that a governance proposal does. The CGP would be public, of course, and able to be verified by anyone, but instead of CELO holders voting for it, cLabs would ask every validator to participate and to decide whether to support the change or not.
This mechanism is usually reserved for security hotfixes, but in this case it feels appropriate to use it to rapidly adjust non-contentious parameters to support broader usage of the Celo platform. Since it’s 12 months since the validators last tested the process, during The Great Celo Stake Off, this could also act as a ‘dry-run’ to ensure readiness in the event of an urgent security issue.
Comments, suggestions and feedback on this community effort are welcome, as always.