While fixing a race condition in the Istanbul engine, we introduced a deadlock that would occur when using the
istanbul_startValidatingAtBlock management API. We did exercise this management API in an end-to-end test. While the test was being run, it was being run in the celo-monorepo repository instead of the celo-blockchain repository. The test did appropriately fail, but the blockchain team did not see that the test failed.
This bug is only present in version 1.4.0 of the celo-blockchain client.
We will be releasing version 1.4.1 with this fix, but in the meantime, please do not use the hotswap start validating API with version 1.4.0.
This PR introduced a readlock on coreMu around
sb.replicaState.NewChainHead(consensusBlock) which then calls
backend.StartValidating which tries to take a lock on coreMu.
There is a follow up PR that switches coreStarted to be an atomic, but that PR did not fully remove the locking and the deadlock is still present on master (on 11/13/21).
So for now, do not upgrade to version 1.4.0 if you’re using hotswap and just wait for the release of version 1.4.1.
The cLabs Blockchain Team
Update: 1.4.1 with a fix for this issue has been released here.