Deadlock in 1.4.0 When Using Hotswap

Hello Validators,

While fixing a race condition in the Istanbul engine, we introduced a deadlock that would occur when using the istanbul_startValidatingAtBlock management API. We did exercise this management API in an end-to-end test. While the test was being run, it was being run in the celo-monorepo repository instead of the celo-blockchain repository. The test did appropriately fail, but the blockchain team did not see that the test failed.

This bug is only present in version 1.4.0 of the celo-blockchain client.

We will be releasing version 1.4.1 with this fix, but in the meantime, please do not use the hotswap start validating API with version 1.4.0.

Technical Details

This PR introduced a readlock on coreMu around sb.replicaState.NewChainHead(consensusBlock) which then calls backend.StartValidating which tries to take a lock on coreMu.

There is a follow up PR that switches coreStarted to be an atomic, but that PR did not fully remove the locking and the deadlock is still present on master (on 11/13/21).

So for now, do not upgrade to version 1.4.0 if you’re using hotswap and just wait for the release of version 1.4.1.

Thank you,

The cLabs Blockchain Team

Update: 1.4.1 with a fix for this issue has been released here.

2 Likes