This PR enables code generation to proceed in parallel to further elaboration. It does not aim to make further refinements such as generating code for different declarations in parallel or removing the dependency on kernel checking.
This is #3014 with cad5cce reverted for testing.