The kernel support opportunistic hash consing.
At commit e63c79c81e we have enabled this feature during
tactic execution, and fixed performance problems in the Certigrad
project (by @dselsam).
However, this change produced unexpected performance problems in
other Lean files (example: @JasonGross example at issue #1646 was 10x
slower after the commit above).
After analyzing the performance logs, I conjecture the hash_consing
may produce a substantial performance overhead if the following property
doesn't hold.
- Let `cache` be the hash_consing cache, then for each `e` in `cache`,
all children of `e` are also in the `cache`.
If the property above is not true, we may have problem whenever
we add an expression `t` containing multiple copies of a big term `T`.
For example, support we insert `f T_1 T_2`, where `T_1` and `T_2` are
structurally equal but are not pointer equal. Then, if we try to insert
`f T_2 T_1`, we will have to compare the huge terms twice, and the
comparison will be proportional to the size of `T_i`.
This commit tries to address this performance problem by enforcing
the property above. This is not a perfect solution since we may keep
trying to create terms using big terms created before hash_consing
has been enabled. After this commit, the example at issue #1646
is only 1.4x slower. It didn't impact the standard library
compilation (memory nor time).
I'm using this commit is a temporary workaround. We should probably
remove the hash_consing support from the kernel, and implement it
in key places. For example, for Certigrad, we can control memory
consumption by using hash_consing only during `simp`,
`instantiate_mvars` and `elaborator` finalization procedure.
@gebner @kha Any ideas/suggestions for this hash_consing issue?