Compactification

From: Compactification (physics) - Wikipedia

== Pretraining Data Set Size ↔ Relative Performance ==
 * 1) Figure 1: Overall learning curves for the four probing methods. For each method, we compute overall performance for each RoBERTa model tested as the macro average over sub-task’s performance after normalization. We fit a logistic curve which we scale to have a maximum value of 1.
 * 2) Is analogous to "compactification".
 * 3) At around ~10k spaced repetition items, I started to notice the concept network growing on its own e.g. learning new things via concepts already stored 99% of the time or combining concepts.
 * 4) No longer requiring (as much) activation energy.