Paper page - Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency
…In GPT-style language-model pretraining , PACI matches the stability and final perplexity of synchronous 1F1B-flush , retains the same peak memory footprint , achieves fully utilized pipeline throughput, and improves training time…
