Paper page - Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
…To validate this claim and identify its causes, we study the effects of model scaling on a synthetic setup consisting of a mixture of tasks that show monotonic scaling curves. The results…