Paper page - Audio-Visual Intelligence in Large Foundation Models
…Furthermore, we curate representative datasets, benchmarks, and evaluation metrics, offering a structured comparison across task families and identifying open challenges in synchronization , spatial reasoning , controllability , and safety . By consolidating this rapidly expanding…