Paper page - When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels
…Sushant Gautam , , , , , , , , Abstract Comparative safety scoring without labeled benchmarks relies on scenario-based audits with validity chains measuring responsiveness, variance dominance, and stability to establish deployment evidence. AI-generated summary Many deployments…
