Paper page - Position: LLM Inference Should Be Evaluated as Energy-to-Token Production
… We therefore call for inference papers and benchmarks to report Joules/token, active binding constraint, PUE-adjusted delivered power, and utilization-adjusted token output alongside accuracy and latency. …