Coding agents are useful for quickly making complex code changes. FL experiments differ from ordinary local model tuning because the correctness of the experiment depends on a contract among the server, clients, model updates, metadata, data splits, and evaluation logic. A candidate can raise the reported score while quietly changing what is being compared—for example, by altering the evaluation data, model capacity, communication budget, local compute, or server-client update semantics. Auto-FL makes the research loop explicit. The agent begins with program.md, which acts as the control plane
NVIDIA FLARE Auto-FL is an automated, AI-driven research loop designed to test and optimize federated learning strategies. The idea is straightforward: start with a comparable benchmark task, give the agent a clear research control plane, set a fixed training budget, constrain the mutation surface, and record every result in an experiment ledger. From there, the agent can autonomously iterate through candidate FL strategies while preserving the FLARE Client API and Recipe API contracts. Rather than handing an agent an open-ended research problem, Auto-FL begins with a fair, comparable benchma
Beyond the default CIFAR-10 simulation, the Auto-FL pattern is highly adaptable. By decoupling the primary control plane from the task profile—which specifies the dataset, metrics, and mutation constraints—researchers can apply the same autonomous experiment discipline to various model families without rebuilding the underlying harness. To demonstrate this flexibility, a medical visual language model (VLM) task is included in this example. This example integrates a federated Qwen3-VL LoRA training workflow into the NVIDIA FLARE client and recipe APIs. The setup simulates three distinct medical
Auto-FL packages the components needed to run that operating model in a single place. It includes a ready-to-run experimental harness within a task profile. FLARE baseline recipes in job.py, a Client API training loop in client.py, custom FL aggregation hooks and additional model and training utilities, and mutation guardrails. The package also includes run scripts, plotting utilities, templates, and a reporting skill for completed campaigns. A task profile can define a supported strategy surface with FedAvg, FedOpt-style server updates, FedAdam, SCAFFOLD, median aggregation, and FedProx hooks