Search

Showing top 67 results for "Kubernetes r/kubernetes" · filtered from 75 indexed

Filtered by topic: Kubernetes Clear ✕

Tracked topic

Kubernetes

187 articles indexed Last updated 19h ago See topic hub

People also ask

What is the benefit of running Slurm on Kubernetes?

The operational payoff of running Slurm on Kubernetes comes from the ecosystem. Rather than building and maintaining separate toolchains for GPU management, monitoring, networking, and node lifecycle, you can use the Kubernetes tooling that already exists for these problems. Platform teams manage clusters with declarative YAML, Helm deployments, rolling updates, and Prometheus or Grafana for observability.

Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical Blog
How does Slinky slurm-operator work?

Slinky slurm-operator represents each Slurm component (slurmctld for scheduling, slurmdbd for accounting, slurmd for compute workers, slurmrestd for API access) as a Kubernetes Custom Resource Definition (CRD). A Slurm cluster is defined using Custom Resources, and Slinky creates containerized Slurm daemons running in their own pods, configured to belong to their respective cluster. Slinky ensures high availability (HA) of the Slurm control plane (slurmctld) through pod regeneration, with no need for the Slurm native HA mechanism. Configuration changes propagate automatically: Kubernetes synch

Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical Blog

Top stories

Discussions and forums

developer.nvidia.com › ko-kr › blog

계층화되고 재현 가능한 레시피를 통한 GPU 인프라용 Kubernetes 검증하기

… AWS에서는 Amazon EKS 팀의 창립 멤버로 참여해 EKS, Karpenter, 그리고 오픈소스 생태계를 통해 Kubernetes 기반 서비스를 정의하는 데 핵심 역할을 했습니다. NVIDIA에서는 GPU 가속 Kubernetes 환경과 대규모 AI 인프라를 위한 헬스 자동화 패턴을 설계하며, 클라우드 사업자와 고객이 프로덕션 환경에서 GPU 워크로드를 안정적으로 운영할 수 있도록 방향을 제시하고 있습니다. …

Mar 20, 2026 · Mark Chmarny