Ref: #73246

Senior HPC Platform Engineer – Remote

  • Practice Cloud & Infrastructure

  • Technologies Infrastructure & Cloud

  • Location Stockholm, Sweden

  • Type Contract

Senior HPC Platform Engineer - Remote - 3 months+

For a global IT Infrastructure Services client we are looking for a HPC Infrastructure & Scheduler Integration Engineer to design, build, and operate a PBS-based high-performance computing platform. 
It will be a 3 month extendable contract.
You will work remotely with occasional travel to Stockholm, Sweden (must be eligible to travel).

This role focuses on integrating compute, storage, and orchestration layers with the scheduler, ensuring reliable job execution, efficient scaling, and seamless integration with modern platforms such as cloud, Kubernetes, and MLOps tools. 

The Role

  • Developing and maintaining scheduler integrations, including hooks, prolog/epilog scripts, and custom automation
  • Automating the full job lifecycle from submission through execution to teardown
  • Designing and managing HPC environments across bare metal, virtualized, and hybrid cloud setups
  • Integrating the scheduler with storage systems (e.g. Lustre), networking (InfiniBand/Ethernet), and identity services (LDAP/Kerberos)
  • Bridging HPC workloads with modern platforms such as Kubernetes, MLOps frameworks, and cloud bursting solutions
  • Optimizing scheduling performance, resource allocation, and cluster utilization
  • Implementing observability (logging, metrics, dashboards) and supporting incident response and root cause analysis

Skills Required

Core Skills

  • Strong Linux systems engineering (RHEL, Rocky, or SLES)
  • Experience with HPC schedulers (PBS Pro/OpenPBS preferred; Slurm/Torque acceptable)
  • Proficiency in scripting and automation (Python and Bash required; Go or Rust a plus)
  • Solid understanding of distributed systems and cluster operations

HPC Expertise

  • Experience with MPI workloads (OpenMPI, MPICH)
  • Familiarity with GPU scheduling (NVIDIA stack, MIG/MPS)
  • Knowledge of parallel file systems (Lustre strongly preferred)
  • Understanding of scheduling concepts (queues, priorities, backfill, fairshare, reservations)

Infrastructure & Integration

  • Experience with configuration management (Ansible, Puppet, etc.)
  • Exposure to CI/CD for infrastructure and API-driven integrations
  • Familiarity with cloud platforms and hybrid HPC architectures

Preferred Experience

  • Building custom PBS hooks or scheduler extensions in production
  • Designing hybrid HPC + Kubernetes or cloud bursting solutions
  • Operating at scale (10k+ cores, multi-petabyte storage)
  • Experience with security/compliance frameworks (e.g. NIST, STIGs)
  • Strong cross-layer debugging skills (network, storage, scheduler)

If you are interested, please share your up to date CV and best contact number to reach you on.
choe.carr@next-ventures.com // +44(0)2038689173

Attachez un curriculum vitae. Les types de fichiers acceptés sont des DOC, DOCX, PDF, HTML et TXT.

Nous téléchargeons votre application. Il peut prendre quelques instants pour lire votre curriculum vitae. Veuillez patienter!