Apple's GGML team provides developers access to harness the power of the GPU across all of Apple's innovative products, from iPhone, iPad, Apple TV, Apple Watch to the Mac product line. Apple Silicon GPU Driver Scheduler team within Graphics, Games and ML group is seeking a senior/principal engineer to lead design of GPU scheduling mechanisms that drive peak utilization and orchestrate distributed inference across multi-node clusters for server-side ML acceleration - the compute infrastructure foundation that will deliver Apple Intelligence on Private Cloud Compute at unprecedented scale.
Description
The Apple Silicon GPU Driver Scheduler team is directly responsible for GPU workload management including scheduling of commands on the GPU, manage resources and dependencies, responsiveness and quality of service for applications using the GPU. The GPU Scheduler team directly impacts the performance and power efficiency of all Apple products using Apple Silicon GPU. We are looking for an engineer with a strong engineering background who is excited to work with engineers and other leaders at Apple to deliver Apple GPUs across all Apple devices, build and ship exciting new GPU focused features, work with other teams to prototype future HW and SW GPU features.
In this role, you'll architect the GPU driver scheduling layer underneath Apple's largest server-side ML and LLM workloads. You’ll design parallelism strategies that scale from a single GPU to clusters of nodes, build the synchronization and communication primitives that hold them together, and shape the HW/SW interfaces for next-generation GPU designs. You will be working at the intersection of cutting-edge ML systems, systems programming and hardware acceleration, partnering with world-class teams across Apple software and hardware organizations to co-design scheduling primitives in next-generation GPU, collaborate with framework and infrastructure teams to expose scheduling control where it matters, and contribute to the performance and reliability characteristics that ultimately determine inference latency and cost.
We are seeking an individual with curiosity and passion to learn and innovate.
The people here at Apple don’t just create products - they create the kind of wonder that’s revolutionized entire industries. It’s the diversity of those people and their ideas that inspires the innovation that runs through everything we do, from amazing technology to industry-leading environmental efforts. Join Apple, and help us leave the world better than we found it.
","responsibilities":"Design and implement low-level GPU driver and scheduler features optimized for ML/LLM workloads
Design, implement, and optimize scheduling strategies for efficient parallelism across one or more GPUs - data, model, and pipeline parallelism
Co-design scheduling primitives with hardware, performance-architecture, and software teams to achieve peak compute utilization and optimal memory throughput on next-generation GPU designs
Design and implement multi-GPU communication and synchronization using RDMA technologies, integrating with SoC, networking, and GPU front-end primitives, and influencing API/framework usage
Design and implement scalable ML serving infrastructure with first-class support for security, load balancing, and fault tolerance
Contribute to the design of APIs and abstractions that expose scheduling control to higher layers of the ML stack
Drive debug, performance analysis, and optimization for ML workloads - identifying bottlenecks in compute, memory, and distributed/network subsystems
Preferred Qualifications
Experience with GPU Programming (CUDA/ROCm/Metal) and high-performance computing, successfully optimizing large-scale parallel workloads
Experience with inter-node communication technologies (InfiniBand, RDMA, NCCL) in the context of ML training/inference
Minimum Qualifications
Technical BS/MS degree or equivalent experience
Excellent systems programming knowledge with C or C++
Strong experience with operating systems and/or scheduling policies knowledge
Experience or deep understanding of distributed systems and parallel computing architectures
Understanding of systems architecture/compilers/algorithms
Excellent written and oral communication skills
Pay & Benefits
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $147,400 and $220,900, and your base pay will depend on your skills, qualifications, experience, and location.
Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.