Member of Technical Staff - ML Performance

Modal · New York, United States, US · 11 days ago

ABOUT US

AI needs a new infrastructure layer. We're building it at Modal.

Every era of computing brought new workloads that previous infrastructure couldn't support: mainframes, databases, and the cloud. Each time, the company that rebuilt the layer underneath defined the decade. AI is no different, except it touches everything instead of one slice, and the window to build the layer underneath it is open right now.

Our customers include category-defining companies like Lovable https://modal.com/blog/lovable-case-study, Ramp https://modal.com/blog/how-ramp-built-a-full-context-background-coding-agent-on-modal, Cognition, DoorDash, and Suno. They rely on Modal for instant GPU access, sub-second container starts, and native storage, so it's simple to serve low-latency inference, fine-tune models, and access production-ready sandboxes at scale.

We recently raised a $355M Series C https://modal.com/blog/modal-series-c at a $4.65B valuation, led by General Catalyst and Redpoint Ventures. We've crossed $300M+ ARR and grown fivefold since September.

Our team includes creators of popular open-source projects (e.g.,Seaborn https://github.com/mwaskom/seaborn,Luigi https://github.com/spotify/luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.

THE ROLE

We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal’s container runtime to push language and diffusion models towards higher throughput and lower latency, we’d love to hear from you!

REQUIREMENTS

5+ years of experience writing high-quality, high-performance code.
Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT).
Familiarity with Nvidia GPU architecture and CUDA.
Experience with ML performance engineering (tell us a story about boosting GPU performance — debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc).
Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).

Headquarters

New York, United States

Work Location

on-site

Job Category

Software Development

Application Deadline

Not specified

Job Type

Full Time

Experience Level

lead

Application Method

Apply via Website

Salary

Not specified

Quick Search Modal Company in New York, United States

Related Jobs

No related jobs found