
FriendliAI · San Francisco, United States, US · 10 days ago
FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our large-scale, GPU-accelerated AI inference platform. You will be delivering world-class inference speed across NVIDIA and AMD GPUs. With our recent $20M funding, we are scaling our team to meet market demand.
This is a deeply technical, high-impact role where you will write GPU code, implement advanced optimizations. As part of our engine team, you will contribute directly to the company’s proprietary inference engine which supports over 450,000 models on Hugging Face. You will work with the inventors of continuous batching and collaborate with the platform team to deploy your work into production.
FriendliAI is building the world’s best AI inference platform that makes large language and multi-modal models fast, efficient, and deployable at scale. We power high-throughput, low-latency AI workloads for organizations worldwide and integrate directly with Hugging Face, giving developers instant access to over 500,000 open-source models.
We are a small, fast-moving team doing work that matters at one of the most exciting moments in the history of technology. With our world-class inference engine, we are building a platform that the AI industry can actually rely on.
Headquarters
San Francisco, United States
Work Location
on-site
Job Category
Software Development
Application Deadline
Not specified
Job Type
full-time
Experience Level
Not specified
Application Method
Apply via Website
Salary
Not specified
No related jobs found