AMD logo

AI Scale-up Switch System Design Engineer

AMD · NJ, New Jersey US, United States, US · 8 days ago

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE

We are looking for a hands-on, technically sharp system design engineer to join our growing team and lead the bring-up of cutting-edge scale-up switches at the heart of next-generation AI rack infrastructure. As a key contributor, you will bring deep expertise in high-speed Ethernet, server management, and platform validation to drive switch platforms from initial power-on through full system qualification. In this role, you will take full ownership of bring-up execution, apply your debugging skills to solve complex multi-layer problems, and collaborate closely with hardware, firmware, and software teams to deliver production-ready systems.

THE PERSON

You're a highly motivated team player with a strong development background, problem solving mentality, excellent communication skills, ability to prioritize tasks along with willingness to learn and adapt. Excellent teamwork skills and capable of working independently.

KEY RESPONSIBILITIES

Lead the system bring-up and validation of state-of-the-art AI scale-up switches purpose-built for high-density GPU compute racks, from initial power-on through full system validation

Perform high-speed SerDes and link bring-up, including configuring and validating Auto-Negotiation/Link Training (AN/LT), tuning TX equalization, and characterizing signal integrity across 200G/400G/800G interfaces

Execute comprehensive link qualification testing using PRBS (Pseudo-Random Binary Sequence), Snake Traffic loopback testing, and FEC (Forward Error Correction) analysis to validate BER performance at scale

Utilize LinkCAT and Broadcom SDK tools to characterize port performance, diagnose link failures, and validate PHY configurations across large port counts

Integrate and validate server management infrastructure including BMC/IPMI, Redfish API, and out-of-band management workflows for automated bring-up and health monitoring

Develop and maintain bring-up scripts and test automation (Python) to accelerate validation coverage across chassis configurations

Debug complex system-level failures spanning hardware, firmware, and software including signal integrity issues, firmware crashes, and management plane anomalies and drive issues to root cause

Collaborate with hardware, firmware, and software teams to reproduce failures, document findings, and verify fixes across platform revisions

Maintain detailed bring-up documentation, test reports, and issue tracking throughout the product development lifecycle

PREFERRED EXPERIENCE

Extensive hands-on experience in hardware bring-up, platform validation, or high-speed networking silicon characterization
Experience with high-speed switch ASICs (Broadcom TH6/Tomahawk series preferred) and familiarity with Broadcom's SDK/DAPI frameworks
Deep understanding of high-speed Ethernet standards (400GbE, 800GbE) including AN/LT (IEEE 802.3), RS-FEC / KP4-FEC, and PAM4 SerDes technology
Hands-on experience with PRBS testing, BER measurement, eye diagram analysis, and Snake/loopback traffic validation methodologies
Familiarity with LinkCAT or equivalent PHY/link characterization tools
Experience with server management protocols: IPMI, Redfish/OpenBMC, KCS, IPMB, and PLDM for out-of-band control and telemetry
Proficiency in Python for test automation, log parsing, and data analysis
Strong debugging skills — comfortable working across hardware (oscilloscope, protocol analyzer), firmware logs, and software traces to isolate root cause
Experience reading schematics and PCB layout to correlate signal integrity observations with hardware design
Excellent communication skills with the ability to document findings clearly and collaborate across multidisciplinary teams
Experience with high-density switch/router platforms or AI/ML fabric infrastructure is a strong plus

ACADEMIC CREDENTIALS

Bachelor’s/Master’s degree in Computer Science or related field strongly preferred

This role is not eligible for visa sponsorship.

#LI-SC3

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

This posting is for an existing vacancy.

Headquarters

NJ, New Jersey US, United States

Work Location

on-site

Job Category

Other Engineering

Application Deadline

Not specified

Job Type

full-time

Experience Level

Not specified

Application Method

Apply via Website

Salary

Not specified

Quick Search AMD Company in NJ, New Jersey US, United States

Related Jobs

No related jobs found