Teleport logo

System Reliability Engineer / T2 Support Engineer

Teleport · Gurugram, Haryana, India, IN · 2 months ago

About the Role

We are looking for an engineer who enjoys understanding how systems behave in real production, not just writing features. This role is responsible for maintaining reliability, stability, and smooth functioning of our live platform running on Google Cloud.

You will act as the first technical owner of production systems — monitoring services, investigating alerts, resolving issues, and performing controlled configuration and operational changes. This role works closely with backend developers, QA, and infrastructure teams to prevent incidents and reduce downtime.

This is not a call-center support role and not a pure development role — it is a hands-on technical position focused on debugging, incident handling, and system operations.

Tech Stack

Google Cloud Platform (Compute, Logging, Monitoring)

Java (Spring Boot based microservices)

MongoDB

  • Apache Kafka (event-driven architecture)
  • Redis cache
  • Linux servers

Key Responsibilities

  • Production Monitoring & Alert Handling
  • Monitor application health, latency, errors, consumer lag, database connections, and resource utilization
  • Acknowledge and investigate monitoring alerts
  • Perform first-level troubleshooting and stabilize services
  • Identify whether issue is infra, application, database, or messaging related

Incident Response

Participate in on-call rotation
Diagnose production incidents and restore services with minimal downtime
Safely restart services, scale instances, or rollback deployments when required
Communicate incident status to stakeholders
Technical Support & Operational Changes
Handle technical support tickets requiring engineering understanding
Update configurations and feature flags
Manage scheduled jobs / cron triggers
Trigger or replay events in Kafka
Assist in minor Java configuration/code fixes when needed
Coordinate production releases
Database & Messaging Operations
Investigate MongoDB performance issues and slow queries
Monitor and resolve Kafka consumer lag and stuck messages
Manage Redis cache behavior (TTL, eviction, connection issues)
Logs & RCA
Analyze logs and metrics to determine root cause of issues
Prepare basic Root Cause Analysis (RCA) reports
Suggest preventive actions to reduce recurring incidents

Requirements

Required Skills

Core Technical Skills

  • Good understanding of Linux commands and server behavior
  • Experience analyzing application logs and debugging runtime issues
  • Basic Java knowledge (stack trace reading, configuration changes, rebuild & deploy)
  • Practical experience with MongoDB (indexes, connections, slow queries)
  • Understanding of Kafka concepts (consumer, offset, lag, partitions)
  • Basic Redis knowledge (caching behavior, TTL)
  • Cloud & Tools
  • Hands-on experience with any cloud platform (GCP preferred / AWS acceptable)
  • Experience using monitoring tools (GCP Monitoring, Prometheus, Grafana, ELK, or similar)
  • Understanding of REST APIs and HTTP status codes

What We Expect From You

  • Ability to investigate problems logically rather than randomly restarting services
  • Comfort working with live production systems
  • Willingness to participate in on-call support
  • Strong ownership mindset and attention to detail
  • Good communication during incidents

Good to Have

  • Experience in e-commerce, fintech, logistics, or high-traffic systems
  • Exposure to CI/CD pipelines and deployments
  • Basic scripting (Shell or Python)
  • Experience writing RCA documents

Experience

3 – 6 years of relevant experience in production support, application support, SRE, DevOps operations, or similar roles.

Benefits

Why Join Us

  • Direct exposure to real distributed systems
  • Hands-on production debugging experience
  • Opportunity to learn system architecture deeply
  • Close interaction with development and platform teams

Important Note

This role involves handling live production systems and occasional on-call responsibilities. Candidates interested only in feature development or pure infrastructure automation may not find this role suitable.

Visit the company's website for more information

Visit website

Headquarters

Gurugram, Haryana, India

Work Location

on-site

Job Category

IT - Network / Systems / DB Admin

Application Deadline

Not specified

Job Type

per-diem

Experience Level

senior-level

Application Method

Apply via Website

Salary

Not specified

Quick Search Teleport Company in Gurugram, Haryana, India

Related Jobs

No related jobs found