Sr. Site Reliability Engineer (SRE)

Welcome to the Latest Job Vacancies Site 2025 and at this time we would like to inform you of the Latest Job Vacancies from the NBCUniversal with the position of Sr. Site Reliability Engineer (SRE) which was opened this.

If this job matches your qualifications, please send your application directly through our latest Job site. Indeed, every job is not easy to apply because it must meet several qualifications and requirements that we must meet in accordance with the standard criteria of the Company who are looking for potential candidates to work. Good job information Sr. Site Reliability Engineer (SRE) below matches your qualifications.

Company Description

NBCUniversal is one of the world's leading media and entertainment companies. We create world-class content, which we distribute across our portfolio of film, television, and streaming, and bring to life through our theme parks and consumer experiences. We own and operate leading entertainment and news brands, including NBC, NBC News, MSNBC, CNBC, NBC Sports, Telemundo, NBC Local Stations, Bravo, USA Network, and Peacock, our premium ad-supported streaming service. We produce and distribute premier filmed entertainment and programming through Universal Filmed Entertainment Group and Universal Studio Group, and have world-renowned theme parks and attractions through Universal Destinations & Experiences. NBCUniversal is a subsidiary of Comcast Corporation.

Our impact is rooted in improving the communities where our employees, customers, and audiences live and work. We have a rich tradition of giving back and ensuring our employees have the opportunity to serve their communities. We champion an inclusive culture and strive to attract and develop a talented workforce to create and deliver a wide range of content reflecting our world.

Comcast NBCUniversal has announced its intent to create a new publicly traded company ('Versant') comprised of most of NBCUniversal's cable television networks, including USA Network, CNBC, MSNBC, Oxygen, E!, SYFY and Golf Channel along with complementary digital assets Fandango, Rotten Tomatoes, GolfNow, GolfPass, and SportsEngine. The well-capitalized company will have significant scale as a pure-play set of assets anchored by leading news, sports and entertainment content. The spin-off is expected to be completed during 2025.

Job Description

As a Principal Site Reliability Engineer (SRE) overseeing our digital application portfolio, you will lead efforts to ensure the reliability, scalability, and performance of the platforms behind our web, mobile, and OTT experiences. You'll work across a diverse ecosystem of products and technologies—helping with architectural decisions, shaping reliability standards, and championing operational excellence at scale.

You will serve as a strategic partner to engineering, product, security, and infrastructure teams—guiding system design for high availability, leading incident response across critical services, and embedding SRE best practices across the software development lifecycle. Your role will include evolving observability frameworks, advancing infrastructure-as-code maturity, and automating tool to accelerate delivery while maintaining stability.

Success in this role is defined by your ability to influence engineering culture, mentor teams, and drive systemic improvements that raise the bar for operational resilience. You'll take a proactive, data-driven approach to identifying and addressing risks before they impact users. Collaboration across teams—including video engineering, content delivery, data, and customer experience—is key to delivering digital products that are not only innovative but consistently reliable.

What We Value

Site Reliability Engineers are the champions of reliability and customer trust in production. We value engineers who are driven by a desire to deliver the best possible customer experience—ensuring that every interaction across our web, mobile, CTV, and video platforms is fast, seamless, and dependable. We look for systems thinkers who act with urgency, collaborate deeply, and apply a data-driven mindset to everything they do. Curiosity, clear communication, and continuous improvement are at the heart of our culture. As a Principal SRE, you'll lead by example—mentoring others, shaping best practices, and helping us build resilient systems that scale.

Responsibilities:

Design and implement tools, processes, and frameworks to proactively monitor, measure, and improve the performance, availability, and reliability of production applications.
Define and maintain key Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to uphold system reliability and user experience targets.
Evaluate applications and services for production readiness—ensuring they meet operational, security, and customer experience requirements before launch.
Establish comprehensive observability practices—including real-time monitoring, alerting, and telemetry—to ensure deep visibility into system health and user impact.
Serve as a feedback loop to engineering teams—analyzing production behavior, identifying reliability gaps, and driving architectural and operational improvements.
Collaborate with security and infrastructure teams to proactively address vulnerabilities and maintain compliance across production systems.
Partner with product and platform teams to ensure operational insights inform development priorities and release strategies.
Lead post-incident reviews and foster a culture of continuous learning, improvement, and resilience.
Participate in a 24/7 on-call rotation to support critical services and ensure rapid incident response.

Qualifications

Must-Haves:

Willingness to work onsite and participate in a 24/7 on-call rotation, including evenings, overnights, weekends, and holidays with minimal notice.
Demonstrated experience supporting digital news and content platforms across web, mobile, CTV, and video-rich environments, with a strong focus on performance and user experience.
10+ years of experience managing and optimizing large-scale, high-traffic websites.
10+ years of hands-on experience with application deployment processes and CI/CD pipelines.
5+ years improving performance and reliability for OTT (Connected TV) and mobile applications.
5+ years supporting microservices and multi-tier distributed systems.
5+ years implementing software automation frameworks for reliability and operational efficiency.
5+ years of experience with cloud platforms, including AWS and Google Cloud Platform (GCP).
5+ years working with observability and APM tools such as Datadog, New Relic, AppDynamics, Sysdig, or Zabbix.
3+ years working with reverse proxies like Varnish and Content Delivery Networks (CDNs) such as Akamai.
5+ years scripting with languages such as Bash, Python, Perl, or Groovy.
5+ years using configuration management tools such as Ansible, SaltStack, Chef, or Puppet.
5+ years configuring and managing application servers (e.g., Tomcat, NGINX, Apache).
5+ years of extensive experience with load and performance testing tools/frameworks such as JMeter, k6, or similar.
Hands-on Experience using tools like Charles Proxy or Fiddler to triage and debug issues with Web, Mobile apps and OTT devices.
High level understanding of video streaming techniques and ability to triage issues with Mobile and OTT streaming applications.
3+ years using performance validation tools such as Selenium, TestNG, or equivalent to drive improvements in production.

Preferred Qualifications:

3+ years implementing and monitoring application/infrastructure security controls, including WAFs, site shields, and other perimeter protections.
3+ years applying code and infrastructure security practices, including vulnerability remediation and secure deployment pipelines.
Relevant certifications in Performance Engineering or Site Reliability Engineering (SRE) are a plus.

Hybrid: This position has been designated as hybrid, generally contributing from the office a minimum of three days per week.

What we'll offer:

At CNBC Headquarters in Englewood Cliffs, NJ, you'll have access to great perks and amenities: 

Sweat it out -- Free onsite fitness center with state-of-the-art equipment, plus daily group classes 
Eat up -- Gourmet cafeteria with daily specials plus soup and salad bars 
Extras -- Dry cleaning, shoe shining and sneak peeks

Don't have a car? No problem! We offer free shuttle transportation to and from multiple locations in Manhattan, Brooklyn, Hoboken and Jersey City .

This position is eligible for company sponsored benefits, including medical, dental and vision insurance, 401(k), paid leave, tuition reimbursement, and a variety of other discounts and perks. Learn more about the benefits offered by NBCUniversal by visiting the Benefits page of the Careers website.

Salary Range: $155,000 - $175,000

How to Submit an Application:

After reading and knowing the criteria and minimum requirements for qualifications that have been explained from the Sr. Site Reliability Engineer (SRE) job info - NBCUniversal Englewood Cliffs, NJ above, thus jobseekers who feel they have not met the requirements including education, age, etc. and really feel interested in the latest job vacancies Sr. Site Reliability Engineer (SRE) job info - NBCUniversal Englewood Cliffs, NJ in 2025-07-25 above, should as soon as possible complete and compile a job application file such as a job application letter, CV or curriculum vitae, FC diploma and transcripts and other supplements as described above, in order to register and take part in the admission selection for new employees in the company referred to, sent via form this bottom.

NBCUniversal | Englewood Cliffs, NJ US 2025-07-25 | Closed Date : 2025-08-24

Submit Application

Sr. Site Reliability Engineer (SRE)

How to Submit an Application:

Recommendations Jobs