Skip to main content
First American Financial Corporation
Oscar M., Production Coordinator

Oscar M. Production Coordinator

Search Jobs

(Remote) Senior Observability and Monitoring Engineer

Santa Ana, California-Remote; Boise, Idaho-Remote; Charlotte, North Carolina-Remote; Chicago, Illinois-Remote; Dallas, Texas-Remote; Des Moines, Iowa-Remote; Fort Myers, Florida-Remote; Houston, Texas-Remote; Irvine, California-Remote; Jacksonville, Florida-Remote; Madison, Wisconsin-Remote; Minneapolis, Minnesota-Remote; New York, New York-Remote; Phoenix, Arizona-Remote; Portland, Oregon-Remote; Sacramento, California-Remote; Salt Lake City, Utah-Remote; San Antonio, Texas-Remote; San Francisco, California-Remote; Seattle, Washington-Remote; South Orange, New Jersey

Apply Now
Job ID R040352 Date posted Oct. 11, 2023 Category Information Technology Employment Type Full Time

Who We Are

Join a team that puts its People First! Since 1889, First American (NYSE: FAF) has held an unwavering belief in its people. They are passionate about what they do, and we are equally passionate about fostering an environment where all feel welcome, supported, and empowered to be innovative and reach their full potential. Our inclusive, people-first culture has earned our company numerous accolades, including being named to the Fortune 100 Best Companies to Work For® list for eight consecutive years. We have also earned awards as a best place to work for women, diversity and LGBTQ+ employees, and have been included on more than 50 regional best places to work lists. First American will always strive to be a great place to work, for all. For more information, please visit www.careers.firstam.com.

What We Do

Job Summary

** Remote Work Welcome **

First American is seeking a Senior Observability and Monitoring Engineer who will play a pivotal role in ensuring reliability, robustness, and performance of First American's mission-critical software systems. This transformative role focuses on implementing, managing, and optimizing observability solutions to gain deep insight into system behavior, troubleshoot issues proactively, and enhance overall operational efficiency. The ideal candidate will exhibit a growth and automation mindset. 

The Opportunity

  • Measure application health and performance against baselines to anticipate failures. 

  • Define service level objectives and supporting service level indicators to capture baselines.

  • Automate application observability and reporting wherever practical.

  • Improve predictive incident response, utilizing automated solutions for issue resolution when applicable, having a well-defined process flow for human intervention.

  • Instill repeatable patterns across First American portfolio of applications ensuring consistent practices are in place.

  • Influence and train software teams on observability and instrumentation, including adopting observability frameworks. Identify key processes that involve toil-based activities and develop plans to remediate through automation.

  • Document and implement incident response processes and procedures to drive consistent mitigation and remediation in case of failure.

  • Address application architecture needs, pushing towards solutions that are fault tolerant, resilient, and easy to manage.

What You’ll Bring 

  • Proven experience with observability tooling, application performance monitoring, infrastructure monitoring and log management. 

  • Proficient in configuring alerting rules and automated responses to trigger actions when predefined thresholds or anomalies are detected. 

  • Scripting and automation skills for customizing and extending observability solutions.  

  • Strong knowledge of cloud platforms and container orchestration.

  • Skilled in defining service level objectives, measuring service level indicators, and setting up error budgets. 

  • Strong understanding of SRE practices: incident response, change/release management, capacity planning, infrastructure automation, elastic environments, chaos engineering and blameless postmortems. 

  • Excellent problem-solving skills, attention to detail, and strong communication abilities. 

Technology Stack: 

  • Cloud Computing Platform: AWS (Lambda, EC2, ECS, EKS, Fargate, RDS, S3, Dynamo DB, SQS)

  • Observability: Open Telemetry, AppDynamics, Grafana, ELK Stack, AWS CloudWatch and X-Ray

  • Programming/Scripting: C# .NET, PowerShell, Python, YAML, BASH

  • Code Repos: Azure Repos, GitHub

  • Infrastructure as code: Terraform, Ansible

Pay Range: $108,240 - $191,125 Annually

This hiring range is a reasonable estimate of the base pay range for this position at the time of posting.  Pay is based on a number of factors which may include job-related knowledge, skills, experience, business requirements and geographic location.

#tcorpit

#techreferral

#LI-JC2

What We Offer

By choice, we don’t simply accept individuality – we embrace it, we support it, and we thrive on it! Our People First Culture celebrates diversity, equity and inclusion not simply because it’s the right thing to do, but also because it’s the key to our success. We are proud to foster an authentic and inclusive workplace For All. You are free and encouraged to bring your entire, unique self to work. First American is an equal opportunity employer in every sense of the term.

Based on eligibility, First American offers a comprehensive benefits package including medical, dental, vision, 401k, PTO/paid sick leave and other great benefits like an employee stock purchase plan.

Apply Now

Related Content