Post a Remote Job — $5
Back to result

UKG is seeking a Site Reliability Engineer (SRE) with a robust and diverse background in Software Engineering, Software Design, and Systems Architecture with a focus on automation, reliability, and system integration. Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. An SRE ensures that UKG’s services—both our internally critical and our externally visible systems—have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance. At UKG, our Site Reliability Engineers (SRE) come from both development and operations backgrounds with a common passion for running products at scale in production. Our SRE engineers are always seeking to understand how our systems work end to end without boundaries.


Primary/Essential Duties and Key Responsibilities:



  • Engage in and improve the whole lifecycle of services from conception to inception, including: system design, build, and deployment

  • Define and implement standards and best practices related to: System Architecture, Deployment, metrics, operational tasks

  • Support services through activities such as monitoring availability, system health, and incident response

  • Improve system performance, application delivery, and efficiency through automation, process refinement, post mortem reviews, and in-depth configuration analysis

  • Engage in Communications across all areas of the organization


Required Qualifications:



  • Knowledge of resilient systems as well as anti-fragility design patterns

  • Knowledge of distributed systems

  • Knowledge of service-oriented architectures

  • Knowledge of microservice architectures

  • Experience in one or more of the following: Python, Go, Angular, .Net Core (C#), JAVA, Node.js

  • Experience with Unix/Linux operating systems internals and administration (e.g., filesystems, inodes, system calls) and networking (e.g., TCP/IP, routing, network topologies).

  • Experience with containerization, such as Kubernetes, bosh, docker

  • Experience with Configuration Management (Puppet/Chef/Ansible)

  • Ability to adapt quickly to changing priorities

  • Ability and willingness to work evenings / nights on occasion (Participate in on-call rotation)


Preferred Qualifications:



  • Experience with algorithms, data structures, complexity analysis, and software design.

  • Experience with OpenStack

  • Experience with administration of ElasticSearch, MySQL, MongoDB, RabbitMQ, Redis, in production environment a PLUS

  • Experience with Amazon Web Services or Google Cloud Platform Products

  • Exposure to writing SQL scripts preferred

  • Technical writing

  • Communication

  • Auditing

  • Development Background

  • Gremlin/Chaos Engineering Tools


Check out how we give our employees the chance to work on whatever project they want for 48 hours! https://youtu.be/2Aw55CP1IO8  


Typical Interview Process:



  • If your application is selected, a Talent Acquisition Team Member will reach out to schedule a phone screen with them.

  • If selected to move forward, you will complete a HackerRank Coding Assessment.

  • If you pass, you will either move forward to a technical phone call for an additional screening, OR directly to an onsite interview.

  • Offer stage.

Be the first to see new remote jobs