This site uses cookies. Some are essential to make our site work; others help us improve the user experience.
By using the site, you consent to the use of these cookies.
Read our
Privacy Policy
to learn more.
I Agree
Sign Up
|
Login
MENU
Home
Search
Search Jobs
Search Resumes
Post
Post Job
Post Resume
Browse
Employers
Articles
Sign Up
Login
Post Job
Post Resume
Site Reliability Engineer
Job Posted
2025-04-15
hireVouch
Canada
Category
Information Technology
Apply for Job
Remote
Job Description
Senior Site Reliability Engineer
Position Overview
We are a mid-size entertainment company delivering captivating digital experiences to millions of customers worldwide. Our IT organization powers the infrastructure and systems behind our cutting-edge payroll and accounting applications. We are seeking a Senior Site Reliability Engineer (SRE) to enhance the performance, scalability, and reliability of our infrastructure and help bring our next-generation solutions to life.
As a Senior Site Reliability Engineer, you will ensure the reliability and scalability of our Infrastructure. You will leverage your skills in cloud technologies, infrastructure operations, Kubernetes orchestration, application development, database administration, Oracle E-Business Suite (EBS), and maintain robust infrastructure that supports business-critical platforms. This role will also involve collaboration with cross-functional teams to implement engineering best practices, monitoring and automation while exploring opportunities to enhance operations with emerging AI technologies.
Key Responsibilities
Infrastructure as Code:
Develop and maintain automated infrastructure provisioning with
Terraform
for hybrid cloud environments.
Cloud Expertise:
Design and manage robust multi-cloud environments using
AWS
and
Azure
, with a focus on optimizing Kubernetes clusters (
EKS
and
AKS
).
Oracle E-Business Suite (EBS):
Support, optimize, and ensure the reliability of
Oracle EBS
deployments, integrating it with other IT systems to maintain smooth business operations.
Operating Systems Management:
Administer and optimize
Linux (RHEL)
and
Windows Server
environments to ensure high availability and security.
Application Performance:
Collaborate with development teams to enhance applications built on
React, Node.js, .NET, C#, and Java
for reliability and performance.
Networking & Security:
Leverage
advanced AWS networking skills
to implement secure and scalable architectures, including VPC design, load balancing, and advanced routing.
Database Optimization:
Monitor and tune database performance and manage relational and NoSQL databases to support high-traffic entertainment services.
Monitoring & Troubleshooting:
Implement observability tools and proactively address performance issues using platforms like Prometheus, Grafana, Splunk, or CloudWatch.
Incident Response & Automation:
Lead incident management, postmortem reviews, and automation efforts to prevent recurrence and improve overall resilience.
Cross-Team Collaboration:
Work closely with developers, system administrators, and security teams to align infrastructure needs with business and technical goals.
Qualifications
Required Technical Skills
Expert-level knowledge of
Terraform
for infrastructure automation.
Hands-on experience managing
Azure Kubernetes Services (AKS)
and
AWS Kubernetes Services (EKS)
clusters.
Advanced knowledge of
AWS
and
Azure
cloud ecosystems, including networking, security, and cost optimization.
Proficiency in
Linux (RHEL)
and
Windows Server
environments.
Proven experience supporting and optimizing
Oracle E-Business Suite (EBS)
in a complex IT environment.
Proven application development experience with
React, Node.js, .NET, C#, and Java
.
Strong database administration and performance-tuning skills for both relational (e.g., MySQL, PostgreSQL, MSSQL) and NoSQL (e.g., DynamoDB, MongoDB) databases.
Advanced networking skills, including
VPC design, transit gateways, and hybrid cloud connectivity
.
Expertise in monitoring, logging, and troubleshooting tools like
NewRelic, Prometheus, Grafana, Splunk, CloudWatch
, and others.
Desired Soft Skills
Strategic thinking to design scalable and reliable systems for high-demand entertainment platforms.
Strong collaboration and mentorship abilities to guide teams in adopting SRE best practices.
Excellent communication skills to work with technical and non-technical stakeholders.
Adaptability to a fast-paced, dynamic environment.
Nice-to-Have Skills
Experience with
AI-powered Operations (AIOps)
to automate troubleshooting and predictive maintenance.
Experience in high-traffic or live-streaming applications.
Certifications such as AWS Certified Solutions Architect or Azure Solutions Architect Expert.
Familiarity with industry-specific compliance standards, e.g., SOC 2, GDPR.