What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
...Caring Senior Service is Hiring Caregivers & PCAs Immediate Start for 12-Hours a Week Are you looking to make a difference in the lives of seniors while working for an employer that truly values its caregivers? Join Caring Senior Service, where we are dedicated to...
...Were looking for a American Airlines Flight Attendant to perform daily responsibilities with dedication. Stay adaptable in a dynamic, fast-paced environment. Work with your team to maintain efficiency and service quality. Perks include competitive pay, flexible schedules...
Job Description "A Day in the Life" video Opportunities with Genoa Healthcare. A career with Genoa Healthcare means you're part of a collaborative effort to serve behavioral health and addiction treatment communities. We do more than just provide medicine: we change...
...Job Description Job Description High-Earning Hybrid Role: Field Sales Manager at Bold Brothers Roofing Are you a versatile professional with a passion for sales, project management, and hands-on field work? Bold Brothers Roofing is looking for a dynamic Field...
...Operated Mid-Atlantic General Contractor is looking for a Project Manager for their Raleigh, NC team. This firm delivers ground-up and... ...interiors, retail, office, and specialty facilities. A Construction Project Manager oversees all phases of a project from...