Important lessons and valuable experiences while developing Shoreline's Azure Agent.
DevOps leaders can apply infrastructure as code lessons and tooling to production ops, use solutions like Terraform + Shoreline to automate repeatable tasks, and make hero-level institutional knowledge accessible to anyone.
Learn how to filter, sort, analyze, and act upon remediation incidents with Shoreline Events.
Jupyter Notebooks for DevOps
Learn how to resolve Kubernetes DNS issues with Shoreline's CoreDNS Op Pack.
For our Intern Spotlight series, we’ll showcase the work of a summer intern at Shoreline with a technical deep dive.
Shoreline’s Argo Op Pack is purpose-built to remediate IP exhaustion related to Argo workflows automatically.
Execute 1,000s of alarms on box, with 1 second of delay.
Learn how to rapidly debug and resolve issues across your entire infrastructure.
Anurag had the opportunity to chat with Jeff Myerson on his podcast, Software Engineering Daily, today to discuss why we're still in the "dial-up" age of cloud computing, how he thought about strategy when he was at AWS, and the operational pain that consumed his team's time at AWS and served as the inspiration for starting Shoreline.
The increasing fleet size and complexity of production environments has created an explosion in on-call incidents. You can dramatically reduce on-call fatigue and improve availability using Shoreline’s incident automation platform.
Shoreline makes it easy to collect diagnostic information when you're doing a root-cause analysis of an issue. This example shows how to automatically capture debugging information for slow Java garbage collection and then automatically bounce the process to alleviate customer pain.
Niall Murphy, former SRE at Google and Microsoft and author of the O'Reilly book, Site Reliability Engineering, shares his experience of using Shoreline's Incident Automation Platform.
Shoreline’s metrics team has machine learning technologies from Google, JAX and XLA, to accelerate metric query and data analysis so SREs can run ad hoc queries in real-time.
DevOps automation is the process of getting machines to handle repetitive work in the software deployment and operations lifecycle so that operators can deploy iterative updates faster and their systems operate more reliably.
Anurag Gupta spoke at the CTO Summit on Reliability to share his new talk “Why systems fail and what you can do about it.” The talk covers four categories of system failures and mitigation approaches for each based on Anurag’s background at AWS running analytic and database services.
This guide shows you how to automatically decommission Kubernetes worker nodes and replace them with a new host with a Shoreline Op Pack
Yesterday, Shoreline’s founder and CEO, Anurag Gupta, talked with Daniel Bryant on the InfoQ podcast about responding to incidents at AWS, GitOps in Day 1 vs Day 2 Operations, and what it takes to build the capability to automatically remediate issues.
SRE and Backend Engineering have a lot of overlap, and you can swap between roles relatively easily. This post addresses the pros and cons of leaving SRE for Backend work.
The terms runbooks and playbooks are often used interchangeably by SREs. They are similar, but this post explains the differences so you can pair the two together as part of your operational excellence.
Creating your runbooks is only the first step. Automating runbook execution to run based on an alarm, without human intervention, is the real goal.
Every iteration of automating and streamlining operational procedures has been advertised as the cure-all solution to every ailment, including resolving incidents. While declarative infrastructure, programmatic deployments, and repeatable automations are desirable, they aren’t capable