When exploring how to run your organization in the cloud, no matter which path you take, having a team of site reliability engineers (SREs) to support your journey is essential in finding success.
But, there is a huge difference between an average SRE and a great one. With an average team of SREs, you may experience time-consuming IT failures that affect the entire business. With a great team, tools and processes will be in place so that fixes can be implemented before the rest of the business even realizes there was an issue.
If you want to find great SREs, you need to make sure you set clear expectations and provide as much detail as possible when describing who you want to fill your open role.
In this blog, we will explore best practices when building a clear, enticing, and effective SRE job description, to help SRE leaders improve how they recruit team members.
The Importance of Level-Setting
First, some level-setting: It’s tough to be an SRE. It’s also tough for employers to find an excellent one–and even harder to fill out a full team. It can be tempting to throw a wide net to find engineers to hire for an SRE role, but not every engineer makes a good SRE.
While dealing with the mild chaos that is on-call operations, SREs constantly work in complex environments requiring them to learn as they go. SREs also tend to face criticism when the business’s service or product goes down and don’t receive the credit they deserve when it stays up.
A great SRE must be dedicated to their role and believe in the true value of what they do. Folks who pursue SRE work need to be just as comfortable responding to a ticket relayed via text on a holiday weekend as they are writing or updating code in the middle of the work day.
When searching for an SRE candidate, employers should keep in mind that SREs need a positive outlook on the role, clear knowledge from their past career experiences, and a variety of technical skills to succeed in the job.
Don’t Narrow the Field Too Much
Employers should avoid narrowing the field by asking SRE candidates if they prefer to use specific tools or lean too heavily on individual levels of cloud versus on-prem experience.
There are also certain phrases like “IT issues” and “solve problems in the fastest way possible,” that turn SREs and other candidates with transferable skills away.
Experienced IT professionals don’t want the sole purpose of their role to be fixing tedious and repetitive IT issues, and they don’t want to feel pressured to take the fastest route to solve a problem (as it usually won’t fully get the job done).
The overarching goal for SRE recruiters should be to weed out folks that truly don't know what they’re doing and folks who feel that there is only one tool to rule them all.
Now that you have an idea of setting expectations within your job description and how to reach enough potential applicants, let’s look at an example of a strong, associate-level SRE job description.
The Ideal Job Description
Job: Site Reliability Engineer (full-time, hybrid, or remote)
Experience in the field: At least 1-2 years in an on-call or shift-style SRE or DevOps role
Our tech stack: (Include all tools and platforms an SRE will be required to use in this role.)
- Fast learner
- Enjoys fixing critical issues and debugging at scale
- Ability to prioritize and re-prioritize depending on workload
- Proven experience handling incidents impacting end-users
- Prepared to help the team succeed as a whole
- Values providing a great user experience and is prepared to help customers achieve it
What you’ll do
- Contribute to design reviews
- Write and produce code
- Respond whenever a disruption in the end-user experience calls for it
- Become fluent in our offering(s) and how to efficiently fix potential disruptions
- Actively seek opportunities to eliminate future incidents through improved approaches, including automation for relevant processes to enhance the work of humans
What you’ll bring to day one on the job
- A thirst for learning new tools and expanding SRE capabilities
- Familiarity with Kubernetes and other container orchestration systems
- Interest in taking ownership over CI/CD incident response needs for debugging production issues
- Goals of building efficient problem-solving skills and documenting said skills for others to leverage
This example of an SRE job description should help you attract the right candidates to join your team. But even better than just attracting candidates, what if there was a tool you could provide to get them to really want to work for you?
Make the SREs come to you
By leveraging vendors who specialize in automation solutions for SRE teams (like Shoreline), you can provide access to ready-to-use solutions and pre-built automations that reduce development costs and minimize toil.
Be on the lookout for a future blog post where we will explore what kind of questions you should ask during an SRE interview and provide some possible answers the interviewee could (or should) respond with.