Gustavo Franco // SRE
Gustavo Franco // SRE
Site Reliability Engineering

About ME

"SRE is what you get when you treat operations as if it’s a software problem.", Ben Treynor, VP SRE at Google.


SRE, DevOps and Dev

Google SRE is a concrete and opinionated implementation of DevOps. Since there can be multiple ways to implement DevOps, many of these are very different than SRE esp. Google SRE where it was initiated.

Coincidentally, Google "dev" as in Software Engineers focusing on feature [dev]elopment (as opposed to security, privacy or reliability) work "de facto" as DevOps. In the absence of SRE support, devs at Google get to operate their systems themselves. SREs are generally in high demand.

About Me

With 15 years of SRE related experience. I've been technically leading and managing organizations at Google for a total of 11 years. 

My teams have been responsible for the reliability and scalability of product launches such as Cloud Identity, Compute Engine, Hangouts Chat and Hangouts Meet. I've also been responsible for teams working on cluster turnup automation, incident management (processes and systems) and chaos engineering services. I’d rather use the term disaster recovery testing or reliability testing instead. We want to avoid chaos as opposed to causing it.

Keeping me Busy

I'm a member of the CRE team within Google SRE. CREs are responsible for bringing SRE best practices including processes and software to the world. While we've been prioritizing working with Google Cloud Platform customers, we are also very much interested in enabling SRE everywhere.

I'd highly recommend reading the second SRE book, The Site Reliability Workbook which contains concrete implementation examples. Google SRE's first book was a broader reference title without as many concrete examples of how to implement our recommendations.