Gustavo Franco // SRE
Site Reliability Engineering

About ME

"SRE is what you get when you treat operations as if it’s a software problem.", Ben Treynor, VP SRE at Google.


SRE, DevOps and Dev

SRE is an implementation of DevOps.

Google "dev" as in Software Engineers focusing on feature [dev]elopment (as opposed to security, privacy or reliability) work "de facto" as DevOps. In the absence of SRE support, devs at Google get to operate their systems themselves. SREs are generally in high demand.

About Me

With 15 years of SRE related experience. I've been technically leading and managing organizations at Google for a total of 11 years. 

My teams have been responsible for the reliability and scalability of product launches such as Cloud Identity, Compute Engine, Hangouts Chat and Hangouts Meet. I've also been responsible for teams working on cluster turnup automation, incident management (processes and systems) and chaos engineering services. I’d rather use the term disaster recovery testing or reliability testing instead. After all, we want to prevent chaos.

Keeping me Busy

I'm a member of the CRE team within Google SRE. CREs are responsible for bringing SRE best practices including processes and software to the world. While we've been prioritizing working with Google Cloud Platform customers, we are also very much interested in enabling SRE everywhere.

I'd highly recommend reading the second SRE book, The Site Reliability Workbook which contains concrete implementation examples. Google SRE's first book was a broader reference title without as many concrete examples of how to implement our recommendations.