Production release checklist

Production release checklist

Release to production is owned by the person who owns the system. The code cannot go into production without the knowledge of the system owner. In fact, the system owner should be the one coordinating release of code to production.

Types of people involved:

  • System(or micro-service) owner
  • Developer – who’s code is going into production
  • SRE – Site reliability engineer
  • Engineering head/VP

Release process contains the following stages

  • Verify the code being release
  • Deploy the code
  • Monitor production
  • Failed production deploy

People who need to be present during production release:

Verify codeDeployMonitor
Failed deploy
System ownerneededneededneededneeded
SREneeded for deployments where system owner lacks confidenceneeded for deployments where system owner lacks confidenceneeded
Engineering headneeded if system owner is not able to verify codeif SRE lacks confidence if SRE lacks confidence needed

Some points about releasing code to production:

  • The developer who’s code is going into production needs to be reviewed by you before releasing it into production.
  • If you don’t completely grasp what is going into the system, take the time to understand what is going on. It’s your system. You cant be letting random stuff into a system that you own.
  • In case you don’t have time to learn the unknown code before release, make sure you have someone who does with you to monitor the release. And then make sure you learn it.
  • The developer whose code is being released needs to be present for the release process.
  • Monitor the live production server for 30 mins to an hour after release to make sure there are no issues.

In case of production failure:

  • If you are not clear what is the reason for production failure in 30 secs,
    • escalate to more senior folks
      • Call people up and make sure they are aware you are facing production problems.
    • pro-actively communicate with customers/clients that we are facing issues.
    • then, dive deep into trying to find a fix for the problem
  • If you are super clear the cause and correction,
    • Fix immediately
    • Escalate to more senior folks
      • Call via phone. Make sure they are aware.
      • SREs and engineering heads can choose to stay alert if they feel more attention is required.
    • Create and submit a failure report

Alex J V
Posted on:
Post author

Leave a comment

Your email address will not be published. Required fields are marked *