A ‘Chaos’ is structured and contained only if an organization has the matured processes and capabilities across,
Culture of experimentation and learning – psychological safety
Risk Management and Governance
Monitoring and Obervability
Incidence response and resolution practices and capabilities
The Practice
Set the baseline – ‘One that can’t be measured can’t be improved – Six Sigma’
We establish a baseline of “current” and articulate “How the system must operate under normal conditions”. That is, we are defining “What is ‘Normal’”.
Form a hypothesis
Think of test, i.e., define the scope of a test, which must be specific and not too broad or generic. For example, it could be “What will happen if a large traffic spike occurs”, or “What if the IaaC provisioning fails (at a specific level)”.
Conduct the Test
The experiment could be in pre-production or production, based on organizational maturity, and being governed through the entire life of the experiments through various measures and metrics automatically.
Evaluate the Results
Evaluate the metrics during and post experiment and decide how has the hypothesis faired and determine the weak points to be strengthened.