Netflix is pioneering a novel approach to system resilience, dubbed ‘Chaos Engineering.’ This method involves intentionally introducing failures into a system to ensure it can withstand unexpected disruptions. Netflix has developed a tool called ‘Chaos Monkey,’ which simulates these disruptions, helping to identify and rectify weaknesses in their infrastructure.

The company believes that this approach is crucial in the digital age, where system failures can have significant consequences. By intentionally creating failures, Netflix can ensure their systems are robust and can recover quickly from any disruptions.

The ‘Chaos Monkey’ tool is freely available, and Netflix encourages other companies to use it. They believe that sharing this tool will help improve system resilience across the industry.

Netflix is also working on a ‘Chaos Kong’ tool, which simulates an outage of an entire Amazon Web Service (AWS) region. This tool will help Netflix ensure they can continue to operate even if a significant portion of their infrastructure fails.

The company’s commitment to ‘Chaos Engineering’ demonstrates a proactive approach to system resilience. By intentionally causing failures, Netflix can ensure their systems are robust and can recover quickly from any disruptions. This strategy is a significant departure from traditional approaches, which typically involve reacting to failures after they occur.

Go to source article: http://readwrite.com/2014/09/17/netflix-chaos-engineering-for-everyone