The Butterfly Effect of Uncontrolled Cloud Operations Change

The rapid development of SaaS applications in the cloud means more and more people, features and open source applications are combining together into one huge platform. Due to the immediacy of software development, a major challenge for software vendors is to audit changes in their application environments, such as system configurations. The problem is that any `change performed at any stage of an operation can impact other areas of an application. This digital butterfly effect can be very detrimental in today’s fast-paced, on-demand environment. It becomes even more destructive when these changes can’t be found, due to inadequate tracking. Failure to track changes, can greatly affect a company’s revenues and success.

A Real-Life Story

A company with a large scale SaaS platform that provides to large e-commerce, quality buyers started seeing that their revenues dropped about 12% from the previous day. The DevOps team started checking all of the platform components and logs to check for malfunctions. After 12 exhausting hours, they started searching for changes that were made over the past 36 hours and roll back these changes. Eventually, they found out that one of the analyst experts made a change to one of the campaigns.

Triggers for Failure

In today’s market, everything has to be as fast as possible. The process of downloading a program and running it is outdated. Instead, users expect instantaneous reactions to their demands. Companies that can’t meet demands and the new requirements of immediate results put their business at risk.
Behind the scenes, it takes a great deal of work to generate these instant results, on-demand. Ultimately, everyone involved in operations, production and even marketing requires the flexibility to influence application behavior according  to user requests. In order to do so, these parties need to have a certain level of access to configuration data in order to be able to perform changes in the application modules and impact application behavior. While having company-wide access to application data and configuration within a company has its benefits, it can also cause a great deal of damage if not managed correctly.

The 3 Scenarios

The following are three scenarios that demonstrate the pitfalls of inadequate monitoring.

1 – Marketing Employees

Unlike the rule-oriented nature of machines, humans are free to do as they see fit at any point. A member of the company in the marketing team who performs a change in a marketing campaign can unknowingly upset the apple cart by creating malfunctions in one or more units across the board. Therefore, proper documentation of any changes that were made will allow other workers to track them and fix bugs or malfunctions in their own work environment. A method or a feature that searches for all changes that were made in the last hour, for example, would prevent a lot of headaches when trying to discover the source of a problem in the system.

2 – Support Employees

In another likely scenario, a member of the support team also runs the risk of upsetting a company’s balance and igniting the digital butterfly effect within SaaS operations. Support queries, which work in sync with production, are another likely cause of production malfunction on the other end of the SaaS business. Similarly to the marketing team example, an audit of changes made to the system would allow developers and employees in different units to isolate the problem and ultimately fix it.

3 – The Developer

The final scenario outlines fixing a bug in the system, on-demand. In order to keep the SssS application up and running, bugs must be caught and dealt with within a very short time frame. Emergency patches are implemented in specific modules to fix glitches in the system as quickly as possible, but can, again, affect other modules and cause them to malfunction. An audit of emergency patches would be invaluable to members of other units trying to determine the cause of their malfunction. Aside from tightly monitoring cloud operations change, implementing a DevOps culture, including automated testing, can greatly serve the company.

What Do We Need?

Returning to our real life story, the change that was made caused a butterfly effect that in turn caused a single component to ignore the campaign traffic and lose all the traffic and revenues for the campaign. The DevOps engineers were working according to the protocols, searched for exceptions and errors, looked for any kind of alerts in the monitor tools and business graphs and then started to rollback the changes.
When tracking issues, one of the first steps should include validating and analyzing the list of recent configuration changes. A tool that marks changes that are made to the platform can save time, money and trust in critical events. Do you know of a good one? Let me know.
Ultimately, the need for transparency, sharing and collaboration within the company is vital to its performance across the board. The ability to audit and track changes would solve many employees a great deal of heartache. It is not enough to simply record changes. Software vendors also require the ability to fast roll back changes in order to restore the system to good working order. Implementation of these features would ensure a company’s pathway to success and is the key to satisfied users.

Roy Brihand, CTO & Co-founder of MoovingON. MoovingON’s mission is to provide an operational effectiveness to enterprises and startups for every application and service at all times.  With constantly growing data magnitudes and IT-oriented challenges, an accurate analysis of business-wise service behavior, is of high importance to any IT management, especially in the cloud environment.

Related posts