ClickSoftware – Great Case of an AWS Cloud Adoption: Part 1, Operations

imageOver the last year I had endless conversations with companies that strive to adopt the cloud – specifically the Amazon cloud. Of those I met, I can say that ClickSoftware is one of the leading traditional ISVs that managed to adopt the cloud. The Amazon cloud is with no doubt the most advanced cloud computing facility, leading the market. In my previous job I was involved in the ClickSoftware cloud initiative, from decision making with regards to Amazon cloud all the way to taking the initial steps to educate and support the company’s different parties in providing an On-Demand SaaS offering.

ClickSoftware provides a comprehensive range of workforce management software solutions designed to help service organizations face head-on the challenges of inefficiency. With maximizing the utilization of your resources is the lifeblood of your service organization and has developed a suite of solutions and services that reach the heart of the problem.
The leaders of the ClickSoftware cloud offering are Udi Keidar, VP of Cloud Services, and Igal Korach, Director of Cloud Services. Thanks to their cooperation, I can bring you this series of posts focusing on the ClickSoftware Cloud adoption process. The posts will include interviews with the team and actually delve into the environment deployment. This is the first one.
My first interview was with Mr. Korach, discussing operation and his “cloud day” as the senior DevOps of this cloud service leader. ClickSoftware started utilizing  the AWS cloud for non-critical services such as demos and proof of concepts for new opportunities, customers’ training and development staging environments. According to Mr. Korach, there is a clear separation between the company’s internal needs on the Amazon cloud and the cloud service offering. The former are consumed by the specific provisioned and relevant department (Sales, Professional Services, R&D) in the organization, and are managed and monitored by the company IT department. The latter (i.e., the company’s Cloud services offerings) needs are deployed, managed and monitored by Korach’s DevOps team.
The first project, set up by the new cloud team more than two years ago, helped the training department provision the AWS relevant resources only for the specific time when the training session takes place hence paying only for what they need and use. The system supports scheduled fully automatic provisioning and deployment of out-of-the box application environment, ready in a press of a button. With this project, Korach’s team gained a clear understanding of how the AWS cloud works and got to put their hands on the “Amazon cloud irons”.
The cloud trend has brought more and more of ClickSoftware’s potential customers to ask for a cloud offering. The veteran ClickSoftware customers who ask to upgrade are being educated on the company’s cloud offering. ClickSoftware, a real innovative company which I consider as an early adopter, finds itself in the position of providing it all out-of-the- box for its customers, including fast delivery and end-to-end management of the application environment. The ClickSoftware story exemplifies a large enterprise looking to innovate its application offering with cloud adoption.

Our objective is first and foremost customer satisfaction. In the cloud you should be focused on your customers’ needs and make sure that every single thing you develop is in order to make you customers happy,” said Korach.

> > > The AWS Cloud Navy Seal
imageKorach’s team includes two cloud specialists, a four-member NOC team. Korach also includes in his team the outsource system integrators that help to accelerate development and adoption. Currently, he employs eMind to support advanced deployment projects and ERA are in charge of the security monitoring and deployment.
His internal team is responsible for ongoing administration of the infrastructure, application components provisioning, and optimization projects that extend automation of processes to support efficient customer on-boarding.
> > > DevOps in Work: Version Deployment Process
imageThe company core R&D members don’t have access to the cloud servers and the deployment is fully managed by Korach’s team. New deployment cycle kick off with R&D publishes and forward a new application version to the cloud team. Once installed on the staging cloud environment, the environment is handed to a specific “Cloud On-Boarding” team that is in charge of certifying the new version for product. It was interesting to learn that there are specific features (drops) that are available only on the cloud offering that is delivered by this special cloud team. Once the deployment is approved for release, the cloud customers are notified of the upcoming version via their cloud operations dashboard.
Pic creds:
> > > Cost & Elasticity
ClickSoftware’s enterprise customers that enjoy the company cloud offering get their “Service Optimization” servers on the Amazon cloud, aligned with their specific current real-time us

“The customer behavior patterns are known and we are able today to forecast the capacity needed. There are specific instances that can never go down and auto-scaling will never be relevant for them. You need to have a clear understanding of your own application needs.” Noted Korach

Interestingly enough, I found that a custom auto-scaling was developed by Korach’s team and is not based on standard parameters such as CPU utilization peaks but instead on their own custom application requirements. Korach estimates that the current auto-scale processes generate cost savings of about 20-30% in comparison to the “always on” option. No doubt those ISVs have to re-calculate their production operations costs and present a new ROI for their paying customers. Korach mentioned recently they moved some of the instances to be reserved following Newvem analytics insights recommendations. He also added that Newvem’s weekly email report making him confident with what he knows about the amount of resources and the overall costs of the company AWS production environment.
> > > Security
Currently Korach’s team maintains a weekly deployment, making sure that no one besides his team touches the sacred production environment by implementing strict procedures. Every team member that is permitted to log in to the production environment must sign a non-disclosure agreement.

“Among all considerations, we choose Amazon cloud because it is compliant with the leading security and compliance standards. However, we recognize the importance of our own liability and we want our deployment and internal processes to be certified as well.

ClickSoftware has over 200 customers from all over the world, spanning many different vertical industries such as Telco, Electric and Gas companies. The leading online critical services are examined carefully by the potential customers including making sure that it holds the relevant compliance certification. Today, Korach’s team invests heavily in the security of their offering—they are in the advanced stages towards ISO27001 certification of the cloud offering.
> > > SLA, DRP
Korach’s approach is based on an important tenet – that ClickSoftware (and not its service providers and infrastructure vendors) bears sole responsibility for the company’s cloud service performance and availability. The company’s cloud customers have a high level view into the operations, including metrics (real time SLA KPIs, amount of current licensed users and more), using a dashboard application that was built upon the

“You must make sure to set your customers’ expectation with regards to your SLA. There will always be “crises” such as outages and performances issues yet you need to make sure to share the status with your customers in real time.”

To support their customers’ mission critical environments, the ClickSoftware cloud team commits to 99.9% uptime (Maximum 10.1 minutes a week!). To achieve this goal, Korach’s team implemented cross AWS region DR.

“We are not holding any running resources in other regions. Basically, with a press of a button I can replicate all our AWS environments between regions… it took us two week to develop ” noted Korach with pride

It was great to learn that generating that option required only minimal efforts from the company DevOps team while the result is pretty great. However I assume that in other application cases (such as ecommerce) the cloud user must have running resources in place in order to avoid even a slight downtime.
What are the tools you use?
Once we got to this point I was interested to learn the amount of external applications that the team utilizes in order to presents an efficient and scalable operation. Here is a list of services that support the operations:
QlikView –  SLA and performance Dashboard for the single customer.Integrated by DataMind.
Newvem – Watching and reporting on costs, security issues. Integrated by DataMind.
OpManager – provided by ManageEngine (Zoho company) to monitor the instances and network performance.
Snort – (IDS/IPS) All traffic goes through a network intrusion prevention and detection system.
Telerik – Deep understanding of the application process performance. Support monitoring and of critical page browsing.
CA Site Minder – User authentication supports Single Sign On (SSO) using Active Directory integration and enables customers’ access and users provisioning .
> > > Future

“If Amazon becomes too costly, we are open to other options. However, the alternative must comply with our high standards. Currently Amazon is the leader with regards to the strict compliance standards it supports, such as ISO and FISMA.”
Discussing the next steps for the future, Korach mentioned that the main focus is on optimizing costs, focusing on the most expensive cloud infrastructure components and trying to find alternatives.
Korach routinely investigates the cloud operation costs (once every two weeks) to pinpoint areas and services that are under controlled with the goal of finding ways for better control and optimization, including utilization of additional advanced AWS services and better automation and monitoring.

Related posts