Managing Secure Cloud Operations

May 2017, Journal of Information Security Research sponsored by the State Information Center planned a special “Microsoft China” on global-domestic partnership in information security. The journal invited Tony Tang (General Manager, Technical Operation of 21Vianet Blue Cloud) and other industry experts to submit papers. Tony Tang, in his paper, talks about how to build a world-class secure cloud platform “Managing Secure Cloud Operations”.

Background
In January of 2015, the State Council of China issued the “Opinions on Promoting Cloud Computing Innovation and Information Technology Industry” (hereinafter referred to as the “opinions”), which emphasized that the Chinese government will continue promoting cloud computing innovations and thus boosting domestic economy. Cloud computing has been growing rapidly in China and fueling waves of start-up boom in recent years as today’s entrepreneurs have an incredible resource available at their fingertips at minimal cost. The impact of cloud computing across industries is unprecedented. It helps create new business opportunities, and make traditional enterprises and government operate more efficiently. It is apparent that cloud computing will play an important role to China’s digital transformation in years to come.

Cloud computing is an emerging computing model rooted from network computing that offers a wide spectrum of services and applications for enterprises and consumers. Various technologies such as machine learning and big data among others have formed part of the cloud-computing ecosystem. Cloud computing is a pay-per-use service model which could provide users with convenient and on-demand network access into a shared pool of computing resources. All the resources—such as compute, network, servers, storage, and applications—could be accessed or released with minimal investment or interaction with the cloud service provider.

Cloud computing will inevitably influence and change many aspects of the software life cycle, from architecture, development to testing, operation and maintenance (thereafter “operations”). Cloud can be categorized into public cloud, private cloud and hybrid cloud in terms of the ways of sharing. This article will focus on technical operations of public cloud, its security and compliance, and its relationship with traditional IT operations.

The Evolution of Cloud Operations
In the traditional IT service management model, hardware, server OS and application software are provided by their respective providers, and all these are operated and maintained by enterprise’s IT department. With the cloud models, however, the operations mentioned above can be managed by cloud service provider and enterprise IT department respectively. Cloud service provider manages data center infrastructure, server room, networking hardware, servers, platform OS, storage and virtual machines; whereas the customer manages the OS and applications in the virtual machines. Cloud computing is made up of three main layers – SaaS, PaaS and IaaS. Cloud service provider and customers are responsible for different parts depending on the layer as shown below:

IaaS is operated and maintained by the cloud service provider, while the OS inside the virtual machines and the stacks above are managed by the customers. As for PaaS, customers only manage the applications and data stacks. And SaaS is relatively simple to use for the customers because the entire stacks are managed by the cloud service provider.

Therefore, in the cloud model, the traditional IT operations have undergone a fundamental change in both IaaS and PaaS. The operations portion is cut into two parts (blue and green as shown above), the blue stacks are managed by the cloud service provider, while the green stacks above are managed by customer’s IT department. As for SaaS, the entire stacks are managed by the cloud service provider.

To meet the growth and requirements of the customers, infrastructure hardware and data center for cloud computing need to be expanded. Therefore, the investment for the hardware infrastructure for data center, such as the computer rooms, the racks, the network routing devices and the servers will be substantial, which is very different from the operations of the traditional Internet Data center (IDC). Cloud services need massive physical infrastructures; therefore, IDC has advantageous in transforming itself to a cloud-computing platform. The rapid growth of the cloud industry therefore forces the traditional IDC to transform itself to a cloud computing model.

Operations: Cloud Platform versus Traditional Enterprise IT
In the traditional IT model, IT department only manages a single server or server clusters whereas for cloud computing, operations take place on the entire data center, or even across multiple data centers and treat them as one complete platform, on which to deploy the cloud OSes and customer’s applications as illustrated below:

Traditional IT vs. Cloud Operation & Maintenance

Thus, transforming from traditional IT to cloud operations is not just a simple change of management from a single-server to a multi-server environment, but multiple layers and aspects with breadth and depth, such as large-scale hardware, cloud operating system, virtual host, virtual network, and cross-virtual host collaborations. Cloud business model also determines the differences between traditional IT and cloud operations. Essentially, cloud operations combine IDC infrastructure and enterprise-level IT operations that involve IDC infrastructure, utilities, hardware, cloud OS, virtual network and virtual machine etc.

Furthermore, since the cloud business model is very different from the traditional IDC and enterprise IT business models, equally important is that additional common cloud management systems are required to provide the support to the cloud business. Typical common cloud management systems include the Business and Operations Support Service (BOSS), which are often split into Business Support Service (BSS) and Operations Support Service (OSS), as shown below:

With cloud computing, the operations model has undergone a fundamental change. Due to the nature of cloud computing, it is necessary to constantly upgrade the underlying application systems live in the production environment. Thus, the cloud service provider needs a bigger team to manage package deployment. Unlike the enterprise-level IT operations that allows upgrade to be done on a subset of servers by stopping them at any given time, cloud operations require different roles for resource coordination and management, and some of these roles do not exist in the traditional IT operations. Cloud products are no longer operated and maintained by users after delivery, instead they are done by the cloud service provider. Cloud service provider also needs to add more roles to increase cross-user services that include cloud services, business services, customer support, deployment management, transition and migration management, operations management, security and risk management, most of which require 7x24x365 support.

Cloud-service users no longer need a full-time operations team like a traditional IT model does. With the cloud-computing model, they only need to care about running their services on the application level. In other words, they now don’t need hardware procurements. Technical staff for cloud operations would have to take more things into consideration such as deploying cloud service components, multi-tenant resource allocation, virtual host and network collaboration, etc.

Cloud Operations
The complexity of cloud operations determines the quality of the technical operations staff. To ensure the stability and reliability of a cloud platform, the operations team must put together a series of high-standard cloud management systems in place. These systems can be divided into the following four categories:

1. Operation and maintenance support system

This mainly provides support for the cloud platform operations such as monitoring and event management, conversion and configuration management, capacity and performance management, IT assets and licensing management, platform and infrastructure management.

2. Business support service system

Business support service system includes customer-related business and service systems such as customer management system, agreement management system, subscription management and price management.

3. Business management process and API management system

Since cloud services require a wide variety of processes, such as cross-department processes, therefore business process management and API management for cross-system and within the same system are very important. Some of the business processes can be integrated into the cloud operations platform as well.

4. Internal management system or partner’s cloud management system

Cloud operations need a system for managing both third-party partners and internal management such as internal office and mail system etc. These are similar to the traditional enterprise internal information management systems.

Cloud Security and Compliance
Cloud security is a wide topic that includes cloud security protection, such as anti-virus, anti-attack, anti-penetration. It also covers data security, such as data leakage prevention, inside job prevention, etc.

Cloud platform usually needs multiple tools and measures to defend against DDOS attack and hacker penetration. There is industrial common practice in anti-DDOS attack, for example, hardware and software firewall, protocol analysis, traffic scrubbing, black hole, and so on. Due to the diversity and complexity of cloud platform users, along with attacks from outside to inside, the attack also happens from inside to outside, where user’s virtual machine is often being hijacked or malicious user attempts to perform external DDOS attack through the cloud platform. The approaches to protect cloud platform from hacker attacks usually include regular vulnerability scanning, timely patching, and tracking of potential vulnerability of open source technology. To protect accounts and services from being hijacked, one has to reinforce the security of bastion host in addition to two-factor authentication, build a threat analysis model to do a comprehensive analysis on all potential threats, and perform various attack simulation including white-hat scanning as necessary.

It is also very important to implement isolations among users, for example, Microsoft Azure provides the following multiple-isolation measures:
• The logical network isolation makes it impossible to access any intranet IP across user accounts.
• The logical isolation of user access rights.
• Encryption and read/write isolation make sure that logically deleted data by a user cannot be accessed by another user.

Furthermore, to improve user data security, it is necessary to provide customers with multiple-encryption options to protect their data storage. Logging all the data access and enabling log access by customer is one of the important measures to protect user data security. Data remote synchronization and remote disaster recovery can also be employed for user data protection. Cloud operations engineers have no access to customer data unless they have written authorization by the customer. The written authorization letter and corresponding operations logs are required to be archived properly to ensure traceability. All the above measures and approaches are integrated efforts to satisfy certification requirements of Level 3 of national information security, trusted cloud of the Ministry of Industry and Information Technology, which helps prevent job inside and provide customer with full confidence to cloud service in terms of user data security.

As for cloud compliance, based on the requirements of national laws and regulations, cloud service provider needs to not only physically store data inside the territory, but also provides customer with precise explanation about what they can or cannot do pursuant to relevant laws, regulations and policies. User data is stored and managed in accordance with those compliance requirements. Furthermore, third-party independent audits are also carried out on a regular basis to meet the compliance requirements. Given cloud platform is the underlying platform for customers to deploy their application systems, cloud service providers may also need to support compliance requirements raised by customers.

In Closing
The quality of cloud services depends not only on advanced technologies, but also the quality of cloud operations. Starting from the evolution of cloud operations, this paper briefly describes the difference between cloud operations and traditional operations, and then details each system and its basic functions. It also introduces cloud security and the basic requirements of cloud compliance. These are all important aspects of cloud operations on daily basis. Shanghai Blue Cloud Network Technology Co. Ltd., a wholly-owned subsidiary of 21Vianet, has been operating Microsoft Azure and Office 365 cloud service for more than three years, providing customers in China with world-class cloud-computing technology and services.