Natural Language to Risk Score for CIS Benchmarks using Deep Neural Networks

Risk management is a fundamental part of Cloud Service Management. Understanding up-to-date risk posture of the cloud environment is a desiring feature of today's complex IT infrastructure.

In today's IT world, different security processes are used to make sure that the cloud environment is safe, secure and compliant. Patch Management and Health Checks are two major examples of such compliance processes. Quantifying the risk for these compliance processes are critical in order to understand the current risk posture of the IT environment. While quantifying the risk in a certain compliance domain is well-defined and standardized, it is yet to become a global rule for different domains due to lack of the standard process.

This blog post explains how the AI-based approach can standardize different domains to quantify risk using natural language processing (NLP). In particular, Center for Internet Security (CIS) Benchmarks are used for the case study and applied in the IBM product, IBM Cloud Pak for Multicloud Management.

Background: Common Vulnerability Exposures

Since the year 1998, the systems and software vulnerabilities have been collected in one place called National Vulnerability Database (NVD), hosted by National Institute of Standards and Technology (NIST). Software vendors release software patches based on the found vulnerabilities. Patches often include one or multiple Common Vulnerability Exposures (CVE) assigned to them. Each CVE is assigned a vulnerability score in the range of 0–10 based on Common Vulnerability Scoring System (CVSS) by a security analyst.

Patch management is one example of the security processes in today's Cloud Service Management. Another process is Health Checking. Health Checks are usually performed by standards. Center for Internet Security (CIS) Benchmarks are an example of such standards that are widely used today. CIS benchmarks aim to make sure that managed systems are secure, safe and compliant. However, today there is no standard way of assigning a risk score for each failed health check similar to each missing patch.

Classification with Deep Neural Networks

Deep Learning is considered to be a subfield of machine learning, which itself is a subfield of Artificial Intelligence. Deep Learning has shown to be an effective method of machine learning when there is large amount of data for training. In our case, we are particularly interested in classification of a given description of text into a Common Vulnerability Scoring System (CVSS).

Data

Our primary source of data are the vulnerability descriptions and their associated CVSS scores from National Vulnerability Database (NVD). As of writing this blog post, there are more than 123K vulnerabilities reported in NVD as of Nov 2019. This constitutes a considerable amount of data. Our second source of data is IBM Technical Specification Standard Documents.

Model

Our approach is based on Deep Learning. The first layer is the input layer, where given a text we generate an embedding out of it. We use state-of-the-art embedding techniques (including language modeling) to achieve the best performance. Next, we have hidden layers, where we use CNN, LSTM and other state-of-the-art sequence-to-sequence models. Finally, we have the output layer, where the layer tries to capture the different dimensions of the CVSS. Our approach is to build an initial model with NVD data (Vanilla NVD Model), and later use transfer learning with the data from IBM Technical Specification Standard Documents and CIS Benchmarks.

Feedback

AI models can learn, and capture important features and output information we would want to receive. However, AI models usually impose limitations to achieve 100% accuracy. The precision is really important for mission critical models that have higher impact when it goes wrong. Hence, we developed a UI where Subject Matter Experts (SMEs) can easily verify (or change) the mapping assigned by the AI model.

IBM Cloud Pak for Multicloud Management

IBM Cloud Pak for Multicloud Management, running on Red Hat OpenShift, provides consistent visibility, governance, and automation from on premises to the edge. CIS Policy Controller is one of the policy controllers. CIS Policy controller implements CIS Kubernetes Benchmark 1.4.0. Each control in this benchmark comes with a score that is quantified using the AI risk framework explained above.

Conclusion

We have shown how AI based risk quantification can be done in a standard way for a given natural language description. Today in IBM Cloud Pak for Multicloud Management, one of the policy controllers is CIS Policy Controller, and each failed CIS Kubernetes Benchmark 1.4.0 check comes with a risk score assigned by our AI model and verified by a subject matter expert.

Authors: Muhammed Fatih Bulut (Senior Research Engineer, IBM Research), Jinho Hwang (Research Staff Member, IBM Research), Milton Hernandez (Distinguished Engineer, IBM Research)