Abstract
The cloud market is growing every day. So are cloud crimes. To investigate crimes that happen in a cloud environment, an investigation is carried out adhering to the court of law. Forensics investigations require evidence from the cloud. Evidence acquisition in the cloud requires formidable efforts because of physical inaccessibility and the lack of cloud forensics tools. Time is very crucial in any forensic investigation. If the evidence is preserved before the cloud forensic investigation, it can give the investigators a head start. To identify and preserve such potential evidence in the cloud, we propose a system with an artificial intelligence (AI)-based agent, equipped for binary classification that monitors and profiles the virtual machine (VM) from hypervisor level activities. The proposed system classifies and preserves evidence data generated in the cloud. The evidence repository module of the system uses a novel blockchain model approach to maintain the data provenance. The proposed system works at the hypervisor level, which makes it robust for anti-forensics techniques in the cloud. The proposed system identifies potential evidence reducing the effective storage space requirement of the evidence repository. Data provenance incorporated in the proposed system reduces trust dependencies on the cloud service provider (CSP).
1 Introduction
Cloud forensics is an investigation of a cyber-crime committed in a cloud cyber-space. Cyber-crimes in cloud-cyber space can be committed by using the cloud as a subject, as an object, or as an environment. The rate at which data is being generated is as huge as 2.5 quintillion bytes per day [1]. The majority of these data are generated in a cloud platform. A cloud forensic investigation needs to collect evidence from such a huge pile of data. Knowing only the precise data that can potentially contribute as evidence to the forensic investigation can save a lot of data processing as well as time. This gives the advantage of the time to the investigation. Such classification is also valuable when it is used to preserve volatile data as evidence. Evidence acquisition presents challenges of data gravity and secure storage [2,3,4]. The chain of custody must be preserved throughout all the forensic investigation stages, which can also be difficult in a cloud scenario.
We propose a system equipped with an AI agent, which incorporates a Binary classification of Evidence using Smart Hypervisor Monitoring (BiSHM) to classify the data as evidence data. Our novel approach, unlike most of the current solutions in the market, does not require an agent to be installed inside the virtual machine (VM) to monitor data or metadata. The monitoring sensors of the modelled AI agent reside at the hypervisor level. This placement of the sensors of the AI agent ensures minimum privacy intrusion in the cloud. Malicious behaviour of the VM is detected with hypervisor level monitoring [5]. All the data including volatile data of the VM generated during the time of the event is collected as evidence. The evidence is then stored securely with a blockchain framework to ensure its integrity.
The BiSHM model will preserve only the required evidence data generated during an attack originating from or terminating at a VM. The chain of custody of the evidence is safeguarded by the blockchain framework. The authors have also contributed a novel dataset for cloud forensics.
The rest of this article is organized as follows: Section 2 discusses the cloud forensic scenario in the cloud. Section 3 details the problem statement. Section 4 describes the proposed system. Modelling of the proposed agent is also elaborated in Section 4. In Section 5, we have explained the dataset, which was used as a part of the experimentation. Section 6 discusses the results from the experiments. The conclusion and future scope of the research are explained in Section 7.
2 Forensics in cloud
According to Gartner, the fastest-growing market will be one of the infrastructure as a service (IaaS) cloud segment, with a projected growth of up to $38.9 billion by 2022, and the platform as a service (PaaS) market will be reaching around $31.8 billion by 2022 [1]. The adoption of the cloud is increasing day by day. This fast adoption also attracts criminals to the cloud platform [6]. According to the 2015 Trustwave Global Security Report [7], the volume of attacks on cloud services more than doubled in 2019 and accounted for 20% of the investigated incidents. The IaaS and PaaS adoptions are widely used in the IT industry. There is a high priority need for forensics tools and techniques for these cloud models [8,9], and while we are at it, we need to understand if and why the forensics investigation approach for each of the cloud service models is different.
Each service model of the cloud has a different level of virtualisation implementation [10]. The level of virtualisation specifies which components of the cloud are under the control of the cloud service provider (CSP) and which components are under the control of the cloud user [11,12]. The relative term used to specify this level of control is called the degree of control. In the PaaS model of the cloud, the CSP has a control over and up to the platform level component, which includes networking, storage, servers, operating system, middleware, and runtime environment of the entire stack of cloud servers. However, applications and data related to the applications are managed by the cloud tenant. The degree of control in PaaS is comparatively more than the IaaS and less than the software as a service (SaaS) model of the cloud. We are proposing a solution for the IaaS model based on the considerations of the least degree of control and the most popular adoption in IT industries. However, the proposed system can also serve the need for the SaaS and PaaS models, which are discussed in the future scope section at the end of this article.
In a cloud model, artefacts such as disk snapshots, application logs, access logs, memory dumps, network logs, and network packets can potentially contribute as evidence for the investigation [13,14,15]. Cyber-attacks in the cloud scenario leave their inkling in some or all of these artefacts. Hence, these artefacts collected at the time of the event are potential evidence. The sources where this evidence is found are virtualized networks, virtualized disks, and virtualized memory. If this evidence data is to be collected and preserved continuously as proposed in refs. [11,16,17], it will take up large amounts of cloud space. Not to mention acquiring some of these artefacts from the VM may demand elevated system privileges [12,18]. To add to these existential problems, sophisticated anti-forensics techniques can obfuscate and corrupt the artefact at sources of evidence.
To mitigate and overcome the aforementioned problems, we propose a system with BiSHM, with a novel evidence detection approach. BiSHM monitors activities linked to sources of evidence. The monitoring is done at the hypervisor level, so for an undesirable event, there are fewer chances of escaping the monitoring. This also mitigates anti-forensics techniques at the VM level. Based on the hypervisor monitoring, only the artefact that is classified as evidence is preserved instead of all the data generated in the cloud. The artefacts to be preserved vary as per the cloud service model because of the degree of control [19].
3 Problem definition
Not all the data which are generated in the cloud are evidence. The data that are generated during an attack or malpractice in the cloud have the potential to contribute as evidence for the investigation. If D H is the data (including logs) generated by the hypervisor and D V is data generated by the virtual machine in a PaaS or IaaS cloud, then the total data generated in the cloud D C can be depicted in equation (1):
Potential evidence data D E is a subset of D C. Referring to equation (1), we can depict this relation with equation (2):
Evidence detection in traditional cloud forensics is a manual process. Data generated in cloud (D E) are huge. This makes the detection of D E from D C a tedious process. It is like finding a needle in the haystack problem [19]. To differentiate D E from D C, we propose a novel approach to evidence detection and preservation using an AI agent. The proposed AI agent will use a classifier to predict D E given D C using the monitored data generated at the hypervisor level. This classification can be defined in a function depicted in equation (3):
The objectives of this research problem are as follows:
To model an AI agent for evidence detection and preservation in a cloud environment.
To propose a novel feature set to improve the performance of the proposed AI agent.
To use a blockchain to preserve the chain of custody of acquired evidence in the cloud.
4 Proposed system
The proposed system consists of a hypervisor monitoring module (HMM), an AI agent, and an evidence provenance module (EPM) as shown in Figure 1. The HMM continuously monitors the memory, disk, and network activities from the virtual device drivers. This is possible with the use of the Libvirt-domain virtualisation Application Programming Library (API) library [20]. Libvirt is a virtualisation library that has drivers for Xen, ESX, Hyper-V, VMware, etc. The AI agent is equipped with a classification module that uses the output from the live monitoring module and classifies the data generated in the cloud as evidence. The evidence acquisition module triggers evidence acquisition and preservation. Acquisition and preservation of the evidence are taken care of by the EPM.

Proposed system.
The instrumental key components of our system are the AI Agent, HMM, and EPM. The placement of the cloud is shown in Figure 1. The hypervisor level monitoring is done using two different libraries, namely, OpenNebula and Kernel virtual machine (KVM). KVM provides more detailed activity monitoring than OpenNebula, which is explained in Section 5. These monitoring data that are generated are stored in the monitoring dataset. We propose a novel feature set, which is derived from the existing features of KVM and the OpenNebula dataset. The monitoring data are used for training the AI agent for evidence detection. Our novel feature-set acts as sensor data for the AI agent. The AI agent is modelled for evidence data detection. The trained classification model enables the AI agent to trigger the evidence acquisition using Libvirt libraries for evidence acquisition. The evidence is pushed to the provenance module where it is securely stored using a simple blockchain mechanism. Thus, continuous monitoring of the data yields selective evidence detection and acquisition.
4.1 Hypervisor monitoring module
The HMM is the core component of the framework which monitors the hypervisor activities and profiles the virtualized resources of the cloud. The HMM module monitors each VM. There are two submodules of the HMM. Monitoring is done using OpenNebula and KVM. In a private cloud deployment, VM monitoring is done at the KVM hypervisor level using the Libvirt API. Libvirt supports KVM and Xen virtualisation. The API provides diverse parameter monitoring of a VM. The pseudocode of the monitoring using OpenNebula and KVM is shown in Algorithms 1 and 2, respectively. The parameters and their descriptions, which are collected as a dataset, are given in Section 4.2.


4.2 Model of the proposed AI agent
In designing an agent, the first step must be always to specify the task environment as fully as possible [21]. We have modelled the AI agent by specifying the performance measure, the environment, and the agent’s actuators and sensors (PEAS). The precise definition of the task environment is given in Sections 4.2.1–4.2.4 by the PEAS description.
4.2.1 Performance
Response time and evidence acquisition time are the performance parameters of our proposed agent. The agent has two submodules, namely, classification and evidence acquisition. The classification module identifies evidence data from the generated cloud data. Response time of the identification of evidence depends on the detection of an attack in the cloud environment [22]. The detection of evidence data triggers the acquisition module. The evidence acquisition module acquires the data and stores it as evidence. The performance of evidence acquisition is technology dependent. The size of the evidence and the specifications of the hardware components limit the performance of the evidence acquisition process.
4.2.2 Environment
Environment profiling of the agent is documented in this section. The agent’s sensors are deployed at the hypervisor level. The agent environment is fully observable. It is a single-agent model where only a single agent is in the action. The state of the environment is categorized as deterministic. The classification agent experience is atomic; hence, the environment state is episodic. The unchanging state of the environment makes it static. The parameters of the environment, which are being monitored continuously, have a continuous distinction that is applied to the state of the environment making it a continuous environment. The environment is known for the outcomes of all actions.
4.2.3 Actuators
If evidence is identified by the agent, it is stored in the evidence repository. This is done by using triggers to fetch the volatile data from the hypervisor level API. The actuators for the proposed AI agent are the evidence detection and evidence acquisition submodules.
4.2.4 Sensors
The activities at the sources of evidence are used for the placement of the sensors. These activities can be quantified using the HMM. Hypervisor level APIs are used by the sensors to collect the monitoring data about the resources. The collected data are used to predict D E from the main sources of the virtualized resources such as a virtualized disk, network, and memory.
4.3 Classification module
The classification module of an AI agent is trained on the generated training dataset. Machine learning modules have a binary classification agent. The agent takes the monitoring data Mt at every time instance t from the HMM and uses it to classify t to be an attack scenario or a normal scenario. This classification triggers the evidence tagging module. The evidence tagging module starts to acquire, tag, and store the data (D C) generated at the t time instance as evidence data (D E). We have explored the following models for the evidence classification problem:
k-Nearest neighbour (kNN)
Support vector machine (SVM)
Naive Bayes
Tree
Random forest
4.3.1 kNN
Euclidean distance and Mahalanobis distance are commonly used distance metrics. The most basic instance-based method is the kNN algorithm [23]. This algorithm assumes that all instances correspond to points in the n-dimensional space. The nearest neighbours of an instance are defined in terms of the standard Euclidean distance. More precisely, let an arbitrary instance x be described by the feature vector described by equation (4):
where a 1(x) denotes the value of the rth attribute of instance x. Then, the distance between the two instances x i and x j is defined to be d(x i , x j ), as shown in equation (5):
Unlike the Euclidean distance though, the Mahalanobis distance accounts for how correlated the variables are to one another. The Mahalanobis distance calculation differs only slightly from Euclidean distance as shown in equation (6):
where
4.3.2 SVM
SVMs are particular linear classifiers that are based on the margin maximisation principle. They perform structural risk minimization, which improves the complexity of the classifier intending to achieve excellent generalization performance [24]. The SVM accomplishes the classification task by constructing, in a higher-dimensional space, the hyperplane that optimally separates the data into two categories. SVM does not deliver as good results for our problem data as other algorithms. The reason could be that the size of the dataset and the dimensions are larger.
4.3.3 Naïve Bayes
Naïve Bayes is a simple learning algorithm that utilizes the Bayes rule together with a strong assumption that the attributes are conditionally independent, given the class. This classifier refers to a group of probabilistic classifiers. It implements the Bayes theorem for classification problems. The first step of the Naive Bayes classifier is to determine the total number of classes (outputs) and calculate the conditional probability for each dataset class. Then, the conditional probability is calculated for each attribute.
4.3.4 Decision tree
A decision tree refers to both a concrete decision model used to support decision-making and a method to construct such models automatically from data. As a model, a decision tree refers to concrete information or a knowledge structure to support decision-making [25]. Selection of the best attribute for the root node and subnodes is crucial while implementing a decision tree. Attribute selection measure (ASM) is used to select the nodes and subnodes. We have used the information gain technique for ASM to select attributes, and the results are explained in Section 6.
4.3.5 Random forest
A random forest is an ensemble of random decision tree classifiers, which makes predictions by combining predictions of individual trees [26]. There are different approaches to introduce randomness in the decision tree construction method. A random forest can be used to make predictions over nominal (classification) or numeric target attributes (regression). Random forests are one of the best performing predictive models. For our datasets, the random forest technique has shown improvement in the sensitivity evaluation metrics with the help of a combinational approach of random forest.
4.4 Evidence provenance module
Memory is stored in a secure folder with limited access policies using the Libvirt memory dump API using the raw format [20]. The file location is also recorded against the memory dump. The algorithm for evidence provenance using blockchain is explained in Algorithm 3.

4.4.1 Evidence acquisition module
Electronic evidence can be found in a disk, network, and memory. However, in a cloud environment, evidence data in memory are highly volatile in nature and difficult to acquire [19]. Volatile memory in most cases no longer exists if the VM is shut down or undeployed [8,27]. But volatile data are very crucial in forensics for crime scene reconstruction. Important artefacts such as process id, password, and unencrypted data can be found in the volatile memory. However, fetching volatile memory data are difficult mainly because of the following reasons: (i) The software approach of fetching volatile memory needs a program to run in the memory affecting the original contents of the memory. (ii) Volatile memory acquisition requires live forensics. Memory being the critical information source and highly volatile, we have discussed the implementation of evidence collection with memory dumps acquisition module. Nonetheless, disk and network data acquisition modules can also work smoothly with the proposed system.
The evidence detection agent of the proposed system triggers the evidence acquisition module. The memory acquisition module uses a kernel-level API to acquire and dump the memory data associated with the VM; hence, it does not require any agent installed on the VM. When the BiSHM agent is running at the hypervisor level, it can trigger the acquisition module to take memory dumps of the VM at a specific time instance t. This dumped data are preserved as evidence in the evidence repository. The evidence acquisition module of the proposed system uses the Libvirt API [20] to acquire the dump. The dump is then processed for evidence provenance. The algorithm for memory evidence acquisition is shown in Algorithm 4.

4.4.2 Blockchain
Blockchain is a tamper-resistant timestamp-based distributed ledger that often adapts for sharing and storing data. The fundamental elements of the blockchain are listed as follows:
Decentralization: In blockchain architecture, control is not retained by a centralized authority. Instead, it is distributed among peers. In the chain, each node is free to join or leave the blockchain network [28].
Collective verification: All other nodes verify the transactions made on the blockchain. In addition, it also provides tamper assistance, which enables the chain of custody of evidence. The data that are stored in the blockchain cannot be modified or deleted.

Blockchain model for evidence repository.
The file size of the evidence file for a VM is as huge as the memory size of the VM, which is several Giga Bytes. To store such a huge file in the blockchain would require calculating the hash of these large files. To tackle this problem, the EPM compresses the evidence and stores it at an encrypted location. Before compressing the memory file, it also calculates the hash value of the file using SHA-256. The metadata of the memory dump including the access timestamp (atime), the modified timestamp (mtime), and the changed timestamp (ctime) and path are stored in the repository dataset. A snapshot of the generated data is shown in Figure 3. These data are then stored in a blockchain. A snapshot of the saved data is shown in Figure 4.

Snapshot of the data to be stored in the blockchain.

Blockchain database snapshot.
5 Datasets
We have deployed a private cloud setup using OpenNebula [30]. There are two ways of monitoring resources using OpenNebula. One of the ways is to use Abstract APIs provided by OpenNebula to monitor the VM. The second way is to use the Libvirt API to monitor the virtualized domains. One of the datasets is generated using OpenNebula Monitoring API and the other dataset using Libvirt API. We have considered both datasets for the experimentation. The OpenNebula dataset has fewer parameters as compared to the Libvirt dataset. The list and the description of the parameters are provided in Tables 1 and 2 for OpenNebula and KVM, respectively. The OpenNebula API provides seven parameters, whereas the KVM parameter provides 43 parameters. These parameters are associated with the disk, network, and memory attributes of the VM. The datasets are stored in the MYSQL database at a regular time interval. The time interval of OpenNebula is greater than the time interval of KVM. Both the datasets showed good classification accuracy. However, for the final evaluation, we have considered the KVM dataset as it gives more detailed features of the disk, memory, and network. We have published both datasets on IEEE Dataport [31,32].
Category and description of features of OpenNebula dataset
Sr No | Category | Parameter | Description |
---|---|---|---|
1 | Meta-data | VMID | The ID of the VM |
2 | LAST_POLL | EPOCH timestamp | |
3 | STATE | State of the VM (explained below) | |
4 | MAC | The physical address of the VM | |
5 | IP | IPv4 addresses associated with the VM at LAST_POLL time instant | |
6 | Memory | CPU | Percentage of CPU consumed |
7 | MEMORY | Memory consumption in kilobytes | |
8 | Network | NETRX | Received bytes from the network |
9 | NETTX | Sent bytes to the network | |
10 | Disk | DISKRDBYTES | Read bytes from all disks since last VM start |
11 | DISKWRBYTES | Written bytes from all disks since last VM start | |
12 | DISKRDIOPS | Read IO operations from all disks since the last VM start | |
13 | DISKWRIOPS | Written IO operations all disks since the last VM start |
Category and description of features of KVM dataset
Sr No | Category | Feature | Description |
---|---|---|---|
1 | Meta-data | LAST_POLL | Epoch timestamp |
2 | VMID | The ID of the VM | |
3 | UUID | Unique identifier of the domain | |
4 | Dom | Domain name | |
5 | Network | Rxbytes | Received bytes from the network |
6 | rxpackets | Received packets from the network | |
7 | rxerrors | Number of received errors from the network | |
8 | rxdrops | Number of received packets dropped from the network | |
9 | txbytes | Transmitted bytes from the network | |
10 | txpackets | Transmitted packets from the network | |
11 | txerrors | Number of transmission errors from the network | |
12 | txdrops | Number of transmitted packets dropped from the network | |
13 | Memory | timecpu | Time spent by vCPU threads executing guest code |
14 | timesys | Time spent in kernel space | |
15 | timeusr | Time spent in userspace | |
16 | state | Running state | |
17 | memmax | Maximum memory in kilobytes | |
18 | mem | Memory used in kilobytes | |
19 | cpus | Number of virtual CPUs | |
20 | cputime | CPU time used in nanoseconds | |
21 | memactual | Current balloon value (in KiB) | |
22 | memswap_in | The amount of data read from swap space (in KiB) | |
23 | memswap_out | The amount of memory written out to swap space (in KiB) | |
24 | memmajor_fault | The number of page faults where disk IO was required | |
25 | memminor_fault | The number of other page faults | |
26 | memunused | The amount of memory left unused by the system (in KiB) | |
27 | memavailable | The amount of usable memory as seen by the domain (in KiB) | |
28 | memusable | The amount of memory that can be reclaimed by balloon without causing host swapping (in KiB) | |
29 | memlast_update | The timestamp of the last update of statistics (in seconds) | |
30 | memdisk_cache | The amount of memory that can be reclaimed without additional I/O, typically disk caches (in KiB) | |
31 | memhugetlb_pgalloc | The number of successful huge page allocations initiated from within the domain | |
32 | memhugetlb_pgfail | The number of failed huge page allocations initiated from within the domain | |
33 | memrss | Resident set size of the running domain’s process (in KiB) | |
34 | Disk | vdard_req | Number of read-requests on the vda block device |
35 | vdard_bytes | Number of read-bytes on the vda block device | |
36 | vdawr_reqs | Number of write requests on the vda block device | |
37 | vdawr_bytes | Number of write requests on vda the block device | |
38 | vdaerror | Number of errors in the vda block device | |
39 | hdard_req | Number of read requests on the hda block device | |
40 | hdard_bytes | Number of read bytes on the had block device | |
41 | hdawr_reqs | Number of write requests on the hda block device | |
42 | hdawr_bytes | Number of write bytes on the hda block device | |
43 | hdaerror | Number of errors in the hda block device |
6 Experimentation
6.1 Setup
The author did not find any suitable and feasible existing dataset, which has the parametric data of a cloud platform along with a mechanism to store the evidence data. So, to generate a synthetic dataset, a private cloud was set up for the proof of concept. The system configuration included Intel® Core™ i5-4590 Processor with 12 GB of RAM with 1 TB of HDD. The private cloud setup was done using a KVM type-1 hypervisor along with OpenNebula (version 5.12) as the cloud management platform. To simulate a real-time cloud environment a script generating, a synthetic workload was deployed on the virtual machines of the cloud.
6.2 Proposed modified feature set
Both the generated datasets, OpenNebula and KVM, did not have any missing values. The numeric features of the dataset represent respective cumulative values till time t. To obtain the rate of the activities at virtualized resources at a specific time instance t, the difference of consecutive data values should be considered. Numeric features representing disk and network activities are not used as they are received from the HMM. The calculation steps of the proposed new feature set are as represented in equation (7):
where Δ(F old) represents the difference of the consecutive values of the feature and Δ(LAST_POLL) represents the time duration of these consecutive F values.
6.3 Pre-processing
Normalization of each of the feature parameters is done to mitigate any dominant effect of any specific parameter. Normalization adjusts values to a common scale by using equation (8).
6.4 Feature ranking
The ranking experiment is carried out on both datasets using information gain [33]. Given a data set X with n cases, X = {x 1, x 2, …, x n }, where each case x i belongs to exactly one of the k predefined classes c j ∈ {c 1, c 2, …, c k }. Then, the information gain of attribute a is the reduction in entropy caused by partitioning X based on the values v i of that attribute a; it is given by equation (9):
Tables 3 and 4 present the information gain of different parameters of OpenNebula and KVM datasets, respectively, in descending order of the information gain.
Information gain ranking of features from OpenNebula dataset
Sr No | Parameter | Information gain |
---|---|---|
1 | NETTX | 0.545711414 |
2 | DISKWRBYTES | 0.208363871 |
3 | DISKWRIOPS | 0.205281549 |
4 | NETRX | 0.165385768 |
5 | CPU | 0.020941001 |
6 | DISKRDBYTES | 0.004170941 |
7 | DISKRDIOPS | 0.003862696 |
Information gain ranking of features from KVM dataset
Sr No | Parameter | Information gain |
---|---|---|
1 | txpackets | 0.623739395 |
2 | txbytes | 0.507514798 |
3 | rxpackets | 0.195322086 |
4 | rxbytes | 0.193009701 |
5 | vdawr_reqs | 0.186630402 |
6 | timesys | 0.163283136 |
7 | vdawr_bytes | 0.167101488 |
8 | timecpu | 0.095914082 |
9 | cputime | 0.095914082 |
10 | memlast_update | 0.000636577 |
11 | timeusr | 0.056836573 |
12 | hdard_req | 0.00047371 |
13 | hdard_bytes | 0.00047371 |
14 | memminor_fault | 0.000294259 |
15 | state | 0.002081396 |
16 | memunused | 0.000247844 |
17 | memusable | 0.000247844 |
18 | memrss | 0.01690513 |
19 | vdard_bytes | 0.001638032 |
20 | vdard_req | 0.001422896 |
6.5 Feature selection
Correlation is a bivariate analysis that measures the strength of association between two variables and the direction of the relationship. In terms of the strength of the relationship, the value of the correlation coefficient varies between +1 and −1 [34]. Pearson r correlation is the most widely used correlation statistic to measure the degree of the relationship between linearly related variables. So, we have experimented with both datasets to find the top parameters for effective machine learning performance. Tables 5 and 6 present the correlation score for the OpenNebula and KVM datasets.
Pearson correlation scores for parameter pairs from the OpenNebula dataset
Sr No | Pearson correlation | Parameter pair | |
---|---|---|---|
1 | 0.99 | DISKRDBYTES | DISKRDIOPS |
2 | 0.969 | DISKRDBYTES | DISKWRIOPS |
3 | 0.944 | DISKRDIOPS | DISKWRIOPS |
4 | 0.94 | DISKRDIOPS | NETRX |
5 | 0.937 | DISKRDBYTES | NETRX |
6 | 0.878 | DISKWRIOPS | NETRX |
7 | 0.778 | DISKWRBYTES | NETRX |
8 | 0.722 | DISKRDIOPS | DISKWRBYTES |
9 | 0.678 | DISKRDBYTES | DISKWRBYTES |
10 | 0.574 | DISKWRBYTES | DISKWRIOPS |
11 | 0.31 | NETRX | NETTX |
12 | 0.251 | DISKRDIOPS | NETTX |
13 | 0.211 | DISKRDBYTES | NETTX |
14 | 0.199 | DISKWRIOPS | NETTX |
15 | 0.186 | CPU | DISKWRIOPS |
16 | 0.179 | CPU | DISKRDBYTES |
17 | 0.171 | CPU | DISKRDIOPS |
18 | 0.129 | CPU | NETRX |
19 | 0.126 | DISKWRBYTES | NETTX |
20 | 0.046 | CPU | DISKWRBYTES |
21 | 0.046 | CPU | NETTX |
Pearson correlation scores for parameter pairs from the KVM dataset
Sr No | Pearson correlation | Parameter pair | |
---|---|---|---|
1 | 0.998 | hdard_bytes | hdard_req |
2 | 0.992 | rxbytes | rxpackets |
3 | 0.789 | vdard_bytes | vdard_req |
4 | 0.697 | vdawr_bytes | vdawr_reqs |
5 | 0.654 | timeusr | txpackets |
6 | 0.593 | hdard_bytes | vdard_req |
7 | 0.579 | hdard_req | vdard_req |
8 | 0.575 | txbytes | txpackets |
9 | 0.518 | timesys | txpackets |
10 | 0.376 | timeusr | txbytes |
11 | 0.306 | timesys | txbytes |
12 | 0.303 | timesys | timeusr |
13 | 0.255 | hdard_bytes | vdard_bytes |
14 | 0.248 | hdard_req | vdard_bytes |
15 | 0.166 | vdard_req | vdawr_reqs |
16 | −0.125 | txpackets | vdawr_reqs |
17 | 0.122 | vdard_bytes | vdawr_bytes |
18 | 0.119 | vdard_bytes | vdawr_reqs |
19 | 0.104 | vdard_req | vdawr_bytes |
20 | 0.088 | rxpackets | vdawr_bytes |
21 | 0.085 | timecpu | txpackets |
22 | −0.075 | timeusr | vdawr_reqs |
23 | −0.074 | txbytes | vdawr_reqs |
24 | 0.061 | timecpu | txbytes |
25 | 0.055 | rxpackets | vdawr_reqs |
26 | 0.054 | rxbytes | vdawr_bytes |
27 | −0.052 | timesys | vdawr_reqs |
28 | −0.051 | txpackets | vdawr_bytes |
29 | 0.048 | rxbytes | vdawr_reqs |
30 | 0.039 | timecpu | timesys |
31 | −0.037 | timecpu | vdawr_bytes |
32 | −0.032 | rxpackets | txpackets |
33 | −0.03 | timeusr | vdawr_bytes |
34 | −0.029 | txbytes | vdawr_bytes |
35 | 0.025 | timecpu | timeusr |
36 | −0.019 | rxpackets | txbytes |
37 | −0.018 | timesys | vdawr_bytes |
38 | −0.018 | txpackets | vdard_req |
39 | −0.018 | rxpackets | timeusr |
40 | −0.017 | txpackets | vdard_bytes |
41 | −0.017 | rxbytes | txpackets |
42 | −0.015 | rxpackets | timesys |
43 | −0.013 | rxpackets | timecpu |
44 | 0.011 | timecpu | vdawr_reqs |
45 | −0.01 | txbytes | vdard_req |
46 | −0.01 | rxbytes | txbytes |
47 | −0.01 | txbytes | vdard_bytes |
48 | 0.009 | rxpackets | vdard_bytes |
49 | 0.009 | rxbytes | vdard_bytes |
50 | −0.008 | hdard_req | txpackets |
51 | -0.008 | hdard_bytes | txpackets |
52 | −0.008 | rxbytes | timeusr |
53 | 0.008 | rxpackets | vdard_req |
54 | −0.007 | rxbytes | timesys |
55 | 0.007 | rxbytes | vdard_req |
56 | 0.007 | hdard_req | vdawr_bytes |
57 | 0.006 | hdard_bytes | vdawr_bytes |
58 | −0.006 | timesys | vdard_bytes |
59 | 0.006 | rxbytes | timecpu |
60 | 0.006 | hdard_req | vdawr_reqs |
61 | −0.006 | timeusr | vdard_bytes |
62 | 0.005 | hdard_bytes | vdawr_reqs |
63 | −0.005 | timesys | vdard_req |
64 | −0.005 | hdard_req | txbytes |
65 | −0.005 | hdard_bytes | txbytes |
66 | 0.004 | timecpu | vdard_bytes |
67 | −0.003 | hdard_bytes | timecpu |
68 | −0.003 | hdard_req | timecpu |
69 | −0.002 | hdard_req | rxpackets |
70 | 0.002 | timecpu | vdard_req |
71 | 0.002 | hdard_bytes | timeusr |
72 | -0.002 | hdard_bytes | rxpackets |
73 | −0.002 | timeusr | vdard_req |
74 | 0.002 | hdard_req | timeusr |
75 | −0.001 | hdard_req | rxbytes |
76 | −0.001 | hdard_bytes | rxbytes |
From the results presented in Table 5, we have selected seven parameters for further experiments using the OpenNebula dataset, and referring to Table 6, we have considered eight parameters for the KVM dataset. This parameter selection is based on high-decorrelated features with high information gain. The distribution of these features is explained in Section 6.6.
6.6 Distributions
The simplest form of Gaussian distribution is the one-dimensional standard Gaussian distribution [25]. All the selected parameters of the dataset show Gaussian distribution. Box plot is another way to discover any anomalies in the dataset. Figures 5 and 6 represent the distribution and box plot of the top features that were selected from the earlier stage. The distribution happens to be a Gaussian distribution. Figure 5 shows the results for the OpenNebula dataset, and Figure 6 shows the results for the KVM dataset.

Distribution and box plot of OpenNebula dataset.

Distribution and box plot of KVM dataset.
6.7 Evaluation results
The OpenNebula dataset [31] and KVM dataset [32] have a total of 4869 and 9610 records, respectively. By using 10% of the fixed sampling method, we have trained SVM, Naïve Bayes, Tree, and Random forest models. The parameter setting, which gives better performance, is selected through extensive empirical experimentation as shown in Table 7. The focus of the research is not to modify any of these classification models. However, the proposed modified feature set used to train the models has revealed better classification results ultimately improving the performance of the AI agent in the proposed system.
Parameter setting for classification models
Model | Parameter setting for KVM dataset | Parameter setting for OpenNebula dataset |
---|---|---|
Random forest | Train:test split = 10%:90%, no of features = 8, no of trees = 8, growth control = do not split subset smaller than 5 | Train:test split = 10%:90%, no of features = 7, no of trees = 7, growth control = do not split subset smaller than 5 |
kNN | Train:test split = 10%:90%, no of features = 8, no of neighbours = 5, metric = uclidean, weight = uniform | Train:test split = 10%:90%, no of features = 7, no of neighbours = 5, metric = uclidean, weight = uniform |
SVM | Train:test split = 10%:90%, no of features = 8, SVM type: SVM, C = 1.0, ε = 0.1, Kernel: RBF, exp(−0.35|x − y|²), Numerical tolerance: 0.001, Iteration limit: unlimited | Train:test split = 10%:90%, no of features = 7, SVM type: SVM, C = 1.0, ε = 0.1, Kernel: RBF, exp(−0.35|x − y|²), Numerical tolerance: 0.001, Iteration limit: unlimited |
The proposed system is tested with training data sampling of 10, 20, 30, 40, 50, 60, and 70%. The proposed agent was trained using SVM, Naïve Bayes, Tree, and random forest models out of which the random forest model gives better classification accuracy in almost every training iteration from 10% to 70%. The data exhibit a class imbalance problem. However, without using any explicit control over the data balance ratio, the Naïve Bayes provides better overall classification accuracy for both the OpenNebula and KVM datasets. Because of the class imbalance problem, we have considered sensitivity and specificity to measure the performance through the evaluation results. Table 8 presents the classification evaluation results for the OpenNebula dataset, and Table 9 shows the classification evaluation results for the KVM dataset. Figures 7 and 8 show the respective graphical depiction of the results for OpenNebula and KVM.
Classification evaluation results for OpenNebula dataset
Model | AUC | CA | F1 | Precision | Recall |
---|---|---|---|---|---|
Random forest | 0.984 | 0.987 | 0.987 | 0.987 | 0.987 |
Tree | 0.942 | 0.975 | 0.974 | 0.975 | 0.975 |
kNN (Mahalanobis distance) | 0.931 | 0.921 | 0.922 | 0.925 | 0.921 |
Naive Bayes | 0.974 | 0.914 | 0.915 | 0.917 | 0.914 |
SVM | 0.74 | 0.874 | 0.855 | 0.891 | 0.874 |
Logistic regression (Lasso) | 0.498 | 0.764 | 0.661 | 0.583 | 0.764 |
Classification evaluation results for KVM dataset
Model | AUC | CA | F1 | Precision | Recall |
---|---|---|---|---|---|
Random forest | 0.998 | 0.998 | 0.996 | 0.998 | 0.994 |
Logistic regression (Ridge) | 0.959 | 0.889 | 0.701 | 0.995 | 0.541 |
SVM | 0.997 | 0.949 | 0.883 | 0.995 | 0.793 |
Logistic regression (Lasso) | 0.995 | 0.992 | 0.983 | 0.995 | 0.971 |
kNN (Mahalanobis distance) | 0.996 | 0.995 | 0.991 | 0.993 | 0.988 |

Classification evaluation for KVM dataset.

Classification evaluation for KVM dataset.
Extensive experimentation shows that the random forest classifier gives the best classification accuracy, sensitivity, and specificity combination altogether. Random forest handles the imbalance of the dataset well. Upon increasing the training data sample size, the AI agent shows improvement in the performance parameters as shown in Figure 9.

Graph showing the comparison of evaluation parameters for classification using KVM dataset.
6.8 Storage efficiency
The evidence repository module saves only the evidence data which is generated during the attack. This reduces the space required to preserve volatile evidence. For the experiment we carried out, we implemented the volatile data dump module which generated around 360 VM memory dump images with an average size of 1 GB each. These data files are compressed to a 79.5 GB memory evidence dataset. Of these preserved and stored memory dump datasets, 79 files of size 17.3 GB were generated during the attack. This means that only 21.76% of the data (in size) is potential evidence, and this can improve the storage and the processing time during the forensics investigation. However, these statistics only depict the current scenario of the simulation and are subject to change as per the duration of the attack. The results of our experiment regarding the storage requirement are presented in Table 10.
Improvement in the storage efficiency
Description | Number of memory dumps | Size in GB |
---|---|---|
Data generated in cloud | 360 | 281 |
Data generated during attack (potential evidence) | 78 | 60 |
7 Conclusion and future scope
A traditional cloud forensics process involves manual detection and acquisition of evidence. In this article, we proposed a system equipped with an AI agent that automates evidence detection and acquisition. We have modelled the AI agent. The system classifies the data generated in the cloud as evidence or non-evidence data by monitoring and profiling the VM at the hypervisor level. We propose a novel feature-set of KVM-based hypervisor monitoring, which is highly de-correlated, giving better classification. The hypervisor-level monitoring ensures minimum privacy intrusion to a naïve user’s VM and decreases the chances of malicious activities escaping the monitoring. The acquired evidence is stored using a simple blockchain mechanism to safeguard the evidence from tampering.
The proposed system is implemented and evaluated on the KVM private cloud platform. The evaluation results show that our system reduces the size requirement of the evidence repository by using AI agents and selective evidence preservation. The process provenance enhances the trustworthiness of the chain of custody for cloud forensics.
The current scope of the research presented in this article is limited to the acquisition of memory evidence. Future work can be undertaken to extend the scope to disk and network artefact acquisition using the Libvirt library. Autosklearn [35] can also be integrated to develop a cloud forensics tool that will aid the AI agent to select classification models and hyperparameter tuning for better results. Our next work is focused on developing a web-based cloud forensics tool that will include the proposed AI agent model for memory, disk, and network evidence detection and acquisition.
-
Conflict of interest: Authors state no conflict of interest.
-
Data availability statement: The datasets generated and analysed during the current study are available in the IEEE Dataport repository, “https://ieee-dataport.org/open-access/evidence-detection-cloud-forensics.” DOI: https://dx.doi.org/10.21227/2yr5-7z67.
References
[1] Gartner. Gartner forecasts worldwide public cloud revenue to grow 17.5 percent in 2019. [Online]. 2019. https://www.gartner.com/en/newsroom/press-releases/2019-04-02-gartner-forecasts-worldwide-public-cloud-revenue-to-gSearch in Google Scholar
[2] M. M. H. Onik, C. S. Kim, N. Y. Lee, and J. Yang, “Privacy-aware blockchain for personal data sharing and tracking,” Open. Computer Sci., vol. 9, pp. 80–91, 2019.10.1515/comp-2019-0005Search in Google Scholar
[3] D. Quick and K.-K. R. Choo, “IoT device forensics and data reduction,” IEEE Access, vol. 6, pp. 47566–47574, 2018.10.1109/ACCESS.2018.2867466Search in Google Scholar
[4] Y. Wu, Z. Zhang, C. Wu, C. Guo, Z. Li, and F. Lau, “Orchestrating bulk data transfers across geo-distributed datacenters,” IEEE Trans. Cloud Comput., vol. 5, no. 1, pp. 112–125, 2015.10.1109/TCC.2015.2389842Search in Google Scholar
[5] A. Aldribi, I. Traoré, B. Moa, and O. Nwamuo, “Hypervisor-based cloud intrusion detection through online multivariate statistical change tracking,” Computers Sec., vol. 88, p. 101646, 2020.10.1016/j.cose.2019.101646Search in Google Scholar
[6] K. Shaukat, S. Luo, V. Varadharajan, I. A. Hameed, and M. Xu, “A survey on machine learning techniques for cyber security in the last decade,” IEEE Access, vol. 8, pp. 222310–222354, 2020.10.1109/ACCESS.2020.3041951Search in Google Scholar
[7] Trustwave global Security Report 2015. https://www2.trustwave.com/rs/815-RFM-693/images/2015_TrustwaveGlobalSecurityReport.pdf.Search in Google Scholar
[8] A. T. Lo’ai and G. Saldamli, “Reconsidering big data security and privacy in cloud and mobile cloud systems,” J. King Saud. Univ-Computer Inf. Sci., vol. 33, no. 7, pp. 810–819, 2021.10.1016/j.jksuci.2019.05.007Search in Google Scholar
[9] Z. Inayat, A. Gani, N. B. Anuar, S. Anwar, and M. K. Khan, “Cloud-based intrusion detection and response system: open research issues, and solutions,” Arab. J. Sci. Eng., vol. 42, pp. 399–423, 2017.10.1007/s13369-016-2400-3Search in Google Scholar
[10] M. Compastié, R. Badonnel, O. Festor, and R. Hev “From virtualization security issues to cloud protection opportunities: An in-depth analysis of system virtualization models,” Computers Sec., vol. 97, p. 101905, 2020.10.1016/j.cose.2020.101905Search in Google Scholar
[11] G. Meera, B. K. S. P. Kumar Raju Alluri, and G. Geethakumari, “CEAT: a cloud evidence acquisition tool for aiding forensic investigations in cloud systems,” Int. J. Trust. Manag. Comput. Commun., vol. 3, no. 34, pp. 360–372, 2016.10.1504/IJTMCC.2016.084562Search in Google Scholar
[12] D. Gonzales, J. M. Kaplan, E. Saltzman, Z. Winkelman, and D. Woods, “Cloud-trust – A security assessment model for infrastructure as a service (IaaS) clouds,” IEEE Trans. Cloud Comput., vol. 5, no. 3, pp. 523–536, 2015.10.1109/TCC.2015.2415794Search in Google Scholar
[13] A. Atamli, G. Petracca, and J. Crowcroft, “IO-Trust: an out-of-band trusted memory acquisition for intrusion detection and forensics investigations in cloud IOMMU based systems,” Proceedings of the 14th International Conference on Availability, Reliability and Security, 2019.10.1145/3339252.3340511Search in Google Scholar
[14] Z. Qi, C. Xiang, R. Ma, J. Li, H. Guan, and D. S. L. Wei. “ForenVisor: A tool for acquiring and preserving reliable data in cloud live forensics,” IEEE Trans. Cloud Comput., vol. 5, no. 3, pp. 443–456, 2016.10.1109/TCC.2016.2535295Search in Google Scholar
[15] L. A. Holt and M. Hammoudeh, “Cloud forensics: A technical approach to virtual machine acquisition,” 2013 European Intelligence and Security Informatics Conference, IEEE, 2013.10.1109/EISIC.2013.59Search in Google Scholar
[16] S. Zawoad, R. Hasan, and A. Skjellum, “OCF: an open cloud forensics model for reliable digital forensics,” 2015 IEEE 8th International Conference on Cloud Computing, IEEE, 2015.10.1109/CLOUD.2015.65Search in Google Scholar
[17] S. Simou, C. Kalloniatis, H. Mouratidis, and S. Gritzalis, A Meta-model for Assisting a Cloud Forensics Process, vol. 9572, Springer Verlag, 2016, pp. 177–187.Search in Google Scholar
[18] S. Khan, A. Gani, AWA Wahab, M. A. Bagiwa, M. Shiraz, S. U. Khan, et al., “Cloud log forensics: Foundations, state of the art, and future directions,” ACM Comput. Surv. (CSUR), vol. 49, no. 1, pp. 1–42, 2016.10.1145/2906149Search in Google Scholar
[19] P. Purnaye and V. Kulkarni, “A Comprehensive study of cloud forensics,” Arch. Comput. Methods Eng., vol. 29, pp. 1–14, 2021.10.1007/s11831-021-09575-wSearch in Google Scholar
[20] W. D. Ashley, Foundations of Libvirt Development, New york: Apress, 2019.10.1007/978-1-4842-4862-1Search in Google Scholar
[21] S. Russell, P. Norvig, and E. Davis, Artificial Intelligence: A Modern Approach, 3rd ed., Upper Saddle River, NJ, Prentice Hall, 2010. Print.Search in Google Scholar
[22] N. Rakotondravony, B. Taubmann, W. Mandarawi, E. Weishäupl, P. Xu, B. Kolosnjaji, et al., “Classifying malware attacks in IaaS cloud environments,” J. Cloud Comput., vol. 6, pp. 1–12, 2017.10.1186/s13677-017-0098-8Search in Google Scholar
[23] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, et al., “KNN model-based approach in classification.” OTM Confederated International Conferences “On the Move to Meaningful Internet Systems,” Berlin, Heidelberg, Springer, pp. 986–996, 2003.10.1007/978-3-540-39964-3_62Search in Google Scholar
[24] A. Hamoudzadeh and S. Behzadi, “Predicting user’s next location using machine learning algorithms,” Spat. Inf. Res., vol. 29, pp. 379–387, 2021.10.1007/s41324-020-00358-2Search in Google Scholar
[25] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, Vol. 454, New York, NY, Springer Science & Business Media, 2012.Search in Google Scholar
[26] V. Y. Kulkarni, P. K. Sinha, and M. C. Petare, “Weighted hybrid decision tree model for random forest classifier,” J. Inst. Eng. (India): Ser. B, vol. 97, pp. 209–217, 2016.10.1007/s40031-014-0176-ySearch in Google Scholar
[27] W.-Z. Zhang, H.-C. Xie, and C.-H. Hsu, “Automatic memory control of multiple virtual machines on a consolidated server,” IEEE Trans. Cloud Comput., vol. 5, no. 1, pp. 2–14, 2015.10.1109/TCC.2014.2378794Search in Google Scholar
[28] M. Pourvahab and G. Ekbatanifard, “Digital forensics architecture for evidence collection and provenance preservation in iaas cloud environment using sdn and blockchain technology,” IEEE Access, vol. 7, pp. 153349–153364, 2019.10.1109/ACCESS.2019.2946978Search in Google Scholar
[29] K. DeviPriya and S. Lingamgunta, “Multi factor two-way hash-based authentication in cloud computing,” Int. J. Cloud Appl. Comput. (IJCAC), vol. 10.2, pp. 56–76, 2020.10.4018/IJCAC.2020040104Search in Google Scholar
[30] P. Kalyanaraman, K. R. Jothi, P. Balakrishnan, R. G. Navya, A. Shah, and V. Pandey, “Implementing hadoop container migrations in OpenNebula private Cloud Environment,” Role Edge Analytics Sustainable Smart City Development: Challenges and Solutions. USA, Wiley, pp. 85–103, 2020.10.1002/9781119681328.ch5Search in Google Scholar
[31] P. Purnaye and V. Kulkarni, “OpenNebula virtual machine profiling for intrusion detection system,” IEEE Dataport, 2020, 10.21227/24mb-vt61.Search in Google Scholar
[32] P. Purnaye and V. Kulkarni, “Memory dumps of virtual machines for cloud forensics,” IEEE Dataport, 2020, 10.21227/ft6c-2915.Search in Google Scholar
[33] T. Ito, Y. Kim, and N. Fukuta. I. C. Society, I. A. for Computer Information Science, I. of Electrical, and E. Engineers, 2015 IEEE/ACIS 14th International Conference on Computer and Information Science (ICIS): Proceedings: June 28–July 1, 2015, Las Vegas, USA.Search in Google Scholar
[34] W. Kirch, “Pearson’s correlation coefficient,” Encycl. Public. Health, vol. 1, pp. 1090–1091, 2008.10.1007/978-1-4020-5614-7_2569Search in Google Scholar
[35] F. Fabris and A. A. Freitas, “Analysing the overfit of the auto-sklearn automated machine learning tool,” International Conference on Machine Learning, Optimization, and Data Science, Cham, Springer, 2019.10.1007/978-3-030-37599-7_42Search in Google Scholar
© 2022 Prasad Purnaye and Vrushali Kulkarni, published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.