Introduction: to growth in availability of number of


The current scenario in cloud computing has evolved
from traditional need of cloud platforms as a single platform of data storage
and virtual machines to Software as a Service (SaaS), Platform as a Service
(PaaS) and Infrastructure as a Service (IaaS). Due to growth in availability of
number of cloud providers in the market, providers are facing intense pressure
of competition for providing best prices and complement it with the best
Quality of Service(QoS). QoS is dependent on various factors like latency,
acceptance rate as well as reliability. Cloud providers must meet all these
requirements and keeping the running costs as low as possible. The pattern of
access to these services varies depending on the time of access. Which means
for a given period the number of users accessing the service concurrently might
be higher while it might be minimal for some other period. Due to this, static
resource allocation to the VM’s might prove to ineffective, especially during
the time periods of low resource utilizations. This problem can be solved by
using a dynamically providing the resources on need basis to the VM’s or by
predicting each VM’s behaviors. This kind of dynamic resource providing however
proves ineffective. By the time the resources are dynamically allocated to the
VMs, the workload on the VM’s is increased and users are serviced slower. Also,
most of the need for the extra resources is complete by the time resources are
allocated. This only creates an extra need for creation and terminations of the
VMs. A rather better way of doing load prediction would be use of machine
learning over the historical data. History can prove to an effective indicator
of the future. The problem of lack of production data of the workloads on cloud
servers is now somewhat handled with the publications of Microsoft Azure
platform published by Microsoft. Instead of doing the machine learning online
on the VM’s, this can be done offline on the client side offline. The results
of these offline learnings can then be applied online to correctly allocate the
resources. This new data available from predictions can help in deciding which
VM’s can be safely oversubscribed. Health management and maintenance of the VMs
can also be done without the explicit need to bring the VM’s offline. This
prediction data is especially crucial during migration which requires high data
volume and higher allocation of the resources. Resource Central is a large-scale
example of implementation of the machine learning which produces, stores and uses
the predictions. These prediction models are kept small enough, so they can run
optimally on the client machine allowing for offline predictions. This model
when applied on Azure’s VM scheduler which selects a new physical server for
each VM needed. Using the predictions, the VM schedulers server selection was

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


Characteristics of VM:

The different characteristics that can be recorded and
used for prediction are Workload, distribution if size, lifetime, resource
consumption, utilization pattern and deployment size. After recording these characterizes,
it becomes obvious that the VM characteristics and behavior patterns are
repeated over multiple lifetimes. The VM workloads can be divided into two main
types, 1st party and 3rd party each of which can be
further split in Infrastructure as a Service and Platform as a Service VM’s.
Workloads from internal VM’s like research and development, infrastructure
management as well as first party services like communication, data storage,
gaming provided to clients can be characterized as first party workload. On the
motherland, VM’s which are created by clients or the third-party are characterized
as the third-party workload. Very vague information is usually available of the
kind of workload third party VM’s perform due to their very nature. Thus,
direct data is not available or is available in very limited nature for the
third-party VM’s. Regardless of its nature, clients can create workload of any
kind, first party of third party. To make the characterization easy, a group of
VM’s created by a single customer is called as a VM deployment. The customer
deploys the VM is a area in a data center. Each VM in a deployment has its own
role depending on the type of service being provided by the VM. To gain a perspective
about how each of the characteristics change and affect the workload on the
resource management, a comparison for the resource usage and the workload needs
to be performed against each of the 5 workload characteristics mentioned
earlier. These characteristics can
further we scrutinized by dividing up the workload into fist party and third
party as well as the load during max resource utilization. For the study, each
subscription will be counted as one instance. Each subscription can be said as
a user, an organization or any entity that would place logic workload on the
resource utilization.

Results on each different characteristic we study, the
implications that it gives must also be taken into consideration before doing a
machine learning prediction on the cloud system.

VM Type:



 As discussed
earlier, VM can be split between first party (IaaS) or third party (PaaS)

Looking at the data, the workload is almost equally
split between the IaaS and

PaaS. The graph is for CDF(Cumulative Distributive Function)
 against the CPU utilizations of the VM’s.


P95 max will be the 95th percentile of the
maximum CPU utilizations. This is done as the readings for each VM were 5 minutes

Some observations that we can draw from the graph:

 The average usage for both first party and
third party are almost the same.

P95 Max for first party group is higher than that of the third party groups.

almost 60% VM’s have less than 20% of CPU utilization on an average.

The data which is not present in the graph, it was
also observed that the PaaS actually requires a lot more total core hours than
that of IaaS VM’s(85% approx.).



VM’s are more likely to be customer facing ( front end for some service). These
instances require best possible performance delivered as any delay in this
would be immediately be noticed by the users.

nno information about this type of data is revealed by the third party VM’s ,
more careful consideration is required when planning for them.

Virtual Resource Usage:



VM resource consisits of two main metrics, the storage
which is usually in the GB’s and the number of active cores dedicated to the VM’s
which provide the 


I'm Harold!

Would you like to get a custom essay? How about receiving a customized one?

Check it out