Contemporary researchers find a scenario where almost entire humanity,
barring a small percentage of it, is creating data, storing data and using data in a very
large scale unstopped. The human race has become data dependent as never before. If
infinite had certain defined limits big data would have been a synonym to infinite or
almost tending to it in due course of time. Researchers are tackling with the analytics
of this big data for making it most useful by evolving various methods. It is growingly
desired to reduce infinitesimally the time being taken in the process of analytics.
Cloud computing is a computing infrastructure model which implements
complex processing in massive scale. It eliminates requirement of maintaining
costlier computing hardware and large space requirement. Basic aim behind cloud
computing model is to offer processing power, space for data and applications in
form of service.
Clustering is a powerful big data analytics and prediction technique . The process divides a dataset into groups. These groups are called clusters. Elements of each partition are as close as possible to one another, and elements of different groups are as far as possible from one another . It uncovers hidden information from a dataset. The information is vital for an organization to take right decisions. For example, clustering helps to find out different groups of customers by analyzing their purchasing patterns and choices in trade and business. Similarly, clustering helps in
categorizing different species of plants and animals considering their various properties . There are many clustering methods to solve different types of problems. K-means is used widely for clustering . It finds homogenous objects on the basis of distance vectors suited to small datasets. Pre-specifying clusters count and a dataset are the two inputs to the process. By applying trial-and-error method, it finds number of clusters accurately for a given dataset. Moreover, initial centres are selected randomly. This is initialization step of the algorithm. Second step is classification
which measures Euclidean distance between these centres and objects. An objects is allocated to its closest centre. Then, average of the points of each cluster is calculated. These averages or means are new centres of the clusters. Final step is convergence step. The process stops as soon as no points migrate from one cluster to another.