In the era, where an enormous amount of data is getting generated from various
resources in different formats, it is highly required to categorize this data in proper format to
process useful knowledge which could be utilized effectively. Clustering technique is one of the
effective and popular techniques to segregate data by abstracting underlying structure of
the data. This approach is used to organize the data either to form a group of
individuals or categorize as a hierarchy of groups. Clustering becomes an important technique
to analyze large amounts of data which is frequently applied in various domains of
engineering, science and other well-known areas such as biology, marketing, psychology,
medicine, remote sensing, computer vision etc. The representation of data that has been
done in clustering analysis is then undergone the observation. It is done to articulate and
justify the grouping of data. The investigation is carried out to see whether the phenomenon
of clustering is fitting into the preconceived ideas and experiments.
Data mining is the domain where data is being retrieved, processed and transformed
into information. In data mining, clustering is one of the most frequently used forms of
exploratory data analysis which belongs to unsupervised classification of patterns
into groups .Clustering works as to divide data into groups on the basis of similarity
and dissimilarity . It is the collection of those data sets and entities which lies in these
groups pertaining to similar and different properties.
In most of the cases, clusters are formed by exploring their internal homogeneous properties and
external separation of dataset. In prescribed clusters, patterns are found to be similar in the
same groups and different in different groups . Data analysis belongs to many
computing applications; it is considered to be involved in the design phase or as a
part of their online functions. Data analysis procedures can be categorized as either
exploratory or confirmatory, based on the models which are appropriate for the source of the
data, but a key element in both types of processes is considered to be grouping, or classification of measurements.