Classification, in its widest sense, has to do with forms of the relatedness and with the organization and display of the relations in a useful manner. The items to be studied could be anything: people, bacteria, religions, books, *etc.* The attributes in each case would be those features of the items that are of interest for the purpose of the study [1]. Classifications are generally pictured in the form of hierarchical trees, also called a dendrogram. A dendrogram is the graphical representation of an ultrametric (= cophenetic) matrix; so dendrograms can be compared to one another by comparing their cophenetic matrices [2].

Cluster Analysis (CA), Principal Components Analysis (PCA) and Discriminant Analysis (DA) are three of the primary methods of modern multivariate analysis. Because of its utility, clustering has emerged as one of the leading methods of multivariate analysis [3].

Cluster analysis is a multivariate statistical technique which was originally developed for biological classification. Biologists Robert Soka1 and Peter Sneath published their seminal text ‘*Principles of Numerical Taxonomy*’ in 1963. Sokal and Sneath demonstrated that cluster analysis could be utilized to efficiently classification a data set which contained all relevant characteristics of an organism. When the organisms had been classified based on these characteristics, it could be determined in which way they differed, and if they belonged to different species. In this way, Sokal and Sneath asserted, researchers could trace the path of evolution from one species to another [4].

In this study for clustering, two measures of cluster ‘goodness’ or quality are used. One type of measure allows us to compare different sets of clusters without reference to external knowledge and is called an internal quality which is used as a measure of ‘overall similarity’ based on the pairwise similarity of documents in a cluster. The other type of measures allows evaluating how well the clustering is working by comparing the groups produced by the clustering techniques to known classes. This type of measure is called an external quality measure, which is not scope of this study [5].

The joining or tree clustering method uses the dissimilarities (similarities) or distances (Euclidean distance, squared Euclidean distance, city-block (Manhattan) distance, Chebychev distance, power distance, Mahalanobis distance, *etc.*) between objects when forming the clusters. Similarities are a set of rules that serve as criteria for grouping or separating items. These distances (similarities) can be based on a single dimension or multiple dimensions, with each dimension representing a rule or condition for grouping objects. The joining algorithm does not ‘care’ whether the distances that are ‘fed’ to it are actual real distances, or some other derived measure of distance that is more meaningful to the researcher; and it is up to the researcher to select the right method for his/her specific application [6].

The next step is to identify how one can find the natural clusters among items characterized by many attributes. A number of cluster analysis procedures (single linkage (nearest neighbor), Complete linkage (furthest neighbor), Unweighted pair-group average (UPGMA), Weighted pair-group average (WPGMA), Unweighted pair-group centroid (UPGMC), Weighted pair-group centroid (median), Ward’s method, *etc.*) are available; many of these begin with an *n*-dimensional space in which each entity is represented by a single point. The dimensions in the space represent the characteristics upon which the entities are to be compared. Similarity between entities can be measured by: (1) the correlation of entities’ scores on the dimensions (cophenetic correlation) or (2) the distance between points in the space (points closest to each other are most similar) [7, 8].

Suppose that the original data

$\{{X}_{i}\}$ have been modeled using a cluster method to produce a dendrogram

$\{{T}_{i}\}$; that is, a simplified model in which data that are ‘close’ have been grouped into a hierarchical tree. Define the following distance measures.

$x(i,j)=|{X}_{i}-{X}_{j}|$, the ordinary Euclidean distance between the

*i* th and

*j* th observations.

$t(i,j)=$ the dendrogrammatic distance between the model points

${T}_{i}$ and

${T}_{j}$. This distance is the height of the node at which these two points are first joined together. Then, letting

*x* be the average of the

$x(i,j)$, and letting

*t* be the average of the

$t(i,j)$, the cophenetic correlation coefficient

*c* is defined as in (1) [

9].

$c=\frac{{\sum}_{i<j}(x(i,j)-x)(t(i,j)-t)}{\sqrt{[{\sum}_{i<j}{(x(i,j)-x)}^{2}][{\sum}_{i<j}{(t(i,j)-t)}^{2}]}}.$

(1)

Since its introduction by Sokal and Rohlf [10], the cophenetic correlation coefficient has been widely used in numerical phenetic studies, both as a measure of degree of fit of a classification to a set of data and as a criterion for evaluating the efficiency of various clustering techniques [11]. In statistics, and especially in biostatistics, cophenetic correlation (more precisely, the cophenetic correlation coefficient) is a measure of how faithfully a dendrogram preserves the pairwise distances between the original unmodeled data points. Although it has been most widely applied in the field of biostatistics (typically to assess cluster-based models of DNA sequences, or other taxonomic models), it can also be used in other fields of inquiry where raw data tend to occur in clumps, or clusters. This coefficient has also been proposed for use as a test for nested clusters [12].

The problem of comparing classifications with numerical methods is not new; the first effective numerical method known to us is the ‘cophenetic correlation’ technique of Sokal and Rohlf [10]. Beginning with the development of cophenetic correlations methods for comparison of dendrograms have recently been the object of strong interest. Baker [13] investigated the impact of observational errors on the dendrograms produced by the complete linkage and single linkage hierarchical grouping techniques. The goodness of fit of the dendrograms was measured by means of the Goodman-Kruskal gamma coefficient. The gamma coefficients indicated that the single linkage grouping technique was more sensitive to the type of data errors employed than the complete linkage technique. Hubert [14] compared two rank orderings of the object pairs. He tested hypothesis that the given set of proximity values have been assigned randomly by referring the Goodman-Kruskal rank correlation *γ* statistic to an approximate permutation distribution. Kuiper and Fisher [15] compared six hierarchical clustering procedures (single linkage, complete linkage, median, average linkage, centroid and Ward’s method) for multivariate normal data, assuming that the true number of clusters was known. The authors used the Rand index, which gives a proportion of correct groupings, to compare the clustering methods. In their study for clusters of equal sizes, Ward’s method and complete linkage method, with very unequal cluster sizes centroid and average linkage method found best, respectively. Blashfield [16] compared four types of hierarchical clustering methods (single linkage, complete linkage, average linkage and Ward’s method) for accuracy in recovery of original population clusters. He used Cohen’s statistic to measure the accuracy of the clustering methods. According to his results, Ward’s method performed significantly better than the other clustering procedures and average linkage gave relatively poor results. According to Milligan [17], complete linkage and Ward’s method reacted badly when outliers were introduced into the simulated data.

Hands and Everitt [18] compared five hierarchical clustering techniques (single linkage, complete linkage, average, centroid, and Ward’s method) on multivariate binary data. They found that Ward’s method was the best overall than other hierarchical methods. Yao [19] discussed six classical clustering algorithms: *k*-means, SOM, EM-based clustering, classification EM clustering, fuzzy *k*-means, leader clustering and different combination scenarios of these algorithms. He used a count of cluster categories, classification accuracy and cluster entropy. Ferreira and Hitchcock [20] compared the performance of four major hierarchical methods (single linkage, complete linkage, average linkage and Ward’s method) for clustering functional data. They used the Rand index to compare the performance of each clustering method. According to their study, Ward’s method was usually the best, while average linkage performed best in some special situations, in particular, when the number of clusters is over specified. Milligan and Cooper [21] used four agglomerative hierarchical clustering methods to generate partition solutions and formed one factor in the overall design. These were the single link, complete link, group average (UPGMA) and Ward’s minimum variance methods. As a result, they found that the single link technique was least effective while the group average and Ward’s methods gave the best overall recovery.

Consider the studies in the literature and the importance of using the most convenient cluster method under different conditions (sample size, variables number and distance measures), a detailed simulation study is undertaken. This study gives more insight into the functioning of the cluster method under different conditions. The purpose of this research is to investigate the best clustering method under different conditions.