Explanation of Clustering
Data clustering is the process of dividing data elements into classes or clusters so that items in the same class are as similar as possible, and items in different classes are as dissimilar as possible. Depending on the nature of the data and the purpose for which clustering is being used, different measures of similarity may be used to place items into classes, where the similarity measure controls how the clusters are formed. Some examples of measures that can be used as in clustering include distance, connectivity, and intensity.
In hard clustering, data is divided into distinct clusters, where each data element belongs to exactly one cluster. In fuzzy clustering (also referred to as soft clustering), data elements can belong to more than one cluster, and associated with each element is a set of membership levels. These indicate the strength of the association between that data element and a particular cluster. Fuzzy clustering is a process of assigning these membership levels, and then using them to assign data elements to one or more clusters.
One of the most widely used fuzzy clustering algorithms is the Fuzzy C-Means (FCM) Algorithm (Bezdek 1981). The FCM algorithm attempts to partition a finite collection of n elements into a collection of c fuzzy clusters with respect to some given criterion. Given a finite set of data, the algorithm returns a list of c cluster centres and a partition matrix, where each element uij tells the degree to which element xi belongs to cluster cj . Like the k-means algorithm, the FCM aims to minimize an objective function. The standard function is:
which differs from the k-means objective function by the addition of the membership values uij and the fuzzifier m. The fuzzifier m determines the level of cluster fuzziness. A large m results in smaller memberships uij and hence, fuzzier clusters. In the limit m = 1, the memberships uij converge to 0 or 1, which implies a crisp partitioning. In the absence of experimentation or domain knowledge, m is commonly set to 2. The basic FCM Algorithm, given n data points (x1, . . ., xn) to be clustered, a number of c clusters with (c1, . . ., cc) the center of the clusters, and m the level of cluster fuzziness with,
Read more about this topic: Fuzzy Clustering
Famous quotes containing the words explanation of and/or explanation:
“To develop an empiricist account of science is to depict it as involving a search for truth only about the empirical world, about what is actual and observable.... It must involve throughout a resolute rejection of the demand for an explanation of the regularities in the observable course of nature, by means of truths concerning a reality beyond what is actual and observable, as a demand which plays no role in the scientific enterprise.”
—Bas Van Fraassen (b. 1941)
“What causes adolescents to rebel is not the assertion of authority but the arbitrary use of power, with little explanation of the rules and no involvement in decision-making. . . . Involving the adolescent in decisions doesnt mean that you are giving up your authority. It means acknowledging that the teenager is growing up and has the right to participate in decisions that affect his or her life.”
—Laurence Steinberg (20th century)