Title: | Bootstrap a Clustering Solution to Establish the Stability of the Clusters |
---|---|
Description: | Providing a cluster allocation for n samples, either with an $n \times p$ data matrix or an $n \times n$ distance matrix, a bootstrap procedure is performed. The proportion of bootstrap replicates where a pair of samples cluster in the same cluster indicates who tightly the samples in a particular cluster clusters together. |
Authors: | Sugnet Lubbe [aut, cre, cph] |
Maintainer: | Sugnet Lubbe <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.2 |
Built: | 2024-11-06 03:10:40 UTC |
Source: | https://github.com/cran/ClusBoot |
Heatmap of the proportion of bootstrap replicates where objects cluster together
boot.proportions( x, col = grDevices::heat.colors(101, rev = TRUE), show.vals = F, text.col = "black", cluster.col = "firebrick", ... )
boot.proportions( x, col = grDevices::heat.colors(101, rev = TRUE), show.vals = F, text.col = "black", cluster.col = "firebrick", ... )
x |
an object of class clusboot |
col |
vector of colours for shading to indicate proportion values |
show.vals |
logical value indicating whether proportion values should be added to individual cells |
text.col |
colour of text for show.vals if |
cluster.col |
colour of lines demarcating cluster membership |
... |
more arguments to be passed to |
out <- clusboot (scale(case.study.psychiatrist), B=100, k=6, clustering.func=complete.linkage) boot.proportions(out)
out <- clusboot (scale(case.study.psychiatrist), B=100, k=6, clustering.func=complete.linkage) boot.proportions(out)
Produces silhouette plots
boot.silhouette(clusboot.out, ...)
boot.silhouette(clusboot.out, ...)
clusboot.out |
an object of class clusboot |
... |
more arguments to be passed to |
list of silhouette widths
out <- clusboot (scale(case.study.psychiatrist), B=100, k=6, clustering.func=complete.linkage) boot.silhouette(out)
out <- clusboot (scale(case.study.psychiatrist), B=100, k=6, clustering.func=complete.linkage) boot.silhouette(out)
Computes the silhouette vales based on proportion of times items cluster together
calc.silhouette(clusboot.out)
calc.silhouette(clusboot.out)
clusboot.out |
an object of class clusboot |
an object of class clusboot
Presence/absence ratings of 24 psychiatric symptoms in 30 psychiatric inpatients made by an individual psychiatrist. The data have been collected in a case study of an individual psychiatrist to identify his implicit taxonomy.
case.study.psychiatrist
case.study.psychiatrist
case.study.psychiatrist
A data frame with 30 observations on the following 28 variables:
V1
inappropriate affect, appearance or behavior; binary vector
V2
interview belligerence - negativism; binary vector
V3
agitation - excitement; binary vector
V4
retardation; binary vector
V5
lack of emotions; binary vector
V6
speech disorganization; binary vector
V7
grandiosity; binary vector
V8
suspicion - ideas of persecution; binary vector
V9
hallucinations - delusions; binary vector
V10
overt anger; binary vector
V11
depression; binary vector
V12
anxiety; binary vector
V13
obsession - compulsion; binary vector
V14
suicide; binary vector
V15
self injury; binary vector
V16
somatic concerns; binary vector
V17
social isolation; binary vector
V18
daily routine impairment; binary vector
V19
leisure time impairment; binary vector
V20
antisocial impulses or acts; binary vector
V21
alcohol abuse; binary vector
V22
drug abuse; binary vector
V23
disorientation; binary vector
V24
memory impairment; binary vector
V25
rating on Global Assessment Scale, a 101-point scale for overall severity of psychiatric disturbance; a numeric vector
V26
Affective (Affective Disorder or Anxiety Disorder); binary vector
V27
Psychotic (Schizophrenic Disorder or Paranoid Disorder); binary vector
V28
Substance abuse (Substance Use Disorder or Substance-Induced Disorder); binary vector
The data set forms part of the International Federation of Classification Societies Cluster Benchmark Data Repository
Van Mechelen, I., & De Boeck, P. (1989). Implicit taxonomy in psychiatric diagnosis: A case study. Journal of Social and Clinical Psychology, 8, 276-287. https://ifcs.boku.ac.at/repository/data/case_study_psychiatrist/index.html
Performs bootstrap on a cluster analysis output
clusboot(datmat, B = 1000, clustering.func = complete.linkage, ...)
clusboot(datmat, B = 1000, clustering.func = complete.linkage, ...)
datmat |
a data matrix or distance object which will be the input to the clustering function |
B |
number of bootstrap replicates |
clustering.func |
the function which will perform the clustering and output a vector of cluster memberships |
... |
more arguments to be passed to the clustering function |
Any R function performing cluster analysis can be specified in clustering.func
although a wrapper function is
typically needed to isolate only the vector output of cluster memberships. See ?complete.linkage
as an example.
Should users perfer to use alternative resamling schemes, other than the bootstrap, Hennig (2007) discuss a variety
of options which could be accessed by specifying clustering.func = fpc.clusterboot
. In addition, the sampling
method is specified in the argument bootmethod
and additional arguments for the function clusterboot
in the
package fpc
must be given. Note that only the resampling facilities of clusterboot
is utilised while the
computation of proportions and silhouette widths remain unchanged. The output object of class clusboot
will remain unchanged as only the resampling section of clusterboot
is used.
an object of class clusboot
which is a list with the following components:
proportions |
matrix of size nxn with cell ij containing the proportion of bootstrap replicates in which object i and object j clustered together. |
clustering |
a vector of length n containing the cluster membership of the n input objects. |
sil |
a vector of length the number of clusters containing the bootstrap-silhouette values for the clusters. |
indv.sil |
a vector of length n containing the bootstrap-silhouette values for the individual objects. |
sil.order |
a vector of length n containing the ordering of the n objects used by the functions
|
ave.sil.width |
the overall stability of the clustering solution, obtained by averaging over the individual object bootstrap-silhouette values. |
Hennig, C., 2007. Cluster-wise assessment of cluster stability. Computational Statistics & Data Analysis, 52(1), pp.258-271. Lubbe, S., 2024. Bootstrapping Cluster Analysis Solutions with the R Package ClusBoot. Austrian Journal of Statistics, 53(3), pp.1-19.
clusboot (scale(case.study.psychiatrist), B=100, k=6, clustering.func=complete.linkage) library(fpc) clusboot (scale(case.study.psychiatrist), B=100, k=6, clustering.func=fpc.clusterboot, clustermethod=hclustCBI, method="complete", bootmethod="subset", subtuning=10)
clusboot (scale(case.study.psychiatrist), B=100, k=6, clustering.func=complete.linkage) library(fpc) clusboot (scale(case.study.psychiatrist), B=100, k=6, clustering.func=fpc.clusterboot, clustermethod=hclustCBI, method="complete", bootmethod="subset", subtuning=10)
Wrapper function for performing complete linkage clustering
complete.linkage(X, k)
complete.linkage(X, k)
X |
samples x variables data matrix |
k |
number of clusters |
vector of cluster memberships
complete.linkage(scale(case.study.psychiatrist), k=6)
complete.linkage(scale(case.study.psychiatrist), k=6)
Resampling according to the methods discussed in Hennig (2007)
fpc.clusterboot( data, B, distances = (inherits(data, "dist")), bootmethod = "boot", bscompare = TRUE, multipleboot = FALSE, jittertuning = 0.05, noisetuning = c(0.05, 4), subtuning = floor(nrow(data)/2), clustermethod, noisemethod = FALSE, count = TRUE, seed = NULL, datatomatrix = TRUE, ... )
fpc.clusterboot( data, B, distances = (inherits(data, "dist")), bootmethod = "boot", bscompare = TRUE, multipleboot = FALSE, jittertuning = 0.05, noisetuning = c(0.05, 4), subtuning = floor(nrow(data)/2), clustermethod, noisemethod = FALSE, count = TRUE, seed = NULL, datatomatrix = TRUE, ... )
data |
a data matrix or distance object which will be the input to the clustering function |
B |
number of bootstrap replicates |
distances |
see |
bootmethod |
see |
bscompare |
see |
multipleboot |
see |
jittertuning |
see |
noisetuning |
see |
subtuning |
see |
clustermethod |
see |
noisemethod |
see |
count |
see |
seed |
see |
datatomatrix |
see |
... |
additional arguments to be sent to the function specified in clustermethod |
a list with two components; boot.out contains the computations for clusboot and out contains the clustering solution of the original data set
MDS plot of similarities given by the proportion of bootstrap replicates where objects cluster together
## S3 method for class 'clusboot' plot(x, col, show.silhouette = TRUE, ...)
## S3 method for class 'clusboot' plot(x, col, show.silhouette = TRUE, ...)
x |
an object of class clusboot |
col |
single colour or a vector specifying a colour for each object |
show.silhouette |
logical indicating whether plotting character size should represent the inidivual silhouette values |
... |
more arguments to be passed to |
matrix of similarities (proportions)
out <- clusboot (scale(case.study.psychiatrist), B=100, k=6, clustering.func=complete.linkage) plot(out)
out <- clusboot (scale(case.study.psychiatrist), B=100, k=6, clustering.func=complete.linkage) plot(out)