Sparse cluster analysis of large-scale discrete variables with application to single nucleotide polymorphism data

Current extremely large scale genetic data presents significant challenges for cluster analysis. Most existing clustering methods are typically built on Euclidean distance and geared toward analyzing continuous response. They work well for clustering, e.g., microarray gene expression data, but often...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:Journal of applied statistics. - 1991. - 40(2013), 2 vom: 01. Feb., Seite 358-367
1. Verfasser: Wu, Baolin (VerfasserIn)
Format: Aufsatz
Sprache:English
Veröffentlicht: 2013
Zugriff auf das übergeordnete Werk:Journal of applied statistics
Schlagworte:Journal Article Clustering Expectation-Maximization algorithm K-means Lasso Latent class model Principal components Single nucleotide polymorphism Sparse clustering
LEADER 01000caa a22002652 4500
001 NLM226104559
003 DE-627
005 20250215043327.0
007 tu
008 231224s2013 xx ||||| 00| ||eng c
028 5 2 |a pubmed25n0753.xml 
035 |a (DE-627)NLM226104559 
035 |a (NLM)23526332 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Wu, Baolin  |e verfasserin  |4 aut 
245 1 0 |a Sparse cluster analysis of large-scale discrete variables with application to single nucleotide polymorphism data 
264 1 |c 2013 
336 |a Text  |b txt  |2 rdacontent 
337 |a ohne Hilfsmittel zu benutzen  |b n  |2 rdamedia 
338 |a Band  |b nc  |2 rdacarrier 
500 |a Date Revised 21.10.2021 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Current extremely large scale genetic data presents significant challenges for cluster analysis. Most existing clustering methods are typically built on Euclidean distance and geared toward analyzing continuous response. They work well for clustering, e.g., microarray gene expression data, but often perform poorly for clustering, e.g., large scale single nucleotide polymorphism data. In this paper, we study the penalized latent class model for clustering extremely large scale discrete data. The penalized latent class model takes into account the discrete nature of the response using appropriate generalized linear models and adopts the lasso penalized likelihood approach for simultaneous model estimation and selection of important covariates. We develop very efficient numerical algorithms for model estimation based on the iterative coordinate descent approach and further develop the Expectation-Maximization algorithm to incorporate and model missing values. We use simulation studies and applications to the international HapMap single nucleotide polymorphism data to illustrate the competitive performance of the penalized latent class model 
650 4 |a Journal Article 
650 4 |a Clustering 
650 4 |a Expectation-Maximization algorithm 
650 4 |a K-means 
650 4 |a Lasso 
650 4 |a Latent class model 
650 4 |a Principal components 
650 4 |a Single nucleotide polymorphism 
650 4 |a Sparse clustering 
773 0 8 |i Enthalten in  |t Journal of applied statistics  |d 1991  |g 40(2013), 2 vom: 01. Feb., Seite 358-367  |w (DE-627)NLM098188178  |x 0266-4763  |7 nnns 
773 1 8 |g volume:40  |g year:2013  |g number:2  |g day:01  |g month:02  |g pages:358-367 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 40  |j 2013  |e 2  |b 01  |c 02  |h 358-367