Foundational Principles for Large-Scale Inference : Illustrations Through Correlation Mining

When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, a...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the IEEE. Institute of Electrical and Electronics Engineers. - 1998. - 104(2016), 1 vom: 14. Jan., Seite 93-110
1. Verfasser: Hero, Alfred O (VerfasserIn)
Weitere Verfasser: Rajaratnam, Bala
Format: Aufsatz
Sprache:English
Veröffentlicht: 2016
Zugriff auf das übergeordnete Werk:Proceedings of the IEEE. Institute of Electrical and Electronics Engineers
Schlagworte:Journal Article Big Data asymptotic regimes correlation estimation correlation mining correlation screening correlation selection graphical models large scale inference purely high dimensional mehr... sample complexity triple asymptotic framework unifying learning theory
LEADER 01000caa a22002652 4500
001 NLM259490660
003 DE-627
005 20250220004040.0
007 tu
008 231224s2016 xx ||||| 00| ||eng c
028 5 2 |a pubmed25n0864.xml 
035 |a (DE-627)NLM259490660 
035 |a (NLM)27087700 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Hero, Alfred O  |e verfasserin  |4 aut 
245 1 0 |a Foundational Principles for Large-Scale Inference  |b Illustrations Through Correlation Mining 
264 1 |c 2016 
336 |a Text  |b txt  |2 rdacontent 
337 |a ohne Hilfsmittel zu benutzen  |b n  |2 rdamedia 
338 |a Band  |b nc  |2 rdacarrier 
500 |a Date Revised 25.03.2024 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks 
650 4 |a Journal Article 
650 4 |a Big Data 
650 4 |a asymptotic regimes 
650 4 |a correlation estimation 
650 4 |a correlation mining 
650 4 |a correlation screening 
650 4 |a correlation selection 
650 4 |a graphical models 
650 4 |a large scale inference 
650 4 |a purely high dimensional 
650 4 |a sample complexity 
650 4 |a triple asymptotic framework 
650 4 |a unifying learning theory 
700 1 |a Rajaratnam, Bala  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t Proceedings of the IEEE. Institute of Electrical and Electronics Engineers  |d 1998  |g 104(2016), 1 vom: 14. Jan., Seite 93-110  |w (DE-627)NLM098145274  |x 0018-9219  |7 nnns 
773 1 8 |g volume:104  |g year:2016  |g number:1  |g day:14  |g month:01  |g pages:93-110 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 104  |j 2016  |e 1  |b 14  |c 01  |h 93-110