Charting EDA : Characterizing Interactive Visualization Use in Computational Notebooks with a Mixed-Methods Formalism

Interactive visualizations are powerful tools for Exploratory Data Analysis (EDA), but how do they affect the observations analysts make about their data? We conducted a qualitative experiment with 13 professional data scientists analyzing two datasets with Jupyter notebooks, collecting a rich datas...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on visualization and computer graphics. - 1996. - PP(2024) vom: 10. Okt.
1. Verfasser: Wootton, Dylan (VerfasserIn)
Weitere Verfasser: Fox, Amy Rae, Peck, Evan, Satyanarayan, Arvind
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on visualization and computer graphics
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM378739913
003 DE-627
005 20241011232848.0
007 cr uuu---uuuuu
008 241011s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TVCG.2024.3456217  |2 doi 
028 5 2 |a pubmed24n1564.xml 
035 |a (DE-627)NLM378739913 
035 |a (NLM)39388331 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Wootton, Dylan  |e verfasserin  |4 aut 
245 1 0 |a Charting EDA  |b Characterizing Interactive Visualization Use in Computational Notebooks with a Mixed-Methods Formalism 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 11.10.2024 
500 |a published: Print-Electronic 
500 |a Citation Status Publisher 
520 |a Interactive visualizations are powerful tools for Exploratory Data Analysis (EDA), but how do they affect the observations analysts make about their data? We conducted a qualitative experiment with 13 professional data scientists analyzing two datasets with Jupyter notebooks, collecting a rich dataset of interaction traces and think-aloud utterances. By qualitatively coding participant utterances, we introduce a formalism that describes EDA as a sequence of analysis states, where each state is comprised of either a representation an analyst constructs (e.g., the output of a data frame, an interactive visualization, etc.) or an observation the analyst makes (e.g., about missing data, the relationship between variables, etc.). By applying our formalism to our dataset, we identify that interactive visualizations, on average, lead to earlier and more complex insights about relationships between dataset attributes compared to static visualizations. Moreover, by calculating metrics such as revisit count and representational diversity, we uncover that some representations serve more as "planning aids" during EDA rather than tools strictly for hypothesis-answering. We show how these measures help identify other patterns of analysis behavior, such as the "80-20 rule", where a small subset of representations drove the majority of observations. Based on these fndings, we offer design guidelines for interactive exploratory analysis tooling and refect on future directions for studying the role that visualizations play in EDA 
650 4 |a Journal Article 
700 1 |a Fox, Amy Rae  |e verfasserin  |4 aut 
700 1 |a Peck, Evan  |e verfasserin  |4 aut 
700 1 |a Satyanarayan, Arvind  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on visualization and computer graphics  |d 1996  |g PP(2024) vom: 10. Okt.  |w (DE-627)NLM098269445  |x 1941-0506  |7 nnns 
773 1 8 |g volume:PP  |g year:2024  |g day:10  |g month:10 
856 4 0 |u http://dx.doi.org/10.1109/TVCG.2024.3456217  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d PP  |j 2024  |b 10  |c 10