Notes from Data Visualisation and D3.js (Communicating with Data) MOOC by udacity (Highly recommend*****)
Visualization for Data Science
P1 for data scientist is to get a simple solution to solve problems and choosing the right tool (chart)
Visual encoding applied to data types which shows relationships = chart type
Data Type – continuous vs categorical
Dimension – 1D, 2D, 3D
Chart types
- Small multiple – Tufte – grid of simple graphs scatter or line to compare
- Box plot – Tukey – distribution and quantiles, esp. comparing distribution
- Line plot – Change over time and patterns, usually over equally spaced time intervals
- Bar – Individual values, support comparisons, and can show rankings or deviations
- Pie – part-whole relationship best suited for one category, poor for comparison
- Stacked bar – part-whole relationship and best suited for showing composition with categories and totals
- Bubbles – how three or more sets of values vary, show correlation
- Map – how two pairs sets of values vary, show correlation, radical/dot plot (geo+shape), choropleth (geo+data), cartogram (geo+size),
- Table – summary
- New – Sparkline (line no axis), Cycle plot (for cycles in data), connected scatter plots (data trends with a tail), violin plots (similar to box plots)
Pre-attentive processing – automated processes for vision and perception – instantness (colour, movement, form, spacial position)
Use of negative space, Redundant coding,
COLOUR
“Get it right in black and white” before you begin adding colour
If use colour -medium hues or pastels, avoid bright colours, use colour to highlight, check your colours for colour blindness
https://color.adobe.com/create/color-wheel
“Indeed, so difficult and subtle that avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm.” Edward Tufte
Proximity (pisa), Similarity (look alike), Figure and Ground (esher), Continuity (long arm), Closure (perception complete unfinished objects), Simplicity
Chart junk
- heavy or dark grid lines, unnecessary text, ornamented chart axes, pictures within graph, shading or 3d perceptive
- Data-ink ratio = ink used to describe data / ink used to describe everything else (high = good)
Lie factor (Edward Tufte)
- lie factor = size of the effect shown in the graphic / size of effect shown in the data ( lie factor of 1 is ideal (0.95 < <1.05). Jitter thought of as noice.
- Graphic ((2nd value – 1st value)/1st value )*100
- Data ((2nd value – 1st value)/1st value )*100
- Graphic(%)/Data(%) = lie factor (more 1 very bad)
- Aesthetic (graph) , data (tabular)
- Separation of concerns
- Independently transform data and present data
- Delegate work and responsibilities
- Engineer focuses on data manipulation
- Designer focuses on visual encoding of data
- Present multiple visual representations of a datasets
- Composable Common elements
- Coordinate System (cartesian vs. radial/polar)
- Scales (linear, logarithmic, etc.)
- Text annotations
- Shape (lines, circles, etc.)
- Data Types (Categorical, Continuous, etc.)
GG pipeline – source, data, variables, algebra, scales, stats, geometry, co-ordinates, aesthetics, render graphic
D3 – chaining of pipeline
d3.json = loads a data file and returns an array of Javascript objects
d3.nest =groups data based on particular keys and returns an array of JSON
d3.scale = converts data to a pixel or color value that can be displayed
d3.layout = applies common transformations on predefined chart objects
d3.selection.append = inserts HTML or SVG elements into a web page
d3.selection.attr = changes a characteristic of an element such as position or fill