Notes from D3 data viz design principles

Notes from Data Visualisation and D3.js (Communicating with Data) MOOC by udacity (Highly recommend*****)

Choosing a good graph

Visualization for Data Science

P1 for data scientist is to get a simple solution to solve problems and choosing the right tool (chart)

Visual encoding applied to data types which shows relationships = chart type

Data Type – continuous vs categorical

Dimension – 1D, 2D, 3D

Chart types

  • Small multiple – Tufte – grid of simple graphs scatter or line to compare
  • Box plot – Tukey – distribution and quantiles, esp. comparing distribution
  • Line plot – Change over time and patterns, usually over equally spaced time intervals
  • Bar – Individual values, support comparisons, and can show rankings or deviations
  • Pie – part-whole relationship best suited for one category, poor for comparison
  • Stacked bar – part-whole relationship and best suited for showing composition with categories and totals
  • Bubbles – how three or more sets of values vary, show correlation
  • Map – how two pairs sets of values vary, show correlation, radical/dot plot (geo+shape), choropleth (geo+data), cartogram (geo+size),
  • Table – summary
  • New – Sparkline (line no axis), Cycle plot (for cycles in data), connected scatter plots (data trends with a tail), violin plots (similar to box plots)

Pre-attentive processing – automated processes for vision and perception – instantness (colour, movement, form, spacial position)

Use of negative space, Redundant coding,


“Get it right in black and white” before you begin adding colour

If use colour -medium hues or pastels, avoid bright colours, use colour to highlight, check your colours for colour blindness

Rules of using colour

R colour palettes

“Indeed, so difficult and subtle that avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm.” Edward Tufte

Laws of perception

Proximity (pisa), Similarity (look alike), Figure and Ground (esher), Continuity (long arm), Closure (perception complete unfinished objects), Simplicity

Chart junk

  • heavy or dark grid lines, unnecessary text, ornamented chart axes, pictures within graph, shading or 3d perceptive
  • Data-ink ratio = ink used to describe data / ink used to describe everything else (high = good)

Lie factor (Edward Tufte)

  • lie factor = size of the effect shown in the graphic / size of effect shown in the data ( lie factor of 1 is ideal (0.95 < <1.05). Jitter thought of as noice.
  • Graphic ((2nd value – 1st value)/1st value )*100
  • Data ((2nd value – 1st value)/1st value )*100
  • Graphic(%)/Data(%) = lie factor (more 1 very bad)

Grammar of Graphics

  • Aesthetic (graph) , data (tabular)
  • Separation of concerns
    • Independently transform data and present data
    • Delegate work and responsibilities
      • Engineer focuses on data manipulation
      • Designer focuses on visual encoding of data
    • Present multiple visual representations of a datasets
    • Composable Common elements
      • Coordinate System (cartesian vs. radial/polar)
      • Scales (linear, logarithmic, etc.)
      • Text annotations
      • Shape (lines, circles, etc.)
      • Data Types (Categorical, Continuous, etc.)

GG pipeline – source, data, variables, algebra, scales, stats, geometry, co-ordinates, aesthetics, render graphic

D3 – chaining of pipeline

d3.json = loads a data file and returns an array of Javascript objects

d3.nest =groups data based on particular keys and returns an array of JSON

d3.scale = converts data to a pixel or color value that can be displayed

d3.layout = applies common transformations on predefined chart objects

d3.selection.append = inserts HTML or SVG elements into a web page

d3.selection.attr = changes a characteristic of an element such as position or fill


Clarity or aesthetics