Data Scientist CheatSheet

I have written this cheat sheet to help Data Scientists navigate and explain the landscape of confusion when it comes to Big Data, CRM’s, Single Customer View and What a Data Scientist is? It’s very difficult to explain to someone trying to explain to you what they think  you do, when they are often wrong and get them to understand that everything can not be solved with hadoop :). I wrote this article to help me to explain to myself and others, and I hope it helps you too. A most essential part of being a data scientist is listening to others, empathizing with them and creating solution that will work for them and the future business’s viability.

Big Data


Big data is not just about the size of data. It’s “about obtaining predictive, actionable, valuable insight from complex data at scale.”[i]

“Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy.”[ii]


Why Big Data?

Big data is important organisation nowadays have vast amounts of data from your simple excel spreadsheets to large operational databases every shape and size of data in-between. But many organisation has the same issue the data sources are spread across departments and are siloed. Little care has been taking with caring for data sources when it comes to good documentation, data cleaning, access and re-use.

Benefits of Big Data

 The big data initiative are one a way of bringing those data sources together to provide actionable, valuable insights to businesses to make better decisions and to reach their objectives. Data is ready to discovered and understood.


Data Science


“Data Science is about discovering and showing people ‘things they didn’t know using data’.”[iii]

Data Science does not start by analysing big data but instead by asking key business questions.

What do we know?

What do we not know?

What have not been answered yet?

What are the known unknowns?

What are the known known?


 “Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics”[iv]

“The reason it takes a science to convert a raw resource into something of value is because what is extracted is never in a useful form. ‘Data in the raw’ is littered with useless noise, irrelevant information, and misleading patterns. To convert this into that precious thing we are after requires a study of its properties and the discovery of a working model that captures the behavior we are interested in. Being in possession of a model despite the noise means an organization now owns the beginnings of further discovery and innovation. Something unique to their business that has given them the knowledge of what to look for, and the codified descriptions of a world that can now be mechanized and scaled.”[v]


What is a Data Scientist?


 A data scientist uses their scientific tools and techniques (mathematical, computational, visual, analytic, statistical, experimental, problem definition, model-building and validation, etc.) to derive discoveries, insights, and value from data collection.

 “A traditional data analyst may look only at data from a single source – a CRM system, for example – a data scientist will most likely explore and examine data from multiple disparate sources. The data scientist will sift through all data with the goal of discovering a previously hidden insight, which in turn can provide a competitive advantage or address a pressing business problem. A data scientist does not simply collect and report on data, but also looks at it from many angles, determines what it means, then recommends ways to apply the data.”[vi]


What is the difference between Big Data vs. Data Science?

Both offer the potential to produce value from data, the fundamental difference is described in this statement “Collecting Does Not Mean Discovering[vii]

Big data concerns itself with collecting more data, meaning investment in data-focused tools. Most people talk about Big Data, they are mainly referring to big data as an IT engineering problem called ‘data plumbing’[viii].Big Data looks to collect and manage large amounts of varied data to serve for example large-scale web applications and vast sensor networks.[ix]

Data science concerns itself with discovering insights from the data meaning investing time in converting data into valuable insights. (“Data Science main concern is not ‘data plumbing.’”[x])“Data Science looks to create models that capture the underlying patterns of complex systems, and codify those models into working applications”[xi]  These are not mathematical models but process models for deriving insights.

Check out the great “Blacksmith analogy” from KDnuggets

This diagram summarises what the transition to data-driven business entails.


So leading on from Big Data and Data Science the first business question many Big Data Team face will be addressing is “Who are our customer?”.


Single Customer View

 A Single Customer View (SCV) is a ma, being created by the Big Data Initiative to provide, a aggregated, consistent and holistic representation of customer data.


“A Single Customer View provides businesses with the ability to track customers and their communications across every channel.” [xiii]

Benefits of Single Customer View

  • Improved customer service levels
  • Better customer retention
  • Higher conversion rates
  • Improved overall customer lifetime value.

“SCV also means being able to use the huge amount of data being pulled in from all these various channels into one place, and being able to use that data in a meaningful way. By building a fuller, personalised picture of the customer and their journey, a business will have a more insightful guide to improving future sales and make improvements to future customer interactions.”[xiv]

Why Single Customer View?

Having this view will allow the business to understand who their customers are and further produce groups, cohorts and types based on behaviours and motivation to understand preferences of the individual visitors.  A unique visitor identifier such as e-mail will be used to link associated behavioural data of the visitor from web, e-commerce, marketing, Wi-Fi and e-newsletter data sources for individuals and aggregated to make visitor groups.


Customer Relationship Management (CRM)

Customer relationship management system is a system and strategy for managing all a company’s interactions and relationships of current and potential customers.

“The system is a piece of technology used to organise, automate and synchronise all of the customer facing areas within your company: from marketing to sales to customer service to technical support.”[xv]

“The main components of CRM are building and managing customer relationships through marketing, observing relationships as they mature through distinct phases, managing these relationships at each stage and recognizing that the distribution of value of a relationship to the firm is not homogenous. When building and managing customer relationships through marketing, firms might benefit from using a variety of tools to help organizational design, incentive schemes, customer structures, and more to optimize the reach of its marketing campaigns.”[xvi]

What is the difference between a single customer view and CRM?

Single Customer View could be understood as a type of analytical CRM.

Its focus is NOT customer communication but an overview of all customers, their communications and behavioural data to provide rich insights of who the business’s customers are and to recognize patterns, trends and profiles type. New undiscovered insights from this holistic view of the customers can be feed into the businesses communication strategy. In addition to assisting in providing insights for closed loop marketing and provisioning for dynamic personalized online content to achieve higher conversions and engagement levels.

CRM’s (referred to as an operational CRM) primary goal is to integrate and automate sales, marketing and customer support focusing on a dashboard for improved customer communication and relationships.

Dashboards provide customer information and purchase history on an individual level.

  • provides customer support through multiple channels such as email, phone, online ticketing, FAQ’s etc.


Closed Loop Marketing

“Closing the loop” relies on providing the marketing team with insights on sales and conversions from all marketing channels.

This will help inform them to how best to continue marketing to customers and effectively evaluate marketing channel and campaign performance (on an aggregated customer level).

Types of Marketing Channels

  • Email
  • Printed Adverts e.g. magazine, newspaper, tube adverts
  • PPC Adwords
  • Social Media
  • Webpages

Insights from single customer view, with aggregated view of customer data from all online and offline channels and touch points, will help optimize marketing by to new and target audiences.


I hope this helps demystify the world of data a bit.