Big data is not just about the size of data as the terminology refers. The standard phrasing about the 4V’s –Volume, Variety, Veracity,Velocity of big data. These refer to size of data, different data types structured and unstructured , the trustworthiness of the data and the frequency of incoming data that needs to be processed.
“Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy.”[ii]
Big data is “about obtaining predictive, actionable, valuable insight from complex data at scale.”[i]
Data sources are spread across organisations in data siloes. Big data initiatives allow organisations to bring data sources together to provide the organisational-wide actionable, valuable insights to allow better decision-making to be reach.
“Data Science is about surprising people by discovering and showing ‘things they didn’t know from data’.”[iii] Data Science does not start by analysing big data but instead by asking key business questions.
What do we know? What do we not know? What have not been answered yet? What are the known unknowns? What are the known known?
“Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics”[iv]
“The reason it takes a science to convert a raw resource into something of value is because what is extracted is never in a useful form. ‘Data in the raw’ is littered with useless noise, irrelevant information, and misleading patterns. To convert this into that precious thing we are after requires a study of its properties and the discovery of a working model that captures the behavior we are interested in. Being in possession of a model despite the noise means an organization now owns the beginnings of further discovery and innovation. Something unique to their business that has given them the knowledge of what to look for, and the codified descriptions of a world that can now be mechanized and scaled.”[v]
What is a Data Scientist?
A data scientist uses there scientific tools and techniques (mathematical, computational, visual, analytic, statistical, experimental, problem definition, model-building and validation, etc.) to derive discoveries, insights, and value from data collection.
“A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a Renaissance individual who really wants to learn and bring change to an organization. Whereas a traditional data analyst may look only at data from a single source – a CRM system, for example – a data scientist will most likely explore and examine data from multiple disparate sources. The data scientist will sift through all data with the goal of discovering a previously hidden insight, which in turn can provide a competitive advantage or address a pressing business problem. A data scientist does not simply collect and report on data, but also looks at it from many angles, determines what it means, then recommends ways to apply the data.”[vi]
Big Data vs. Data Science
“Although both offer the potential to produce value from data, the fundamental difference between Data Science and Big Data can be summarized in one statement:
Collecting Does Not Mean Discovering”[vii]
Most people talk about Big Data, they are mainly referring to Big Data as an IT engineering problem. “We call this ‘data plumbing. Data Science main concern is not ‘data plumbing.’”[viii] “Big Data looks to collect and manage large amounts of varied data to serve large-scale web applications and vast sensor networks.”
“Data Science looks to create models that capture the underlying patterns of complex systems, and codify those ” [ix]