- Discover open data science
- Machine learning and classification
- Visualisation and communication
Course – http://training.theodi.org/ods/#/id/co-0
Key Skills
- Venn Diagram (DS skills)
- Ethics – de-anomalies ?
- IBM advert – What’s the values?
- Proxy measure – tfl- load on axial
- Chapter impact – Food Agency – view browsing history
“Part Analyst, part artist” – Anjul Bhambhri (VP of big data at IBM)
http://edsa-project.eu/resources/datasets/
Data Science skills are in demand
8 key areas
- Big Data (80-90% mention for DS jobs)
- Data Collection and Analysis
- Machine Learning prediction
- Maths and Statistics
- Interpretation and Visualisation
- Advance computing and programming
- Business Intelligence and Domain Expertise
- Open Source Tools and Concepts & Open Innovation
Big Data
Filtering and processing (6 million rows of data in 5 minutes https://socrata.com/ )
Pivot tables on big data in the cloud : https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data
Commodity scalable cloud computing is ready to e used. Build and test small and then scale on demand. Amazon EC2, Heroku, Cloudflare are all great examples. Socrata and tableau are others.
Explain and use “data on the web” and the “web of data”
Finding data
Data.gov.XX – Advance search – Site: Link: related: filetype:
File types
- XLS – tab
- CSV – tab
- TSV – tabular
- XML – hierachical
- JSON – hierachical
- YAML
Tabular- table form
Hierarchical – one way relation
Network – social -multiple directions
Portal Aggregators
Transport API – http://www.transportapi.com/
Enigma.io – google for open data
Scraping
PDFTABLES.COM https://pdftables.com/pricing – only for 50 paged free
magic.import.io – hosted version could deal with credentials log in. Terms of use – for research or reporting
Dbpedia http://147.228.127.146:9220/search/_all
Elastic search
Document and data
Webpages http://bbc.co.uk/news/
Rss feed – reduce due to advertisement revenue http://feeds.bbci.co.uk/news/rss.xml
Identifiers
- ISBN
- Postcode
- MAC address
Cool identifier
- http:/swtrains/trains/code
- Authority/data table/ identifier
- https://www.w3.org/Provider/Style/URI
Instead of html
Data browsers
Add extension .xml
RDF browser – Q&D RDF browser
http://graphite.ecs.soton.ac.uk/browser/
View using Postman
.rdf
http://wiki.dbpedia.org/projects/bubble-navigator
Doc – html website
Data – building data
Query – building
5-star http://5stardata.info/
- Open license
- Readable
- Open format
- Machine learning and Prediction
- Decision Tree – Decision tree for audio guide
Types of ML
- Supervised learning
- Unsupervised learning
- Semi supervised learning
- Re-inforcement learning
Classification
- Clustering
- Regression
http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
D3 -Visualisation Cross Filter
http://square.github.io/crossfilter/
Code pen online – experiment
http://codepen.io/davetaz/pen/NGprqg