Scalable Data Science
Introduction
Prelude of 2016 Version
Some Basics and Essentials
Week 1: Introduction to Scalable Data Science
Week 2: Introduction to Spark RDDs, Transformations and Actions and Word Count of the US State of the Union Addresses
Week 3: Introduction to Spark SQL, ETL and EDA of Diamonds, Power Plant and Wiki CLick Streams Data
Week 4: Introduction to Machine Learning - Unsupervised Clustering and Supervised Classification
Week 5: Introduction to Non-distributed and Distributed Linear Algebra and Applied Linear Regression
- Linear Algebra Introduction
  - HOMEWORK: breeze linear algebra cheat sheet
- Linear Regression Introduction
- Distributed Linear Algebra for Linear Regression Introduction
  - HOMEWORK: Spark Data Types for Distributed Linear Algebra
- Power Plant Pipeline: Model, Tune, Evaluate
Week 6: Introduction to Spark Streaming, Twitter Collector, Top Hashtag Counter and Streaming Model-Prediction Server
Week 7: Probabilistic Topic Modelling via Latent Dirichlet Allocation and Intro to XML-parsing of Old Bailey Online
- Probabilistic Topic Modelling
- HOMEWORK: Introduction to XML-parsing of Old Bailey Online
Week 8: Graph Querying in GraphFrames and Distributed Vertex Programming in GraphX
- Introduction to GraphFrames
- HOMEWORK: On-Time Flight Performance with GraphFrames
Week 9: Deep Learning, Convolutional Neural Nets, Sparkling Water and Tensor Flow
Week 10: Scalable Geospatial Analytics with Magellan
- What is Scalable Geospatial Analytics
- Introduction to Magellan for Scalable Geospatial Analytics
Week 11 and 12: Student Projects
Extra Resources
- AWS Educate
- Databricksified Spark SQL Programming Guide 1.6
- Linear Algebra Cheat Sheet
- Databricksified Data Types in MLLib Programming Guide 1.6
- Introduction to XML-parsing of Old Bailey Online

Powered by GitBook

Week 4: Introduction to Machine Learning - Unsupervised Clustering and Supervised Classification

Introduction to Machine Learning - Unsupervised Clustering and Supervised Classification

Sections

results matching ""

No results matching ""