Scalable Data Science
Introduction
Prelude of 2016 Version
Some Basics and Essentials
Week 1: Introduction to Scalable Data Science
Week 2: Introduction to Spark RDDs, Transformations and Actions and Word Count of the US State of the Union Addresses
Week 3: Introduction to Spark SQL, ETL and EDA of Diamonds, Power Plant and Wiki CLick Streams Data
Week 4: Introduction to Machine Learning - Unsupervised Clustering and Supervised Classification
Week 5: Introduction to Non-distributed and Distributed Linear Algebra and Applied Linear Regression
- Linear Algebra Introduction
  - HOMEWORK: breeze linear algebra cheat sheet
- Linear Regression Introduction
- Distributed Linear Algebra for Linear Regression Introduction
  - HOMEWORK: Spark Data Types for Distributed Linear Algebra
- Power Plant Pipeline: Model, Tune, Evaluate
Week 6: Introduction to Spark Streaming, Twitter Collector, Top Hashtag Counter and Streaming Model-Prediction Server
Week 7: Probabilistic Topic Modelling via Latent Dirichlet Allocation and Intro to XML-parsing of Old Bailey Online
- Probabilistic Topic Modelling
- HOMEWORK: Introduction to XML-parsing of Old Bailey Online
Week 8: Graph Querying in GraphFrames and Distributed Vertex Programming in GraphX
- Introduction to GraphFrames
- HOMEWORK: On-Time Flight Performance with GraphFrames
Week 9: Deep Learning, Convolutional Neural Nets, Sparkling Water and Tensor Flow
Week 10: Scalable Geospatial Analytics with Magellan
- What is Scalable Geospatial Analytics
- Introduction to Magellan for Scalable Geospatial Analytics
Week 11 and 12: Student Projects
Extra Resources
- AWS Educate
- Databricksified Spark SQL Programming Guide 1.6
- Linear Algebra Cheat Sheet
- Databricksified Data Types in MLLib Programming Guide 1.6
- Introduction to XML-parsing of Old Bailey Online

Powered by GitBook

Week 3: Introduction to Spark SQL, ETL and EDA of Diamonds, Power Plant and Wiki CLick Streams Data

Introduction to Spark SQL, ETL and EDA of Diamonds, Power Plant and Wiki CLick Streams Data

Sections

results matching ""

No results matching ""