Scalable Data Science
Introduction
Prelude of 2016 Version
Some Basics and Essentials
Week 1: Introduction to Scalable Data Science
Scalable Data Science
Why Spark?
Login to databricks
Scala Crash Course
Week 2: Introduction to Spark RDDs, Transformations and Actions and Word Count of the US State of the Union Addresses
RDDs, Transformations and Actions
HOMEWORK: RDDs, Transformations and Actions
Word Count: US State of Union Addesses
EXTRA_Word Count: ETL of US State of Union Addesses
Week 3: Introduction to Spark SQL, ETL and EDA of Diamonds, Power Plant and Wiki CLick Streams Data
Spark SQL Introduction
HOMEWORK: overview
HOMEWORK: getting started
HOMEWORK: data sources
HOMEWORK: performance tuning
HOMEWORK: distributed sql engine
ETL and EDA of Diamonds Data
ETL and EDA of Power Plant Data
ETL and EDA of Wiki Click Stream Data
Week 4: Introduction to Machine Learning - Unsupervised Clustering and Supervised Classification
Introduction to Machine Learning
Unsupervised Clustering of 1 Million Songs via K-Means in 3 Stages
Stage 1: Extract-Transform-Load
Stage 2: Explore
Stage 3: Model
Supervised Classification of Hand-written Digits via Decision Trees
Week 5: Introduction to Non-distributed and Distributed Linear Algebra and Applied Linear Regression
Linear Algebra Introduction
HOMEWORK: breeze linear algebra cheat sheet
Linear Regression Introduction
Distributed Linear Algebra for Linear Regression Introduction
HOMEWORK: Spark Data Types for Distributed Linear Algebra
Local Vector
Labeled Point
Local Matrix
Distributed Matrix
Row Matrix
Indexed Row Matrix
Coordinate Matrix
Block Matrix
Power Plant Pipeline: Model, Tune, Evaluate
Week 6: Introduction to Spark Streaming, Twitter Collector, Top Hashtag Counter and Streaming Model-Prediction Server
Introduction to Spark Streaming
Tweet Collector - broken down
Tweet Collector - Generic
Tweet Hashtag Counter
Streaming Model-Prediction Server, the Full Powerplant Pipeline
Week 7: Probabilistic Topic Modelling via Latent Dirichlet Allocation and Intro to XML-parsing of Old Bailey Online
Probabilistic Topic Modelling
HOMEWORK: Introduction to XML-parsing of Old Bailey Online
Week 8: Graph Querying in GraphFrames and Distributed Vertex Programming in GraphX
Introduction to GraphFrames
HOMEWORK: On-Time Flight Performance with GraphFrames
Week 9: Deep Learning, Convolutional Neural Nets, Sparkling Water and Tensor Flow
Deep Learning, A Crash Introduction
H2O Sparkling Water
H2O Sparkling Water: Ham or Spam Example
Setting up TensorFlow Spark Cluster
Scalable Object Identification with Sparkling TensorFlow
Week 10: Scalable Geospatial Analytics with Magellan
What is Scalable Geospatial Analytics
Introduction to Magellan for Scalable Geospatial Analytics
Week 11 and 12: Student Projects
Student Projects
Dillon George, Scalable Geospatial Algorithms
Scalable Spatio-temporal Constraint Satisfaction
Map-matching
OpenStreetMap to GraphX
Akinwande Atanda, Twitter Analytics
Chapter_Outline_and_Objectives
Unfiltered_Tweets_Collector_Set-up
Filtered_Tweets_Collector_Set-up_by_Keywords_and_Hashtags
Filtered_Tweets_Collector_Set-up_by_Class
ETL_Tweets
binary_classification
binary_classification_with_Loop
binary_classification_with_Loop_TweetDataSet
Yinnon Dolev, Deciphering Spider Vision
Xin Zhao, Higher Order Spectral CLustering
Case-study
Shanshan Zhou, Exploring EEG
Shakira Suwan, Change Detection in Random Graph Series
Matthew Hendtlass, The ATP graph
Yuki_Katoh_GSW_Passing_Analysis
Andrey Konstantinov, Keystroke Biometric
Dominic Lee, Random Matrices
References
Harry Wallace, Movie Recommender
Ivan Sadikov, Reading NetFlow Logs
Extra Resources
AWS Educate
Databricksified Spark SQL Programming Guide 1.6
overview
getting started
data sources
performance tuning
distributed sql engine
Linear Algebra Cheat Sheet
Databricksified Data Types in MLLib Programming Guide 1.6
Local Vector
Labeled Point
Local Matrix
Distributed Matrix
Row Matrix
Indexed Row Matrix
Coordinate Matrix
Block Matrix
Introduction to XML-parsing of Old Bailey Online
Powered by
GitBook
Week 6: Introduction to Spark Streaming, Twitter Collector, Top Hashtag Counter and Streaming Model-Prediction Server
Introduction to Spark Streaming, Twitter Collector, Top Hashtag Counter and Streaming Model-Prediction Server
Sections
Introduction to Spark Streaming
Tweet Collector - broken down
Tweet Collector - Generic
Tweet Hashtag Counter
Streaming Model-Prediction Server, the Full Powerplant Pipeline
results matching "
"
No results matching "
"