• Scalable Data Science
  • Introduction
  • Prelude of 2016 Version
  • Some Basics and Essentials
  • Week 1: Introduction to Scalable Data Science
    • Scalable Data Science
    • Why Spark?
    • Login to databricks
    • Scala Crash Course
  • Week 2: Introduction to Spark RDDs, Transformations and Actions and Word Count of the US State of the Union Addresses
    • RDDs, Transformations and Actions
    • HOMEWORK: RDDs, Transformations and Actions
    • Word Count: US State of Union Addesses
    • EXTRA_Word Count: ETL of US State of Union Addesses
  • Week 3: Introduction to Spark SQL, ETL and EDA of Diamonds, Power Plant and Wiki CLick Streams Data
    • Spark SQL Introduction
      • HOMEWORK: overview
      • HOMEWORK: getting started
      • HOMEWORK: data sources
      • HOMEWORK: performance tuning
      • HOMEWORK: distributed sql engine
    • ETL and EDA of Diamonds Data
    • ETL and EDA of Power Plant Data
    • ETL and EDA of Wiki Click Stream Data
  • Week 4: Introduction to Machine Learning - Unsupervised Clustering and Supervised Classification
    • Introduction to Machine Learning
    • Unsupervised Clustering of 1 Million Songs via K-Means in 3 Stages
      • Stage 1: Extract-Transform-Load
      • Stage 2: Explore
      • Stage 3: Model
    • Supervised Classification of Hand-written Digits via Decision Trees
  • Week 5: Introduction to Non-distributed and Distributed Linear Algebra and Applied Linear Regression
    • Linear Algebra Introduction
      • HOMEWORK: breeze linear algebra cheat sheet
    • Linear Regression Introduction
    • Distributed Linear Algebra for Linear Regression Introduction
      • HOMEWORK: Spark Data Types for Distributed Linear Algebra
        • Local Vector
        • Labeled Point
        • Local Matrix
        • Distributed Matrix
        • Row Matrix
        • Indexed Row Matrix
        • Coordinate Matrix
        • Block Matrix
    • Power Plant Pipeline: Model, Tune, Evaluate
  • Week 6: Introduction to Spark Streaming, Twitter Collector, Top Hashtag Counter and Streaming Model-Prediction Server
    • Introduction to Spark Streaming
    • Tweet Collector - broken down
    • Tweet Collector - Generic
    • Tweet Hashtag Counter
    • Streaming Model-Prediction Server, the Full Powerplant Pipeline
  • Week 7: Probabilistic Topic Modelling via Latent Dirichlet Allocation and Intro to XML-parsing of Old Bailey Online
    • Probabilistic Topic Modelling
    • HOMEWORK: Introduction to XML-parsing of Old Bailey Online
  • Week 8: Graph Querying in GraphFrames and Distributed Vertex Programming in GraphX
    • Introduction to GraphFrames
    • HOMEWORK: On-Time Flight Performance with GraphFrames
  • Week 9: Deep Learning, Convolutional Neural Nets, Sparkling Water and Tensor Flow
    • Deep Learning, A Crash Introduction
    • H2O Sparkling Water
    • H2O Sparkling Water: Ham or Spam Example
    • Setting up TensorFlow Spark Cluster
    • Scalable Object Identification with Sparkling TensorFlow
  • Week 10: Scalable Geospatial Analytics with Magellan
    • What is Scalable Geospatial Analytics
    • Introduction to Magellan for Scalable Geospatial Analytics
  • Week 11 and 12: Student Projects
    • Student Projects
    • Dillon George, Scalable Geospatial Algorithms
      • Scalable Spatio-temporal Constraint Satisfaction
      • Map-matching
      • OpenStreetMap to GraphX
    • Akinwande Atanda, Twitter Analytics
      • Chapter_Outline_and_Objectives
      • Unfiltered_Tweets_Collector_Set-up
      • Filtered_Tweets_Collector_Set-up_by_Keywords_and_Hashtags
      • Filtered_Tweets_Collector_Set-up_by_Class
      • ETL_Tweets
      • binary_classification
      • binary_classification_with_Loop
      • binary_classification_with_Loop_TweetDataSet
    • Yinnon Dolev, Deciphering Spider Vision
    • Xin Zhao, Higher Order Spectral CLustering
      • Case-study
    • Shanshan Zhou, Exploring EEG
    • Shakira Suwan, Change Detection in Random Graph Series
    • Matthew Hendtlass, The ATP graph
      • Yuki_Katoh_GSW_Passing_Analysis
    • Andrey Konstantinov, Keystroke Biometric
    • Dominic Lee, Random Matrices
      • References
    • Harry Wallace, Movie Recommender
    • Ivan Sadikov, Reading NetFlow Logs
  • Extra Resources
    • AWS Educate
    • Databricksified Spark SQL Programming Guide 1.6
      • overview
      • getting started
      • data sources
      • performance tuning
      • distributed sql engine
    • Linear Algebra Cheat Sheet
    • Databricksified Data Types in MLLib Programming Guide 1.6
      • Local Vector
      • Labeled Point
      • Local Matrix
      • Distributed Matrix
      • Row Matrix
      • Indexed Row Matrix
      • Coordinate Matrix
      • Block Matrix
    • Introduction to XML-parsing of Old Bailey Online
Powered by GitBook

Week 4: Introduction to Machine Learning - Unsupervised Clustering and Supervised Classification

Introduction to Machine Learning - Unsupervised Clustering and Supervised Classification

  • Sections
    • Introduction to Machine Learning
    • Unsupervised Clustering of 1 Million Songs via K-Means in 3 Stages
      • Stage 1: Extract-Transform-Load
      • Stage 2: Explore
      • Stage 3: Model
    • Supervised Classification of Hand-written Digits via Decision Trees

results matching ""

    No results matching ""