SDS-2.2, Scalable Data Science
List of pointers to potential course projects
from 2016 and 2017
Theoretical Projects in Scalable Data Science
Exact Matrix Completion via Convex Optimization
Abstract
- Suppose that one observes an incomplete subset of entries selected from a low-rank matrix. When is it possible to complete the matrix and recover the entries that have not been seen? We demonstrate that in very general settings, one can perfectly recover all of the missing entries from most sufficiently large subsets by solving a convex programming problem that finds the matrix with the minimum nuclear norm agreeing with the observed entries. The techniques used in this analysis draw upon parallels in the field of compressed sensing, demonstrating that objects other than signals and images can be perfectly reconstructed from very limited information.
- As published in DOI:10.1145/2184319.2184343
Originally published in FCM 9, 6 (2009)
Data Science and Prediction
- Big data promises automated actionable knowledge creation and predictive models for use by both humans and computers.
- DOI:10.1145/2500499
Applied Projects in Scalable Data Science
Content Recommendation on Web Portals
How to offer recommendations to users when they have not specified what they want.
- DOI:10.1145/2461256.2461277
- Techniques and Applications for Sentiment Analysis
- The main applications and challenges of one of the hottest research areas in computer science.
- DOI:10.1145/2436256.2436274
- Looking back at big data (Digital humanities)
- As computational tools open up new ways of understanding history, historians and computer scientists are working together to explore the possibilities.
- DOI:10.1145/2436256.2436263
- Computational Epidemiology
- The challenge of developing and using computer models to understand and control the diffusion of disease through populations.
- DOI:10.1145/2483852.2483871
- Replicated Data Consistency Explained Through Baseball
- A broader class of consistency guarantees can, and perhaps should, be offered to clients that read shared data.
- DOI:10.1145/2500500
- Community Sense and Response Systems: Your phone as a quake detector
- The Caltech CSN project collects sensor data from thousands of personal devices for real-time response to dangerous earthquakes.
- DOI:10.1145/2622628.2622633
- Reshaping (non-State) Terrorist Networks (fields: State/national security, social psychology, counter-recruitment, counter/de-radicalization...)
- To destabilize terrorist organizations, the
STONE
algorithms identify a set of operatives whose removal would maximally reduce lethality. - DOI:10.1145/2632661.2632664
- To destabilize terrorist organizations, the
- Rise of Hate Groups in the US (fields: social psychology, understanding online emergence of "hate groups", ...)
- watch Democracy Now story on This Year (2016) in Hate and Extremism
- read https://www.splcenter.org/intelligence-report, The Intelligence Report is the Southern Poverty Law Center's award-winning magazine. The quarterly publication provides comprehensive updates to law enforcement agencies, the media and the general public. See several articles on different 'hate groups' published on February 17, 2016.
- New News Aggregator Apps (ML at work)
- How apps like Inkl and SmartNews are overcoming the challenges of aggregation to win over content publishers and users alike
- DOI:10.1145/2800445
- References:
- Himabindu Lakkaraju , Angshu Rai , Srujana Merugu, Smart news feeds for social networks using scalable joint latent factor models, Proceedings of the 20th international conference companion on World wide web, March 28-April 01, 2011, Hyderabad, India
- Benedict C. May , Nathan Korda , Anthony Lee , David S. Leslie, Optimistic Bayesian sampling in contextual-bandit problems, The Journal of Machine Learning Research, 13, p.2069-2106, 3/1/2012
- Eli Pariser, The Filter Bubble: What the Internet Is Hiding from You, Penguin Group , The, 2011
- Rikiya Takahashi, Tetsuro Morimura, Predicting Preference Reversals via Gaussian Process Uncertainty Aversion
Natural Language Translation at the Intersection of AI and HCI
- Abstract: The fields of artificial intelligence (AI) and human-computer interaction (HCI) are influencing each other like never before. Widely used systems such as Google Translate, Facebook Graph Search, and RelateIQ hide the complexity of large-scale AI systems behind intuitive interfaces. But relations were not always so auspicious. The two fields emerged at different points in the history of computer science, with different influences, ambitions, and attendant biases. AI aimed to construct a rival, and perhaps a successor, to the human intellect. Early AI researchers such as McCarthy, Minsky, and Shannon were mathematicians by training, so theorem-proving and formal models were attractive research directions. In contrast, HCI focused more on empirical approaches to usability and human factors, both of which generally aim to make machines more useful to humans. Many attendees at the first CHI conference in 1983 were psychologists and engineers. Presented papers had titles such as "Design Principles for Human-Computer Interfaces" and "Psychological Issues in the Use of Icons in Command Menus," hardly appealing fare for mainstream AI researchers.
Since the 1960s, HCI has often been ascendant when setbacks in AI occurred, with successes and failures in the two fields redirecting mindshare and research funding.14 Although early figures such as Allen Newell and Herbert Simon made fundamental contributions to both fields, the competition and relative lack of dialogue between AI and HCI are curious. Both fields are broadly concerned with the connection between machines and intelligent human agents. What has changed recently is the deployment and adoption of user-facing AI systems. These systems need interfaces, leading to natural meeting points between the two fields.
Sensing Emotions
- How computer systems detect the internal emotional states of users.
- DOI:10.1145/2800498
- Further Reading:
- Rosalind W. Picard, Affective computing, MIT Press, Cambridge, MA, 1997
- The latest scientific findings indicate that emotions play an essential role in decision making, perception, learning, and more—that is, they influence the very mechanisms of rational thinking. Not only too much, but too little emotion can impair decision making. According to Rosalind Picard, if we want computers to be genuinely intelligent and to interact naturally with us, we must give computers the ability to recognize, understand, even to have and express emotions.
- Rafael A. Calvo , Sidney D'Mello , Jonathan Gratch , Arvid Kappas, The Oxford Handbook of Affective Computing, Oxford University Press, Oxford, 2014
- The Oxford Handbook of Affective Computing is a definitive reference in the burgeoning field of affective computing (AC)
- Bartlett, M., Littlewort, G., Frank, M., and Lee, K. Automated Detection of Deceptive Facial Expressions of Pain, Current Biology, 2014.
- Highlights
- Untrained human observers cannot differentiate faked from genuine pain expressions
- With training, human performance is above chance but remains poor
- A computer vision system distinguishes faked from genuine pain better than humans
- The system detected distinctive dynamic features of expression missed by humans
- Highlights
- Carlos Busso , Zhigang Deng , Serdar Yildirim , Murtaza Bulut , Chul Min Lee , Abe Kazemzadeh , Sungbok Lee , Ulrich Neumann , Shrikanth Narayanan, Analysis of emotion recognition using facial expressions, speech and multimodal information, Proceedings of the 6th international conference on Multimodal interfaces, October 13-15, 2004, State College, PA, USA
- Abstract:The interaction between human beings and computers will be more natural if computers are able to perceive and respond to human non-verbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on facial expressions or speech, relatively limited work has been done to fuse these two, and other, modalities to improve the accuracy and robustness of the emotion recognition system. This paper analyzes the strengths and the limitations of systems based only on facial expressions or acoustic information. It also discusses two approaches used to fuse these two modalities: decision level and feature level integration. Using a database recorded from an actress, four emotions were classified: sadness, anger, happiness, and neutral state. By the use of markers on her face, detailed facial motions were captured with motion capture, in conjunction with simultaneous speech recordings. The results reveal that the system based on facial expression gave better performance than the system based on just acoustic information for the emotions considered. Results also show the complementarily of the two modalities and that when these two modalities are fused, the performance and the robustness of the emotion recognition system improve measurably.
- Putting the Data Science into Journalism
- News organizations increasingly use techniques like data mining, Web scraping, and data visualization to uncover information that would be impossible to identify and present manually.
- DOI:10.1145/2742484
- Big Data Meets Big Science (Extra Reading)
- Next-generation scientific instruments are forcing researchers to question the limits of massively parallel computing.
- DOI:10.1145/2617660
- Big Data and its techincal challenges (Extra reading)
- Exploring the inherent technical challenges in realizing the potential of Big Data.
- DOI:10.1145/2611567
- Exascale Computing and Big Data (Extra Reading)
- Scientific discovery and engineering innovation requires unifying traditionally separated high-performance computing and big data analytics. The twin ecosystems of HPC and big data and the challenges facing both
- DOI:10.1145/2699414
- Watch https://www.youtube.com/watch?list=PLn0nrSd4xjjbIHhktZoVlZuj2MbrBBC_f&v=eLMChVev6hw
- Battling Evil: Dark Silicon (Extra Reading)
- The changing nature of computing as chips with more transistors than can be concurrently activated become more commonplace.
- Read http://www.hpcdan.org/reeds_ruminations/2011/05/battling-evil-dark-silicon.html
- TensorFlow: Google Open Sources Their Machine Learning Tool (see InfoQ)
- TensorFlow is a machine learning library created by the Brain Team researchers at Google and now open sourced under the Apache License 2.0. TensorFlow is detailed in the whitepaper TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. The source code can be found on Google Git. It is a tool for writing and executing machine learning algorithms. Computations are done in a data flow graph where the nodes are mathematical operations and the edges are tensors (multidimensional data arrays) that are exchanged between nodes. An user constructs the graph and writes the algorithms that executed on each node. TensorFlow takes care of executing the code asynchronously on different devices, cores, and threads.... TensorFlow is used by Google for GMail (SmartReply), Search (RankBrain), Pictures (Inception Image Classification Model), Translator (Character Recognition), and other products.
- See https://databricks.com/blog/2016/01/25/deep-learning-with-spark-and-tensorflow.html
Keep reading... I have not updated since early 2017!!!
- Association for Computing Machinery (ACM) Communications is a nice central point for a quick overview into current computationallu focussed mathematical sciences.
- PNAS/Science/Nature - usual popular science venues
- Hacker News
- ...
Shared Student Notebooks for sds-2.2
Several notebooks that stduents tried along the course are part of the course content