HKUST

CSIC 5011: Topological and Geometric Data Reduction and Visualization
Fall 2017


Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics, statistics, and engineering, who are interested in learning from data. Students with other backgrounds such as life sciences are also welcome, provided you have certain maturity of mathematics. It will cover wide topics in geometric (principal component analysis and manifold learning, etc.) and topological data reduction (clustering and computational homology group, etc.).
Prerequisite: linear and abstract algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains), convex optimization; familiarity with Matlab, R, and/or Python, etc.

Reference (参考教材)

[pdf download]

Instructors:

Yuan YAO

Time and Place:

Monday 3:00pm-4:20pm, Friday 10:30am-11:50am Rm 1027, LSK Bldg
This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com.
Find our class page at: https://piazza.com/ust.hk/fall2017/csic5011/home

Homework and Projects:

Weekly homeworks(?), monthly mini-projects, and a final major project. No final exam.

Schedule (时间表)

Date Topic Instructor Scriber
01/09/2017, Fri Lecture 01: Introduction to Geometric and Topological Data Reduction [ syllabus ]
Y.Y.
04/09/2017, Mon Lecture 02: Principal Component Analysis [ lecture02.pdf ]
Y.Y.
08/09/2017, Fri Lecture 03: Multidimensional Scaling [ lecture02.pdf ]
Y.Y.
11/09/2017, Mon Lecture 04: High Dimensional PCA and Random Projections (Chap 3: 1,2,3) [ lecture04.pdf ]
Y.Y.
15/09/2017, Fri Lecture 05: Compressed Sensing and Random Projections (Chap 3: 4) [ new lecture notes updated on Sep 17, 2017 ]
Y.Y.
18/09/2017, Mon Lecture 06: Random Matrix Theory for PCA and Horn's Parallel Analysis (Chap 2: 3) [ new lecture notes updated on Sep 18, 2017 ] [ lecture notes on Parallel Analysis by LI, Zhen ]
Y.Y. LI, Zhen
22/09/2017, Fri Lecture 07: Sample Mean as MLE? James-Stein Estimator and Shrinkages (Chap 2: 1-2)[ new lecture notes updated on Sep 22, 2017 ]
    [Reference]:
  • Comparing Maximum Likelihood Estimator and James-Stein Estimator in R: [ JSE.R ]
Y.Y.
25/09/2017, Mon Lecture 08: Robust PCA and Sparse PCA (Chap 4: 1-5)[ new update on Sep 27, 2017 ]
Y.Y.
29/09/2017, Fri Lecture 09: MDS with Uncertainty -- Sensor Network Localization (Chap 4: 6-)[ new update on Sep 27, 2017 ]
    [Reference]:
  • Sensor Network Localization in Matlab: [ SNLSDP ]
Y.Y.
06/10/2017, Fri Lecture 10: Manifold Learning I: ISOMAP and LLE (Chap 5: 1-2)[ slides ]
    [Reference]:
  • [ISOMAP]: Tenenbaum's website on science paper with datasets;
  • [LLE]: Roweis' website on science paper;
    [Matlab]:
  • IsomapR1 : isomap codes by Tennenbaum, de Silva (isomapII.m with sparsity, fast mex with dijkstra.cpp and fibheap.h
  • lle.m : lle with k-nearest neighbors
  • kcenter.m : k-center algorithm to find 'landmarks' in a metric space
Y.Y.
09/10/2017, Fri Lecture 11: Manifold Learning II: Extended LLEs (Chap 5: 3-6) [ new update on Oct 9, 2017 ][ slides ]
    [Reference]:
  • Mikhail Belkin & Partha Niyogi, Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, Advances in Neural Information Processing Systems 14, 2001, p. 586?691, MIT Press [nips link]
  • Zhang, Z. & Wang, J. MLLE: Modified Locally Linear Embedding Using Multiple Weights. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.382
  • Donoho, D. & Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci U S A. 100:5591 (2003). doi: 10.1073/pnas.1031596100
  • Zhang, Z. & Zha, H. (2005) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing. 26 (1): 313?338. doi:10.1137/s1064827502419154
Y.Y.
13/10/2017, Fri Lecture 12: Diffusion Map and Stochastic Neighbour Embedding [ slides ]
    [Matlab]
  • Matlab code to compare manifold learning algorithms [ mani.m ] : PCA, MDS, ISOMAP, LLE, Hessian LLE, LTSA, Laplacian, Diffusion (no SNE!)
Y.Y.
16/10/2017, Mon Lecture 13: Random Walk on Graphs: Perron-Frobenius Theory vs. PageRank and Fiedler Theory
    [Project 1 Report Repository]
  • GitHub Repository for reports of Project 1 [ GitHub ]
Y.Y.
20/10/2017, Fri Lecture 14: Random Walk on Graphs: Cheeger Inequality, Lumpability and Transition Path Theory etc.
Y.Y.
20/10/2017, Mon Lecture 15: Poster workshop on project 1.
    [ Reference ]
  • Doodle voting for top 3 posters [ link ]
  • GitHub Repository of report collection for Project 1 [ GitHub ]
Jinshan ZENG
Haixia LIU
27/10/2017, Fri Lecture 16: Stochastic semidefinite optimization via low-rank factorization: algorithms, theory and applications [ slides ]
Jinshan ZENG
30/10/2017, Mon Lecture 17: Restricted Boltzmann Machine and Deep Belief Nets [ slides ]
Zhen LI
3/11/2017, Fri Lecture 18: Summary, Supervised PCA as Sufficient Dimensionality Reduction [ Chapter 2:2 in new update on Nov 3rd, 2017 ], and Project 2
Y.Y.
6/11/2017, Mon Lecture 19: Introduction to Topological Data Analysis I - Simplicial Complexes, Nerve, Reeb Graph, and Mapper [ slides ]
    [Reference]:
  • Topological Methods for Exploring Low-density States in Biomolecular Folding Pathways.
    Yuan Yao, Jian Sun, Xuhui Huang, Gregory Bowman, Gurjeet Singh, Michael Lesnick, Vijay Pande, Leonidas Guibas and Gunnar Carlsson.
    J. Chem. Phys. 130, 144115 (2009).
    [pdf][Online Publication][SimTK Link: Data and Mapper Matlab Codes] [Selected by Virtual Journal of Biological Physics Research, 04/15/2009].
  • Structural insight into RNA hairpin folding intermediates.
    Bowman, Gregory R., Xuhui Huang, Yuan Yao, Jian Sun, Gunnar Carlsson, Leonidas Guibas and Vijay Pande.
    Journal of American Chemistry Society, 2008, 130 (30): 9676-9678.
    [link]
  • Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development.
    Abbas H Rizvi, Pablo G Camara, Elena K Kandror, Thomas J Roberts, Ira Schieren, Tom Maniatis & Raul Rabadan.
    Nature Biotechnology. 2017 May. doi:10.1038/nbt.3854
  • Spatiotemporal genomic architecture informs precision oncology in glioblastoma.
    Lee JK, Wang J, Sa JK, Ladewig E, Lee HO, Lee IH, Kang HJ, Rosenbloom DS, Camara PG, Liu Z, van Nieuwenhuizen P, Jung SW, Choi SW, Kim J, Chen A, Kim KT, Shin S, Seo YJ, Oh JM, Shin YJ, Park CK, Kong DS, Seol HJ, Blumberg A, Lee JI, Iavarone A, Park WY, Rabadan R, Nam DH.
    Nat Genet. 2017 Apr. doi: 10.1038/ng.3806.
  • A Python Implementation of Mapper [ sakmapper ] in single cell data analysis.
  • Single Cell TDA [ scTDA ] with [ tutorial in html ]
    [Seminar]
  • Speaker: Prof. Raul RABADAN, Department of Systems Biology, Department of Biomedical Informatics, Center for Computational Biology & Bioinformatics, Columbia University
  • Title: Exploring biological dynamical processes using Topological Data Analysis applied to Single Cell Expression Data [ slides? ]
  • Time: 5:00-6:00pm
  • Venue: LTK (Lift 31/32)
  • Abstract: Transcriptional programs control cellular lineage commitment and differentiation during development. Understanding of cell fate has been advanced by studying single-cell RNA-sequencing (RNA-seq) but is limited by the assumptions of current analytic methods regarding the structure of data. We present single-cell topological data analysis (scTDA), an algorithm for topology-based computational analyses to study temporal, unbiased transcriptional regulation. Unlike other methods, scTDA is a nonlinear, model-independent, unsupervised statistical framework that can characterize transient cellular states. We applied scTDA to the analysis of murine embryonic stem cell (mESC) differentiation in vitro in response to inducers of motor neuron differentiation. scTDA resolved asynchrony and continuity in cellular identity over time and identified four transient states (pluripotent, precursor, progenitor, and fully differentiated cells) based on changes in stage-dependent combinations of transcription factors, RNA-binding proteins, and long noncoding RNAs (lncRNAs). scTDA can be applied to study asynchronous cellular responses to either developmental cues or environmental perturbations.
Y.Y.
10/11/2017, Fri Lecture 20: Introduction to Topological Data Analysis II: Persistent Homology [ slides ]
    [Reference]:
  • A Java package for persistent homology and barcodes: Javaplex Tutorial.
  • Guo-Wei Wei, Persistent Homology Analysis of Biomolecular Data, SIAM News 2017
  • Topological Data Analysis Generates High-Resolution, Genome-wide Maps of Human Recombination.
    Pablo G. Camara, Daniel I.S. Rosenbloom, Kevin J. Emmett, Arnold J. Levine, Raul Rabadan.
    Cell Systems. 2016 June. doi: 10.1016/j.cels.2016.05.008.
  • Topology of viral evolution.
    Chan JM, Carlsson G, Rabadan R.
    Proc Natl Acad Sci USA 2013 Oct 29. doi: 10.1073/pnas.1313480110.
  • Robert Ghrist's monograph on applied Topology Elementary Applied Topology
Y.Y.
17/11/2017, Fri Lecture 21: Applied Hodge Theory I: Social Choice, Crowdsourced Ranking, and Hodge Decomposition [ slides ]
Y.Y.
18/11/2017, Sat Lecture 22: Tutorial on Single Cell TDA [ pptx ]
Quanhua MU
20/11/2017, Mon Lecture 23: Applied Hodge Theory II: Hodge Decomposition for Pairwise Ranking and Game Theory [ slides ] and a Tutorial on Word Embedding [ MaoYe Report ]
    [Reference]:
  • Statistical Ranking and Combinatorial Hodge Theory.
    Xiaoye Jiang, Lek-Heng Lim, Yuan Yao and Yinyu Ye.
    Mathematical Programming, Volume 127, Number 1, Pages 203-244, 2011.
    [pdf][ arxiv.org/abs/0811.1067][ Matlab Codes]

  • Flows and Decompositions of Games: Harmonic and Potential Games
    Ozan Candogan, Ishai Menache, Asuman Ozdaglar, and Pablo A. Parrilo
    Mathematics of Operations Research, 36(3): 474 - 503, 2011
    [arXiv.org/abs/1005.2405][ doi:10.1287/moor.1110.0500 ]

  • HodgeRank on Random Graphs for Subjective Video Quality Assessment.
    Qianqian Xu, Qingming Huang, Tingting Jiang, Bowei Yan, Weisi Lin, and Yuan Yao.
    IEEE Transactions on Multimedia, 14(3):844-857, 2012
    [pdf][ Matlab codes in zip ]

  • Robust Evaluation for Quality of Experience in Crowdsourcing.
    Qianqian Xu, Jiechao Xiong, Qingming Huang, and Yuan Yao
    ACM Multimedia 2013.
    [pdf]

  • Online HodgeRank on Random Graphs for Crowdsourceable QoE Evaluation.
    Qianqian Xu, Jiechao Xiong, Qingming Huang, and Yuan Yao
    IEEE Transactions on Multimedia, 16(2):373-386, Feb. 2014.
    [pdf]

  • Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs
    Braxton Osting, Jiechao Xiong, Qianqian Xu, and Yuan Yao
    Applied and Computational Harmonic Analysis, 41 (2): 540-560, 2016
    [ arXiv:1503.00164 ] [ ACHA online ] [Matlab codes to reproduce our results]

  • False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking
    Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao
    Proceedings of The 33rd International Conference on Machine Learning (ICML), New York, June 19-24, 2016.
    [ arXiv:1605.05860 ] [ pdf ] [ supplementary ]

  • Parsimonious Mixed-Effects HodgeRank for Crowdsourced Preference Aggregation
    Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao
    ACM Multimedia Conference (ACMMM), Amsterdam, Netherlands, October 15-19, 2016.
    [ arXiv:1607.03401 ] [ pdf ]

  • HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation
    Qianqian Xu, Jiechao Xiong, Xi Chen, Qingming Huang, Yuan Yao
    to appear in AAAI, 2018.
    [ arXiv:1711.05957 ] [ Matlab Source Codes ]

Y.Y.
Hongyu MAO
24/11/2017, Fri Lecture 24: Clustering Methods [ slides in pdf ] and Presentation on Bayesian Deep Learning
Y.Y.
Yushi Ye
27/11/2017, Mon Lecture 25: Final Project [ pdf ]
Y.Y.

Datasets (to-be-updated)


by YAO, Yuan.