Sunday, February 24, 2019

Recommended ML resources from Sam.  

  1. t-SNE: An easy way to conduct 2-D visualization of high dimensional data.  One of the big problems we have with high-dimensional data is visualizing it to see if there is structure.  To tell if there is clustering, sometimes we can conduct hierarchical clustering and see what split level (via a scree plot) causes a good improvement in the clustering purity.  If we want to visualize in 2-D, often our first step is to conduct PCA or some other dimension-reduction procedure first.  t-SNE is an easy way to embed high dimensional data into 2-D.  It has been shown to do a good job of preserving distances between points when plotting them.  The catch is that while it does plot points that are close to each other in the high dimensional space as close in 2-D, if points are far away in high-D space, they be plotted either close or far in 2-D.  It also works very fast, even for huge datasets, and has implementations in the major software languages.  Here is a link by the creator with Q&A and a link to a video: https://lvdmaaten.github.io/tsne/
  2. ML video lectures: These ML video lectures (https://work.caltech.edu/lectures.html) are very popular on YouTube, and I think they are for more of an advanced, graduate-level compared to, say, Andrew Ng's Coursera videos.  They are more mathematical, but he does a good job of explaining them.  I particularly found lecture 15 on kernel methods.
--Sam

No comments:

Post a Comment

  Our next ML study group meeting will take place on Monday the 8 th  of October.   I'll cover the contraction theorem.   See relevant s...