Reliable machine learning

Sunday, August 26, 2018

Will speak about sequential tests. The objective is not to decide the sample side in advance. https://drive.google.com/file/d/1EX_Z-wHvFgl1gR-rxcHi9VamyPiEVQub/view?usp=sharing

Thursday, August 23, 2018

Bayesian inference recording. For more details see chapter 24 in the understanding book.

Wednesday, August 22, 2018

Ml crash directory

Are you familiar with regression (0-5.59)? One way to view Ml is regression on steroids...which mean a harder optimization problem (one that does not have a close analytic solution and/or is not convex) with many parameters.

Let's consider supervised learning first. You are given n labeled data points,
( x1,y1),...,(xn,yn). Your objective is to find a function f(x)=y that best predicts y on a new batch of x's.   When y is continuous it is called regression and when its discrete it is called classification.

There are two things to notice right away
1. To solve this an optimization problem is defined, e.g., a minimization of square error in our original regression problem
2. Trying to explain the given data completely which is sometimes called extrapolation is actually a pitfall. You may capture random trends and your prediction power may be hindered. This is called overfitting

The basic intuition underlying many approaches to the classification problem is that had we known p(x, y) and given a new x we would have calculated p(x, y) for each y and choose y with the greatest probability. The difficulty is that it is not easy to estimate p(x, y).

View reference two below up to 8.15.

To estimate p(x y) we could proceed as follows. Recall that p(x y) = p(x |y)p(y) = p(y|x)p(x). Thus, estimating p(x), p(y) and p(x|y) from the training data will let us estimate p(x y) and p(y|x) and thus decide given a new x its class y. A simplifying independence assumption leads to the naive Bayes approach that is intuitively covered in the first part of Ariel Kleiner's crash course on ML (up to slide 25). This is an instance of what is referred to as generative models.

Yet another approach is to define an optimization that attempts to maximize performance on the training data while keeping f(x) simple.   This is done in a varieties of ways.

To deep dive on ML concepts see reference three below.   Iterate between reference three and simple ML tutorial in python or R to master the subject.

References

1. Introduction to programmers on why ml is useful to master. Notice that this introduction
ignores the challenges of applying it where it excels and dealing with drift.
2. Nice overview that start with classification only thing to be careful of is the claim that neural network are not statistical models. Estimating a neural network performance should be done using the same standard statistical tools, e.g., cross validation.
3.   An intuitive deep dive on the concepts of machine learning is given by Haul Daume III

Sunday, August 19, 2018

Back to Bayesian inference - https://drive.google.com/file/d/1NUioDotuKeA8kKg341qRjyUESnUjxkos/view?usp=sharing

Monday, August 13, 2018

Ml crash directory

Are you familiar with regression - https://m.youtube.com/watch?v=aq8VU5KLmkY (0-5.59)? One way to view Ml is regression on steroids...which mean a harder optimization problem (one that does not have a close analytic solution and/or is not convex) with many parameters.

Let's consider supervised learning first. You are given n labeled data points,
( x1,y1),...,(xn,yn). Your objective is to find a function f(x)=y that best predicts y on a new batch of x's.   When y is continuous it is called regression and when its discrete it is called classification.

There are two things to notice right away
1. To solve this an optimization problem is defined, e.g., a minimization of square error in our original regression problem
2. Trying to explain the given data completely which is sometimes called extrapolation is actually a pitfall. You may capture random trends and your prediction power may be hindered. This is called overfitting

The basic intuition underlying many approaches to the classification problem is that had we known p(x, y) and given a new x we would have calculated p(x, y) for each y and choose y with the greatest probability. The difficulty is that it is not easy to estimate p(x, y).

View reference two below up to 8.15.

To estimate p(x y) we could proceed as follows. Recall that p(x y) = p(x |y)p(y) = p(y|x)p(x). Thus, estimating p(x), p(y) and p(x|y) from the training data will let us estimate p(x y) and p(y|x) and thus decide given a new x its class y. A simplifying independence assumption leads to the naive Bayes approach that is intuitively covered in the first part of Ariel Kleiner's crash course on ML at http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/ariel-kleiner-ampcamp-2012-machine-learning-part-1.pdf. (up to slide 25). This is an instance of what is referred to as generative models.

Yet another approach is to define an optimization that attempts to maximize performance on the training data while keeping f(x) simple.   This is done in a varieties of ways.

To deep dive on ML concepts see reference three below.   Iterate between reference three and simple ML tutorial in python or R to master the subject.

References

1. Introduction to programmers on why ml is useful to master -
https://m.youtube.com/watch?v=0mK52UsOj-U
Ignores the challenges of applying it where it excels and dealing with drift.
2. Nice overview that start with classification https://m.youtube.com/watch?v=z-EtmaFJieY only thing to be careful of is the claim that neural network are not statistical models. Estimating a neural network performance should be done using the same standard statistical tools, e.g., cross validation.
3.   An intuitive deep dive on the concepts of machine learning is given by Haul Daume III at http://ciml.info/dl/v0_8/ciml-v0_8-all.pdf

Sunday, July 29, 2018

We'll discuss latent variables and the intuition of the EM algorithm https://drive.google.com/open?id=1PcESzNbZzzMej1nA0o5CHRzzGKTw2IAd

Sunday, July 22, 2018

We'll cover LDA in tw's meeting. Here is the slide - https://drive.google.com/open?id=1KRoCA4vo9H9oJOl3iD-qRqIHl9qQq9vf

This is part of our deep dive into generative models which will eventually loops us back to BN but will also shade light on GAN approaches. Here is some background and relevant resources -

Generative models

Under the generative model approach we attempt to model the joint distribution p(x y). Given x and applying the Bayesian rule to our model we classify as y the y for which p(y | x) is largest.

A straight forward application of the Bayes rule is to attempt the estimation of probabilities in the Bayesian rule p(y | x) p(x) = p(x | y) p(y). With the typical large number of dimensions of the vector x, density estimation of the required quantiles is really hard. See the first 30 mins of https://m.youtube.com/watch?v=_m7TMkzZzus#fauxfullscreen for details.

As modeling the joint distribution p(x y) is hard simplifying assumption are introduced leading to different more concrete classification techniques.

LDA

LDA models each p(x | y) as a gaussian distribution. This stat quest video describes how the average and standard deviation of the distribution are chosen to maximize the separation between the classes over the training set https://m.youtube.com/watch?v=azXCzI57Yfc

The second 30 mins of this lecture derives LDA and explains what happens if the covariance of all class matrices are I https://m.youtube.com/watch?v=_m7TMkzZzus#

Here the estimation of a covariance matrices of a random vector is explained in detailhttps://en.m.wikipedia.org/wiki/Estimation_of_covariance_matrices

See chapter 24 of the understanding book for a broader coverage of generation methods - https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf

The background required for the Gaussian distribution and the covariance matrix is covered herehttp://cs229.stanford.edu/section/gaussians.pdf