Sunday, July 29, 2018
Sunday, July 22, 2018
We'll cover LDA in tw's meeting. Here is the slide  https://drive.google.com/open?id=1KRoCA4vo9H9oJOl3iDqRqIHl9qQq9vf
This is part of our deep dive into generative models which will eventually loops us back to BN but will also shade light on GAN approaches. Here is some background and relevant resources 
This is part of our deep dive into generative models which will eventually loops us back to BN but will also shade light on GAN approaches. Here is some background and relevant resources 
Generative models
Under the generative model approach we attempt to model the joint
distribution p(x y). Given x and applying the Bayesian rule to our model we
classify as y the y for which p(y  x) is largest.
A straight forward application of the Bayes rule is to attempt the
estimation of probabilities in the Bayesian rule p(y  x) p(x) = p(x  y) p(y).
With the typical large number of dimensions of the vector x, density
estimation of the required quantiles is really hard. See the first 30 mins
of https://m.youtube.com/watch?v=_m7TMkzZzus#fauxfullscreen for
details.
As modeling the joint distribution p(x y) is hard simplifying assumption
are introduced leading to different more concrete classification
techniques.
LDA
LDA models each p(x  y) as a gaussian distribution. This stat quest
video describes how the average and standard deviation of the distribution are
chosen to maximize the separation between the classes over the training set https://m.youtube.com/watch?v=azXCzI57Yfc
The second 30 mins of this lecture derives LDA and explains what happens if
the covariance of all class matrices are I https://m.youtube.com/watch?v=_m7TMkzZzus#
Here the estimation of a covariance matrices of a random vector is
explained in detailhttps://en.m.wikipedia.org/wiki/Estimation_of_covariance_matrices
See chapter 24 of the understanding book for a broader coverage of
generation methods  https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understandingmachinelearningtheoryalgorithms.pdf
The background required for the Gaussian distribution and the covariance
matrix is covered herehttp://cs229.stanford.edu/section/gaussians.pdf
Tuesday, July 3, 2018
AI is fundamentally concerned with the creation of higher, more abstract representations of the world from simpler representations, automatically by a machine. Ideally, such representations are required to be associated with statistical guarantees of their correctness.
Previous attempts to this end identified homomorphism in algebraic structures as a fundamental tool for abstraction. Early AI attempts applied it to solve simple board games by abstracting the board states. In addition, more recent advances in image processing suggest that symmetries in groups is a good way to capture abstraction by ignoring of unimportant changes to the imageSome (https://www.microsoft.com/enus/research/video/symmetrybasedlearning/). More concretely, we say s is a symmetry of f(x) = y if f(s(x)) = f(x). These two notions together suggest focusing on groups augmented with a probability measure to study the question of automatic abstraction.
We thus focus next on representation, symmetry, and homomorphism in groups.
https://m.youtube.com/watch?v=qpGDNKgfHHg# is a nice introduction to the concept of group representations with examples.
For any set X, the set of all 11 onto functions f : X > X with the composition operation form a group. As mentioned above a symmetry of f is a s : X > X such that F(s(x)) = f(x). The first half of https://m.youtube.com/watch?v=MVoxtgVCo5g# By Alex Flournoy (up to ~32) motivates symmetries over transformation f : X > X and introduces some relevant language such as continuous, discrete, infinite, compact, local and global symmetries. The associated lecture notes are here https://inside.mines.edu/~aflourno/Particle/Lecture2Groups%20and%20Representations.pdf.
Some highlights from the symmetry learning work by Pedro Domingos et al https://www.microsoft.com/enus/research/video/symmetrybasedlearning/
1. Symmetries are changes in the data obtained by group operations such as rotation of a chair you want the classifier to be invariant under.
2. Symmetries may reduce the number of features thus we can learn with less data and still achieve the golden ratios of number of features and size of training set
3. Symmetry may reduce a search space
4. It is not dependent on the ml method being used
Study group meeting slide 
https://drive.google.com/file/d/1SSDcrvE5uCM6J9xsRI2lvwhtVb61S7XP/view?usp=sharing.
Youtube in Hebrew of the ML study group meeting 
https://urldefense.proofpoint.com/v2/url?u=https3A__www.youtube.com_playlist3Flist3DPLRPue8gCw662D8mizHl7s0ZQzATzdL8FZJ&d=DwIFAg&c=jf_iaSHvJObTbxsiA1ZOg&r=y_b69HJLyjn8lFwzYVfyxol578OEO4exeFDpgGN6MoQ&m=sxiQbE7RbDZBIVy4IS1MeIYDRpGBYK2rg4rlQDFzFQ&s=O8k45AgwGeBLm4lf2K25pZtg5uNVNWE8V7WSv4dlOhY&e=
Some related papers follow.
1. An algebraic abstraction approach to reinforcement learning is given here http://www.cse.iitm.ac.in/~ravi/papers/WALS03.pdf
2. Here is an approximate homomorphism approach https://web.eecs.umich.edu/~baveja/Papers/fp543jiang.pdf
3. Symmetry based semantic meaning uses the concept of an orbit in a group to represent a set of paraphrases that defines implicitly the Semitic of a sentence https://homes.cs.washington.edu/~pedrod/papers/sp14.pdf.
4. Work on deep symmetry network https://homes.cs.washington.edu/~pedrod/papers/nips14.pdf
Previous attempts to this end identified homomorphism in algebraic structures as a fundamental tool for abstraction. Early AI attempts applied it to solve simple board games by abstracting the board states. In addition, more recent advances in image processing suggest that symmetries in groups is a good way to capture abstraction by ignoring of unimportant changes to the imageSome (https://www.microsoft.com/enus/research/video/symmetrybasedlearning/). More concretely, we say s is a symmetry of f(x) = y if f(s(x)) = f(x). These two notions together suggest focusing on groups augmented with a probability measure to study the question of automatic abstraction.
We thus focus next on representation, symmetry, and homomorphism in groups.
https://m.youtube.com/watch?v=qpGDNKgfHHg# is a nice introduction to the concept of group representations with examples.
For any set X, the set of all 11 onto functions f : X > X with the composition operation form a group. As mentioned above a symmetry of f is a s : X > X such that F(s(x)) = f(x). The first half of https://m.youtube.com/watch?v=MVoxtgVCo5g# By Alex Flournoy (up to ~32) motivates symmetries over transformation f : X > X and introduces some relevant language such as continuous, discrete, infinite, compact, local and global symmetries. The associated lecture notes are here https://inside.mines.edu/~aflourno/Particle/Lecture2Groups%20and%20Representations.pdf.
Some highlights from the symmetry learning work by Pedro Domingos et al https://www.microsoft.com/enus/research/video/symmetrybasedlearning/
1. Symmetries are changes in the data obtained by group operations such as rotation of a chair you want the classifier to be invariant under.
2. Symmetries may reduce the number of features thus we can learn with less data and still achieve the golden ratios of number of features and size of training set
3. Symmetry may reduce a search space
4. It is not dependent on the ml method being used
Study group meeting slide 
https://drive.google.com/file/d/1SSDcrvE5uCM6J9xsRI2lvwhtVb61S7XP/view?usp=sharing.
Youtube in Hebrew of the ML study group meeting 
https://urldefense.proofpoint.com/v2/url?u=https3A__www.youtube.com_playlist3Flist3DPLRPue8gCw662D8mizHl7s0ZQzATzdL8FZJ&d=DwIFAg&c=jf_iaSHvJObTbxsiA1ZOg&r=y_b69HJLyjn8lFwzYVfyxol578OEO4exeFDpgGN6MoQ&m=sxiQbE7RbDZBIVy4IS1MeIYDRpGBYK2rg4rlQDFzFQ&s=O8k45AgwGeBLm4lf2K25pZtg5uNVNWE8V7WSv4dlOhY&e=
Some related papers follow.
1. An algebraic abstraction approach to reinforcement learning is given here http://www.cse.iitm.ac.in/~ravi/papers/WALS03.pdf
2. Here is an approximate homomorphism approach https://web.eecs.umich.edu/~baveja/Papers/fp543jiang.pdf
3. Symmetry based semantic meaning uses the concept of an orbit in a group to represent a set of paraphrases that defines implicitly the Semitic of a sentence https://homes.cs.washington.edu/~pedrod/papers/sp14.pdf.
4. Work on deep symmetry network https://homes.cs.washington.edu/~pedrod/papers/nips14.pdf
Monday, July 2, 2018
In today's study group, we'll discuss group representation in the context of learning  https://drive.google.com/file/d/1SSDcrvE5uCM6J9xsRI2lvwhtVb61S7XP/view?usp=sharing.
Sunday, July 1, 2018
Ml crash directory
Are you familiar with regression  https://m.youtube.com/watch?v=aq8VU5KLmkY? One way to view Ml is regression on steroids...which mean a harder optimization problem (one that does not have a close analytic solution and/or is not convex) with many parameters.
Let's consider supervised learning first. You are given n labeled data points,
( x1,y1),...,(xn,yn). Your objective is to find a function f(x)=y that best predicts y on a new batch of x's. When y is continuous it is called regression and when its discrete it is called classification.
There are two things to notice right away
1. To solve this an optimization problem is defined, e.g., a minimization of square error in our original regression problem
2. Trying to explain the given data completely which is sometimes called extrapolation is actually a pitfall, you may capture random trends and your prediction power may be hindered. This is called overfitting
The basic intuition underlying many approaches to the classification problem is that had we known p(x, y) and given a new x we would have calculated p(x, y) for each y and choose y with the greatest probability. The difficulty is that it is not easy to estimate p(x, y).
A simplifying independence assumption leads to the naive Bayes approach that is intuitively covered in the first part of Ariel Kleiner's crash course on ML at http://ampcamp.berkeley.edu/wpcontent/uploads/2012/06/arielkleinerampcamp2012machinelearningpart1.pdf.
Yet another approach is to define an optimization that attempts to maximize performance on the training data while keeping f(x) simple. This is done in a varsities of ways.
To deep dive on ML concepts see reference three below. Iterate between reference three and simple ML tutorial in python or R to master the subject.
References
1. Introduction to programmers on why ml is useful to master 
https://m.youtube.com/watch?v=0mK52UsOjU
Ignores the challenges of applying it where it excels and dealing with drift.
2. Nice overview that start with classification https://m.youtube.com/watch?v=zEtmaFJieY only thing to be careful of is the claim that neural network are not statistical models. Estimating a neural network performance should be done using the same standard statistical tools, e.g., cross validation.
3. An intuitive deep dive on the concepts of machine learning is given by Haul Daume III at http://ciml.info/dl/v0_8/cimlv0_8all.pdf
Are you familiar with regression  https://m.youtube.com/watch?v=aq8VU5KLmkY? One way to view Ml is regression on steroids...which mean a harder optimization problem (one that does not have a close analytic solution and/or is not convex) with many parameters.
Let's consider supervised learning first. You are given n labeled data points,
( x1,y1),...,(xn,yn). Your objective is to find a function f(x)=y that best predicts y on a new batch of x's. When y is continuous it is called regression and when its discrete it is called classification.
There are two things to notice right away
1. To solve this an optimization problem is defined, e.g., a minimization of square error in our original regression problem
2. Trying to explain the given data completely which is sometimes called extrapolation is actually a pitfall, you may capture random trends and your prediction power may be hindered. This is called overfitting
The basic intuition underlying many approaches to the classification problem is that had we known p(x, y) and given a new x we would have calculated p(x, y) for each y and choose y with the greatest probability. The difficulty is that it is not easy to estimate p(x, y).
A simplifying independence assumption leads to the naive Bayes approach that is intuitively covered in the first part of Ariel Kleiner's crash course on ML at http://ampcamp.berkeley.edu/wpcontent/uploads/2012/06/arielkleinerampcamp2012machinelearningpart1.pdf.
Yet another approach is to define an optimization that attempts to maximize performance on the training data while keeping f(x) simple. This is done in a varsities of ways.
To deep dive on ML concepts see reference three below. Iterate between reference three and simple ML tutorial in python or R to master the subject.
References
1. Introduction to programmers on why ml is useful to master 
https://m.youtube.com/watch?v=0mK52UsOjU
Ignores the challenges of applying it where it excels and dealing with drift.
2. Nice overview that start with classification https://m.youtube.com/watch?v=zEtmaFJieY only thing to be careful of is the claim that neural network are not statistical models. Estimating a neural network performance should be done using the same standard statistical tools, e.g., cross validation.
3. An intuitive deep dive on the concepts of machine learning is given by Haul Daume III at http://ciml.info/dl/v0_8/cimlv0_8all.pdf
Ml crash directory
Are you familiar with regression  https://m.youtube.com/watch?v=aq8VU5KLmkY ?
One way to view Ml is regression on steroids....
Let's consider supervised learning first. You are given n labeled data
points,
( x1,y1),...,(xn,yn). Your objective is to find a function f(x)=y that best
predicts y on a new batch of x's. When y is continuous it is called regression
and when its discrete it is called classification.
There are two things to notice right away
1. To solve this an optimization problem is defined, e.g., a minimization
of square error in our original regression problem
2. Trying to explain the given data completely which is sometimes called
extrapolation is actually a pitfall, you may capture random trends and your
prediction power may be hindered. This is called overfitting
The basic intuition underlying many approached to the classification
problem is that had we known p(x, y) and given a new x we would have calculated
p(x, y) for each y and choose y with the greatest probability. The difficulty
is that it is not easy to estimate p(x, y).
A simplifying independence assumption leads to the naive Bayes approach
that is intuitively covered in the first part of Ariel Kleiner's crash course
on ML at http://ampcamp.berkeley.edu/wpcontent/uploads/2012/06/arielkleinerampcamp2012machinelearningpart1.pdf.
References
1. Introduction to programmers on why ml is useful to master 
Ignores the challenges of applying it where it excels and dealing with
drift.
2. Nice overview that start with classification https://m.youtube.com/watch?v=zEtmaFJieY only
thing to be careful of is the claim that neural network are not statistical
models. Estimating a neural network performance should be done using the same
standard statistical tools, e.g., cross validation.
Subscribe to:
Posts (Atom)
Under the Bayesian setting, probabilities represent our brief on the state of the world which we can update incrementally after each experim...

We'll continue with convex optimization  https://drive.google.com/drive/folders/0BzUXUMab8u_ZU0h3ZEc5Z2VrMm8

Bayesian inference recording . For more details see chapter 24 in the understanding book.

Back to Bayesian inference  https://drive.google.com/file/d/1NUioDotuKeA8kKg341qRjyUESnUjxkos/view?usp=sharing