type
Post
Created date
Jun 16, 2022 01:41 PM
category
Data Science
tags
Machine Learning
Machine Learning
status
Published
Language
From
summary
slug
password
Author
Priority
Featured
Featured
Cover
Origin
Type
URL
Youtube
Youtube
icon
TOC
What is maximum likelihood estimate (MLE)?When is it used ?The basic intuition behind using maximum likelihood to fit a logistic regression model:What is its function name ?Why MLE?Q1. Why MLE ?Q2. Why is log better?How do we do this process mathematically? Both sources tell us to to set derivative to zero : Example 1 : Finding the mean given variance (Fleshman)Example 2 : 55 heads out of 100 flips, find the MLE for the probability p of heads on a single toss. (MIT)Step 1 : understand the annotations.Step 2 : After knowing the definition, compute the which is a notation for the MLE.Example 3 : 9 heads out of 13 flips, find the MLE for the probability p of heads on a single toss. (didl)Step 1 : understand the annotations and scenarioAssumption ScenarioStep 2 : Step 2 : Compute the ^ which is a notation for the MLE.So the answer is 9/13, which makes sense.Example : ETC3250 BrendiMathShorten the log likelihood (ETC3250)Reference :
TOC by what why how
What is maximum likelihood estimate (MLE)?
From (MIT) :
- There are many methods for estimating unknown parameters from data, one of which is MLE.
- It answer :
For which parameter value does the observed data have the biggest probability?
- a point estimate because it gives a single value for the unknown parameter
When is it used ?
- The concept that when working with a probabilistic model with unknown parameters, the parameters which make the data have the highest probability are the most likely ones. (didl)
The basic intuition behind using maximum likelihood to fit a logistic regression model:
- We try to find and such that plugging these estimates into the model for .(statl)
What is its function name ?
- It is called : maximum likelihood estimator (ETC2420 Lecture week 4)
Why MLE?
Q1. Why MLE ?
- Simple to compute. (MIT)
- Intuitively appealing —
- we try to find the value of the parameter that would have most likely produced the data we in fact observed. (Libretexts)
- find the maximum of that function, i.e. the parameters which are most likely to have produced the observed data. (Fleshman)
- Better than least squares method (non-linear) ∵ it has better statistical properties. (statl)
- Least squares approach is in fact a special case of maximum likelihood. (statl)
Q2. Why is log better?
- It is often easier to work with the log likelihood. (MIT)
- Problem : (Fleshman)
- Taking the product of small numbers creates smaller numbers, which computers can struggle to represent with finite precision.
- Solution : (Fleshman)
- To alleviate these numerical issues (and for other conveniences mentioned later), we often work with the log of the likelihood function, aptly named the log-likelihood.
- Why does taking the log help? Details are explained.
How do we do this process mathematically?
Both sources tell us to to set derivative to zero :
- We could find the MLE by finding the values of θ where the derivative is zero, and finding the one that gives the highest probability. (didl)
- If you’re familiar with calculus, you’ll know that you can find the maximum of a function by taking its derivative and setting it equal to 0. The derivative of a function represents the rate of change of the original function. (Fleshman)
There are 3 examples, 2 of which are about flipping the coins. The same intuition and logic but from different sources ( Example 2, 3). Another one is about finding the mean given variance (Example 1).
Example 1 : Finding the mean given variance (Fleshman)
Imagine we have some data generated from a Gaussian distribution with a variance of 4, but we don’t know the mean.
I like to think of MLE as taking the Gaussian, sliding it over all possible means, and choosing the mean which causes the model to fit the data best.
If you look at the log-likelihood curve above, we see that
- initially it’s changing in the positive direction (moving up). It reaches a peak,
- and then it starts changing in a negative direction (moving down).
The key is that at the peak, the rate of change is 0.
So if we know the functional form of the derivative, we can set it equal to 0 and solve for the best parameters.
Example 2 : 55 heads out of 100 flips, find the MLE for the probability p of heads on a single toss. (MIT)
A coin is flipped 100 times. Given that there were 55 heads, find the maximum likelihood estimate for the probability p of heads on a single toss.
Definition of what we are finding:
Step 1 : understand the annotations.
Step 2 : After knowing the definition, compute the which is a notation for the MLE.
2.2. at the end :
Example 3 : 9 heads out of 13 flips, find the MLE for the probability p of heads on a single toss. (didl)
Question in laymen sense :
I flipped 13 coins, and 9 came up heads, what is our best guess for the probability that
the coin comes us heads?
Step 1 : understand the annotations and scenario
Assumption
- Suppose that we have a single parameter θ representing the probability that a coin flip is heads. Then the probability of getting a tails is 1−θ.
- If our observed data X is a sequence with heads and tails, we can use the fact that independent probabilities multiply to see that
Scenario
Step 2 : Step 2 : Compute the ^ which is a notation for the MLE.
RMB : We could find the MLE by finding the values of θ where the derivative is zero, and finding the one that gives the highest probability.
So the answer is 9/13, which makes sense.
Code can be found in book.
Example : ETC3250 Brendi
Math
The concept and mathematical representation in general can be found here. (Fleshman)
Shorten the log likelihood (ETC3250)
Reference :
(Here)
- didl : Zhang, A. (2021). Dive into Deep Learning. (Here)
- statl : James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) (2nd ed. 2021 ed.). Springer. (Here)
- MIT : MLE Intro. Jeremy Orloff and Jonathan Bloom (Here)
- Libretexts. (2020, August 10). 7.3: Maximum Likelihood. Statistics LibreTexts. https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Point_Estimation/7.03%3A_Maximum_Likelihood
- Fleshman, W. (2019, March 9). Fundamentals of Machine Learning (Part 2) - Towards Data Science. Medium. https://towardsdatascience.com/maximum-likelihood-estimation-984af2dcfcac
- Author:Jason Siu
- URL:https://jason-siu.com/article%2F1f0b7cca-53f1-4ea0-9da5-afeb2e9198db
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts