SlideShare a Scribd company logo
An introduction to statistical inference
Dr. Abhay Pratap Pandey
University of Delhi
What is inference?
Inference defined:
• An everyday meaning…
We infer a conclusion based on evidence and reasoning
• A statistical meaning…
We infer a property of a population from a sample
Why inference?
The aim of inference is to determine the characteristics of a population
from a sample.
Population
Sample
inferencial statistics
Population and sample
In statistical analysis, a population is a collection of all the
people, items, or events about which one wants to make
inferences. OR
Any well-defined group of subjects, which could be
individuals, firms, cities, or many other possibilities
(For example university students in India.)
In statistical analysis, a sample, is a subset of the population
(i.e. the people, items, or events) that one collects and
analyzes to make inferences. (For example 200 randomly
chosen university students.)
Statistical sample - Subset of the population chosen to represent the
population in a statistical analysis; denoted as (X1,X2, ... Xn).
Random sample- randomly chosen from the population sample of
individuals.
In the case of random sampling, the following techniques can be used:
Independent sampling (draw with replacement) - after each draw the
unit returns to the population.
Dependent sampling (draw without replacement) - after each draw the
unit does not return to the population (no longer participate in the
drawing).
In statistical analysis, an observation is an elements of the sample. (For
example Helena, a student at Central University.)
Sampling
Estimation
Testing of
hypothesis
Statistical inference
Aim of statistical inference
The aim of statistical inference is to learn about the population using the observed
data
This involves:
• computing something with the data
• a statistic: function of data
• interpret the result
• in probabilistic terms: sampling distribution of statistic
Estimation
• Determination of the population parameter by the calculation of a
sample statistic…
Characteristic
Population
Parameter
μ
Sample
Statistic
𝑥
inferencial statistics
A sampling distribution is a probability distribution of a statistic obtained
through a large number of samples drawn from a specific population.
Population
parameter
μ
Sample
Statistic 𝑥1
Sample
Statistic
𝑥2
Sample
Statistic 𝑥3
Uncertainty
Estimates are not perfect
Sampling
distribution
inferencial statistics
Types of estimators in statistics
Estimator
An estimator is a statistic (function of data) that produces such a guess.
We usually mean by “best” an estimator whose sampling distribution is more
concentrated about the population parameter value compared to other
estimators.
The two main types of estimators in statistics are
• Point estimators
• Interval estimators
Point estimation: Point estimators are functions that are used to find an
approximate value of a population parameter from random samples of the
population. They use the sample data of a population to calculate a point
estimate or a statistic that serves as the best estimate of an
unknown parameter of a population. We want to estimate a population
parameter using the observed data.
Ex. some measure of variation, an average, min, max, quantile, etc.
• Interval estimation
Interval estimation uses sample data to calculate the interval
of the possible values of an unknown parameter of a
population. The interval of the parameter is selected in a
way that it falls within a 95% or higher probability, also
known as the confidence interval. The confidence interval is
used to indicate how reliable an estimate is, and it is
calculated from the observed data. The endpoints of the
intervals are referred to as the upper and lower confidence
limits.
Properties of Point Estimators
• Unbiasedness
• Consistency
• Sufficiency
• Efficiency
Unbiasedness
An estimator of a given parameter is said to be unbiased if its expected
value is equal to the true value of the parameter.
The bias of a point estimator is defined as the difference between
the expected value of the estimator and the value of the parameter being
estimated. When
Also, the closer the expected value of a parameter is to the value of the
parameter being measured, the lesser the bias is.
inferencial statistics
Consistency
Consistency tells us how close the point estimator stays to the value of
the parameter as it increases in size. The point estimator requires a
large sample size for it to be more consistent and accurate. You can also
check if a point estimator is consistent by looking at its corresponding
expected value and variance. For the point estimator to be consistent,
the expected value should move toward the true value of the
parameter.
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
Maximum likelihood estimator
The maximum likelihood estimator method of point estimation
attempts to find the unknown parameters that maximize the likelihood
function. It takes a known model and uses the values to compare data
sets and find the most suitable match for the data.
For example, a researcher may be interested in knowing the average
weight of babies born prematurely. Since it would be impossible to
measure all babies born prematurely in the population, the researcher
can take a sample from one location. Since the weight of pre-term
babies follows a normal distribution, the researcher can use the
maximum likelihood estimator to find the average weight of the entire
population of pre-term babies based on the sample data.
inferencial statistics
inferencial statistics
inferencial statistics
Method of moments
The method of moments of estimating parameters was introduced in
1887 by Russian mathematician Pafnuty Chebyshev. It starts by taking
known facts about a population and then applying the facts to a sample
of the population. The first step is to derive equations that relate the
population moments to the unknown parameters.
The next step is to draw a sample of the population to be used to
estimate the population moments. The equations derived in step one
are then solved using the sample mean of the population moments.
This produces the best estimate of the unknown population
parameters.
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
What is Confidence Interval?
A confidence interval is an estimate of an interval in statistics that may
contain a population parameter. The unknown population parameter is
found through a sample parameter calculated from the sampled data.
For example, the population mean μ is found using the sample mean x̅.
The interval is generally defined by its lower and upper bounds. The
confidence interval is expressed as a percentage (the most frequently
quoted percentages are 90%, 95%, and 99%). The percentage reflects
the confidence level.
The concept of the confidence interval is very important in statistics
(hypothesis testing) since it is used as a measure of uncertainty. The
concept was introduced by Polish mathematician and statistician, Jerzy
Neyman in 1937.
Confidence Interval
We can also quantify the uncertainty (sampling distribution) of our
point estimate.
One way of doing this is by constructing an interval that is likely to
contain the population parameter.
One such an interval, which is computed on the basis of the data, is
called a confidence interval.
The sampling probability that the confidence interval will indeed
contain the parameter value is called the confidence level.
We construct confidence intervals for a given confidence level.
Interpretation of Confidence Interval
The proper interpretation of a confidence interval is probably the most
challenging aspect of this statistical concept. One example of the most
common interpretation of the concept is the following:
There is a 95% probability that, in the future, the true value of the
population parameter (e.g., mean) will fall within X [lower bound] and Y
[upper bound] interval.
In addition, we may interpret the confidence interval using the statement
below:
We are 95% confident that the interval between X [lower bound] and Y
[upper bound] contains the true value of the population parameter.
However, it would be inappropriate to state the following:
There is a 95% probability that the interval between X [lower bound] and
Y [upper bound] contains the true value of the population parameter.
How to Calculate the Confidence Interval?
The interval is calculated using the following steps:
• Gather the sample data.
• Calculate the sample mean x̅.
• Determine whether a population’s standard deviation is known or
unknown.
• If a population’s standard deviation is known, we can use a z-score for
the corresponding confidence level.
• If a population’s standard deviation is unknown, we can use a t-
statistic for the corresponding confidence level.
• Find the lower and upper bounds of the confidence interval using the
following formulas:
a. Known population standard deviation
b. Unknown population standard deviation
Examples
• Suppose we conduct a poll to try and get a sense of the outcome of an
upcoming election with two candidates. We poll 1000 people, and 550 of
them respond that they will vote for candidate A .
How confident can we be that a given person will cast their vote for
candidate A?
Sol.
1. Select our desired levels of confidence We’re going to use the 90%,
95%, and 99% levels
2. Calculate α and α/2 Our α values are 0.1, 0.05, and 0.01 respectively
Our α/2 values are 0.05, 0.025, and 0.005
3. Look up the corresponding z-scores Our Zα /2 values are 1.645, 1.96,
and 2.58
4. Multiply the z-score by the standard error to find the margin of error
First we need to calculate the standard error
5. Find the interval by adding and subtracting this product from the mean.
In this case, we are working with a distribution we have not previously
discussed, a normal binomial distribution (i.e. a vote can choose Candidate
A or B, a binomial function).
We have a probability estimator from our sample, where the probability of
an individual in our sample voting for candidate A was found to be 550/1000
or 0.55.
We can use this information in a formula to estimate the standard error for
such a distribution:
5. Multiply the z-score by the standard error cont.
• For a normal binominal distribution, the standard error can be estimated
using:
S.E= 0.0157
• We can now multiply this value by the z-scores to calculate the
margins of error for each conf. level
Multiply the z-score by the standard error cont.
• We calculate the margin of error and add and subtract that value
from the mean (0.55 in this case) to find the bounds of our confidence
intervals at each level of confidence:
CI Zα/2 Margin of error Lower Bounds Upper Bounds
90% 1.645 0.026 0.524 0.576
95% 1.96 0.031 0.519 0.581
99% 2.58 0.041 0.509 0.591
What is Hypothesis Testing?
Hypothesis Testing is a method of statistical inference. It is used to test
if a statement regarding a population parameter is statistically
significant. Hypothesis testing is a powerful tool for testing the power
of predictions.
For example: A Statistician might want to make a prediction of the
mean value a customer would pay for his firm’s product. He can then
formulate a hypothesis, for example, “The average value that
customers will pay for my product is larger than $5”. To statistically test
this question, the firm owner could use hypothesis testing.
Hypothesis testing is formulated in terms of two hypothesis:
• H0: the null hypothesis;
• H1: the alternate hypothesis.
The hypothesis we want to test is if H1 is “likely" true.
So, there are two possible outcomes:
• Reject H0 and accept H1 because of sufficient evidence in the sample
in favor or H1;
• Do not reject H0 because of insufficient evidence to support H1.
Null Hypothesis and Alternative Hypothesis
• Null Hypothesis
• Alternative Hypothesis
The Null Hypothesis is usually set as what we don’t want to be true. It is
the hypothesis to be tested. Therefore, the Null Hypothesis is considered
to be true, until we have sufficient evidence to reject it. If we reject the
null hypothesis, we are led to the alternative hypothesis.
Example of the business owner who is looking for some customer insight.
His null hypothesis would be:
H0 : The average value customers are willing to pay for my product is
smaller than or equal to $5 or H0 : µ ≤ 5(µ = the population mean)
The alternative hypothesis would then be what we are evaluating, so, in
this case, it would be:
Ha : The average value customers are willing to pay for the product is
greater than $5 or Ha : µ > 5
inferencial statistics
Type I and Type II Errors
A Type I Error arises when a true Null Hypothesis is rejected. The
probability of making a Type I Error is also known as the level of
significance of the test, which is commonly referred to as alpha (α). So,
for example, if a test that has its alpha set as 0.01, there is a 1%
probability of rejecting a true null hypothesis or a 1% probability of
making a Type I Error.
A Type II Error arises when you fail to reject a False Null Hypothesis.
The probability of making a Type II Error is commonly denoted by the
Greek letter beta (β). β is used to define the Power of a Test, which is
the probability of correctly rejecting a false null hypothesis.
The Power of a Test is defined as 1-β. A test with more Power is more
desirable, as there is a lower probability of making a Type II Error.
However, there is a tradeoff between the probability of making a Type I
Error and the probability of making a Type II Error.
Properties of hypothesis testing
• Significance level - is the maximum probability of committing a Type I
error. This probability is symbolized by α.
P(Type I error|H0 is true)=α.
• Critical or Rejection Region – the range of values for the test value
that indicate a significant difference and that the null hypothesis
should be rejected.
• Non-critical or Non-rejection Region – the range of values for the test
value that indicates that the difference was probably due to chance
and that the null hypothesis should not be rejected.
inferencial statistics
One tail test(Right tail)
Left-tail test
Two-tail test
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
Steps in hypothesis testing
inferencial statistics
inferencial statistics
Testing a hypothesis about the mean of a population
We have the following steps:
1.Data: determine variable, sample size (n), sample mean( ) ,
population standard deviation or sample standard deviation (s) if is
unknown
2. Assumptions : We have two cases:
Case1: Population is normally or approximately normally distributed
with known or unknown variance (sample size n may be small or large),
Case 2: Population is not normal with known or unknown variance (n is
large i.e. n≥30).
3.Hypothesis: we have three cases
Case I : H0: μ=μ0 Vs HA: μ μ0
e.g. we want to test that the population mean is different than 50
Case II : H0: μ = μ0 Vs HA: μ > μ0
e.g. we want to test that the population mean is greater than 50
Case III : H0: μ = μ0 Vs HA: μ< μ0
e.g. we want to test that the population mean is less than 50
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
inferencial statistics
Example
• Researchers are interested in the mean age of a certain population.
• A random sample of 10 individuals drawn from the population of
interest has a mean of 27.
• Assuming that the population is approximately normally distributed
with variance 20,can we conclude that the mean is different from 30
years ? (α=0.05) .
• If the p - value is 0.0340 how can we use it in making a decision?
Solution
1-Data: variable is age, n=10, =27 ,σ2=20,α=0.05
2-Assumptions: the population is approximately normally distributed with
variance 20
3-Hypotheses:
• H0 : μ=30
• HA: μ 30
4-Test Statistic:
• Z = -2.12
5.Decision Rule
The alternative hypothesis is HA: μ ≠ 30
Hence we reject H0 if Z > Z(1-0.025)= Z(0.975)
• or Z< - Z(1-0.025 )= - Z(0.975)
• Z(0.975)=1.96(from table D)
6.Decision:
• We reject H0 ,since -2.12 is in the rejection region .
• We can conclude that μ is not equal to 30
• Using the p value ,we note that p-value =0.0340< 0.05,therefore we
reject H0
inferencial statistics
inferencial statistics
inferencial statistics
Thankyou

More Related Content

What's hot (20)

PPTX
Testing of hypotheses
RajThakuri
 
PPTX
Statistical inference concept, procedure of hypothesis testing
AmitaChaudhary19
 
PPTX
Goodness of-fit
Long Beach City College
 
PPTX
Descriptive statistics
Dr Resu Neha Reddy
 
PPT
Inferential statistics (2)
rajnulada
 
PPTX
Point estimation
Shahab Yaseen
 
PDF
t-TEst. :D
patatas
 
PPTX
Point Estimation
DataminingTools Inc
 
PPTX
Review & Hypothesis Testing
Sr Edith Bogue
 
PDF
Introduction to Statistics
aan786
 
PDF
Confidence Intervals: Basic concepts and overview
Rizwan S A
 
PPTX
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Dexlab Analytics
 
PDF
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Edureka!
 
PPTX
Basic Concepts of Inferential statistics
Statistics Consultation
 
PPTX
Sample size calculation
Santam Chakraborty
 
PPTX
Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...
EqraBaig
 
PPTX
Sampling techniques
Dr. Ankita Chaturvedi
 
PPTX
What is statistics
Raj Teotia
 
PPTX
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Stats Statswork
 
DOCX
Estimation in statistics
Rabea Jamal
 
Testing of hypotheses
RajThakuri
 
Statistical inference concept, procedure of hypothesis testing
AmitaChaudhary19
 
Goodness of-fit
Long Beach City College
 
Descriptive statistics
Dr Resu Neha Reddy
 
Inferential statistics (2)
rajnulada
 
Point estimation
Shahab Yaseen
 
t-TEst. :D
patatas
 
Point Estimation
DataminingTools Inc
 
Review & Hypothesis Testing
Sr Edith Bogue
 
Introduction to Statistics
aan786
 
Confidence Intervals: Basic concepts and overview
Rizwan S A
 
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Dexlab Analytics
 
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Edureka!
 
Basic Concepts of Inferential statistics
Statistics Consultation
 
Sample size calculation
Santam Chakraborty
 
Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...
EqraBaig
 
Sampling techniques
Dr. Ankita Chaturvedi
 
What is statistics
Raj Teotia
 
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Stats Statswork
 
Estimation in statistics
Rabea Jamal
 

Similar to inferencial statistics (20)

PDF
Estimation and hypothesis testing (2).pdf
MuazbashaAlii
 
PPTX
CH 06; ESTIMATION OF Sample and Population Mean
Atoshe Elmi
 
PPTX
6. point and interval estimation
ONE Virtual Services
 
PPTX
estimation.pptx
NaolAbebe8
 
PPTX
Point and Interval Estimation
Shubham Mehta
 
PPTX
Inferential Statistics-Part-I mtech.pptx
ShaktikantGiri1
 
PDF
Statistical estimation and sample size determination
MikaPop
 
PPTX
5..theory of estimatio..n-converted.pptx
CharuNangia
 
PDF
2_Lecture 2_Confidence_Interval_3.pdf
CHANSreyya1
 
PDF
ESTIMATION-OF-PARAMETERS ON THE SUBJECT STATISTICS
JadidahSaripada
 
PDF
Business statistics-i-part2-aarhus-bss
Antonio Rivero Ostoic
 
PPT
Business Statistics Chapter 8
Lux PP
 
PPT
Msb12e ppt ch06
Subas Nandy
 
PPTX
statistical inference.pptx
SoujanyaLk1
 
PPT
Inferential statistics-estimation
Southern Range, Berhampur, Odisha
 
PPTX
Business Analytics _ Confidence Interval
Ravindra Nath Shukla
 
PPTX
Statistical inference: Estimation
Parag Shah
 
PPT
Ch. 2 (B) Theory of Estimation ABOUT MARKETING
randevbros
 
PDF
Lec9_Estimation for engineering student.pdf
Shamolina
 
PPT
POINT_INTERVAL_estimates.ppt
AngelieLimbagoCagas
 
Estimation and hypothesis testing (2).pdf
MuazbashaAlii
 
CH 06; ESTIMATION OF Sample and Population Mean
Atoshe Elmi
 
6. point and interval estimation
ONE Virtual Services
 
estimation.pptx
NaolAbebe8
 
Point and Interval Estimation
Shubham Mehta
 
Inferential Statistics-Part-I mtech.pptx
ShaktikantGiri1
 
Statistical estimation and sample size determination
MikaPop
 
5..theory of estimatio..n-converted.pptx
CharuNangia
 
2_Lecture 2_Confidence_Interval_3.pdf
CHANSreyya1
 
ESTIMATION-OF-PARAMETERS ON THE SUBJECT STATISTICS
JadidahSaripada
 
Business statistics-i-part2-aarhus-bss
Antonio Rivero Ostoic
 
Business Statistics Chapter 8
Lux PP
 
Msb12e ppt ch06
Subas Nandy
 
statistical inference.pptx
SoujanyaLk1
 
Inferential statistics-estimation
Southern Range, Berhampur, Odisha
 
Business Analytics _ Confidence Interval
Ravindra Nath Shukla
 
Statistical inference: Estimation
Parag Shah
 
Ch. 2 (B) Theory of Estimation ABOUT MARKETING
randevbros
 
Lec9_Estimation for engineering student.pdf
Shamolina
 
POINT_INTERVAL_estimates.ppt
AngelieLimbagoCagas
 
Ad

Recently uploaded (20)

PPTX
Currencyyyy_Comparison_Presentation.pptx
NomanAli40040
 
PPTX
Corporate Finance - Chapter No. Thirteen
HarisMunir8
 
PPTX
Why Indians Are Shifting from Fixed Deposits to Mutual Funds
wenigo4302
 
PPTX
General Chemistry 2_Lesson Outline Topics.pptx
nacinopa016
 
PPTX
How to Mitigate Risk as part of Generative AI Deployment
PaulYoung221210
 
PDF
Euler Finance Thesis - BlackRock $3B Buidl Fund
iashsingh
 
PDF
HUMAN VALUES power point presentation for bba and bcom
eshitakashyap
 
PPTX
Seminar 4 Slides - Week 4 - Islamic Capital Markets.pptx
MushtaqHussain412351
 
PDF
Internet of service and industrial internet
Claudia Lanteri
 
PDF
World Energy Investment 2025 - What are the status of countries in transition...
ProbodhMallick1
 
PPTX
Market Structure -A topic of economics.pptx
gagansharmabbapro23
 
PPTX
UNIT V Indian_Economy_Structure_and_Future.pptx
sakthisri321
 
PPTX
united states Presentation Debt crises 1st.pptx
marialabib40
 
PDF
Food_Supply_Chain_Management- Building a Sustainable Future (Second Edition).pdf
VananhVuong1
 
PPTX
ROYALTY PPTdownlad. Ffggfffgbfghggd .pptx
govindkorgaonkar21
 
PDF
86% of Enterprises Do Not Plan Production Changes in the Next Two Years – IER...
Інститут економічних досліджень та політичних консультацій
 
PPTX
Gender Budget and Women Empowerment in the Current scenario.pptx
radhikaramesh14
 
PPTX
personal financial planning .. jatin.pptx
govindkorgaonkar21
 
PPTX
Economics as Social and Applied Science.
RubyJaneVirolaTaunan
 
PPTX
Power BI Dashboard Design Core Principles of Effective Data Visualization.pptx
360factors
 
Currencyyyy_Comparison_Presentation.pptx
NomanAli40040
 
Corporate Finance - Chapter No. Thirteen
HarisMunir8
 
Why Indians Are Shifting from Fixed Deposits to Mutual Funds
wenigo4302
 
General Chemistry 2_Lesson Outline Topics.pptx
nacinopa016
 
How to Mitigate Risk as part of Generative AI Deployment
PaulYoung221210
 
Euler Finance Thesis - BlackRock $3B Buidl Fund
iashsingh
 
HUMAN VALUES power point presentation for bba and bcom
eshitakashyap
 
Seminar 4 Slides - Week 4 - Islamic Capital Markets.pptx
MushtaqHussain412351
 
Internet of service and industrial internet
Claudia Lanteri
 
World Energy Investment 2025 - What are the status of countries in transition...
ProbodhMallick1
 
Market Structure -A topic of economics.pptx
gagansharmabbapro23
 
UNIT V Indian_Economy_Structure_and_Future.pptx
sakthisri321
 
united states Presentation Debt crises 1st.pptx
marialabib40
 
Food_Supply_Chain_Management- Building a Sustainable Future (Second Edition).pdf
VananhVuong1
 
ROYALTY PPTdownlad. Ffggfffgbfghggd .pptx
govindkorgaonkar21
 
86% of Enterprises Do Not Plan Production Changes in the Next Two Years – IER...
Інститут економічних досліджень та політичних консультацій
 
Gender Budget and Women Empowerment in the Current scenario.pptx
radhikaramesh14
 
personal financial planning .. jatin.pptx
govindkorgaonkar21
 
Economics as Social and Applied Science.
RubyJaneVirolaTaunan
 
Power BI Dashboard Design Core Principles of Effective Data Visualization.pptx
360factors
 
Ad

inferencial statistics

  • 1. An introduction to statistical inference Dr. Abhay Pratap Pandey University of Delhi
  • 2. What is inference? Inference defined: • An everyday meaning… We infer a conclusion based on evidence and reasoning • A statistical meaning… We infer a property of a population from a sample
  • 3. Why inference? The aim of inference is to determine the characteristics of a population from a sample. Population Sample
  • 5. Population and sample In statistical analysis, a population is a collection of all the people, items, or events about which one wants to make inferences. OR Any well-defined group of subjects, which could be individuals, firms, cities, or many other possibilities (For example university students in India.) In statistical analysis, a sample, is a subset of the population (i.e. the people, items, or events) that one collects and analyzes to make inferences. (For example 200 randomly chosen university students.)
  • 6. Statistical sample - Subset of the population chosen to represent the population in a statistical analysis; denoted as (X1,X2, ... Xn). Random sample- randomly chosen from the population sample of individuals. In the case of random sampling, the following techniques can be used: Independent sampling (draw with replacement) - after each draw the unit returns to the population. Dependent sampling (draw without replacement) - after each draw the unit does not return to the population (no longer participate in the drawing). In statistical analysis, an observation is an elements of the sample. (For example Helena, a student at Central University.)
  • 8. Aim of statistical inference The aim of statistical inference is to learn about the population using the observed data This involves: • computing something with the data • a statistic: function of data • interpret the result • in probabilistic terms: sampling distribution of statistic
  • 9. Estimation • Determination of the population parameter by the calculation of a sample statistic… Characteristic Population Parameter μ Sample Statistic 𝑥
  • 11. A sampling distribution is a probability distribution of a statistic obtained through a large number of samples drawn from a specific population. Population parameter μ Sample Statistic 𝑥1 Sample Statistic 𝑥2 Sample Statistic 𝑥3 Uncertainty Estimates are not perfect Sampling distribution
  • 13. Types of estimators in statistics Estimator An estimator is a statistic (function of data) that produces such a guess. We usually mean by “best” an estimator whose sampling distribution is more concentrated about the population parameter value compared to other estimators. The two main types of estimators in statistics are • Point estimators • Interval estimators Point estimation: Point estimators are functions that are used to find an approximate value of a population parameter from random samples of the population. They use the sample data of a population to calculate a point estimate or a statistic that serves as the best estimate of an unknown parameter of a population. We want to estimate a population parameter using the observed data. Ex. some measure of variation, an average, min, max, quantile, etc.
  • 14. • Interval estimation Interval estimation uses sample data to calculate the interval of the possible values of an unknown parameter of a population. The interval of the parameter is selected in a way that it falls within a 95% or higher probability, also known as the confidence interval. The confidence interval is used to indicate how reliable an estimate is, and it is calculated from the observed data. The endpoints of the intervals are referred to as the upper and lower confidence limits.
  • 15. Properties of Point Estimators • Unbiasedness • Consistency • Sufficiency • Efficiency Unbiasedness An estimator of a given parameter is said to be unbiased if its expected value is equal to the true value of the parameter. The bias of a point estimator is defined as the difference between the expected value of the estimator and the value of the parameter being estimated. When Also, the closer the expected value of a parameter is to the value of the parameter being measured, the lesser the bias is.
  • 17. Consistency Consistency tells us how close the point estimator stays to the value of the parameter as it increases in size. The point estimator requires a large sample size for it to be more consistent and accurate. You can also check if a point estimator is consistent by looking at its corresponding expected value and variance. For the point estimator to be consistent, the expected value should move toward the true value of the parameter.
  • 22. Maximum likelihood estimator The maximum likelihood estimator method of point estimation attempts to find the unknown parameters that maximize the likelihood function. It takes a known model and uses the values to compare data sets and find the most suitable match for the data. For example, a researcher may be interested in knowing the average weight of babies born prematurely. Since it would be impossible to measure all babies born prematurely in the population, the researcher can take a sample from one location. Since the weight of pre-term babies follows a normal distribution, the researcher can use the maximum likelihood estimator to find the average weight of the entire population of pre-term babies based on the sample data.
  • 26. Method of moments The method of moments of estimating parameters was introduced in 1887 by Russian mathematician Pafnuty Chebyshev. It starts by taking known facts about a population and then applying the facts to a sample of the population. The first step is to derive equations that relate the population moments to the unknown parameters. The next step is to draw a sample of the population to be used to estimate the population moments. The equations derived in step one are then solved using the sample mean of the population moments. This produces the best estimate of the unknown population parameters.
  • 38. What is Confidence Interval? A confidence interval is an estimate of an interval in statistics that may contain a population parameter. The unknown population parameter is found through a sample parameter calculated from the sampled data. For example, the population mean μ is found using the sample mean x̅. The interval is generally defined by its lower and upper bounds. The confidence interval is expressed as a percentage (the most frequently quoted percentages are 90%, 95%, and 99%). The percentage reflects the confidence level. The concept of the confidence interval is very important in statistics (hypothesis testing) since it is used as a measure of uncertainty. The concept was introduced by Polish mathematician and statistician, Jerzy Neyman in 1937.
  • 39. Confidence Interval We can also quantify the uncertainty (sampling distribution) of our point estimate. One way of doing this is by constructing an interval that is likely to contain the population parameter. One such an interval, which is computed on the basis of the data, is called a confidence interval. The sampling probability that the confidence interval will indeed contain the parameter value is called the confidence level. We construct confidence intervals for a given confidence level.
  • 40. Interpretation of Confidence Interval The proper interpretation of a confidence interval is probably the most challenging aspect of this statistical concept. One example of the most common interpretation of the concept is the following: There is a 95% probability that, in the future, the true value of the population parameter (e.g., mean) will fall within X [lower bound] and Y [upper bound] interval. In addition, we may interpret the confidence interval using the statement below: We are 95% confident that the interval between X [lower bound] and Y [upper bound] contains the true value of the population parameter. However, it would be inappropriate to state the following: There is a 95% probability that the interval between X [lower bound] and Y [upper bound] contains the true value of the population parameter.
  • 41. How to Calculate the Confidence Interval? The interval is calculated using the following steps: • Gather the sample data. • Calculate the sample mean x̅. • Determine whether a population’s standard deviation is known or unknown. • If a population’s standard deviation is known, we can use a z-score for the corresponding confidence level. • If a population’s standard deviation is unknown, we can use a t- statistic for the corresponding confidence level.
  • 42. • Find the lower and upper bounds of the confidence interval using the following formulas: a. Known population standard deviation
  • 43. b. Unknown population standard deviation
  • 44. Examples • Suppose we conduct a poll to try and get a sense of the outcome of an upcoming election with two candidates. We poll 1000 people, and 550 of them respond that they will vote for candidate A . How confident can we be that a given person will cast their vote for candidate A? Sol. 1. Select our desired levels of confidence We’re going to use the 90%, 95%, and 99% levels 2. Calculate α and α/2 Our α values are 0.1, 0.05, and 0.01 respectively Our α/2 values are 0.05, 0.025, and 0.005 3. Look up the corresponding z-scores Our Zα /2 values are 1.645, 1.96, and 2.58 4. Multiply the z-score by the standard error to find the margin of error First we need to calculate the standard error
  • 45. 5. Find the interval by adding and subtracting this product from the mean. In this case, we are working with a distribution we have not previously discussed, a normal binomial distribution (i.e. a vote can choose Candidate A or B, a binomial function). We have a probability estimator from our sample, where the probability of an individual in our sample voting for candidate A was found to be 550/1000 or 0.55. We can use this information in a formula to estimate the standard error for such a distribution: 5. Multiply the z-score by the standard error cont. • For a normal binominal distribution, the standard error can be estimated using: S.E= 0.0157
  • 46. • We can now multiply this value by the z-scores to calculate the margins of error for each conf. level Multiply the z-score by the standard error cont. • We calculate the margin of error and add and subtract that value from the mean (0.55 in this case) to find the bounds of our confidence intervals at each level of confidence: CI Zα/2 Margin of error Lower Bounds Upper Bounds 90% 1.645 0.026 0.524 0.576 95% 1.96 0.031 0.519 0.581 99% 2.58 0.041 0.509 0.591
  • 47. What is Hypothesis Testing? Hypothesis Testing is a method of statistical inference. It is used to test if a statement regarding a population parameter is statistically significant. Hypothesis testing is a powerful tool for testing the power of predictions. For example: A Statistician might want to make a prediction of the mean value a customer would pay for his firm’s product. He can then formulate a hypothesis, for example, “The average value that customers will pay for my product is larger than $5”. To statistically test this question, the firm owner could use hypothesis testing.
  • 48. Hypothesis testing is formulated in terms of two hypothesis: • H0: the null hypothesis; • H1: the alternate hypothesis. The hypothesis we want to test is if H1 is “likely" true. So, there are two possible outcomes: • Reject H0 and accept H1 because of sufficient evidence in the sample in favor or H1; • Do not reject H0 because of insufficient evidence to support H1.
  • 49. Null Hypothesis and Alternative Hypothesis • Null Hypothesis • Alternative Hypothesis The Null Hypothesis is usually set as what we don’t want to be true. It is the hypothesis to be tested. Therefore, the Null Hypothesis is considered to be true, until we have sufficient evidence to reject it. If we reject the null hypothesis, we are led to the alternative hypothesis. Example of the business owner who is looking for some customer insight. His null hypothesis would be: H0 : The average value customers are willing to pay for my product is smaller than or equal to $5 or H0 : µ ≤ 5(µ = the population mean) The alternative hypothesis would then be what we are evaluating, so, in this case, it would be: Ha : The average value customers are willing to pay for the product is greater than $5 or Ha : µ > 5
  • 51. Type I and Type II Errors A Type I Error arises when a true Null Hypothesis is rejected. The probability of making a Type I Error is also known as the level of significance of the test, which is commonly referred to as alpha (α). So, for example, if a test that has its alpha set as 0.01, there is a 1% probability of rejecting a true null hypothesis or a 1% probability of making a Type I Error. A Type II Error arises when you fail to reject a False Null Hypothesis. The probability of making a Type II Error is commonly denoted by the Greek letter beta (β). β is used to define the Power of a Test, which is the probability of correctly rejecting a false null hypothesis.
  • 52. The Power of a Test is defined as 1-β. A test with more Power is more desirable, as there is a lower probability of making a Type II Error. However, there is a tradeoff between the probability of making a Type I Error and the probability of making a Type II Error.
  • 54. • Significance level - is the maximum probability of committing a Type I error. This probability is symbolized by α. P(Type I error|H0 is true)=α. • Critical or Rejection Region – the range of values for the test value that indicate a significant difference and that the null hypothesis should be rejected. • Non-critical or Non-rejection Region – the range of values for the test value that indicates that the difference was probably due to chance and that the null hypothesis should not be rejected.
  • 66. Testing a hypothesis about the mean of a population We have the following steps: 1.Data: determine variable, sample size (n), sample mean( ) , population standard deviation or sample standard deviation (s) if is unknown 2. Assumptions : We have two cases: Case1: Population is normally or approximately normally distributed with known or unknown variance (sample size n may be small or large), Case 2: Population is not normal with known or unknown variance (n is large i.e. n≥30).
  • 67. 3.Hypothesis: we have three cases Case I : H0: μ=μ0 Vs HA: μ μ0 e.g. we want to test that the population mean is different than 50 Case II : H0: μ = μ0 Vs HA: μ > μ0 e.g. we want to test that the population mean is greater than 50 Case III : H0: μ = μ0 Vs HA: μ< μ0 e.g. we want to test that the population mean is less than 50
  • 74. Example • Researchers are interested in the mean age of a certain population. • A random sample of 10 individuals drawn from the population of interest has a mean of 27. • Assuming that the population is approximately normally distributed with variance 20,can we conclude that the mean is different from 30 years ? (α=0.05) . • If the p - value is 0.0340 how can we use it in making a decision?
  • 75. Solution 1-Data: variable is age, n=10, =27 ,σ2=20,α=0.05 2-Assumptions: the population is approximately normally distributed with variance 20 3-Hypotheses: • H0 : μ=30 • HA: μ 30 4-Test Statistic: • Z = -2.12 5.Decision Rule The alternative hypothesis is HA: μ ≠ 30 Hence we reject H0 if Z > Z(1-0.025)= Z(0.975) • or Z< - Z(1-0.025 )= - Z(0.975) • Z(0.975)=1.96(from table D)
  • 76. 6.Decision: • We reject H0 ,since -2.12 is in the rejection region . • We can conclude that μ is not equal to 30 • Using the p value ,we note that p-value =0.0340< 0.05,therefore we reject H0