SlideShare a Scribd company logo
Statistical Inference and Hypothesis Testing
by
Dr. Priyanka Dixit
TISS, Mumbai
Descriptive and Inferential Statistics
• Descriptive statistics is the term given to the analysis
of data that helps describe, show or summarize data
in a meaningful way such that, pattern might emerge
from the data. It do not, however, allow us to make
conclusions beyond the data we have analysed or
reach conclusions regarding any hypotheses we might
have made.
• It is applicable to properly describe data through
statistics and graphs.
Inferential Statistics
• Inferential statistics are techniques that allow
us to use these samples to make generalization
about the populations from which the samples
were drawn.
Statistical Inference
The process of generalization in prescribed
manner from a sample to its universe is known as
Statistical Inference.
Universe/
Population
µ σ
SAMPL
E
Population Parameters
µ: Population mean
σ: Population standard deviation
Sample Statistic
x: Sample mean
s: Sample standard deviation
X s
Statistical Inference
• Inductive Inference: Extension from particular
to the general is called inductive inference.
• Inductive inference involves element of
uncertainty in the conclusions.
• Deductive Inference
• Deductive inference can be described as
a method of deriving information from
the accepted facts, involves no
uncertainty in the conclusions. The
conclusions reached by deductive
inference are conclusive.
Population and Sample
• The population is an abstract term that refers
to the totality of all conceptually possible
observations, measurements or outcomes of
some specified kind.
• The number of conceptually possible
observations is called the size of the
population.
• The size varies according to the population
being investigated.
Contd…
• For example, a study of monthly income may be
conducted at a district, state and country level.
• So, in the first case, the population will consist of
the income of one district, all residents of the
state in the second case and in the third case
income of all citizens of the country.
• A population may be finite when it consists of a
given number of observations and infinite when
it includes infinite number of observations.
Sample
• A sample is a set of observations selected from
the population.
• The number of observations included in the
sample is called the size of the sample.
• In finite population, a random sample is obtained
by giving every individual in the population an
equal chance of being chosen.
• In case of infinite population, a sample is random
if each observation is independent of every other
observation.
Parameter/Statistics
• Population and samples are studied through
their characteristics. The most important of
these characteristics are the Mean, the
Variance and the Standard deviation.
• The characteristics of a population are called
parameters.
• The characteristics of sample are called
statistics.
Parameters
(Population)
Statistics (Sample)
Population Mean Sample Mean
Population Variance Sample Variance
Population Standard
Deviation
Sample Standard
Deviation
• The purpose of statistical inference is to make a
judgment about the particular parameters on
the basis of sample statistics.
• The judgment relating to population parameters
are of two types; one is related to estimation of
a parameter, the other with testing hypothesis
about the parameter.
Hypothesis Testing
Hypothesis testing in inferential statistics involves
making inferences about the nature of the
population on the basis of observations of a
sample drawn from the population. The
hypothesis is tested against the information
provided by sample in the form of a test-statistic.
What is Statistical Hypothesis?
A Hypothesis is a statement about one or more
population parameters.
Null Hypothesis
What is null hypothesis?
A null hypothesis (H0) is a hypothesis of no
relationship or no difference.
Steps in hypothesis testing
1. State the Hypothesis
2. Set the criterion for rejecting H0
3. Compute the test statistic
4. Decide whether to reject H0
1. State the Hypothesis
In inferential statistics, the term hypothesis has a very
specific meaning: conjecture about one or more
population parameters.
The hypothesis to be tested is called the null hypothesis
and is given the symbol H0.
Example: We use a null hypothesis that the mean
quantitative GRE score of the population of MPH
students is 455.
Thus, our null hypothesis, written in symbols, is
H0: µ = 455 OR H0: µ-455 = 0
Where
µ = population mean
455= Hypothesis value to be tested
We test the null hypothesis (H0) against the
alternative hypothesis (symbolized H1), which
includes the possible outcomes not covered by the
null hypothesis.
For the above example we will use the alternative
hypothesis as
H1 : µ ≠ 455
The alternative hypothesis, often considered the
research hypothesis, can be supported only be
rejecting the null hypothesis.
2. Set the Criterion for Rejecting H0
After stating the hypothesis the next step in hypothesis testing is
determining how different the sample statistic must be from
the hypothesized population parameter (µ) before the null
hypothesis can be rejected.
For our example, suppose we randomly select 144 MPH students
from the population and find the sample mean to be 535. Is
this sample mean =535 sufficiently different from what we
hypothesize for the population mean (µ = 455) to warrant rejecting
null hypothesis.
Before answering this question, we need to consider three
concepts: (i) errors in hypothesis testing, (ii) level of significance,
and (iii) Region of rejection
Properties of Normal Distribution
8. The areas of a normal curve are measured in standard deviation units.
The proportions of cases in specified areas of a normal curve, as
marked by standard deviations, are constant as detailed below:
Number of standard Results lying outside
deviation from mean this (%)
1.00
31.74
1.64
10.00
1.96
5.00
2.58
1.00
3.29
0.10
i. Errors in hypothesis testing
When we decide to reject or not reject the null
hypothesis, there are four possible situations:
a. A true hypothesis is rejected.
b. A true hypothesis is not rejected.
c. A false hypothesis is not rejected
d. A false hypothesis is rejected
In a specific situation, we may make one of two types
of errors, as shown in the figure below:
Decision made State of nature
Null hypothesis is
true
Null hypothesis is
false
Reject null
hypothesis Type I error
Correct
decision
Do not reject null
hypothesis
Correct
decision
Type II error
Example
Verdict of Jury
Defendant
Guilty Innocent
Not Guilty Incorrect Correct decision
Guilty Correct decision Incorrect
Contd… Errors
Type I error is when we reject a true null
hypothesis.
Type II error is when we do not reject a false
null hypothesis
ii. Level of significance
• To choose the criterion for rejecting H0, the
researcher must first select what is called the level of
significance.
• The level of significance or alpha (α) level is defined
as the probability of making a Type I error when
testing a null hypothesis.
• The level of significance is the probability of making a
Type I error: rejecting H0 when it is true.
Power of the Test
• Type II error involves acceptance of H0 when it is actually false
or not finding an effect when actually there is an effect.
• β is the probability of type II error.
• (1-β) is called the power of the test= Probability of finding an
effect when actually there is an effect.
• Power of a statistical test is analogous to the sensitivity of a
diagnostic test.
• α being the false positive.
• β being the false negative.
iii. Region of Rejection
• The region of rejection is the area of the sampling
distribution that represents those values of the sample
mean that are improbable if the null hypothesis is true.
• The Critical values of the tests statistic are those values in
the sampling distribution that represent the beginning of the
region of rejection.
• When the alternative hypothesis is non-directional, the
region of rejection is located in both tails of the sampling
distribution. The test of the null hypothesis against this non-
directional alternative is called a two-tailed test.
• The probability of obtaining a mean as extreme as or more
extreme than the observed sample mean (xbar), given that
the null hypothesis is true, is called the p-value of the test or
p.
Properties of Normal Distribution
8. Properties of Normal Distribution
The areas of a normal curve are measured in standard deviation units.
The proportions of cases in specified areas of a normal curve, as
marked by standard deviations, are constant as detailed below:
Number of standard Results lying outside
deviation from mean this (%)
1.00
31.74
1.64
10.00
1.96
5.00
2.58
1.00
3.29
0.10
Region of rejection for sampling distribution of the mean for null
hypothesis H0 : µ = 455 and S.D. (σx) = 8.33
3. Compute the Test Statistic
In our example
µ=455, the hypothesized value for the parameter
n=144, the size of the sample
= 535, the observed value for the sample statistic
σ=100, the value of the standard deviation in the population
First using the concept of z scores, we determine how
Different is from µ, or the number of standard errors
(standard deviation units) the observed sample value is
from the hypothesized value.
In symbols,
calculating the z score using above formula is called
computing the test statistic
4. Decide about H0
Suppose we had found that the sample mean
for 144 students was not 535, but 465. Our
hypotheses, sampling distribution, and critical
values (+1.96 and -1.96) remain the same, but
now the test statistic is
In other words, the observed sample mean ( = 465) is 1.20
standard errors above the hypothesized value of the
population mean.
Theoretical sampling distribution for the hypothesis H0:µ=45,
illustrating the values of the test statistic when =465
Note that the test statistic (1.20) does not exceed the critical value; it does not fall
into the region of the rejection; and we should not reject the null hypothesis .
-1.96 +1.96
1.20 9.60
• This test statistic (1.20) is then compared to
the critical value (1.96).
• If the test statistic exceeds the critical values
in absolute value, then the null hypothesis is
rejected.
• If the test statistic does not exceeds the
critical values in absolute value, then the null
hypothesis is accepted.
Region of rejection : Directional Alternative Hypothesis
In the GRE example, we tested the null hypothesis against a
non-directional alternative:
H0 : µ = 455
H1 : µ ≠ 455
This test is called two-tailed or non-directional because the
region of rejection was located in both tails of the sampling
distribution of the mean.
Suppose a direction of the results is anticipated. A directional
hypothesis states that a parameter is either greater or less than
the hypothesis value.
For instance, in the GRE example we might use the alternative
hypothesis that the mean GRE level of our population is greater
than 455, in symbols,
H0 : µ = 455
H1 : µ > 455
An alternative hypothesis can be either non-directional
or directional.
A directional alternative hypothesis states that the
parameter is greater than or less than the
hypothesized value.
A non-directional alternative hypothesis merely
states that the parameter is different from (not equal
to) the hypothesized value.
The test of the null hypothesis against a directional
alternative is called a one-tailed test, the region of
rejection is located in one of the two tails of the
sampling distribution. The specific tail of the
distribution is determined by the direction of the
alternative hypothesis.
Now suppose the alternative hypothesis states that the
mean GRE was less than 455. In symbols, the
hypotheses are
H0 : µ = 455
H1 : µ < 455
Here the critical region lies on the left tail of the
distribution.
Type-I and Type-II Errors in Decision Making
In a specific situation, we may make one of two types of
errors, as shown in the figure below:
Decision taken by
the investigator
Existing Reality
Group A=Group B Group A # Group B
Group A # Group B P[ Type-I Error]
(Level of significance)
Correct Decision
(Power of the study)
Group A=Group B Correct Decision
(Level of confidence)
Type – II Error
Testing of Hypothesis
Q=1 A random sample of 100 observations from a
population with standard deviation 60 yielded a
sample mean of 100.
(a) Test the null hypothesis that µ=100 against the
alternative hypothesis (µ≠100) using α=0.05.
(b) Test the null hypothesis that µ=100 against the
alternative hypothesis (µ>100) using α=0.05
Testing of Hypothesis
Ex=1 A random sample of 200 observations from a
population with standard deviation 80 yielded a
sample mean of 150.
(a) Test the null hypothesis that µ=100 against the
alternative hypothesis (µ≠100) using α=0.05.
(b) Test the null hypothesis that µ=100 against the
alternative hypothesis (µ>100) using α=0.05
• Ex=2 A random sample of 100 observations
from a population with standard deviation 60
yielded a sample mean of 100.
• (a)Test the null hypothesis that µ=111 against
the alternative hypothesis (µ≠111) using α=0.05.
• (b) Test the null hypothesis that µ<=111 against
the alternative hypothesis (µ>111) using α=0.05
• Explain why the results differ.
Lecture_Hypothesis_Testing statistics .pptx
Q=2 The heights of 10 males of a given locality
are found to be as follows:
70, 67, 62, 68, 61, 68, 70, 64, 64, 66 inches.
Is it reasonable to believe that the average height
is greater than 64 inches?
What will be the finding if alternative hypothesis
was two-tailed
Contd.. Answer
Mean=66; S.D.=3.16 and Variance=10.00, t=2.00
• The tabulated value of t-statistic at 9 d.f. and α=0.05
(one-tailed) is 1.833
• Since calculated value is greater than the tabulated
value, we will reject the null hypothesis. We can
believe that mean height is greater than 64 inches.
• What will be the finding if alternative hypothesis was
two-tailed (answer it).
Student’s t Distributions
Does the adjustment of using s to estimate σ have an effect on the
statistical test? Actually, it does, especially for small samples.
The effect is that the normal distribution is inappropriate as the
sampling distribution of the mean.
In the beginning of the 20th century William S. Gosset found that,
for small samples, sampling distribution departed substantially
from
the normal distribution and that, as sample sizes changed, the
distributions changed.
This gave rise to not one distribution but a family of distributions.
The t distributions are a family of symmetrical, bell-shaped
distributions that change as the sample size changes.
Degrees of Freedom
Degrees of Freedom : The number of degrees
of freedom is a mathematical concept defined
as the number of observations less the
number of restrictions placed on them.
Student’s t distribution for 1, 2, 5, 10, and
∞ degrees of freedom
Lecture_Hypothesis_Testing statistics .pptx
Point Estimates and Interval
Estimates
A point estimate is a single value that represent the
best estimate of the population value. If we are
estimating the mean of a population (µ), then the
sample mean is the best point estimates.
Interval Estimation builds on points estimation to arrive
at a range of values that are tenable for the
parameter and that define an interval we are
confident contains the parameter.
Confidence Interval
CI= ± (ZCV) (σX)
Where
= Sample mean
ZCV = Critical value using the normal distribution and
σX = Standard error of the mean
Confidence Interval
CI= ± (tCV) (sX)
Where
= Sample mean
tCV = Critical value using appropriate t distribution and
sX = estimated standard error of the mean from the
sample
Comparison of Two Means
• Q=As part of an investigation of the development of infant sleep patterns,
the sleep of 20 infants (10 male and 10 female) was monitored on several
occasions between 1 week and 6 months of age. The quiet sleep results
(in minutes) at 1 week of age for the 20 study infants follow.
• Is there evidence of a difference in quiet sleep behavior between two
genders?
• Is there evidence that male mean quiet sleep behavior is higher than
female?
Quiet sleep
(male)
85 129 215 143 44 173 230 198 105 127 Mean=
144.90
Quiet sleep
(female)
140 155 33 209 166 72 116 131 97 124 Mean=
124.30
Sp is pooled variance, Sm^2 and Sf^2 is variance of two sample set
Contd… Answer
For male; S1=59.35; S1
2=3522.54; Mean=144.90
For female; S2=49.48; S2
2=2448.011; Mean=124.30
• t=0.843 at 18 d.f.
Paired-t-test
• As part of a study to determine the effects of a certain oral contraceptive
on weight gain; nine healthy females were weighted at the beginning of a
course of oral contraceptive use. They were reweighed after 3 months.
Results are given below. Do the results suggest evidence of weight gain?
• Longitudinal Study/Real-Cohort Study
Subject Initial weight (LBS) 3 - Months weight
(LBS)
1 120 123
2 141 143
3 130 140
4 150 145
5 135 140
6 140 143
7 120 118
8 140 141
9 130 132
• Contd… Answer
• t=1.509
• One-tailed
• Tabulated value of t at α=0.05 and d.f. =8 is 1.860 (one-
tailed).
Male Female
42.1 41.3 42.4 43.2 41.8 42.7 43.8 42.5 43.1 44.0
41.0 41.8 42.8 42.3 42.7 43.6 43.3 43.5 41.7 44.1
Do the data provide sufficient evidence to conclude that, on the
average, the male weight is greater than female weight? Perform
the required hypothesis test at the 5% level of significance.
Proportion Test
• Q=1 In a sample of 1000 people in Maharashtra, 540 are rice
eaters and the rest are wheat eaters. Can we assume that
both rice and wheat are equally popular in this state at 1%
level of significance?
Z tabulated at 1% level of significance is 2.58 (two-tailed).
Q=2 Twenty people were attacked by a disease and only 18
survived. Will you reject the hypothesis that the survival rate,
if attacked by this disease, is 85% in favour of the hypothesis
that it is more, at 5% level.
Z tabulated at 5% level of significance is 2.58 (one-tailed).
Q=3 In a year there are 956 births in a town A of which 52.5%
were males, while in towns A and B combined, this proportion
in a total of 1406 births was 0.496. Is there any significant
difference in the proportion of male births in the two towns?
Z tabulated at 5% level of significance is 1.96 (two-tailed).
References
• Medical Statistics-Principles & Methods by K.R.
Sundaram, S. N. Dwivedi and V Sreenivas.
Ad

More Related Content

Similar to Lecture_Hypothesis_Testing statistics .pptx (20)

Topic 7 stat inference
Topic 7 stat inferenceTopic 7 stat inference
Topic 7 stat inference
Sizwan Ahammed
 
Day-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptxDay-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptx
rjaisankar
 
6 estimation hypothesis testing t test
6 estimation hypothesis testing t test6 estimation hypothesis testing t test
6 estimation hypothesis testing t test
Penny Jiang
 
Testing Of Hypothesis
Testing Of HypothesisTesting Of Hypothesis
Testing Of Hypothesis
SWATI SINGH
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)
Harve Abella
 
inferencial statistics
inferencial statisticsinferencial statistics
inferencial statistics
anjaemerry
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Kaori Kubo Germano, PhD
 
hypothesis testing
hypothesis testinghypothesis testing
hypothesis testing
ilona50
 
20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd
HimanshuSharma723273
 
312320.pptx
312320.pptx312320.pptx
312320.pptx
YogeshPatel28169
 
teast mean one and two sample
teast mean one and two sampleteast mean one and two sample
teast mean one and two sample
Muzamil Hussain
 
Confidence intervals, hypothesis testing and statistical tests of significanc...
Confidence intervals, hypothesis testing and statistical tests of significanc...Confidence intervals, hypothesis testing and statistical tests of significanc...
Confidence intervals, hypothesis testing and statistical tests of significanc...
Subramani Parasuraman
 
Data Science : Unit-I -Hypothesis and Inferences.pptx
Data Science : Unit-I -Hypothesis and Inferences.pptxData Science : Unit-I -Hypothesis and Inferences.pptx
Data Science : Unit-I -Hypothesis and Inferences.pptx
subhashchandra197
 
hypothesis testing
 hypothesis testing hypothesis testing
hypothesis testing
zoheb khan
 
Ds 2251 -_hypothesis test
Ds 2251 -_hypothesis testDs 2251 -_hypothesis test
Ds 2251 -_hypothesis test
Khulna University
 
Statistical inference concept, procedure of hypothesis testing
Statistical inference   concept, procedure of hypothesis testingStatistical inference   concept, procedure of hypothesis testing
Statistical inference concept, procedure of hypothesis testing
AmitaChaudhary19
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
Sanjay Basukala
 
hypothesis test
 hypothesis test hypothesis test
hypothesis test
Unsa Shakir
 
Testing of Hypothesis using Z dist..pptx
Testing of Hypothesis  using Z dist..pptxTesting of Hypothesis  using Z dist..pptx
Testing of Hypothesis using Z dist..pptx
UVAS
 
7 hypothesis testing
7 hypothesis testing7 hypothesis testing
7 hypothesis testing
AASHISHSHRIVASTAV1
 
Topic 7 stat inference
Topic 7 stat inferenceTopic 7 stat inference
Topic 7 stat inference
Sizwan Ahammed
 
Day-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptxDay-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptx
rjaisankar
 
6 estimation hypothesis testing t test
6 estimation hypothesis testing t test6 estimation hypothesis testing t test
6 estimation hypothesis testing t test
Penny Jiang
 
Testing Of Hypothesis
Testing Of HypothesisTesting Of Hypothesis
Testing Of Hypothesis
SWATI SINGH
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)
Harve Abella
 
inferencial statistics
inferencial statisticsinferencial statistics
inferencial statistics
anjaemerry
 
hypothesis testing
hypothesis testinghypothesis testing
hypothesis testing
ilona50
 
20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd
HimanshuSharma723273
 
teast mean one and two sample
teast mean one and two sampleteast mean one and two sample
teast mean one and two sample
Muzamil Hussain
 
Confidence intervals, hypothesis testing and statistical tests of significanc...
Confidence intervals, hypothesis testing and statistical tests of significanc...Confidence intervals, hypothesis testing and statistical tests of significanc...
Confidence intervals, hypothesis testing and statistical tests of significanc...
Subramani Parasuraman
 
Data Science : Unit-I -Hypothesis and Inferences.pptx
Data Science : Unit-I -Hypothesis and Inferences.pptxData Science : Unit-I -Hypothesis and Inferences.pptx
Data Science : Unit-I -Hypothesis and Inferences.pptx
subhashchandra197
 
hypothesis testing
 hypothesis testing hypothesis testing
hypothesis testing
zoheb khan
 
Statistical inference concept, procedure of hypothesis testing
Statistical inference   concept, procedure of hypothesis testingStatistical inference   concept, procedure of hypothesis testing
Statistical inference concept, procedure of hypothesis testing
AmitaChaudhary19
 
hypothesis test
 hypothesis test hypothesis test
hypothesis test
Unsa Shakir
 
Testing of Hypothesis using Z dist..pptx
Testing of Hypothesis  using Z dist..pptxTesting of Hypothesis  using Z dist..pptx
Testing of Hypothesis using Z dist..pptx
UVAS
 

Recently uploaded (20)

Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATAAWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
Taqyea
 
spssworksho9035530-lva1-app6891 (1).pptx
spssworksho9035530-lva1-app6891 (1).pptxspssworksho9035530-lva1-app6891 (1).pptx
spssworksho9035530-lva1-app6891 (1).pptx
clarkraal
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Microsoft Excel: A Comprehensive Overview
Microsoft Excel: A Comprehensive OverviewMicrosoft Excel: A Comprehensive Overview
Microsoft Excel: A Comprehensive Overview
GinaTomarongRegencia
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
Process Mining at AE - Key success factors
Process Mining at AE - Key success factorsProcess Mining at AE - Key success factors
Process Mining at AE - Key success factors
Process mining Evangelist
 
How to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process miningHow to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process mining
Process mining Evangelist
 
Collibra DQ Installation setup and debug
Collibra DQ Installation setup and debugCollibra DQ Installation setup and debug
Collibra DQ Installation setup and debug
karthikprince20
 
AWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdfAWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATAAWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
Taqyea
 
spssworksho9035530-lva1-app6891 (1).pptx
spssworksho9035530-lva1-app6891 (1).pptxspssworksho9035530-lva1-app6891 (1).pptx
spssworksho9035530-lva1-app6891 (1).pptx
clarkraal
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Microsoft Excel: A Comprehensive Overview
Microsoft Excel: A Comprehensive OverviewMicrosoft Excel: A Comprehensive Overview
Microsoft Excel: A Comprehensive Overview
GinaTomarongRegencia
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
How to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process miningHow to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process mining
Process mining Evangelist
 
Collibra DQ Installation setup and debug
Collibra DQ Installation setup and debugCollibra DQ Installation setup and debug
Collibra DQ Installation setup and debug
karthikprince20
 
AWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdfAWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
Ad

Lecture_Hypothesis_Testing statistics .pptx

  • 1. Statistical Inference and Hypothesis Testing by Dr. Priyanka Dixit TISS, Mumbai
  • 2. Descriptive and Inferential Statistics • Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, pattern might emerge from the data. It do not, however, allow us to make conclusions beyond the data we have analysed or reach conclusions regarding any hypotheses we might have made. • It is applicable to properly describe data through statistics and graphs.
  • 3. Inferential Statistics • Inferential statistics are techniques that allow us to use these samples to make generalization about the populations from which the samples were drawn.
  • 4. Statistical Inference The process of generalization in prescribed manner from a sample to its universe is known as Statistical Inference. Universe/ Population µ σ SAMPL E Population Parameters µ: Population mean σ: Population standard deviation Sample Statistic x: Sample mean s: Sample standard deviation X s
  • 5. Statistical Inference • Inductive Inference: Extension from particular to the general is called inductive inference. • Inductive inference involves element of uncertainty in the conclusions.
  • 6. • Deductive Inference • Deductive inference can be described as a method of deriving information from the accepted facts, involves no uncertainty in the conclusions. The conclusions reached by deductive inference are conclusive.
  • 7. Population and Sample • The population is an abstract term that refers to the totality of all conceptually possible observations, measurements or outcomes of some specified kind. • The number of conceptually possible observations is called the size of the population. • The size varies according to the population being investigated.
  • 8. Contd… • For example, a study of monthly income may be conducted at a district, state and country level. • So, in the first case, the population will consist of the income of one district, all residents of the state in the second case and in the third case income of all citizens of the country. • A population may be finite when it consists of a given number of observations and infinite when it includes infinite number of observations.
  • 9. Sample • A sample is a set of observations selected from the population. • The number of observations included in the sample is called the size of the sample. • In finite population, a random sample is obtained by giving every individual in the population an equal chance of being chosen. • In case of infinite population, a sample is random if each observation is independent of every other observation.
  • 10. Parameter/Statistics • Population and samples are studied through their characteristics. The most important of these characteristics are the Mean, the Variance and the Standard deviation. • The characteristics of a population are called parameters. • The characteristics of sample are called statistics.
  • 11. Parameters (Population) Statistics (Sample) Population Mean Sample Mean Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation
  • 12. • The purpose of statistical inference is to make a judgment about the particular parameters on the basis of sample statistics. • The judgment relating to population parameters are of two types; one is related to estimation of a parameter, the other with testing hypothesis about the parameter.
  • 13. Hypothesis Testing Hypothesis testing in inferential statistics involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population. The hypothesis is tested against the information provided by sample in the form of a test-statistic. What is Statistical Hypothesis? A Hypothesis is a statement about one or more population parameters.
  • 14. Null Hypothesis What is null hypothesis? A null hypothesis (H0) is a hypothesis of no relationship or no difference. Steps in hypothesis testing 1. State the Hypothesis 2. Set the criterion for rejecting H0 3. Compute the test statistic 4. Decide whether to reject H0
  • 15. 1. State the Hypothesis In inferential statistics, the term hypothesis has a very specific meaning: conjecture about one or more population parameters. The hypothesis to be tested is called the null hypothesis and is given the symbol H0. Example: We use a null hypothesis that the mean quantitative GRE score of the population of MPH students is 455. Thus, our null hypothesis, written in symbols, is H0: µ = 455 OR H0: µ-455 = 0 Where µ = population mean 455= Hypothesis value to be tested
  • 16. We test the null hypothesis (H0) against the alternative hypothesis (symbolized H1), which includes the possible outcomes not covered by the null hypothesis. For the above example we will use the alternative hypothesis as H1 : µ ≠ 455 The alternative hypothesis, often considered the research hypothesis, can be supported only be rejecting the null hypothesis.
  • 17. 2. Set the Criterion for Rejecting H0 After stating the hypothesis the next step in hypothesis testing is determining how different the sample statistic must be from the hypothesized population parameter (µ) before the null hypothesis can be rejected. For our example, suppose we randomly select 144 MPH students from the population and find the sample mean to be 535. Is this sample mean =535 sufficiently different from what we hypothesize for the population mean (µ = 455) to warrant rejecting null hypothesis. Before answering this question, we need to consider three concepts: (i) errors in hypothesis testing, (ii) level of significance, and (iii) Region of rejection
  • 18. Properties of Normal Distribution 8. The areas of a normal curve are measured in standard deviation units. The proportions of cases in specified areas of a normal curve, as marked by standard deviations, are constant as detailed below: Number of standard Results lying outside deviation from mean this (%) 1.00 31.74 1.64 10.00 1.96 5.00 2.58 1.00 3.29 0.10
  • 19. i. Errors in hypothesis testing When we decide to reject or not reject the null hypothesis, there are four possible situations: a. A true hypothesis is rejected. b. A true hypothesis is not rejected. c. A false hypothesis is not rejected d. A false hypothesis is rejected
  • 20. In a specific situation, we may make one of two types of errors, as shown in the figure below: Decision made State of nature Null hypothesis is true Null hypothesis is false Reject null hypothesis Type I error Correct decision Do not reject null hypothesis Correct decision Type II error
  • 21. Example Verdict of Jury Defendant Guilty Innocent Not Guilty Incorrect Correct decision Guilty Correct decision Incorrect
  • 22. Contd… Errors Type I error is when we reject a true null hypothesis. Type II error is when we do not reject a false null hypothesis
  • 23. ii. Level of significance • To choose the criterion for rejecting H0, the researcher must first select what is called the level of significance. • The level of significance or alpha (α) level is defined as the probability of making a Type I error when testing a null hypothesis. • The level of significance is the probability of making a Type I error: rejecting H0 when it is true.
  • 24. Power of the Test • Type II error involves acceptance of H0 when it is actually false or not finding an effect when actually there is an effect. • β is the probability of type II error. • (1-β) is called the power of the test= Probability of finding an effect when actually there is an effect. • Power of a statistical test is analogous to the sensitivity of a diagnostic test. • α being the false positive. • β being the false negative.
  • 25. iii. Region of Rejection • The region of rejection is the area of the sampling distribution that represents those values of the sample mean that are improbable if the null hypothesis is true. • The Critical values of the tests statistic are those values in the sampling distribution that represent the beginning of the region of rejection. • When the alternative hypothesis is non-directional, the region of rejection is located in both tails of the sampling distribution. The test of the null hypothesis against this non- directional alternative is called a two-tailed test. • The probability of obtaining a mean as extreme as or more extreme than the observed sample mean (xbar), given that the null hypothesis is true, is called the p-value of the test or p.
  • 26. Properties of Normal Distribution 8. Properties of Normal Distribution The areas of a normal curve are measured in standard deviation units. The proportions of cases in specified areas of a normal curve, as marked by standard deviations, are constant as detailed below: Number of standard Results lying outside deviation from mean this (%) 1.00 31.74 1.64 10.00 1.96 5.00 2.58 1.00 3.29 0.10
  • 27. Region of rejection for sampling distribution of the mean for null hypothesis H0 : µ = 455 and S.D. (σx) = 8.33
  • 28. 3. Compute the Test Statistic In our example µ=455, the hypothesized value for the parameter n=144, the size of the sample = 535, the observed value for the sample statistic σ=100, the value of the standard deviation in the population First using the concept of z scores, we determine how Different is from µ, or the number of standard errors (standard deviation units) the observed sample value is from the hypothesized value. In symbols,
  • 29. calculating the z score using above formula is called computing the test statistic
  • 30. 4. Decide about H0 Suppose we had found that the sample mean for 144 students was not 535, but 465. Our hypotheses, sampling distribution, and critical values (+1.96 and -1.96) remain the same, but now the test statistic is
  • 31. In other words, the observed sample mean ( = 465) is 1.20 standard errors above the hypothesized value of the population mean.
  • 32. Theoretical sampling distribution for the hypothesis H0:µ=45, illustrating the values of the test statistic when =465 Note that the test statistic (1.20) does not exceed the critical value; it does not fall into the region of the rejection; and we should not reject the null hypothesis . -1.96 +1.96 1.20 9.60
  • 33. • This test statistic (1.20) is then compared to the critical value (1.96). • If the test statistic exceeds the critical values in absolute value, then the null hypothesis is rejected. • If the test statistic does not exceeds the critical values in absolute value, then the null hypothesis is accepted.
  • 34. Region of rejection : Directional Alternative Hypothesis In the GRE example, we tested the null hypothesis against a non-directional alternative: H0 : µ = 455 H1 : µ ≠ 455 This test is called two-tailed or non-directional because the region of rejection was located in both tails of the sampling distribution of the mean. Suppose a direction of the results is anticipated. A directional hypothesis states that a parameter is either greater or less than the hypothesis value. For instance, in the GRE example we might use the alternative hypothesis that the mean GRE level of our population is greater than 455, in symbols, H0 : µ = 455 H1 : µ > 455
  • 35. An alternative hypothesis can be either non-directional or directional. A directional alternative hypothesis states that the parameter is greater than or less than the hypothesized value. A non-directional alternative hypothesis merely states that the parameter is different from (not equal to) the hypothesized value.
  • 36. The test of the null hypothesis against a directional alternative is called a one-tailed test, the region of rejection is located in one of the two tails of the sampling distribution. The specific tail of the distribution is determined by the direction of the alternative hypothesis. Now suppose the alternative hypothesis states that the mean GRE was less than 455. In symbols, the hypotheses are H0 : µ = 455 H1 : µ < 455 Here the critical region lies on the left tail of the distribution.
  • 37. Type-I and Type-II Errors in Decision Making In a specific situation, we may make one of two types of errors, as shown in the figure below: Decision taken by the investigator Existing Reality Group A=Group B Group A # Group B Group A # Group B P[ Type-I Error] (Level of significance) Correct Decision (Power of the study) Group A=Group B Correct Decision (Level of confidence) Type – II Error
  • 38. Testing of Hypothesis Q=1 A random sample of 100 observations from a population with standard deviation 60 yielded a sample mean of 100. (a) Test the null hypothesis that µ=100 against the alternative hypothesis (µ≠100) using α=0.05. (b) Test the null hypothesis that µ=100 against the alternative hypothesis (µ>100) using α=0.05
  • 39. Testing of Hypothesis Ex=1 A random sample of 200 observations from a population with standard deviation 80 yielded a sample mean of 150. (a) Test the null hypothesis that µ=100 against the alternative hypothesis (µ≠100) using α=0.05. (b) Test the null hypothesis that µ=100 against the alternative hypothesis (µ>100) using α=0.05
  • 40. • Ex=2 A random sample of 100 observations from a population with standard deviation 60 yielded a sample mean of 100. • (a)Test the null hypothesis that µ=111 against the alternative hypothesis (µ≠111) using α=0.05. • (b) Test the null hypothesis that µ<=111 against the alternative hypothesis (µ>111) using α=0.05 • Explain why the results differ.
  • 42. Q=2 The heights of 10 males of a given locality are found to be as follows: 70, 67, 62, 68, 61, 68, 70, 64, 64, 66 inches. Is it reasonable to believe that the average height is greater than 64 inches? What will be the finding if alternative hypothesis was two-tailed
  • 43. Contd.. Answer Mean=66; S.D.=3.16 and Variance=10.00, t=2.00 • The tabulated value of t-statistic at 9 d.f. and α=0.05 (one-tailed) is 1.833 • Since calculated value is greater than the tabulated value, we will reject the null hypothesis. We can believe that mean height is greater than 64 inches. • What will be the finding if alternative hypothesis was two-tailed (answer it).
  • 44. Student’s t Distributions Does the adjustment of using s to estimate σ have an effect on the statistical test? Actually, it does, especially for small samples. The effect is that the normal distribution is inappropriate as the sampling distribution of the mean. In the beginning of the 20th century William S. Gosset found that, for small samples, sampling distribution departed substantially from the normal distribution and that, as sample sizes changed, the distributions changed. This gave rise to not one distribution but a family of distributions. The t distributions are a family of symmetrical, bell-shaped distributions that change as the sample size changes.
  • 45. Degrees of Freedom Degrees of Freedom : The number of degrees of freedom is a mathematical concept defined as the number of observations less the number of restrictions placed on them.
  • 46. Student’s t distribution for 1, 2, 5, 10, and ∞ degrees of freedom
  • 48. Point Estimates and Interval Estimates A point estimate is a single value that represent the best estimate of the population value. If we are estimating the mean of a population (µ), then the sample mean is the best point estimates. Interval Estimation builds on points estimation to arrive at a range of values that are tenable for the parameter and that define an interval we are confident contains the parameter.
  • 49. Confidence Interval CI= ± (ZCV) (σX) Where = Sample mean ZCV = Critical value using the normal distribution and σX = Standard error of the mean
  • 50. Confidence Interval CI= ± (tCV) (sX) Where = Sample mean tCV = Critical value using appropriate t distribution and sX = estimated standard error of the mean from the sample
  • 51. Comparison of Two Means • Q=As part of an investigation of the development of infant sleep patterns, the sleep of 20 infants (10 male and 10 female) was monitored on several occasions between 1 week and 6 months of age. The quiet sleep results (in minutes) at 1 week of age for the 20 study infants follow. • Is there evidence of a difference in quiet sleep behavior between two genders? • Is there evidence that male mean quiet sleep behavior is higher than female? Quiet sleep (male) 85 129 215 143 44 173 230 198 105 127 Mean= 144.90 Quiet sleep (female) 140 155 33 209 166 72 116 131 97 124 Mean= 124.30
  • 52. Sp is pooled variance, Sm^2 and Sf^2 is variance of two sample set
  • 53. Contd… Answer For male; S1=59.35; S1 2=3522.54; Mean=144.90 For female; S2=49.48; S2 2=2448.011; Mean=124.30 • t=0.843 at 18 d.f.
  • 54. Paired-t-test • As part of a study to determine the effects of a certain oral contraceptive on weight gain; nine healthy females were weighted at the beginning of a course of oral contraceptive use. They were reweighed after 3 months. Results are given below. Do the results suggest evidence of weight gain? • Longitudinal Study/Real-Cohort Study Subject Initial weight (LBS) 3 - Months weight (LBS) 1 120 123 2 141 143 3 130 140 4 150 145 5 135 140 6 140 143 7 120 118 8 140 141 9 130 132
  • 55. • Contd… Answer • t=1.509 • One-tailed • Tabulated value of t at α=0.05 and d.f. =8 is 1.860 (one- tailed).
  • 56. Male Female 42.1 41.3 42.4 43.2 41.8 42.7 43.8 42.5 43.1 44.0 41.0 41.8 42.8 42.3 42.7 43.6 43.3 43.5 41.7 44.1 Do the data provide sufficient evidence to conclude that, on the average, the male weight is greater than female weight? Perform the required hypothesis test at the 5% level of significance.
  • 57. Proportion Test • Q=1 In a sample of 1000 people in Maharashtra, 540 are rice eaters and the rest are wheat eaters. Can we assume that both rice and wheat are equally popular in this state at 1% level of significance? Z tabulated at 1% level of significance is 2.58 (two-tailed). Q=2 Twenty people were attacked by a disease and only 18 survived. Will you reject the hypothesis that the survival rate, if attacked by this disease, is 85% in favour of the hypothesis that it is more, at 5% level. Z tabulated at 5% level of significance is 2.58 (one-tailed).
  • 58. Q=3 In a year there are 956 births in a town A of which 52.5% were males, while in towns A and B combined, this proportion in a total of 1406 births was 0.496. Is there any significant difference in the proportion of male births in the two towns? Z tabulated at 5% level of significance is 1.96 (two-tailed).
  • 59. References • Medical Statistics-Principles & Methods by K.R. Sundaram, S. N. Dwivedi and V Sreenivas.