SlideShare a Scribd company logo
Inferential Statistics
Why do you need inferential statistics?
 Making an inference about the population from a sample.
 Allow us to use the information learned from descriptive
statistics.
 It extends beyond the immediate data.
 Used to infer from sample data what is the population might
think.
 Used to make judgements about the probability that an observed
difference between groups is a dependable one or one that might
have happened by chance in the study.
Inferential Statistics
Statistical inference is the procedure by which we reach a conclusion about a population
on the basis of the information contained in a sample drawn from that population.
Statistical Inference (Two General Areas)
 Estimation
 Hypothesis Testing
Estimation process use sample data to calculate some statistics that serves as an
approximation of the corresponding parameter of the population from which the
sample was drawn.
Types of estimate
Point estimate: A point estimate is a single numerical value used to estimate the
corresponding population parameter.
Interval estimate: An interval estimate consists of two numerical values defining a range
of values that, with a specified degree of confidence, most likely includes the parameter
being estimated.
Inferential Statistics
A single computed value is referred to as an estimate. The rule that tells us
how to compute this value, or estimate, is referred to as an estimator.
Estimators are usually presented as formulas. For example,
is an estimator of the population mean, µ. The single numerical value that
results from evaluating this formula is called an estimate of the parameter
µ.
In many cases, a parameter may be estimated by more than one estimator.
One of the desired property of a good estimator is unbiasedness.
An estimator, say, T, of the parameter θ is said to be an unbiased estimator
of θ if E(T) = θ.
Inferential Statistics
The sampled population is the population from which one actually
draws a sample.
The target population is the population about which one wishes to
make an inference.
Only when the target population and the sampled population are the
same it is possible for one to use statistical inference procedures to
reach conclusions about the target population. If the sampled
population and the target population are different, the researcher
can reach conclusions about the target population only on the basis
of nonstatistical considerations like extrapolating findings to the
target population.
Inferential Statistics
CONFIDENCE INTERVAL FOR A POPULATION MEAN
Suppose a researchers wishes to estimate the mean of
some normally distributed population.
Draw a random sample of size n from the normally
distributed population and compute , which is used as a
point estimate of µ. Random sampling inherently
involves chance, cannot be expected to be equal to µ.
It would be much more meaningful, therefore, to
estimate µ by an interval that somehow communicates
information regarding the probable magnitude of µ.
Inferential Statistics
CONFIDENCE INTERVAL FOR A POPULATION MEAN
If sampling is from a normally distributed
population, the sampling distribution of the sample
mean will be normally distributed with a mean
equal to the population mean µ, and a variance is
equal to . From our knowledge of normal
distributions, approximately 95% of the possible
values of constituting the distribution are within
two standard deviations of the mean. The two
points that are two standard deviations from the
mean will contain approximately 95%.
Inferential Statistics
Example:
Suppose a researcher, interested in obtaining an estimate of the average level of some
enzyme in a certain human population, takes a sample of 10 individuals, determines the
level of the enzyme in each, and computes a sample mean of . Suppose further it is
known that the variable of interest is approximately normally distributed with a variance of
45. We wish to estimate µ.
Inferential Statistics
An approximate 95% confidence interval for µ is given by
Interval Estimate Components:
The interval estimate contains in its centre the point estimate of µ. The 2 we recognize as a
value from the standard normal distribution that tells us within how many standard errors
lie approximately 95% of the possible values of . This value of z is referred to as the
reliability coefficient. The last component, , is the standard error, or standard deviation of
the sampling distribution of . In general, then, a interval estimate may be expressed as
follows:
Sampling from a normal distribution with known variance, an interval estimate for µ is
where is the value of z to the left of which lies and to the right of which lies of the area
under its curve.
Inferential Statistics
Interpreting Confidence Intervals:
How do we interpret the interval given by Expression ?
In the present example the reliability coefficient is equal to 2. We say
that in repeated sampling approximately 95% of the intervals
constructed by the above expression will include the population
mean. This interpretation is based on the probability of occurrence of
different values of . We may generalize this interpretation by
designating the total area under the curve of that is outside the
interval as and the area within the interval as 1- .
𝛂 𝛂
Inferential Statistics
Probabilistic Interpretation: In repeated sampling, from a normally distributed
population with a known standard deviation, 100(1- ) percent of all intervals of
𝛂
the form will in the long run include the population mean µ.
The quantity (1- ), in this case 0.95, is called the
𝛂 confidence coefficient (or
confidence level), and the interval is called a confidence interval for µ. When
(1- )=0.95,
𝛂 the interval is called the 95% confidence interval for µ.
Practical Interpretation: When sampling is from a normally distributed
population with known standard deviation, we are 100(1- ) percent confident
𝛂
that the single computed interval, , contains the population mean.
Precision: The quantity obtained by multiplying the reliability factor by the
standard error of the mean is called the precision of the estimate. This quantity
is also called the margin of error.
Inferential Statistics
Sampling from Nonnormal Populations:
As noted, it will not always be possible or prudent to assume that the
population of interest is normally distributed. Thanks to the central limit
theorem, this will not deter us if we are able to select a large enough sample.
We have learned that for large samples, the sampling distribution of is
approximately normally distributed regardless of how the parent population is
distributed.
Example: Punctuality of patients in keeping appointments is of interest to a
research team. In a study of patient flow through the offices of general
practitioners, it was found that a sample of 35 patients was 17.2 minutes late
for appointments, on average. Previous research had shown the standard
deviation to be about 8 minutes. The population distribution was felt to be
nonnormal. What is the 90% confidence interval for µ, the true mean amount
of time late for appointments?
Inferential Statistics
Since the sample size is fairly large (greater than 30), and since the population
standard deviation is known, we draw on the central limit theorem and
assume the sampling distribution of to be approximately normally
distributed. From the normal distribution Table, we find the reliability
coefficient corresponding to a confidence coefficient of 0.90 to be about
1.645. The standard error is
Therefore, 90% percent confidence interval for µ is
Frequently, when the sample is large enough for the application of the central
limit theorem, the population variance is unknown. In that case, we use the
sample variance as a replacement for the unknown population variance in the
formula for constructing a confidence interval for population mean.
t-distribution
One does not have knowledge of the population mean and
variance, and cannot use the statistic to construct Confidence
intervals of the mean. The statistic,
is normally distributed when the population is normally
distributed and is at least approximately normally distributed
when n is large, regardless of the functional form of the
population, we cannot make use of this fact because is
𝜎
unknown.
t-distribution
Most logical solution, we use the sample Standard Deviation (SD),
as an approximation of . This is justifiable when n ≥ 30 and also the use of
𝜎
normal distribution theory to construct confidence interval for
When we have a small sample an alternative procedure for constructing a
confidence interval is the use of Student’s t distribution or simply t
distribution.
William S Gosset was a statistician employed by the Guinness brewing
company which had stipulated that he cannot publish his own name.
(GOSSET -pseudonym of student) student’s t distribution.
t-distribution
Properties of t distribution:
1. It has a mean = 0
2. symmetrical about the mean.
3. In general, it has a variance greater than 1, but the
variance approaches 1 as the sample size becomes
large. For degree of freedom (df) > 2, the variance of t
distribution is df/(df-2), since here df = n - 1 for n > 3
we may write the variance of the t distribution
= (n-1)/(n-3)
4. Variable t ranges from -∞ to +∞
5. Compared to the normal distribution the t distribution
is less peaked in the centre and has higher tails.
6. The t distribution approaches the normal distribution
as n - 1 approaches infinity
t-distribution
Confidence interval using t
Source of reliability coefficient is different
When sampling is from a normal distribution whose standard
deviation , is unknown the 100(1-𝛼) percent confidence interval for
the population mean µ is given by
Applicable: strictly valid when the sample be drawn from a normal distribution
t-distribution: Applicability
When to use t distribution:
The t distribution can be used with any statistic having a bell-shaped distribution
(approximately normal).
The sampling distribution of a statistic should be bell-shaped. If any of the following
conditions apply:
The population distribution is normal.
The population distribution is symmetric, unimodal, and without outliers, and the
sample size is at least 30.
The population distribution is moderately skewed, unimodal, without outliers and the
sample size is at least 40.
The sample size is greater than 40, without outliers
The t distribution should not be used with small samples from populations that are
not approximately normal.
t-distribution: Example
A researcher conducted a study to evaluate the effect of on job body
mechanism instruction on the work performance of newly employed young
workers. He used two randomly selected groups of subjects, an experimental
group and a control group. The experimental group received one hour of back-
school training provided by an occupational therapist. The control group did
not receive this training. A criterion-referenced body mechanics evaluation
checklist was used to evaluate each worker’s lifting, lowering, pulling and
transferring of objects in the work environment. A correctly performed task
received a score of 1. The 15 control subjects made a mean score of 11.53 on
the evaluation with a standard deviation of 3.681. We assume that these 15
controls behave as a random sample from a population of similar subjects. We
wish to use these sample data to estimate the mean score for the population.
Standard normal and t-distribution Tables
t-distribution: Example
Ans: We may use the sample mean 11.53 as a point estimate of the population mean but since
the population standard deviation is unknown, we must assume that the population of values
to be at least approximately normally distributed before constructing a confidence interval for
µ. Let us assume that such an assumption is reasonable and that a 95% confidence interval is
desired. We have our estimator and our standard error is We need now to find the
x̄
reliability coefficient, the value of t associated with a confidence coefficient of 0.95 and n-1
=14 degree of freedom. Since a 95% confidence interval leaves 0.05 of the area under the
curve of t to be equally divided between the two tails, we need the value of t to the right of
which lies 0.025 of the area.
From T distribution table at 14 degrees of freedom: t0.975 = 2.1448
Hence the 95% confidence interval
11.53 ± 2.1448 × (0.9504) = 11.53 ± 2.04 = [9.49, 13.57]
This interval may be interpreted from both the probabilistic and practical points of view. We
are 95% of confidence that the true population mean µ is somewhere between 9.49 and 13.57
because, in repeated sampling, 95% of intervals constructed in like manner will include µ.
Deciding between z and t
When we construct a confidence interval for a population mean, we must decide whether to use
a value of z or a value of t as the reliability factor. To make an appropriate choice we must
consider sample size, whether the sampled population is normally distributed, and whether the
population variance is known.
Confidence interval for the difference between two population means
Confidence interval for the difference between population means provides information
that is helpful and deciding whether or not it is likely that the two population means are
equal. When the constructed interval does not include zero, we say that the interval
provides evidence that the two-population means are not equal. When the interval
includes zero, we say that the population mean may be equal.
Sampling from Normal Populations with known variences
Population variance are known the 100(1-)% confidence interval for µ1 - µ2 is given
Sampling from Non-normal Populations
The construction of a confidence interval for the difference between two population
means when sampling is from non-normal populations proceeds in the same manner as
sampling from normal populations if the sample sizes n1 and n2 are large. Again, this is a
result of the central limit theorem. If the population variances are unknown, we use the
sample variances to estimate them.
Confidence interval for the difference between two population means
Example: Despite common knowledge of the adverse effects of
doing so, many women continue to smoke while pregnant. A
researcher examined the effectiveness of a smoking cessation
program for pregnant women. The mean number of cigarettes
smoked daily at the close of the program by the 328 women who
completed the program was 4.3 with a standard deviation of 5.22.
Among 64 women who did not complete the program, the mean
number of cigarettes smoked per day at the close of the program
was 13 with a standard deviation of 8.97. We wish to construct a 99
percent confidence interval for the difference between the means of
the populations from which the samples may be presumed to have
been selected.
Confidence interval for the difference between two population means
No information is given regarding the shape of the distribution of cigarettes smoked per day.
Since our sample sizes are large, however, the central limit theorem assures us that the sampling
distribution of the difference between sample means will be approximately normally distributed
even if the distribution of the variable in the populations is not normally distributed. We may
use this fact as justification for using the z statistic as the reliability factor in the construction of
our confidence interval. Also, since the population standard deviations are not given, we will use
the sample standard deviations to estimate them. The point estimate for the difference between
population means is the difference between sample means, 4.3 – 13.0 = - 8.7. From normal
distribution Table, we find the reliability factor to be 2.58. The estimated standard error is
Our 99 percent confidence interval for the difference between population means is
- 8.7 ± 2.58 (1.1577) = [- 11.7; - 5.7]
We are 99 percent confident that the mean number of cigarettes smoked per day for women
who complete the program is between 5.7 and 11.7 lower than the mean for women who do
not complete the program.
Confidence interval for the difference between two population means
t distribution and the difference between means
when population variances are unknown, and we wish to estimate
the difference between two population means with a confidence
interval we can use the t distribution as a source of reliability factor
if certain assumptions are met.
We must know or willing to assume, that the two sampled
populations are normally distributed.
Regarding unknown population variances, two situations may occur:
Situation–I: population variances are equal,
Situation–II: population variances are not equal.
Situation–I: population variances are equal
If the assumption of equal population variance is justified, the two samples may be
considered as the estimates of the same quantity, the common variance. Obtain a
pooled estimate of the common variance. Pooled variance is the weighted average of
the sample variances. Sample variances are weighted by their degree of freedom
100(1 - α) percent confidence interval for µ1 - µ2
degree of freedom used in determining the value of t in n1 + n2 – 2
Example: population variances are equal
The purpose of a researcher study was to determine the effect of long term
exercise intervention on corporate executives enrolled in a supervised
fitness program. Data were collected on 13 subjects (the exercise group)
who voluntarily entered a supervised exercise program and remained active
for an average of 13 years and 17 subjects (the secondary group) who
elected not to join the fitness program. Among the data collected on the
subjects was the maximum number of sit-ups completed in 30 seconds. The
exercise group has a mean and standard deviation for this variable of 21.0
and 4.9 respectively. The mean and standard deviation for the sedentary
group were 12.1 and 5.6 respectively. We assume that the two populations
of overall muscle condition measures are approximately normally
distributed and that the two population variances are equal. We wish to
construct a 95% confidence interval for the difference between the means
of the populations represented by these two samples.
Example: population variances are equal
Pooled estimate of the common population variance
from t distribution table (13+17-2) = 28 degree of freedom and desired 0.95 confidence
interval, reliability factor 2.0484
Confidence interval = (21.0 - 12.1) 2.0484 8.9 4.0085 =
We are 95% confident that the difference between the population mean is somewhere
between 4.9 and 12.9. We can say this because we know that if we were to repeat the
study many, many times and compute confidence intervals in the same way, about 95% of
the intervals would include the difference between the population mean.
Since the interval does not include zero, we conclude that the population means are not
equal.
We can interpret this interval that the difference between the two population means is
estimated to be 8.9 and we are 95% confident that the true value lies between 4.9 and
12.9.
Confidence interval for the difference between two population means
Population variance is not equal
When one is unable to conclude that the variances of two populations of
interest are equal even though the two populations may be assumed to
be normally distributed, it is not proper to use t distribution.
Solutions has been proposed by many researchers. But the problem
resolves around the fact that the quality
does not follow t-distribution with n1 + n2 - 2 degree of freedom when
the population variances are not equal.
Confidence interval for the difference between two population means
Population variance is not equal
The solution proposed by Cochran consists of completing the reliability factor by the
following formula:
Where for n1 - 1 degrees of freedom, and for n2 - 1 degrees of freedom.
An approximate 100(1- ) percent confidence interval for µ
𝛼 1 - µ2 is given by
Example: population variances are not equal
The purpose of a research study was to determine the effect of long-term
exercise intervention on corporate executives enrolled in a supervised
fitness program. Data were collected on 13 subjects (the exercise group)
who voluntarily entered a supervised exercise program and remained
active for an average of 13 years and 17 subjects (the secondary group)
who elected not to join the fitness program. Among the data collected on
the subjects were the maximum number of sit-ups completed in 30
seconds. The exercise group has a mean and standard deviation for this
variable of 21.0 and 4.9 respectively. The mean and standard deviation
for the sedentary group were 12.1 and 5.6 respectively. We assume that
the two populations of overall muscle condition measures are
approximately normally distributed and that the two population
variances are not equal. We wish to construct a 95% confidence interval
for the difference between the means of the populations represented by
these two samples.
Example: population variances are not equal
We will use Cochran reliability factor t’. From t distribution Table with 12 degrees of freedom
and
. Similarly, with 16 degrees of freedom and . We now compute
we now construct the 95 percent confidence interval for the difference between the two
population means.
Since the interval does not include zero, we conclude that the population means are not equal.
We can interpret this interval that the difference between the two population means is
estimated to be 8.9 and we are 95% confident that the true value lies between 7.9348 and
16.8348.
Determination of sample size for estimating mean
Planning of any survey experiment - How large a sample to take?
Larger than needed - wasteful of resource
Smaller than needed - lead to a result of no practical use
objective:- Interval estimation should have
1) a narrow interval
2) high reliability.
Total width of the interval is twice the magnitude of the quality :
Increasing reliability means a larger reliability coefficient---> increase interval.
Determination of sample size for estimating mean
Fixed reliability coefficient and reduce standard error
standard error = , is fixed the only way is to increase n --> take a
larger sample
How large?--> depends on the desired degree of reliability and the
desired interval width
sampling with replacement from an infinite or sufficiently large
population formula
sampling from small finite population without replacement formula
Determination of sample size for estimating mean
Sample size estimation formulas require the and population variance
is unknown.
The most frequently used source for estimation of are:
A pilot or preliminary sample may drawn from the population and
computed sample variance(S2
) may be used to estimate . Observations
used in the pilot sample may be computed on a part of the final
sample:
n(the computed sample size) - n1(the pilot sample size)
= n2(the number of observation needed to satisfy the total sample size
requirement)
Estimates of Sigma square may be available from previous or similar
studies.
Inference about population variance
Confidence interval for the variance of a normally
distributed population
Distribution normal? - used sample variance as an
approximate estimator of population variance
Wonder about the quality? - check whether the sample
variance is an unbiased estimator of population variance.
To be unbiased - average value of the sample variance over
all possible sample must be equal to the population variance.
E() =
Inference about population variance
Draw all possible samples of size two from the
population consisting of the values 6, 8, 10, 12 and
14.
If we compute the sample variance
For each of the possible samples, we obtain the
sample variances as shown in the table
Sampling with replacement
E(s2
), the expected value of the mean of the sample
variance, (0 + 2 + ….. + 2 + 0)/25 = 8
Hence E(s2
) =
S E C O N D D R A W
6 8 10 12 14
F
I
R
S
T
D
R
A
w
6
8
10
12
14
0
2
8
18
32
2
0
2
8
18
8
2
0
2
8
18
8
2
0
2
32
18
8
2
0
Inference about population variance
Sampling without replacement: the expected value of s2
,
(0 + 2 + ….. + 2 + 0)/10 = 10
Then E(s2
) = where sampling is with replacement. Results justify the use of for computing
the sample variance.
E(s2
) when sampling is without replacement.
Interval estimation of a population variance
Success depends on our ability to find an approximate sampling distribution
Confidence interval for are usually based on the sampling distribution of
If sample of size n are drawn from a normally distributed population, this quality [ ] has a
distribution known as Chi square distribution with n - 1 degrees of freedom.
To obtain a 100(1-α)% confidence interval for we select values of from the table in such a
way that α/2 is to the left of smaller value of and α/2 is to be the right of the larger values
of .
Inference about population variance
The 100(1-α) % confidence interval for is
Confidence interval for , population standard deviation is
Method is widely used but have some draw back. Normality of the population is crucial.
Estimator is not in the centre of the confidence interval because distribution is not
symmetric.
Inference about population variance
A random sample of 20 nominally measured 2 mm diameter steel ball bearings is
taken and the diameters are measured precisely. The measurements, in mm, are
as follows:
2.02 1.94 2.09 1.95 1.98 2.00 2.03 2.04 2.08 2.07
1.99 1.96 1.99 1.95 1.99 1.99 2.03 2.05 2.01 2.03
Assuming that the diameters are normally distributed with unknown mean  and
unknown variance 2
,
a) Find a 2-sided 95% confidence interval for the variance 2
b) Find a 2-sided confidence interval for the standard deviation 
Inference about population variance
From the data we calculate and , and .
Hence,
There are 19 degrees of freedom and the critical values of the Chi-square
distribution and
The confidence interval for 2
is
The confidence interval for  is
Inferential Statistics-Part-I mtech.pptx
Inference about population variances
Confidence interval for the ratio of the variances of the two normally distributed populations:
One way of comparing two variances is to compute their ratio
To use t distribution for constructing a confidence interval for the difference between population
means requires that the population variances be equal. If the two variances are equal, the ratio
will be one. If the confidence interval for the ratio of two populations variances includes 1, we
conclude that the two populations variances, may in fact, be equal.
This is a form of inference, and we must rely on some sampling distribution. This time the
distribution of is utilised provided the following assumptions are met.
Assumption:
1. and are computed from independent samples of size n1 and n2
2. Sample have been drawn from two normally distributed populations
3. let
If the assumption are met, follows a F distribution
Inference about population variances
F-distribution: If U and V are independent chi-square random variables with r1 and r2
degrees of freedom, respectively, then: follows an F distribution with r1 numerator degrees
of freedom and r2 denominator degrees of freedom.
This F distribution depends on two-degrees-of-freedom values, one corresponding to the
value of n1 – 1 used in computing, and the other corresponding to the value of n2 – 1 used in
computing . They are usually referred to as the numerator degree of freedom and
denominator degree of freedom.
A confidence interval for at 100(1-α) % confidence is constructed by
the values from F table to the left lies α/2 of the area
the value from F table to the right lies α/2 of the area
Inference about population variances
The values of at the intersection of the column headed df1 and the row labelled df2. If we
have extensive table of F distribution, finding would be no trouble. To include every
possible percentile of F would make a very lengthy table. A relationship exists to compute
Inference about population variances
Problem#
The variability in the thickness of oxide layers in
semiconductor wafers is a critical characteristic, where
low variability is desirable. A company is investigating two
different ways to mix gases so as to reduce the variability
of the oxide thickness. We produce 16 wafers with each
gas mixture and our results indicate that the standard
deviation is s1 = 1.96Å and s2 = 2.13Å for the two
mixtures. What is the 95% confidence interval for the ratio
between the two variances?
Inference about population variances
Given information:
Sample size of population 1: n1 = 16;
sample standard deviation for sample from population 1: s1 = 1.96;
Sample size of population 2: n2 = 16;
Sample standard deviation for sample from population 2: s2 = 2.13.
Since we are looking for a 95% confidence interval, we need two
f values:
Inference about population variances
Ad

More Related Content

Similar to Inferential Statistics-Part-I mtech.pptx (20)

Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
Avjinder (Avi) Kaler
 
statistics chapter 4 PowerPoint for accounting studens.ppt
statistics chapter 4 PowerPoint for accounting studens.pptstatistics chapter 4 PowerPoint for accounting studens.ppt
statistics chapter 4 PowerPoint for accounting studens.ppt
yotor520
 
Ch. 2 (B) Theory of Estimation ABOUT MARKETING
Ch. 2 (B) Theory of Estimation ABOUT MARKETINGCh. 2 (B) Theory of Estimation ABOUT MARKETING
Ch. 2 (B) Theory of Estimation ABOUT MARKETING
randevbros
 
Sampling
SamplingSampling
Sampling
Md Iqbal
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatistic
NeurologyKota
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docx
maxinesmith73660
 
5_lectureslides.pptx
5_lectureslides.pptx5_lectureslides.pptx
5_lectureslides.pptx
suchita74
 
Biostatics 8.pptx
Biostatics 8.pptxBiostatics 8.pptx
Biostatics 8.pptx
EyobAlemu11
 
Standard Error & Confidence Intervals.pptx
Standard Error & Confidence Intervals.pptxStandard Error & Confidence Intervals.pptx
Standard Error & Confidence Intervals.pptx
hanyiasimple
 
Estimating a Population Proportion
Estimating a Population ProportionEstimating a Population Proportion
Estimating a Population Proportion
Long Beach City College
 
Estimating a Population Proportion
Estimating a Population ProportionEstimating a Population Proportion
Estimating a Population Proportion
Long Beach City College
 
Introduction-to-Tests based on T-distribution.pptx
Introduction-to-Tests based on T-distribution.pptxIntroduction-to-Tests based on T-distribution.pptx
Introduction-to-Tests based on T-distribution.pptx
ShriramKargaonkar
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
Maria Theresa
 
6. point and interval estimation
6. point and interval estimation6. point and interval estimation
6. point and interval estimation
ONE Virtual Services
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.ppt
Ogunsina1
 
Chapter09
Chapter09Chapter09
Chapter09
rwmiller
 
Mpu 1033 Kuliah 9
Mpu 1033 Kuliah 9Mpu 1033 Kuliah 9
Mpu 1033 Kuliah 9
SITI AHMAD
 
Lec 5 statistical intervals
Lec 5 statistical intervalsLec 5 statistical intervals
Lec 5 statistical intervals
cairo university
 
M.Ed Tcs 2 seminar ppt npc to submit
M.Ed Tcs 2 seminar ppt npc   to submitM.Ed Tcs 2 seminar ppt npc   to submit
M.Ed Tcs 2 seminar ppt npc to submit
BINCYKMATHEW
 
Statistical inference with Python
Statistical inference with PythonStatistical inference with Python
Statistical inference with Python
Johnson Ubah
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
Avjinder (Avi) Kaler
 
statistics chapter 4 PowerPoint for accounting studens.ppt
statistics chapter 4 PowerPoint for accounting studens.pptstatistics chapter 4 PowerPoint for accounting studens.ppt
statistics chapter 4 PowerPoint for accounting studens.ppt
yotor520
 
Ch. 2 (B) Theory of Estimation ABOUT MARKETING
Ch. 2 (B) Theory of Estimation ABOUT MARKETINGCh. 2 (B) Theory of Estimation ABOUT MARKETING
Ch. 2 (B) Theory of Estimation ABOUT MARKETING
randevbros
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatistic
NeurologyKota
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docx
maxinesmith73660
 
5_lectureslides.pptx
5_lectureslides.pptx5_lectureslides.pptx
5_lectureslides.pptx
suchita74
 
Biostatics 8.pptx
Biostatics 8.pptxBiostatics 8.pptx
Biostatics 8.pptx
EyobAlemu11
 
Standard Error & Confidence Intervals.pptx
Standard Error & Confidence Intervals.pptxStandard Error & Confidence Intervals.pptx
Standard Error & Confidence Intervals.pptx
hanyiasimple
 
Introduction-to-Tests based on T-distribution.pptx
Introduction-to-Tests based on T-distribution.pptxIntroduction-to-Tests based on T-distribution.pptx
Introduction-to-Tests based on T-distribution.pptx
ShriramKargaonkar
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
Maria Theresa
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.ppt
Ogunsina1
 
Mpu 1033 Kuliah 9
Mpu 1033 Kuliah 9Mpu 1033 Kuliah 9
Mpu 1033 Kuliah 9
SITI AHMAD
 
Lec 5 statistical intervals
Lec 5 statistical intervalsLec 5 statistical intervals
Lec 5 statistical intervals
cairo university
 
M.Ed Tcs 2 seminar ppt npc to submit
M.Ed Tcs 2 seminar ppt npc   to submitM.Ed Tcs 2 seminar ppt npc   to submit
M.Ed Tcs 2 seminar ppt npc to submit
BINCYKMATHEW
 
Statistical inference with Python
Statistical inference with PythonStatistical inference with Python
Statistical inference with Python
Johnson Ubah
 

Recently uploaded (20)

Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
How to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process miningHow to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process mining
Process mining Evangelist
 
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
muhammed84essa
 
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahahE-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
RyanRahardjo2
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
spssworksho9035530-lva1-app6891 (1).pptx
spssworksho9035530-lva1-app6891 (1).pptxspssworksho9035530-lva1-app6891 (1).pptx
spssworksho9035530-lva1-app6891 (1).pptx
clarkraal
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Process Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulenProcess Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulen
Process mining Evangelist
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Volkswagen - Analyzing the World's Biggest Purchasing Process
Volkswagen - Analyzing the World's Biggest Purchasing ProcessVolkswagen - Analyzing the World's Biggest Purchasing Process
Volkswagen - Analyzing the World's Biggest Purchasing Process
Process mining Evangelist
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
AWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdfAWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest InsurerSuncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Process mining Evangelist
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
How to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process miningHow to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process mining
Process mining Evangelist
 
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
muhammed84essa
 
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahahE-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
RyanRahardjo2
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
spssworksho9035530-lva1-app6891 (1).pptx
spssworksho9035530-lva1-app6891 (1).pptxspssworksho9035530-lva1-app6891 (1).pptx
spssworksho9035530-lva1-app6891 (1).pptx
clarkraal
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Process Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulenProcess Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulen
Process mining Evangelist
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Volkswagen - Analyzing the World's Biggest Purchasing Process
Volkswagen - Analyzing the World's Biggest Purchasing ProcessVolkswagen - Analyzing the World's Biggest Purchasing Process
Volkswagen - Analyzing the World's Biggest Purchasing Process
Process mining Evangelist
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
AWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdfAWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest InsurerSuncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Process mining Evangelist
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Ad

Inferential Statistics-Part-I mtech.pptx

  • 1. Inferential Statistics Why do you need inferential statistics?  Making an inference about the population from a sample.  Allow us to use the information learned from descriptive statistics.  It extends beyond the immediate data.  Used to infer from sample data what is the population might think.  Used to make judgements about the probability that an observed difference between groups is a dependable one or one that might have happened by chance in the study.
  • 2. Inferential Statistics Statistical inference is the procedure by which we reach a conclusion about a population on the basis of the information contained in a sample drawn from that population. Statistical Inference (Two General Areas)  Estimation  Hypothesis Testing Estimation process use sample data to calculate some statistics that serves as an approximation of the corresponding parameter of the population from which the sample was drawn. Types of estimate Point estimate: A point estimate is a single numerical value used to estimate the corresponding population parameter. Interval estimate: An interval estimate consists of two numerical values defining a range of values that, with a specified degree of confidence, most likely includes the parameter being estimated.
  • 3. Inferential Statistics A single computed value is referred to as an estimate. The rule that tells us how to compute this value, or estimate, is referred to as an estimator. Estimators are usually presented as formulas. For example, is an estimator of the population mean, µ. The single numerical value that results from evaluating this formula is called an estimate of the parameter µ. In many cases, a parameter may be estimated by more than one estimator. One of the desired property of a good estimator is unbiasedness. An estimator, say, T, of the parameter θ is said to be an unbiased estimator of θ if E(T) = θ.
  • 4. Inferential Statistics The sampled population is the population from which one actually draws a sample. The target population is the population about which one wishes to make an inference. Only when the target population and the sampled population are the same it is possible for one to use statistical inference procedures to reach conclusions about the target population. If the sampled population and the target population are different, the researcher can reach conclusions about the target population only on the basis of nonstatistical considerations like extrapolating findings to the target population.
  • 5. Inferential Statistics CONFIDENCE INTERVAL FOR A POPULATION MEAN Suppose a researchers wishes to estimate the mean of some normally distributed population. Draw a random sample of size n from the normally distributed population and compute , which is used as a point estimate of µ. Random sampling inherently involves chance, cannot be expected to be equal to µ. It would be much more meaningful, therefore, to estimate µ by an interval that somehow communicates information regarding the probable magnitude of µ.
  • 6. Inferential Statistics CONFIDENCE INTERVAL FOR A POPULATION MEAN If sampling is from a normally distributed population, the sampling distribution of the sample mean will be normally distributed with a mean equal to the population mean µ, and a variance is equal to . From our knowledge of normal distributions, approximately 95% of the possible values of constituting the distribution are within two standard deviations of the mean. The two points that are two standard deviations from the mean will contain approximately 95%.
  • 7. Inferential Statistics Example: Suppose a researcher, interested in obtaining an estimate of the average level of some enzyme in a certain human population, takes a sample of 10 individuals, determines the level of the enzyme in each, and computes a sample mean of . Suppose further it is known that the variable of interest is approximately normally distributed with a variance of 45. We wish to estimate µ.
  • 8. Inferential Statistics An approximate 95% confidence interval for µ is given by Interval Estimate Components: The interval estimate contains in its centre the point estimate of µ. The 2 we recognize as a value from the standard normal distribution that tells us within how many standard errors lie approximately 95% of the possible values of . This value of z is referred to as the reliability coefficient. The last component, , is the standard error, or standard deviation of the sampling distribution of . In general, then, a interval estimate may be expressed as follows: Sampling from a normal distribution with known variance, an interval estimate for µ is where is the value of z to the left of which lies and to the right of which lies of the area under its curve.
  • 9. Inferential Statistics Interpreting Confidence Intervals: How do we interpret the interval given by Expression ? In the present example the reliability coefficient is equal to 2. We say that in repeated sampling approximately 95% of the intervals constructed by the above expression will include the population mean. This interpretation is based on the probability of occurrence of different values of . We may generalize this interpretation by designating the total area under the curve of that is outside the interval as and the area within the interval as 1- . 𝛂 𝛂
  • 10. Inferential Statistics Probabilistic Interpretation: In repeated sampling, from a normally distributed population with a known standard deviation, 100(1- ) percent of all intervals of 𝛂 the form will in the long run include the population mean µ. The quantity (1- ), in this case 0.95, is called the 𝛂 confidence coefficient (or confidence level), and the interval is called a confidence interval for µ. When (1- )=0.95, 𝛂 the interval is called the 95% confidence interval for µ. Practical Interpretation: When sampling is from a normally distributed population with known standard deviation, we are 100(1- ) percent confident 𝛂 that the single computed interval, , contains the population mean. Precision: The quantity obtained by multiplying the reliability factor by the standard error of the mean is called the precision of the estimate. This quantity is also called the margin of error.
  • 11. Inferential Statistics Sampling from Nonnormal Populations: As noted, it will not always be possible or prudent to assume that the population of interest is normally distributed. Thanks to the central limit theorem, this will not deter us if we are able to select a large enough sample. We have learned that for large samples, the sampling distribution of is approximately normally distributed regardless of how the parent population is distributed. Example: Punctuality of patients in keeping appointments is of interest to a research team. In a study of patient flow through the offices of general practitioners, it was found that a sample of 35 patients was 17.2 minutes late for appointments, on average. Previous research had shown the standard deviation to be about 8 minutes. The population distribution was felt to be nonnormal. What is the 90% confidence interval for µ, the true mean amount of time late for appointments?
  • 12. Inferential Statistics Since the sample size is fairly large (greater than 30), and since the population standard deviation is known, we draw on the central limit theorem and assume the sampling distribution of to be approximately normally distributed. From the normal distribution Table, we find the reliability coefficient corresponding to a confidence coefficient of 0.90 to be about 1.645. The standard error is Therefore, 90% percent confidence interval for µ is Frequently, when the sample is large enough for the application of the central limit theorem, the population variance is unknown. In that case, we use the sample variance as a replacement for the unknown population variance in the formula for constructing a confidence interval for population mean.
  • 13. t-distribution One does not have knowledge of the population mean and variance, and cannot use the statistic to construct Confidence intervals of the mean. The statistic, is normally distributed when the population is normally distributed and is at least approximately normally distributed when n is large, regardless of the functional form of the population, we cannot make use of this fact because is 𝜎 unknown.
  • 14. t-distribution Most logical solution, we use the sample Standard Deviation (SD), as an approximation of . This is justifiable when n ≥ 30 and also the use of 𝜎 normal distribution theory to construct confidence interval for When we have a small sample an alternative procedure for constructing a confidence interval is the use of Student’s t distribution or simply t distribution. William S Gosset was a statistician employed by the Guinness brewing company which had stipulated that he cannot publish his own name. (GOSSET -pseudonym of student) student’s t distribution.
  • 15. t-distribution Properties of t distribution: 1. It has a mean = 0 2. symmetrical about the mean. 3. In general, it has a variance greater than 1, but the variance approaches 1 as the sample size becomes large. For degree of freedom (df) > 2, the variance of t distribution is df/(df-2), since here df = n - 1 for n > 3 we may write the variance of the t distribution = (n-1)/(n-3) 4. Variable t ranges from -∞ to +∞ 5. Compared to the normal distribution the t distribution is less peaked in the centre and has higher tails. 6. The t distribution approaches the normal distribution as n - 1 approaches infinity
  • 16. t-distribution Confidence interval using t Source of reliability coefficient is different When sampling is from a normal distribution whose standard deviation , is unknown the 100(1-𝛼) percent confidence interval for the population mean µ is given by Applicable: strictly valid when the sample be drawn from a normal distribution
  • 17. t-distribution: Applicability When to use t distribution: The t distribution can be used with any statistic having a bell-shaped distribution (approximately normal). The sampling distribution of a statistic should be bell-shaped. If any of the following conditions apply: The population distribution is normal. The population distribution is symmetric, unimodal, and without outliers, and the sample size is at least 30. The population distribution is moderately skewed, unimodal, without outliers and the sample size is at least 40. The sample size is greater than 40, without outliers The t distribution should not be used with small samples from populations that are not approximately normal.
  • 18. t-distribution: Example A researcher conducted a study to evaluate the effect of on job body mechanism instruction on the work performance of newly employed young workers. He used two randomly selected groups of subjects, an experimental group and a control group. The experimental group received one hour of back- school training provided by an occupational therapist. The control group did not receive this training. A criterion-referenced body mechanics evaluation checklist was used to evaluate each worker’s lifting, lowering, pulling and transferring of objects in the work environment. A correctly performed task received a score of 1. The 15 control subjects made a mean score of 11.53 on the evaluation with a standard deviation of 3.681. We assume that these 15 controls behave as a random sample from a population of similar subjects. We wish to use these sample data to estimate the mean score for the population.
  • 19. Standard normal and t-distribution Tables
  • 20. t-distribution: Example Ans: We may use the sample mean 11.53 as a point estimate of the population mean but since the population standard deviation is unknown, we must assume that the population of values to be at least approximately normally distributed before constructing a confidence interval for µ. Let us assume that such an assumption is reasonable and that a 95% confidence interval is desired. We have our estimator and our standard error is We need now to find the x̄ reliability coefficient, the value of t associated with a confidence coefficient of 0.95 and n-1 =14 degree of freedom. Since a 95% confidence interval leaves 0.05 of the area under the curve of t to be equally divided between the two tails, we need the value of t to the right of which lies 0.025 of the area. From T distribution table at 14 degrees of freedom: t0.975 = 2.1448 Hence the 95% confidence interval 11.53 ± 2.1448 × (0.9504) = 11.53 ± 2.04 = [9.49, 13.57] This interval may be interpreted from both the probabilistic and practical points of view. We are 95% of confidence that the true population mean µ is somewhere between 9.49 and 13.57 because, in repeated sampling, 95% of intervals constructed in like manner will include µ.
  • 21. Deciding between z and t When we construct a confidence interval for a population mean, we must decide whether to use a value of z or a value of t as the reliability factor. To make an appropriate choice we must consider sample size, whether the sampled population is normally distributed, and whether the population variance is known.
  • 22. Confidence interval for the difference between two population means Confidence interval for the difference between population means provides information that is helpful and deciding whether or not it is likely that the two population means are equal. When the constructed interval does not include zero, we say that the interval provides evidence that the two-population means are not equal. When the interval includes zero, we say that the population mean may be equal. Sampling from Normal Populations with known variences Population variance are known the 100(1-)% confidence interval for µ1 - µ2 is given Sampling from Non-normal Populations The construction of a confidence interval for the difference between two population means when sampling is from non-normal populations proceeds in the same manner as sampling from normal populations if the sample sizes n1 and n2 are large. Again, this is a result of the central limit theorem. If the population variances are unknown, we use the sample variances to estimate them.
  • 23. Confidence interval for the difference between two population means Example: Despite common knowledge of the adverse effects of doing so, many women continue to smoke while pregnant. A researcher examined the effectiveness of a smoking cessation program for pregnant women. The mean number of cigarettes smoked daily at the close of the program by the 328 women who completed the program was 4.3 with a standard deviation of 5.22. Among 64 women who did not complete the program, the mean number of cigarettes smoked per day at the close of the program was 13 with a standard deviation of 8.97. We wish to construct a 99 percent confidence interval for the difference between the means of the populations from which the samples may be presumed to have been selected.
  • 24. Confidence interval for the difference between two population means No information is given regarding the shape of the distribution of cigarettes smoked per day. Since our sample sizes are large, however, the central limit theorem assures us that the sampling distribution of the difference between sample means will be approximately normally distributed even if the distribution of the variable in the populations is not normally distributed. We may use this fact as justification for using the z statistic as the reliability factor in the construction of our confidence interval. Also, since the population standard deviations are not given, we will use the sample standard deviations to estimate them. The point estimate for the difference between population means is the difference between sample means, 4.3 – 13.0 = - 8.7. From normal distribution Table, we find the reliability factor to be 2.58. The estimated standard error is Our 99 percent confidence interval for the difference between population means is - 8.7 ± 2.58 (1.1577) = [- 11.7; - 5.7] We are 99 percent confident that the mean number of cigarettes smoked per day for women who complete the program is between 5.7 and 11.7 lower than the mean for women who do not complete the program.
  • 25. Confidence interval for the difference between two population means t distribution and the difference between means when population variances are unknown, and we wish to estimate the difference between two population means with a confidence interval we can use the t distribution as a source of reliability factor if certain assumptions are met. We must know or willing to assume, that the two sampled populations are normally distributed. Regarding unknown population variances, two situations may occur: Situation–I: population variances are equal, Situation–II: population variances are not equal.
  • 26. Situation–I: population variances are equal If the assumption of equal population variance is justified, the two samples may be considered as the estimates of the same quantity, the common variance. Obtain a pooled estimate of the common variance. Pooled variance is the weighted average of the sample variances. Sample variances are weighted by their degree of freedom 100(1 - α) percent confidence interval for µ1 - µ2 degree of freedom used in determining the value of t in n1 + n2 – 2
  • 27. Example: population variances are equal The purpose of a researcher study was to determine the effect of long term exercise intervention on corporate executives enrolled in a supervised fitness program. Data were collected on 13 subjects (the exercise group) who voluntarily entered a supervised exercise program and remained active for an average of 13 years and 17 subjects (the secondary group) who elected not to join the fitness program. Among the data collected on the subjects was the maximum number of sit-ups completed in 30 seconds. The exercise group has a mean and standard deviation for this variable of 21.0 and 4.9 respectively. The mean and standard deviation for the sedentary group were 12.1 and 5.6 respectively. We assume that the two populations of overall muscle condition measures are approximately normally distributed and that the two population variances are equal. We wish to construct a 95% confidence interval for the difference between the means of the populations represented by these two samples.
  • 28. Example: population variances are equal Pooled estimate of the common population variance from t distribution table (13+17-2) = 28 degree of freedom and desired 0.95 confidence interval, reliability factor 2.0484 Confidence interval = (21.0 - 12.1) 2.0484 8.9 4.0085 = We are 95% confident that the difference between the population mean is somewhere between 4.9 and 12.9. We can say this because we know that if we were to repeat the study many, many times and compute confidence intervals in the same way, about 95% of the intervals would include the difference between the population mean. Since the interval does not include zero, we conclude that the population means are not equal. We can interpret this interval that the difference between the two population means is estimated to be 8.9 and we are 95% confident that the true value lies between 4.9 and 12.9.
  • 29. Confidence interval for the difference between two population means Population variance is not equal When one is unable to conclude that the variances of two populations of interest are equal even though the two populations may be assumed to be normally distributed, it is not proper to use t distribution. Solutions has been proposed by many researchers. But the problem resolves around the fact that the quality does not follow t-distribution with n1 + n2 - 2 degree of freedom when the population variances are not equal.
  • 30. Confidence interval for the difference between two population means Population variance is not equal The solution proposed by Cochran consists of completing the reliability factor by the following formula: Where for n1 - 1 degrees of freedom, and for n2 - 1 degrees of freedom. An approximate 100(1- ) percent confidence interval for µ 𝛼 1 - µ2 is given by
  • 31. Example: population variances are not equal The purpose of a research study was to determine the effect of long-term exercise intervention on corporate executives enrolled in a supervised fitness program. Data were collected on 13 subjects (the exercise group) who voluntarily entered a supervised exercise program and remained active for an average of 13 years and 17 subjects (the secondary group) who elected not to join the fitness program. Among the data collected on the subjects were the maximum number of sit-ups completed in 30 seconds. The exercise group has a mean and standard deviation for this variable of 21.0 and 4.9 respectively. The mean and standard deviation for the sedentary group were 12.1 and 5.6 respectively. We assume that the two populations of overall muscle condition measures are approximately normally distributed and that the two population variances are not equal. We wish to construct a 95% confidence interval for the difference between the means of the populations represented by these two samples.
  • 32. Example: population variances are not equal We will use Cochran reliability factor t’. From t distribution Table with 12 degrees of freedom and . Similarly, with 16 degrees of freedom and . We now compute we now construct the 95 percent confidence interval for the difference between the two population means. Since the interval does not include zero, we conclude that the population means are not equal. We can interpret this interval that the difference between the two population means is estimated to be 8.9 and we are 95% confident that the true value lies between 7.9348 and 16.8348.
  • 33. Determination of sample size for estimating mean Planning of any survey experiment - How large a sample to take? Larger than needed - wasteful of resource Smaller than needed - lead to a result of no practical use objective:- Interval estimation should have 1) a narrow interval 2) high reliability. Total width of the interval is twice the magnitude of the quality : Increasing reliability means a larger reliability coefficient---> increase interval.
  • 34. Determination of sample size for estimating mean Fixed reliability coefficient and reduce standard error standard error = , is fixed the only way is to increase n --> take a larger sample How large?--> depends on the desired degree of reliability and the desired interval width sampling with replacement from an infinite or sufficiently large population formula sampling from small finite population without replacement formula
  • 35. Determination of sample size for estimating mean Sample size estimation formulas require the and population variance is unknown. The most frequently used source for estimation of are: A pilot or preliminary sample may drawn from the population and computed sample variance(S2 ) may be used to estimate . Observations used in the pilot sample may be computed on a part of the final sample: n(the computed sample size) - n1(the pilot sample size) = n2(the number of observation needed to satisfy the total sample size requirement) Estimates of Sigma square may be available from previous or similar studies.
  • 36. Inference about population variance Confidence interval for the variance of a normally distributed population Distribution normal? - used sample variance as an approximate estimator of population variance Wonder about the quality? - check whether the sample variance is an unbiased estimator of population variance. To be unbiased - average value of the sample variance over all possible sample must be equal to the population variance. E() =
  • 37. Inference about population variance Draw all possible samples of size two from the population consisting of the values 6, 8, 10, 12 and 14. If we compute the sample variance For each of the possible samples, we obtain the sample variances as shown in the table Sampling with replacement E(s2 ), the expected value of the mean of the sample variance, (0 + 2 + ….. + 2 + 0)/25 = 8 Hence E(s2 ) = S E C O N D D R A W 6 8 10 12 14 F I R S T D R A w 6 8 10 12 14 0 2 8 18 32 2 0 2 8 18 8 2 0 2 8 18 8 2 0 2 32 18 8 2 0
  • 38. Inference about population variance Sampling without replacement: the expected value of s2 , (0 + 2 + ….. + 2 + 0)/10 = 10 Then E(s2 ) = where sampling is with replacement. Results justify the use of for computing the sample variance. E(s2 ) when sampling is without replacement. Interval estimation of a population variance Success depends on our ability to find an approximate sampling distribution Confidence interval for are usually based on the sampling distribution of If sample of size n are drawn from a normally distributed population, this quality [ ] has a distribution known as Chi square distribution with n - 1 degrees of freedom. To obtain a 100(1-α)% confidence interval for we select values of from the table in such a way that α/2 is to the left of smaller value of and α/2 is to be the right of the larger values of .
  • 39. Inference about population variance The 100(1-α) % confidence interval for is Confidence interval for , population standard deviation is Method is widely used but have some draw back. Normality of the population is crucial. Estimator is not in the centre of the confidence interval because distribution is not symmetric.
  • 40. Inference about population variance A random sample of 20 nominally measured 2 mm diameter steel ball bearings is taken and the diameters are measured precisely. The measurements, in mm, are as follows: 2.02 1.94 2.09 1.95 1.98 2.00 2.03 2.04 2.08 2.07 1.99 1.96 1.99 1.95 1.99 1.99 2.03 2.05 2.01 2.03 Assuming that the diameters are normally distributed with unknown mean  and unknown variance 2 , a) Find a 2-sided 95% confidence interval for the variance 2 b) Find a 2-sided confidence interval for the standard deviation 
  • 41. Inference about population variance From the data we calculate and , and . Hence, There are 19 degrees of freedom and the critical values of the Chi-square distribution and The confidence interval for 2 is The confidence interval for  is
  • 43. Inference about population variances Confidence interval for the ratio of the variances of the two normally distributed populations: One way of comparing two variances is to compute their ratio To use t distribution for constructing a confidence interval for the difference between population means requires that the population variances be equal. If the two variances are equal, the ratio will be one. If the confidence interval for the ratio of two populations variances includes 1, we conclude that the two populations variances, may in fact, be equal. This is a form of inference, and we must rely on some sampling distribution. This time the distribution of is utilised provided the following assumptions are met. Assumption: 1. and are computed from independent samples of size n1 and n2 2. Sample have been drawn from two normally distributed populations 3. let If the assumption are met, follows a F distribution
  • 44. Inference about population variances F-distribution: If U and V are independent chi-square random variables with r1 and r2 degrees of freedom, respectively, then: follows an F distribution with r1 numerator degrees of freedom and r2 denominator degrees of freedom. This F distribution depends on two-degrees-of-freedom values, one corresponding to the value of n1 – 1 used in computing, and the other corresponding to the value of n2 – 1 used in computing . They are usually referred to as the numerator degree of freedom and denominator degree of freedom. A confidence interval for at 100(1-α) % confidence is constructed by the values from F table to the left lies α/2 of the area the value from F table to the right lies α/2 of the area
  • 45. Inference about population variances The values of at the intersection of the column headed df1 and the row labelled df2. If we have extensive table of F distribution, finding would be no trouble. To include every possible percentile of F would make a very lengthy table. A relationship exists to compute
  • 46. Inference about population variances Problem# The variability in the thickness of oxide layers in semiconductor wafers is a critical characteristic, where low variability is desirable. A company is investigating two different ways to mix gases so as to reduce the variability of the oxide thickness. We produce 16 wafers with each gas mixture and our results indicate that the standard deviation is s1 = 1.96Å and s2 = 2.13Å for the two mixtures. What is the 95% confidence interval for the ratio between the two variances?
  • 47. Inference about population variances Given information: Sample size of population 1: n1 = 16; sample standard deviation for sample from population 1: s1 = 1.96; Sample size of population 2: n2 = 16; Sample standard deviation for sample from population 2: s2 = 2.13. Since we are looking for a 95% confidence interval, we need two f values: