SlideShare a Scribd company logo
Lecture Series on
  Biostatistics




      Inferential Statistics-
           Estimation
                           By
           Dr. Bijaya Bhusan Nanda,
       M. Sc (Gold Medalist) Ph. D. (Stat.)
    Topper Orissa Statistics & Economics Services, 1988
            bijayabnanda@yahoo.com
CONTENTS
 Introduction
 Confidential interval for a population mean
 The t distribution
 Confidence interval for the difference
  between two population mean
 Confidence interval for a population
  proportion
 Confidence interval for the difference
  between two population proportion
Introduction
Statistical Inference
 It is the procedure by which we reach a conclusion
  about a population on the basis of information
  contained in the sample drawn from that population.
 Two broad areas of statistical inference
    Estimation
    Hypothesis testing.
 Estimation- It is the process of calculating some
  statistics based upon the sample data drawn from a
  certain population. The statistics is used as an
  approximation to the population parameter.
Types of Estimate
 For each of parameters, we can have
   Point Estimate
   Interval estimate
 A point estimate is a single numerical value used to
 estimate the corresponding population parameter.
An interval estimate consists of two numerical values
 defining a range of values that, with a specified degree
 of confidence, includes the parameter being estimated.
Choosing an Appropriate Estimation
 A single computed value is an estimate.
 An estimator usually presented as a formula. For
  example
   x=
        ∑ xi
         n   is the estimator of population mean
 The single numerical value that results from
  evaluating this formula is called an estimate of the
  parameter µ.
 E(T) is obtained by taking the average value of T
  computed from all possible sample i.e. E(T)= µT
Criteria of a good estimator:
There can be more than one estimator for the parameter.
 For example population mean can be estimated by the
 sample median or sample mean.
But a good estimator should fulfill certain criteria. One such
 criterion of a good estimator is unbiasedness.
An estimator T of the parameter θ is said to be an unbiased
 estimator of θ if E(T)= θ
 Sample mean is an unbiased estimator of population mean.
 Sample proportion is an unbiased estimate of population
 proportion.
 The difference between two sample means is an unbiased
 estimate of difference between the population mean.
The difference between two sample proportions is an
 unbiased estimate of difference between the population
 proportion.
Sampled Population and Target Population
 Sampled population is the population from which one
  actually draws a sample.
 The target population is the population about which
  one wishes to make inference.
 Statistical inference procedure allows one to make
  inference about sampled population (provided proper
  sampling methods have been employed)
 Only when sampled population and the target
  population are the same, it is possible to reach
  statistical inference about the target population.
Random and Non random Samples
 The strict validity of the statistical procedures
  discussed depends on the assumptions of random
  sample.
 In real world applications it is impossible or
  impractical to use truly random samples.
 If the researchers had to depend on randomly selected
  materials, very little research of this type would be
  conducted.
 Therefore, non statistical considerations must play a
  past in the generalizations process.
 Researchers may contend that samples actually used
  are equivalent to simple random samples, since there is
  no reasons to believe that material actually used is not
  representative of the population about which inferences
  are desired.
 In many health research projects, samples of
  convenience, rather than random samples are
  employed.
 Generalization must be made on non statistical
  consideration.
 Consequences of such generalization, however may
  range from misleading to disastrous.
 In some situation it is possible to introduce
  randomization into experiment even though available
  subjects are not randomly selected from well defined
  population.
 Example: Allocating subjects to treatments in a random
  manner.
Confidential interval for a population mean
 Say x the sample mean from a random sample of size n
      is
  drawn from a normally distributed population.
 x is an unbiased estimator of the population mean
      ()
     E x =µ
 But because of sampling fluctuation x can’t be expected
  to be equal to µ.
 Therefore, we should have an interval estimate µ.
 For the interval estimate we must have knowledge of
  sampling distribution of x .
 Sampling distribution of x for a normal population is as
  follows

         x ≈ N ( µ , σ )i.e.N ( µ ,σ            )
                                            n
                     x   x
 Following the normal probability distribution the
 interval estimate for µ is as follows
 95% confidence interval for µ = µ ± 2σ          x
 since we don’t know µ with the help of sample mean   x   we
 have the 95% confidence interval for µ as
              x ± 2σx = x ± 2 σ             n
  If we don’t know σ the 95% confidence interval for
  µ = x ± 2 s / n where s= the standard deviation based
   on the sample.
Interval estimate Component
 In general an interval estimate may be expressed as
  follows
  estimator ± (reliability coefficient)× (standard error)
 In particular when sampling is from a normal
  distribution with known variance, an interval
  estimator for µ may be expressed as
                  x±z                 σ
                           (1 − α / 2 )   x
  where z(1-α /2) is the value of z to the left of which lies
  1- α /2 and to the right of which lies α /2 of the area
  under its curve.
Interpreting Confidence Interval:
 Probabilistic Interpretation: In repeated sampling,
  from a normally distributed population with a known
  standard deviation, 100(1- α ) percent of all intervals of
  the form   x±z               σ
                     (1 − α / 2 ) x will in the long run
  include the population mean µ .
 Practical Interpretation: When sampling is from a
  normally distributed population with known standard
  deviation, we are 100(1- α ) % confident that the single
  computed interval,
  population mean µ .
                         x±z                σ
                                    (1 − α / 2 ) x
                                                   , contains the

 Precision: The quantity obtained by multiplying the
  reliability factor by the standard error of the mean is
  called the precision of the estimate. This quantity is also
  called the margin of error.
Example
       A physical therapist wished to estimate, with 99 percent
  confidence , the mean maximum strength of a particular
  muscle in a certain group of individuals. He is willing to
  assume that strength scores are approximately normally
  distributed with a variance of 144. A sample of 15 subjects
  who participated in the experiment yielded a mean of 84.3.
Solution:
  The z value corresponding to a confidence coefficient of 0.99
  is found in Table D to be 2.58. This is our reliability
  coefficient. The standard error is σx = 12          = 3.0984
    .Our 99%                                      15
                                    84.3 ± 2.58(3.0984)
  Confidence interval for µ, then
                                    84.3 ± 8.0
                                    76.3,92.3
We say we are 99% confidence that the
 population mean is between 76.3 and 92.3
 since, in repeated sampling, 99% of all
 intervals that could be constructed in the
 manner just described would include the
 population mean.
The t distribution
 Estimation of confidence interval using standard normal z
  distribution applies when population variance is known.
  Usually, the knowledge of population variance is not
  known.
 This condition presents a problem to construct confidence
  intervals.
                             x −µ
  Although the statistic z = σ       is normally distributed
                                n
  or approximately normally distributed when n is large,
  regardless of the functional form of the population, we
  can’t make use of this fact because σ is not known.
 In this case we may use of
                               s=   ∑ ( xi − x )
                                                   2
                                                       ( n − 1)
  as an estimator of σ. When sample size is large, say,
  greater than 30, s is a good approximate of σ is
  substantial and we may use normal distribution theory
  to construct confidence interval.
 When the sample size is small, ≤30, we make use of
  Student’s t distribution.

                  x−µ
               t=
The quantity      s       follows this distribution.
                    n
Properties of the t distribution:
    It has a mean of 0.
    Symmetrical about the mean.
   In general, it has a variance greater than 1, but the
    variance approaches 1 as the sample size becomes
    large. For df >2, the variance for t distribution is
    df /(df-2).
    The variable t ranges from -∞ to + ∞ .
   The t distribution is really a family of distributions,
    since there is a different distribution for each sample
    value of n-1, the divisor used in computing s2. We recall
    that n-1 is referred to as degrees of freedom.
   Compared to normal distribution is less peaked at
    center and has higher tails t distribution approaches
    the normal distribution an n-1 approaches infinity.
   The t distribution approaches the normal distribution
    as n-1 approaches infinity.
The t distribution for different degrees of freedom
                           Degrees of freedom=30
                           Degrees of freedom=5
                           Degrees of freedom=2




Comparison of normal distribution and t distribution

                            Normal Distribution


                            t distribution
Confidence Intervals Using t:
 estimator ± (reliability coefficient) (standard error)
                                                     error
The source of reliability co-efficient is ‘t’ distribution
  rather than standard normal z distribution.
 When sampling is from a normal distribution whose
  standard deviation , σ , is unknown the 100(1-α )%
  confidence interval for the population mean µ , is
  given by
                                         s
                    x ± t (1 − α / 2 )
                                          n
   A moderate departures of the sampled population
    from the normally can be tolerated in using the ‘t’
    distribution. An assumption of mound shaped
    population distribution be tenable.
Example:
  Maureen McCauley conducted a study to evaluate the effect
  of on the job body mechanics instruction on the work
  performance of newly employed young workers (A-1). She
  used two randomly selected groups of subjects, an
  experimental group and a control group. The experimental
  group received one hour of back school training provided on
  occupational therapist. The control group did not receive this
  training. A criterion referenced Body Mechanics Evaluation
  Checklist was used to evaluate each workers lifting, lowering,
  pulling, and transferring of objects in the work environment.
  A correctly performed task received a score of 1. The 15
  control subjects made a mean score of 11.53 on the evaluation
  with a standard deviation of 3.681. We assume that these 15
  controls behave as a random sample from a population of
  similar subjects. We wish to use these sample data to estimate
  the mean score fro the population.
Solution:
  We may assume sample mean, 11.53, as a point estimate
  of population mean but since the population standard
  deviation is unknown, we must assume the population of
  values to be at least approximately normally distributed
  before constructing a confidence interval for µ .
  Let us assume an assumption is reasonable and that a
  95% confidence interval is desired. Our estimator x is
   and our standard error is

             s   n = 3.681         = 0.9564
                              15
  We need to find reliability coefficient, the value of t
  associated with a confidence coefficient of 0.95 and n-
  1=14 degrees of freedom.
Confidence interval is equally divided in to two tails
i.e.0.025 .In Table E the tabulated value of t is 2.1448, which
is our reliability coefficient.
Then our 95% confidence interval is as follows
                  11.53 ± 2.1448(0.9504)
                  11.53 ± 2.04
                  9.49,13.57
Deciding Between z and t:
               To make an appropriate choice between z and
we must consider whether the sampled population is normally
distributed, and whether the population variance is known.
Flowchart for use in deciding between z and t when
              making inferences about population means
                                      Normal
                Y                    Population                  N


               Large                                          Large
      Y       Sample         N                      Y        Sample         N




     Populn            Y    Populn     N     Y     Populn    N       Y     Populn   N
Y               N
      var                    var                    var                     var
    known?
                           known?                 known?                  known?


z                t     z               t      z              z        *             *

                z                          Central limit theorem applies

               * = use nonparametric Procedure
Confidence interval for the difference between two
                  population means
Normal Population
 Say µ -µ = the difference between two normal population
       1  2
  means.

   x −x
     1      2 = the difference between two independent
  sample means drawn from the two populations respectively.
            E ( x 1 − x 2) = µ1 − µ 2
                             σ1 σ 2
                              2
            V ( x 1 − x 2) = + 2
                             n1 n 2
  Where σ12 & σ22 are the variance of the population 1&2 and n1,
  n2 are the size of the SRS drawn from population 1&2
  respectively.
 When the population variances are known, the 100(1-
  α)% confidence interval for µ1& µ2 is given by

                                        σ           σ
                                            2           2

          (x − x ) ± z                          +
                                            1           2
             1     2     (1 − α / 2 )

                                        n   1       n   2



 When the confidence interval includes zero, we say
  that the population means may be equal when it
  doesn’t includes zero we say that the interval
  provides evidence that the two population means are
  not equal.
Example:
  A research team is interested in the difference between
  serum uric acid levels in patients with and without Down’s
  syndrome. In a large hospital for the treatment of the
  mentally retarded, a sample of 12 individuals with Down’s
  syndrome yielded a mean of 4.5mg/100ml. In a general
  hospital a sample of 15 normal individuals of the same age
  and sex were found to have a mean value of 3.4. If it is
  reasonable to assume that the two populations of values are
  normally distributed with variances equal to 1 and 1.5, find
  the 95% confidence interval for µ1- µ2.
Solution:
  For a point estimate of µ1- µ2, we use
               x 1 − x 2 = 4.5 − 3.4 = 1.1
The reliability coefficient corresponding to 0.95 is found in
Table D to be 1.96. The standard error
                 σ
                     2
                      σ     1 1 .5
                              2
σ ( x 1 − x 2) =    +1
                         =   +2
                                   = 0.4282
                 n1   n2   12 15
The 95 percent confidence interval, then, is
                  1.1 ±1.96(0.4282)
                  1.1 ± 0.84
                  0.26,1.94
The difference µ1- µ2, is some where between 0.26 and 1.94,
because, in repeated sampling, 95% of the intervals
constructed in this manner would include the difference
between the true means. Since the interval does not include
zero, we conclude that the two population means are not
equal.
Sampling from Nonnormal Population:
  If sample sizes n1 and n2 are large, we may construct the
  confidence interval the same way as we do incase of
  normal population in accordance with the Central Limit
  Theorem.
Example:
  Motivated by an awareness of the existence of a body of
  controversial literature suggesting that stress, anxiety, and
  depression are harmful to the immune system, Gorman et al.
  (A-5) conducted a study in which the subjects were homosexual
  men and some of whom were HIV positive and some of whom
  were HIV negative. Data were collected on a wide variety of
  medical, immunological, psychiatric, and neurological measures,
  one of which was the number of CD4+ cells in the blood. The
  mean number of CD4+ cells for the 112 men with HIV infection
  was 401.8 with a standard deviation of 226.4. For the 75 men
  without HIV infection the mean and standard deviation were
  828.2 and 274.9, respectively. We wish to construct a 99%
  confidence interval for the difference between population
  means.
Solution:
  Here we are using z statistic as the reliability factor in the
  construction of our confidence interval. Since the population
  standard deviation are not given, we will use the sample standard
  deviations to estimate them. The point estimate for the difference
  between population means is the difference between sample
  means, 882.2-401.8=426.4. In Table D, we find the reliability
  factor to be 2.58. The estimated standard error is

                            2               2
                   274.9            226.4
   sx − x =                     +               = 38.279
                      75            112
      1     2
Our 99% confidence interval for the difference between
 population means is
                 426.4 ± 2.58(38.279)
                 327.6,525.2
We are 99% confident that the mean number of CD4+
 cells in HIV +ve males differs from the mean for HIV
 –ve males by some where between 327.6 and 525.2.
Confidence interval using ‘t’ distribution:
 Say the variances of the two sampled population are not
   known.
 The two sampled population are normally distributed.
 The difference between the two population means can be
   estimated through confidence interval using ‘t’ distribution
   as the source of reliability factor.
Situation-I-Equal population variances:
   The 100(1-α)% confidence interval for µ1- µ2 is given by
                                                2           2
                                            s           s
            ( x 1 − x 2) ± t (1 − α / 2 )       p
                                                    +       p

                                            n1          n2
If the two sample sizes are unequal , the weighted average
   takes advantages of the additional information provided by
   the larger sample. The pooled estimate is given by
( n1 −1) s + ( n 2 −1) s
                                    2                       2

           s
               2
               p   =                1                       2

                                n1 + n 2 − 2
The standard error of the estimate , then, is given by

                                            2           2
                                        s           s
               Sx    1   − x2   =           p
                                                +       p

                                        n1          n2
The ‘t’ follows a Student’s t distribution with n1+n2-2 d.f.
Example:
       The purpose of a study by Stone et al.(A-6) was to
determine the effects of long-term exercise intervention on
corporate executives enrolled in a supervised fitness program.
Data were collected on 13 subjects (the exercise group) who
  voluntarily entered a supervised exercise program and remained
  active for an average of 13 years and 17 subjects (the secondary
  group) who elected not to join the fitness program. Among
  the data collected on the subjects was maximum number of
  sit-ups completed in 30 seconds. The exercise group had a
  mean and standard deviation for this variable of 21.0 and
  4.9 respectively. The mean and standard deviation for the
  secondary group were 12.1 and 5.6 respectively. We assume
  that the two population of overall muscle condition
  measures are approximately normally distributed and that
  the two population variances are equal. We wish to
  construct a 95% confidence interval for the difference
  between the means of the populations represented by these
  two samples.
Solution:
       The 95% confident that the difference between
  population means is somewhere between 4.9, 12.9 .
Situation-II-Unequal population variances:
   An approximate 100(1-α)% confidence interval for µ1- µ2 is
   given by                              2
                                       s1 s 2
                                                2
                 x 1 − x 2 ± t ′(1 − α / 2 )        +
                                               n1       n2
The solution proposed by Cochran consists of computing the
  reliability factor t ′  =
                            w1t1 + w2 t 2
                        (1−α 2)
                                   w1 + w2
Where w1=s12/n1, w2=s22/n2, t1=t1-α/2 for n1-1 degrees of freedom,
  and t2=t1- α/2 for n2-1 degrees of freedom.
Example:
  in the study by Stone et al.(A-6)described in previous
  example , the investigator also reported the following
  information on a measure of overall muscle condition
  scores made by the subjects:
Sample      n    Mean    S.D
    Exercise    13   4.5     0.3
    group
    Sedentary   17   3.7     1.0
    group


We assume that the two populations of overall muscle
condition scores are approximately normally distributed. We
are unwilling to assume, however, that the two population
variances are equal. We wish to construct a 95% confidence
interval for the difference between the mean overall muscle
condition scores of the two populations represented by the
samples.
Solution:
The 95% confidence interval for the difference between the
two population means when they are not equal is 0.25, 1.34
Flowchart in deciding whether the reliability factor should be z, t, or t’
when making inferences about the difference between two population
means                        Normal
                                Y                    Population                       N

             Y
                        Large                N                                         Large
                       Sample                                                Y                               N
                                                                                      Sample



          Populn                         Populn                          Populn                          Populn
    Y                 N             Y                  N            Y                 N             Y                 N
           var                             var                             var                             var
         known?                          known?                          known?                          known?



    =?                =?            =?                =?            =?                =?            =?                =?




Y        N       Y         N    Y        N       Y         N    Y        N       Y         N    Y        N        Y        N


z        z       t         t’   z        z       t         t’   z        z        z        t’   *        *        *        *
                 or        or       * = Nonparametric test to be applied
                  z         z
Confidence interval for a population proportion
    Population proportion in many situations is a matter of
    great interest to the health professional.
    What proportion of people suffers from different diseases,
    disability etc.? what proportion of patience response to a
    particular treatment? These are certain important questions
    for the health purpose.
    We estimate the population proportion as in the case of
    population mean.
    A random sample is drawn from the population of interest
    and the sample proportion p provides an unbiased estimate
                                  ˆ
    of the population proportion P.
   A confidence interval is obtained by the general formula
    estimator ± (reliability coefficient) × (standard error)
    when both np and n(1-p) are greater than 5, the sampling
    distribution of p is quite close to the normal distribution.
                     ˆ
   When above condition is mate100(1-α) % confidence
    interval for p is given by

                     p ± z (1 − α / 2 )
                     ˆ                    p(1 − p )
                                          ˆ     ˆ
Example:
                                                      n
  Mathers et al.(A-12) found that in a sample of 591 patients
  admitted to a psychiatric hospital, 204 admitted to using
  cannabis at least once in their lifetime. We wish to
  construct a 95% confidence interval for the proportion of
  lifetime cannabis users in the sampled population of
  psychiatric hospital admission.
Solution:
The 95% confident that the population proportion p is
  between 0.3069, 0.3835
Confidence interval for the difference between two
             population proportion
   An unbiased point estimator of the difference between two
   population proportions (p1-p2) is provided by the difference
                                 ˆ ˆ
   between sample proportion p1 − p 2 .
  When n1 and n2(sample size) are large and the population
   proportions are not too close to 0 and 1, normal distribution
   theory may be applied to obtain confidence interval.
  A 100(1-α)% confidence interval for p1-p2 is given by


                                    p1 (1 − p1)
                                    ˆ       ˆ         p 2 (1 − p 2 )
                                                      ˆ        ˆ
       p 2 − p 2 ± z (1 − α / 2 )
       ˆ     ˆ                                    +
                                        n1                 n2
Example:
  Borst et al. (A-16) investigated the relation of ego
  development, age, gender and diagnosis to suicidality among
  adolescent psychiatric inpatients. There sample consisted of 96
  boys and 123 girls between the age of 12 and 16 selected from
  admissions to a child and adolescent unit of a private
  psychiatric hospital. Suicide attempts were reported by 18 of
  the boys and 60 of the girls. Let us assume the girls and the
  boys behave like a simple random sample from a population
  of similar girls and that the boys likewise may be considered a
  SRS from a population of similar boys. For these two
  population, we wish to construct a 99% confidence interval for
  the difference between between the proportions of suicide
  attempters.
Solution:
  The 99% confident that for the sampled populations , the
  proportion of suicide attempts among girls exceeds the
  proportion of suicide attempts among boys by somewhere
  between 0.1450, 0.4556
Example:
  Goldberg et al. (A-24) conducted a study to determine if an
  acute dose of dextroamphetamine might have positive
  effects on affect and cognition in schizophrenic patients
  maintained on a regimen of haloperidol. Among the
  variables measured was the change in patients’ tension-
  anxiety states. For n2=4 patients who responded to
  amphetamine, the standard deviation was 5.8. Let us assume
  that these patients constitute independent SRS from
  populations of similar patients. Let us also assume that
  change scores in tension-anxiety state is a normally
  distributed variable in both populations. We wish to
  construct a 95% confidence interval for the ratio of the
  variances of these two populations.
Solution: σ1   2

  0.2081< σ2 <14.0554
               2

More Related Content

What's hot (20)

PPTX
Statistical inference
Jags Jagdish
 
PPT
Hypothesis Testing
Southern Range, Berhampur, Odisha
 
PPTX
Statistical inference concept, procedure of hypothesis testing
AmitaChaudhary19
 
PPTX
Statistical distributions
TanveerRehman4
 
PDF
Kolmogorov Smirnov good-of-fit test
Rizwan S A
 
DOCX
Binary Logistic Regression
Seth Anandaram Jaipuria College
 
PPTX
Discriminant analysis
Amritashish Bagchi
 
PPTX
INFERENTIAL STATISTICS: AN INTRODUCTION
John Labrador
 
PPT
Estimation
Mmedsc Hahm
 
PPTX
Hypothesis testing ppt final
piyushdhaker
 
PPTX
Sampling Distribution
Cumberland County Schools
 
PPTX
Testing of hypotheses
RajThakuri
 
PPT
Hypothesis
Nilanjan Bhaumik
 
PPT
HYPOTHESIS TESTING.ppt
sadiakhan783184
 
PPTX
Point and Interval Estimation
Shubham Mehta
 
DOCX
Estimation in statistics
Rabea Jamal
 
PDF
Hypothesis testing an introduction
Geetika Gulyani
 
PPT
The sampling distribution
Harve Abella
 
PPTX
Sampling Distributions
DataminingTools Inc
 
PDF
Ordinal logistic regression
Dr Athar Khan
 
Statistical inference
Jags Jagdish
 
Statistical inference concept, procedure of hypothesis testing
AmitaChaudhary19
 
Statistical distributions
TanveerRehman4
 
Kolmogorov Smirnov good-of-fit test
Rizwan S A
 
Binary Logistic Regression
Seth Anandaram Jaipuria College
 
Discriminant analysis
Amritashish Bagchi
 
INFERENTIAL STATISTICS: AN INTRODUCTION
John Labrador
 
Estimation
Mmedsc Hahm
 
Hypothesis testing ppt final
piyushdhaker
 
Sampling Distribution
Cumberland County Schools
 
Testing of hypotheses
RajThakuri
 
Hypothesis
Nilanjan Bhaumik
 
HYPOTHESIS TESTING.ppt
sadiakhan783184
 
Point and Interval Estimation
Shubham Mehta
 
Estimation in statistics
Rabea Jamal
 
Hypothesis testing an introduction
Geetika Gulyani
 
The sampling distribution
Harve Abella
 
Sampling Distributions
DataminingTools Inc
 
Ordinal logistic regression
Dr Athar Khan
 

Similar to Inferential statistics-estimation (20)

PPTX
Inferential Statistics-Part-I mtech.pptx
ShaktikantGiri1
 
PPTX
estimation.pptx
NaolAbebe8
 
PPT
Chapter 7 note Estimation.ppt biostatics
mohammedibrahim237048
 
PPT
L estimation
Mmedsc Hahm
 
PPT
10주차
Kookmin University
 
PPTX
statistical inference.pptx
SoujanyaLk1
 
PPT
STATS_Q4_1.powerpoint presentation in....
jayarvidor5
 
PDF
Estimation and hypothesis testing (2).pdf
MuazbashaAlii
 
PPT
Statistik 1 7 estimasi & ci
Selvin Hadi
 
PPTX
6. point and interval estimation
ONE Virtual Services
 
PPTX
Confidence interval statistics two .pptx
AyeshaShahidKayani
 
PPTX
inferencial statistics
anjaemerry
 
PPTX
3. Statistical inference_anesthesia.pptx
Abebe334138
 
PPTX
. Estimation Of Parameters presentation pptx
PrinceShahzaib4
 
PDF
Business statistics-i-part2-aarhus-bss
Antonio Rivero Ostoic
 
PPTX
5..theory of estimatio..n-converted.pptx
CharuNangia
 
PPTX
Business Analytics _ Confidence Interval
Ravindra Nath Shukla
 
PDF
Biostatics part 7.pdf
NatiphBasha
 
PPTX
Chapter 8 review
drahkos1
 
PPT
MTH120_Chapter9
Sida Say
 
Inferential Statistics-Part-I mtech.pptx
ShaktikantGiri1
 
estimation.pptx
NaolAbebe8
 
Chapter 7 note Estimation.ppt biostatics
mohammedibrahim237048
 
L estimation
Mmedsc Hahm
 
statistical inference.pptx
SoujanyaLk1
 
STATS_Q4_1.powerpoint presentation in....
jayarvidor5
 
Estimation and hypothesis testing (2).pdf
MuazbashaAlii
 
Statistik 1 7 estimasi & ci
Selvin Hadi
 
6. point and interval estimation
ONE Virtual Services
 
Confidence interval statistics two .pptx
AyeshaShahidKayani
 
inferencial statistics
anjaemerry
 
3. Statistical inference_anesthesia.pptx
Abebe334138
 
. Estimation Of Parameters presentation pptx
PrinceShahzaib4
 
Business statistics-i-part2-aarhus-bss
Antonio Rivero Ostoic
 
5..theory of estimatio..n-converted.pptx
CharuNangia
 
Business Analytics _ Confidence Interval
Ravindra Nath Shukla
 
Biostatics part 7.pdf
NatiphBasha
 
Chapter 8 review
drahkos1
 
MTH120_Chapter9
Sida Say
 
Ad

More from Southern Range, Berhampur, Odisha (20)

PDF
Growth and Instability in Food Grain production in Oisha
Southern Range, Berhampur, Odisha
 
PDF
Growth and Instability of OIlseeds Production in Odisha
Southern Range, Berhampur, Odisha
 
DOCX
Statistical thinking and development planning
Southern Range, Berhampur, Odisha
 
DOCX
CAN PLEASURE BE A JOURNEY RATHER THAN A STOP?
Southern Range, Berhampur, Odisha
 
PPTX
Monitoring and Evaluation
Southern Range, Berhampur, Odisha
 
PDF
A Simple Promise for Happiness
Southern Range, Berhampur, Odisha
 
PPT
Measures of mortality
Southern Range, Berhampur, Odisha
 
PPT
Understanding data through presentation_contd
Southern Range, Berhampur, Odisha
 
PPT
Nonparametric and Distribution- Free Statistics _contd
Southern Range, Berhampur, Odisha
 
PPT
Nonparametric and Distribution- Free Statistics
Southern Range, Berhampur, Odisha
 
PDF
Simple and Powerful promise for Peace and Happiness
Southern Range, Berhampur, Odisha
 
PPTX
Understanding Stress, Stressors, Signs and Symptoms of Stress
Southern Range, Berhampur, Odisha
 
PPT
Simple linear regressionn and Correlation
Southern Range, Berhampur, Odisha
 
PPT
Sampling methods in medical research
Southern Range, Berhampur, Odisha
 
PPT
Probability concept and Probability distribution_Contd
Southern Range, Berhampur, Odisha
 
PPT
Probability concept and Probability distribution
Southern Range, Berhampur, Odisha
 
PPT
Measures of dispersion
Southern Range, Berhampur, Odisha
 
PPT
Measures of central tendency
Southern Range, Berhampur, Odisha
 
Growth and Instability in Food Grain production in Oisha
Southern Range, Berhampur, Odisha
 
Growth and Instability of OIlseeds Production in Odisha
Southern Range, Berhampur, Odisha
 
Statistical thinking and development planning
Southern Range, Berhampur, Odisha
 
CAN PLEASURE BE A JOURNEY RATHER THAN A STOP?
Southern Range, Berhampur, Odisha
 
Monitoring and Evaluation
Southern Range, Berhampur, Odisha
 
A Simple Promise for Happiness
Southern Range, Berhampur, Odisha
 
Measures of mortality
Southern Range, Berhampur, Odisha
 
Understanding data through presentation_contd
Southern Range, Berhampur, Odisha
 
Nonparametric and Distribution- Free Statistics _contd
Southern Range, Berhampur, Odisha
 
Nonparametric and Distribution- Free Statistics
Southern Range, Berhampur, Odisha
 
Simple and Powerful promise for Peace and Happiness
Southern Range, Berhampur, Odisha
 
Understanding Stress, Stressors, Signs and Symptoms of Stress
Southern Range, Berhampur, Odisha
 
Simple linear regressionn and Correlation
Southern Range, Berhampur, Odisha
 
Sampling methods in medical research
Southern Range, Berhampur, Odisha
 
Probability concept and Probability distribution_Contd
Southern Range, Berhampur, Odisha
 
Probability concept and Probability distribution
Southern Range, Berhampur, Odisha
 
Measures of dispersion
Southern Range, Berhampur, Odisha
 
Measures of central tendency
Southern Range, Berhampur, Odisha
 
Ad

Recently uploaded (20)

PPTX
Modern analytical techniques used to characterize organic compounds. Birbhum ...
AyanHossain
 
PPTX
classroom based quiz bee.pptx...................
ferdinandsanbuenaven
 
PPTX
ROLE OF ANTIOXIDANT IN EYE HEALTH MANAGEMENT.pptx
Subham Panja
 
PDF
1, 2, 3… E MAIS UM CICLO CHEGA AO FIM!.pdf
Colégio Santa Teresinha
 
PPTX
Nutrition Month 2025 TARP.pptx presentation
FairyLouHernandezMej
 
PPT
digestive system for Pharm d I year HAP
rekhapositivity
 
PPTX
Maternal and Child Tracking system & RCH portal
Ms Usha Vadhel
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PPTX
SAMPLING: DEFINITION,PROCESS,TYPES,SAMPLE SIZE, SAMPLING ERROR.pptx
PRADEEP ABOTHU
 
PPTX
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
PPTX
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
PPTX
How to Manage Promotions in Odoo 18 Sales
Celine George
 
PPTX
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
PPTX
Nutri-QUIZ-Bee-Elementary.pptx...................
ferdinandsanbuenaven
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PPTX
PPT on the Development of Education in the Victorian England
Beena E S
 
PPTX
How to Configure Storno Accounting in Odoo 18 Accounting
Celine George
 
PDF
BÀI TẬP BỔ TRỢ THEO LESSON TIẾNG ANH - I-LEARN SMART WORLD 7 - CẢ NĂM - CÓ ĐÁ...
Nguyen Thanh Tu Collection
 
PPTX
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
PPTX
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
Modern analytical techniques used to characterize organic compounds. Birbhum ...
AyanHossain
 
classroom based quiz bee.pptx...................
ferdinandsanbuenaven
 
ROLE OF ANTIOXIDANT IN EYE HEALTH MANAGEMENT.pptx
Subham Panja
 
1, 2, 3… E MAIS UM CICLO CHEGA AO FIM!.pdf
Colégio Santa Teresinha
 
Nutrition Month 2025 TARP.pptx presentation
FairyLouHernandezMej
 
digestive system for Pharm d I year HAP
rekhapositivity
 
Maternal and Child Tracking system & RCH portal
Ms Usha Vadhel
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
SAMPLING: DEFINITION,PROCESS,TYPES,SAMPLE SIZE, SAMPLING ERROR.pptx
PRADEEP ABOTHU
 
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
How to Manage Promotions in Odoo 18 Sales
Celine George
 
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
Nutri-QUIZ-Bee-Elementary.pptx...................
ferdinandsanbuenaven
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PPT on the Development of Education in the Victorian England
Beena E S
 
How to Configure Storno Accounting in Odoo 18 Accounting
Celine George
 
BÀI TẬP BỔ TRỢ THEO LESSON TIẾNG ANH - I-LEARN SMART WORLD 7 - CẢ NĂM - CÓ ĐÁ...
Nguyen Thanh Tu Collection
 
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 

Inferential statistics-estimation

  • 1. Lecture Series on Biostatistics Inferential Statistics- Estimation By Dr. Bijaya Bhusan Nanda, M. Sc (Gold Medalist) Ph. D. (Stat.) Topper Orissa Statistics & Economics Services, 1988 bijayabnanda@yahoo.com
  • 2. CONTENTS  Introduction  Confidential interval for a population mean  The t distribution  Confidence interval for the difference between two population mean  Confidence interval for a population proportion  Confidence interval for the difference between two population proportion
  • 3. Introduction Statistical Inference  It is the procedure by which we reach a conclusion about a population on the basis of information contained in the sample drawn from that population.  Two broad areas of statistical inference  Estimation  Hypothesis testing.  Estimation- It is the process of calculating some statistics based upon the sample data drawn from a certain population. The statistics is used as an approximation to the population parameter.
  • 4. Types of Estimate For each of parameters, we can have  Point Estimate  Interval estimate  A point estimate is a single numerical value used to estimate the corresponding population parameter. An interval estimate consists of two numerical values defining a range of values that, with a specified degree of confidence, includes the parameter being estimated.
  • 5. Choosing an Appropriate Estimation  A single computed value is an estimate.  An estimator usually presented as a formula. For example x= ∑ xi n is the estimator of population mean  The single numerical value that results from evaluating this formula is called an estimate of the parameter µ.  E(T) is obtained by taking the average value of T computed from all possible sample i.e. E(T)= µT
  • 6. Criteria of a good estimator: There can be more than one estimator for the parameter. For example population mean can be estimated by the sample median or sample mean. But a good estimator should fulfill certain criteria. One such criterion of a good estimator is unbiasedness. An estimator T of the parameter θ is said to be an unbiased estimator of θ if E(T)= θ  Sample mean is an unbiased estimator of population mean.  Sample proportion is an unbiased estimate of population proportion.  The difference between two sample means is an unbiased estimate of difference between the population mean. The difference between two sample proportions is an unbiased estimate of difference between the population proportion.
  • 7. Sampled Population and Target Population  Sampled population is the population from which one actually draws a sample.  The target population is the population about which one wishes to make inference.  Statistical inference procedure allows one to make inference about sampled population (provided proper sampling methods have been employed)  Only when sampled population and the target population are the same, it is possible to reach statistical inference about the target population. Random and Non random Samples  The strict validity of the statistical procedures discussed depends on the assumptions of random sample.
  • 8.  In real world applications it is impossible or impractical to use truly random samples.  If the researchers had to depend on randomly selected materials, very little research of this type would be conducted.  Therefore, non statistical considerations must play a past in the generalizations process.  Researchers may contend that samples actually used are equivalent to simple random samples, since there is no reasons to believe that material actually used is not representative of the population about which inferences are desired.  In many health research projects, samples of convenience, rather than random samples are employed.
  • 9.  Generalization must be made on non statistical consideration.  Consequences of such generalization, however may range from misleading to disastrous.  In some situation it is possible to introduce randomization into experiment even though available subjects are not randomly selected from well defined population.  Example: Allocating subjects to treatments in a random manner.
  • 10. Confidential interval for a population mean  Say x the sample mean from a random sample of size n is drawn from a normally distributed population.  x is an unbiased estimator of the population mean  () E x =µ  But because of sampling fluctuation x can’t be expected to be equal to µ.  Therefore, we should have an interval estimate µ.  For the interval estimate we must have knowledge of sampling distribution of x .  Sampling distribution of x for a normal population is as follows x ≈ N ( µ , σ )i.e.N ( µ ,σ ) n x x
  • 11.  Following the normal probability distribution the interval estimate for µ is as follows 95% confidence interval for µ = µ ± 2σ x since we don’t know µ with the help of sample mean x we have the 95% confidence interval for µ as x ± 2σx = x ± 2 σ n If we don’t know σ the 95% confidence interval for µ = x ± 2 s / n where s= the standard deviation based on the sample.
  • 12. Interval estimate Component  In general an interval estimate may be expressed as follows estimator ± (reliability coefficient)× (standard error)  In particular when sampling is from a normal distribution with known variance, an interval estimator for µ may be expressed as x±z σ (1 − α / 2 ) x where z(1-α /2) is the value of z to the left of which lies 1- α /2 and to the right of which lies α /2 of the area under its curve.
  • 13. Interpreting Confidence Interval:  Probabilistic Interpretation: In repeated sampling, from a normally distributed population with a known standard deviation, 100(1- α ) percent of all intervals of the form x±z σ (1 − α / 2 ) x will in the long run include the population mean µ .  Practical Interpretation: When sampling is from a normally distributed population with known standard deviation, we are 100(1- α ) % confident that the single computed interval, population mean µ . x±z σ (1 − α / 2 ) x , contains the  Precision: The quantity obtained by multiplying the reliability factor by the standard error of the mean is called the precision of the estimate. This quantity is also called the margin of error.
  • 14. Example A physical therapist wished to estimate, with 99 percent confidence , the mean maximum strength of a particular muscle in a certain group of individuals. He is willing to assume that strength scores are approximately normally distributed with a variance of 144. A sample of 15 subjects who participated in the experiment yielded a mean of 84.3. Solution: The z value corresponding to a confidence coefficient of 0.99 is found in Table D to be 2.58. This is our reliability coefficient. The standard error is σx = 12 = 3.0984 .Our 99% 15 84.3 ± 2.58(3.0984) Confidence interval for µ, then 84.3 ± 8.0 76.3,92.3
  • 15. We say we are 99% confidence that the population mean is between 76.3 and 92.3 since, in repeated sampling, 99% of all intervals that could be constructed in the manner just described would include the population mean.
  • 16. The t distribution  Estimation of confidence interval using standard normal z distribution applies when population variance is known. Usually, the knowledge of population variance is not known.  This condition presents a problem to construct confidence intervals. x −µ Although the statistic z = σ is normally distributed n or approximately normally distributed when n is large, regardless of the functional form of the population, we can’t make use of this fact because σ is not known.
  • 17.  In this case we may use of s= ∑ ( xi − x ) 2 ( n − 1) as an estimator of σ. When sample size is large, say, greater than 30, s is a good approximate of σ is substantial and we may use normal distribution theory to construct confidence interval.  When the sample size is small, ≤30, we make use of Student’s t distribution. x−µ t= The quantity s follows this distribution. n
  • 18. Properties of the t distribution:  It has a mean of 0.  Symmetrical about the mean.  In general, it has a variance greater than 1, but the variance approaches 1 as the sample size becomes large. For df >2, the variance for t distribution is df /(df-2).  The variable t ranges from -∞ to + ∞ .  The t distribution is really a family of distributions, since there is a different distribution for each sample value of n-1, the divisor used in computing s2. We recall that n-1 is referred to as degrees of freedom.  Compared to normal distribution is less peaked at center and has higher tails t distribution approaches the normal distribution an n-1 approaches infinity.  The t distribution approaches the normal distribution as n-1 approaches infinity.
  • 19. The t distribution for different degrees of freedom Degrees of freedom=30 Degrees of freedom=5 Degrees of freedom=2 Comparison of normal distribution and t distribution Normal Distribution t distribution
  • 20. Confidence Intervals Using t: estimator ± (reliability coefficient) (standard error) error The source of reliability co-efficient is ‘t’ distribution rather than standard normal z distribution.  When sampling is from a normal distribution whose standard deviation , σ , is unknown the 100(1-α )% confidence interval for the population mean µ , is given by s x ± t (1 − α / 2 ) n  A moderate departures of the sampled population from the normally can be tolerated in using the ‘t’ distribution. An assumption of mound shaped population distribution be tenable.
  • 21. Example: Maureen McCauley conducted a study to evaluate the effect of on the job body mechanics instruction on the work performance of newly employed young workers (A-1). She used two randomly selected groups of subjects, an experimental group and a control group. The experimental group received one hour of back school training provided on occupational therapist. The control group did not receive this training. A criterion referenced Body Mechanics Evaluation Checklist was used to evaluate each workers lifting, lowering, pulling, and transferring of objects in the work environment. A correctly performed task received a score of 1. The 15 control subjects made a mean score of 11.53 on the evaluation with a standard deviation of 3.681. We assume that these 15 controls behave as a random sample from a population of similar subjects. We wish to use these sample data to estimate the mean score fro the population.
  • 22. Solution: We may assume sample mean, 11.53, as a point estimate of population mean but since the population standard deviation is unknown, we must assume the population of values to be at least approximately normally distributed before constructing a confidence interval for µ . Let us assume an assumption is reasonable and that a 95% confidence interval is desired. Our estimator x is and our standard error is s n = 3.681 = 0.9564 15 We need to find reliability coefficient, the value of t associated with a confidence coefficient of 0.95 and n- 1=14 degrees of freedom.
  • 23. Confidence interval is equally divided in to two tails i.e.0.025 .In Table E the tabulated value of t is 2.1448, which is our reliability coefficient. Then our 95% confidence interval is as follows 11.53 ± 2.1448(0.9504) 11.53 ± 2.04 9.49,13.57 Deciding Between z and t: To make an appropriate choice between z and we must consider whether the sampled population is normally distributed, and whether the population variance is known.
  • 24. Flowchart for use in deciding between z and t when making inferences about population means Normal Y Population N Large Large Y Sample N Y Sample N Populn Y Populn N Y Populn N Y Populn N Y N var var var var known? known? known? known? z t z t z z * * z Central limit theorem applies * = use nonparametric Procedure
  • 25. Confidence interval for the difference between two population means Normal Population  Say µ -µ = the difference between two normal population 1 2 means.  x −x 1 2 = the difference between two independent sample means drawn from the two populations respectively. E ( x 1 − x 2) = µ1 − µ 2 σ1 σ 2 2 V ( x 1 − x 2) = + 2 n1 n 2 Where σ12 & σ22 are the variance of the population 1&2 and n1, n2 are the size of the SRS drawn from population 1&2 respectively.
  • 26.  When the population variances are known, the 100(1- α)% confidence interval for µ1& µ2 is given by σ σ 2 2 (x − x ) ± z + 1 2 1 2 (1 − α / 2 ) n 1 n 2  When the confidence interval includes zero, we say that the population means may be equal when it doesn’t includes zero we say that the interval provides evidence that the two population means are not equal.
  • 27. Example: A research team is interested in the difference between serum uric acid levels in patients with and without Down’s syndrome. In a large hospital for the treatment of the mentally retarded, a sample of 12 individuals with Down’s syndrome yielded a mean of 4.5mg/100ml. In a general hospital a sample of 15 normal individuals of the same age and sex were found to have a mean value of 3.4. If it is reasonable to assume that the two populations of values are normally distributed with variances equal to 1 and 1.5, find the 95% confidence interval for µ1- µ2. Solution: For a point estimate of µ1- µ2, we use x 1 − x 2 = 4.5 − 3.4 = 1.1
  • 28. The reliability coefficient corresponding to 0.95 is found in Table D to be 1.96. The standard error σ 2 σ 1 1 .5 2 σ ( x 1 − x 2) = +1 = +2 = 0.4282 n1 n2 12 15 The 95 percent confidence interval, then, is 1.1 ±1.96(0.4282) 1.1 ± 0.84 0.26,1.94 The difference µ1- µ2, is some where between 0.26 and 1.94, because, in repeated sampling, 95% of the intervals constructed in this manner would include the difference between the true means. Since the interval does not include zero, we conclude that the two population means are not equal.
  • 29. Sampling from Nonnormal Population: If sample sizes n1 and n2 are large, we may construct the confidence interval the same way as we do incase of normal population in accordance with the Central Limit Theorem. Example: Motivated by an awareness of the existence of a body of controversial literature suggesting that stress, anxiety, and depression are harmful to the immune system, Gorman et al. (A-5) conducted a study in which the subjects were homosexual men and some of whom were HIV positive and some of whom were HIV negative. Data were collected on a wide variety of medical, immunological, psychiatric, and neurological measures, one of which was the number of CD4+ cells in the blood. The mean number of CD4+ cells for the 112 men with HIV infection was 401.8 with a standard deviation of 226.4. For the 75 men without HIV infection the mean and standard deviation were 828.2 and 274.9, respectively. We wish to construct a 99% confidence interval for the difference between population means.
  • 30. Solution: Here we are using z statistic as the reliability factor in the construction of our confidence interval. Since the population standard deviation are not given, we will use the sample standard deviations to estimate them. The point estimate for the difference between population means is the difference between sample means, 882.2-401.8=426.4. In Table D, we find the reliability factor to be 2.58. The estimated standard error is 2 2 274.9 226.4 sx − x = + = 38.279 75 112 1 2
  • 31. Our 99% confidence interval for the difference between population means is 426.4 ± 2.58(38.279) 327.6,525.2 We are 99% confident that the mean number of CD4+ cells in HIV +ve males differs from the mean for HIV –ve males by some where between 327.6 and 525.2.
  • 32. Confidence interval using ‘t’ distribution:  Say the variances of the two sampled population are not known.  The two sampled population are normally distributed.  The difference between the two population means can be estimated through confidence interval using ‘t’ distribution as the source of reliability factor. Situation-I-Equal population variances: The 100(1-α)% confidence interval for µ1- µ2 is given by 2 2 s s ( x 1 − x 2) ± t (1 − α / 2 ) p + p n1 n2 If the two sample sizes are unequal , the weighted average takes advantages of the additional information provided by the larger sample. The pooled estimate is given by
  • 33. ( n1 −1) s + ( n 2 −1) s 2 2 s 2 p = 1 2 n1 + n 2 − 2 The standard error of the estimate , then, is given by 2 2 s s Sx 1 − x2 = p + p n1 n2 The ‘t’ follows a Student’s t distribution with n1+n2-2 d.f. Example: The purpose of a study by Stone et al.(A-6) was to determine the effects of long-term exercise intervention on corporate executives enrolled in a supervised fitness program.
  • 34. Data were collected on 13 subjects (the exercise group) who voluntarily entered a supervised exercise program and remained active for an average of 13 years and 17 subjects (the secondary group) who elected not to join the fitness program. Among the data collected on the subjects was maximum number of sit-ups completed in 30 seconds. The exercise group had a mean and standard deviation for this variable of 21.0 and 4.9 respectively. The mean and standard deviation for the secondary group were 12.1 and 5.6 respectively. We assume that the two population of overall muscle condition measures are approximately normally distributed and that the two population variances are equal. We wish to construct a 95% confidence interval for the difference between the means of the populations represented by these two samples. Solution: The 95% confident that the difference between population means is somewhere between 4.9, 12.9 .
  • 35. Situation-II-Unequal population variances: An approximate 100(1-α)% confidence interval for µ1- µ2 is given by 2 s1 s 2 2 x 1 − x 2 ± t ′(1 − α / 2 ) + n1 n2 The solution proposed by Cochran consists of computing the reliability factor t ′ = w1t1 + w2 t 2 (1−α 2) w1 + w2 Where w1=s12/n1, w2=s22/n2, t1=t1-α/2 for n1-1 degrees of freedom, and t2=t1- α/2 for n2-1 degrees of freedom. Example: in the study by Stone et al.(A-6)described in previous example , the investigator also reported the following information on a measure of overall muscle condition scores made by the subjects:
  • 36. Sample n Mean S.D Exercise 13 4.5 0.3 group Sedentary 17 3.7 1.0 group We assume that the two populations of overall muscle condition scores are approximately normally distributed. We are unwilling to assume, however, that the two population variances are equal. We wish to construct a 95% confidence interval for the difference between the mean overall muscle condition scores of the two populations represented by the samples. Solution: The 95% confidence interval for the difference between the two population means when they are not equal is 0.25, 1.34
  • 37. Flowchart in deciding whether the reliability factor should be z, t, or t’ when making inferences about the difference between two population means Normal Y Population N Y Large N Large Sample Y N Sample Populn Populn Populn Populn Y N Y N Y N Y N var var var var known? known? known? known? =? =? =? =? =? =? =? =? Y N Y N Y N Y N Y N Y N Y N Y N z z t t’ z z t t’ z z z t’ * * * * or or * = Nonparametric test to be applied z z
  • 38. Confidence interval for a population proportion  Population proportion in many situations is a matter of great interest to the health professional.  What proportion of people suffers from different diseases, disability etc.? what proportion of patience response to a particular treatment? These are certain important questions for the health purpose.  We estimate the population proportion as in the case of population mean.  A random sample is drawn from the population of interest and the sample proportion p provides an unbiased estimate ˆ of the population proportion P.  A confidence interval is obtained by the general formula estimator ± (reliability coefficient) × (standard error)  when both np and n(1-p) are greater than 5, the sampling distribution of p is quite close to the normal distribution. ˆ
  • 39. When above condition is mate100(1-α) % confidence interval for p is given by p ± z (1 − α / 2 ) ˆ p(1 − p ) ˆ ˆ Example: n Mathers et al.(A-12) found that in a sample of 591 patients admitted to a psychiatric hospital, 204 admitted to using cannabis at least once in their lifetime. We wish to construct a 95% confidence interval for the proportion of lifetime cannabis users in the sampled population of psychiatric hospital admission. Solution: The 95% confident that the population proportion p is between 0.3069, 0.3835
  • 40. Confidence interval for the difference between two population proportion  An unbiased point estimator of the difference between two population proportions (p1-p2) is provided by the difference ˆ ˆ between sample proportion p1 − p 2 .  When n1 and n2(sample size) are large and the population proportions are not too close to 0 and 1, normal distribution theory may be applied to obtain confidence interval.  A 100(1-α)% confidence interval for p1-p2 is given by p1 (1 − p1) ˆ ˆ p 2 (1 − p 2 ) ˆ ˆ p 2 − p 2 ± z (1 − α / 2 ) ˆ ˆ + n1 n2
  • 41. Example: Borst et al. (A-16) investigated the relation of ego development, age, gender and diagnosis to suicidality among adolescent psychiatric inpatients. There sample consisted of 96 boys and 123 girls between the age of 12 and 16 selected from admissions to a child and adolescent unit of a private psychiatric hospital. Suicide attempts were reported by 18 of the boys and 60 of the girls. Let us assume the girls and the boys behave like a simple random sample from a population of similar girls and that the boys likewise may be considered a SRS from a population of similar boys. For these two population, we wish to construct a 99% confidence interval for the difference between between the proportions of suicide attempters. Solution: The 99% confident that for the sampled populations , the proportion of suicide attempts among girls exceeds the proportion of suicide attempts among boys by somewhere between 0.1450, 0.4556
  • 42. Example: Goldberg et al. (A-24) conducted a study to determine if an acute dose of dextroamphetamine might have positive effects on affect and cognition in schizophrenic patients maintained on a regimen of haloperidol. Among the variables measured was the change in patients’ tension- anxiety states. For n2=4 patients who responded to amphetamine, the standard deviation was 5.8. Let us assume that these patients constitute independent SRS from populations of similar patients. Let us also assume that change scores in tension-anxiety state is a normally distributed variable in both populations. We wish to construct a 95% confidence interval for the ratio of the variances of these two populations. Solution: σ1 2 0.2081< σ2 <14.0554 2