SlideShare a Scribd company logo
By
Dr Muhammad Safdar Baig
Associate Professor
Oral & Dental Surgery
BVH/QMC, Bahawalpur
For
Post Graduate Trainees
Bahawal Victoria Hospital &
Quaid-e-Azam Medical College, Bahawalpur 1
2
safdarbeg@gmail.com
safdar_b@yahoo.com
safdar_b@hotmail.com
03006821103
Kuzma & Rosner – Biostat
Leon Gordis – Fundamentals of
Epidemiology
3
 Collection
 Analysis &
 Interpretation
So as to find Solutions to
a problem.
4
RESEARCH is
A Process of Systematic,
Scientific Data
5
6
7
A Variable is a characteristic of a person, object or
phenomenon that can take on different values.
A simple example of a variable is a person’s age.
The variable age can take on different values
because a person can be 20 years old, 35 years
old, and so on.
8
Dependent and independent variables
Because in health system research you often look for
causal explanations, It is important to make distinction
between dependent and independent variables.
The variable that is used to describe or measure the
problem under study (outcome) is called the
DEPENDENT variable.
The variables that are used to describe or measure the
factors that are assumed to cause or at least to
influence the problem are called the INDEPENDENT
(exposure) variables.
9
Data
Data are values of the observation recorded
for variables (e.g. age, weight, gender).
10
TYPES of DATA
Qualitative or categorical data:-
The characteristic which can’t be expressed numerically like sex,
ethnicity , healing etc.
Quantitative data or numerical data:-
The characteristic which can be expressed numerically like age,
temperature, no. of children in a family.
Categorical Data
There are two types of categorical data:
• Nominal
• Ordinal data.
11
NOMINAL DATA
 In NOMINAL DATA, the variables are divided into
named categories. These categories however, cannot be
ordered one above another (as they are not greater or less
than each other).
 Example:
NOMINAL DATA CATEGORIES
Sex/ Gender: male, female
Marital status: single, married, widowed,
separated, divorced
12
ORDINAL DATA
 In ORDINAL DATA, the variables are also divided into a
number of categories, but they can be ordered one above
another, from lowest to highest or vice versa.
 Example:
ORDINAL DATA CATEGORIES
Level of knowledge: good, average, poor
Level of blood pressure: high, moderate, low
13
Presentation of Data
 Data once collected should be presented in a such a way
as to be easily understood . The style of presentation
depends, of course, on type of data.
 Data can be presented in as frequency tables, charts,
graphs, etc. Here we would discuss some of the
important means of presentation.
14
FREQUENCY TABLES
 In a FREQUENCY TABLE data is
presented in a tabular form. It gives the
frequency with which (or the number of
times) a particular value appears in the
data.
15
Systolic Blood Pressure of patients coming to a
tertiary care hospital OPD
Distribution Frequency Relative Cumulative
Relative
Below 100 6 0.10 0.10
100 – 120 9 0.15 0.25
121 – 140 24 0.40 0.65
141 – 160 15 0.25 0.90
Above 160 6 0.10 1.00
n = 60
16
Graphs
 Another way to summarize and display data is
through the use of graph or pictorial
representations of numerical data. Graphs should
be designed so that they convey at a single glance
the general patterns in a set of data.
17
Bar charts
 Bar charts are used for nominal or ordinal data.
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990
Years
No.
of
cigarettes
Cigarette consumption of persons 18 years
of age or older, United States, 1900 - 1990
18
Pie chart
 Pie charts can also be used to display nominal or ordinal data.
Male
70%
Female
30%
Gender distribution
19
Histogram
A histogram depicts a frequency distribution for quantative data
Histogram showing distribution of Age (years)
20
SUMMARIZATION
OF DATA
21
MEASURES OF CENTRAL TENDENCY
•Mean
•Median
•Mode
22
MEAN
 The MEAN (or arithmetic mean) is also
known as the AVERAGE. It Is calculated by
totaling the results of all the observations
and dividing by the total number of
observations. Note that the mean can only
be calculated for numerical data.
23
MEDIAN
The MEDIAN is the value that divides a distribution
into two equal halves.
 The median is useful when some measurements are much
bigger or much smaller than the rest. The mean of such
data will be biased toward these extreme values.
 The median is not influenced by extreme values.
24
MODE
 The MODE is the most frequently occurring
value in a set of observations.
25
MEASURES OF VARIATION
Range is defined as the difference in value between the
highest (maximum) and the lowest (minimum) observation
Variance Quantifies the amount of variability or spread
about the mean of the sample.
Standard deviation it is the square root of the variance
26
Standard Deviation
 The STANDARD DEVIATION is a measure, which
describes how much individual measurements differ,
on the average, from the mean.
 A large standard deviation shows that there is a wide
scatter of measured values around the mean, while a
small standard deviation shows that the individual
values are concentrated around the mean with little
variation among them.
27
Standard error of the mean
When we draw a sample from study population and compute its
sample mean it is not likely to be identical to the population
mean. If we draw another sample from same population and
compute its sample mean, this may also not be identical to the
first sample mean. It probably also differs from the true mean of
the total population from which the sample was drawn this
phenomenon is called sampling variation.
Standard error:- The standard error gives an estimate of the
degree to which the sample mean varies from the population
mean and this measures is used to calculate CI.
28
THE NORMAL DISTRIBUTION
 Many variables have a normal distribution. This is a
bell shaped curve with most of the values clustered
near the mean and a few values out near the tails.
29
 The normal distribution is symmetrical around
the mean. The mean, the median and the mode
of a normal distribution have the same value.
 An important characteristic of a normally
distributed variable is that 95% of the
measurements have value which are
approximately within 2 standard deviations
(SD) of the mean.
30
THE NORMAL DISTRIBUTION
31
Estimation
The process of using sample information to
draw conclusion about the value of a
population parameter is known as
estimation.
32
 A point estimate is a specific numerical value
estimate of a parameter.
 The best point estimate of the population mean µ is
the sample mean
 But how good is a point estimate?
 There is no way of knowing how close the point
estimate is to the population mean
 Statisticians prefer another type of estimate called
an interval estimate
Point Estimate
X
33
Interval Estimate
 An interval estimate of a parameter is an interval or a range of
values used to estimate the parameter
Confidence Level
 The confidence level of an interval estimate of a parameter is
the probability that the interval estimate will contain the
parameter
 Three commonly used confidence levels are 90%, 95% and 99%
 If one desires to be more confident then the sample size must be
larger
34
A Presentation
35
RATIO
 The most basic measure of distribution.
 Obtained by simply dividing one quantity by
another without implying any specific
relationship between the numerator and
denominator, such as the number of
stillbirths per thousand live births.
 In ratio, the numerator & denominator are
mutually exclusive.
36
PROPORTION
 A proportion is a type of ratio in which those who
are included in the numerator must also be
included in the denominator.
 For example: the proportion of women over the age
of 50 who have had a hysterectomy, or the number
of fetal deaths out of the total number of births (live
births plus fetal deaths).
37
RATE
 A rate is a proportion with specifications of time.
There is a distinct relationship between the
numerator and denominator with a measure of
time being an intrinsic part of the denominator.
 For example, the number of newly diagnosed
cases of breast cancer per 100,000 women during
a given year.
38
IMPORTANT POINT
 It is necessary to be very specific about what constitutes
both the numerator and the denominator. In some
circumstances, it is important to make clear whether the
measure represents the number of events or the number
of individuals.
 For example, the frequency of myopia among a
population of school children could represent the
number of affected eyes in relation to total eyes, or the
number of children affected in one or both eyes relative
to all students.
39
PREVALENCE
 Prevalence quantifies the proportion of individuals
in a population who have the disease at a specific
instant and provides an estimate of the probability
(risk) that an individual will be ill at a point in time
 The formula for calculating the prevalence P =
number of existing cases of a disease/ total
population (at a given point in time)
40
POINT PREVALENCE
 Prevalence can be thought of as the status of the
disease in a population at a point in time and as such
is also referred to as point prevalence.
 This "point" can refer to a specific point in calendar
time or to a fixed point in the course of events that
varies in real time from person to person, such as the
onset of menopause or puberty or the third
postoperative day.
41
PERIOD PREVALENCE
 It represents the proportion of cases that exist within a
population at any point during a specified period of time.
 The numerator thus includes cases that were present at
the start of the period plus new cases that developed
during this time
E.g. Frequency of patients receiving Psychiatric Rx
between May 31 – Dec 01 2008
42
INCIDENCE:
 Incidence quantifies the number of new
events or cases of disease that develop in a
population of individuals at risk during a
specified time interval.
43
Cumulative incidence (CI)
 Is the proportion of people who become
diseased during a specified period of time. It
provides an estimate of the probability, or
risk, that an individual will develop a disease
during a specified period of time
CI = No. of new cases of a disease
Total population at risk
44
Issues in the Calculation of Measures of Incidence
 For any measure of disease frequency,
precise definition of the denominator is
essential for both accuracy and clarity. This
is a particular concern in the calculation of
incidence. The denominator of a measure of
incidence should include only those who are
considered "at risk" of developing the
disease.
45
Contd.
 That is, the total population from which
the new cases could arise. Consequently,
those who currently have or have already
had the disease under study or persons
who cannot develop the disease for reasons
such as age, immunization, or prior
removal of the involved organ should be
excluded from the denominator.
46
Special Types of Incidence Rates
MORBIDITY RATE
Is the incidence rate of non fatal cases in the total population at risk
during a specified period of time.
For example, the morbidity rate of tuberculosis (TB) in the U.S. in 1982
can be calculated by dividing the number of nonfatal cases newly
reported during that year by the total U.S. midyear population.
Total no of nonfatal cases of TB in POP at risk
Mid year POP
25,520
231,534,000
= 11.0 per 100,000 population
47
MORTALITY RATE
 It expresses the incidence of deaths in a particular
population during a period of time.
 It is calculated by dividing the number of fatalities
during that period by the total population.
 This can be further divided into cause specific or
all case mortality.
48
Measures of
Association
49
Measures of Association
 Relative risk (cohort study)
 Odds ratio (case control)
50
Cohort Studies
a b
c d
Exposed
Non Exposed
Diseased Non Diseased
a+b
c+d
51
Relative Risk
 Incidence in exposed individuals=a/a+b
Or proportion of exposed people who developed the disease
 Incidence in non-exposed individuals =c/c+d
Or proportion of non exposed people who develop disease
Relative Risk= Incidence in exposed
Incidence in non exposed
RR = a/a+b
c/c+d
52
Calculating the Relative Risk
CHD + CHD – Total
112 176 288
88 224 312
Disease Status
Smoker
Non smoker
Incidence in exposed = a /a+b = 112 / 288= 0.38
Incidence in non exposed = c /c+d= 88 / 312= 0.28
RR= 0.38 / 0.28 = 1.38
53
Interpretation of RR
 Compared to non smokers, the smokers
have a 1.38 times greater risk of
developing CHD
54
Odds Ratio
 Incidence cannot be measured in case control
studies because we start with the diseased
people (cases) and non diseased people
(controls), hence we calculate OR
55
Case Control
a b
c d
Exposed
Non Exposed
Cases Controls
a+b
c+d
b+d
a+c
OR=a/c b/d or ad/bc
56
Passive Smoking & Breast Cancer
Breast
cancer
No Breast
cancer
Total
140 (a) 370 (b) 510
40 (c) 234 (d) 274
Exposed (Passive
Smokers)
Not exposed
Odds=140 / 40=3.5 Odds=370 / 234=1.6
OR=3.5 / 1.6=2.2
Compared to the control, the odds of being a passive smoker are 2.2 >
in Ca breast cases
57
58
59
Bias is:
Any systematic error that results in an incorrect
estimate of the association between the exposure and
outcome. Usually introduced by the experimenter or
the researcher himself due to non-standardized
measuring techniques.
60
Type of Bias:-
Selection Bias
Observation (Information/Misclassification)
Bias
Recall Bias
Interviewers Bias
Lost-to-follow up
61
Can Control Bias:-
In study design through
Choice of study population
Data collection:-
Uniform Source of information
Efficient Questionnaire development
Standardization of measurement technique
Blinding
62
63
The concept of confounding is a central one in the
interpretation of any epidemiological study.
Confounding can be thought of as mixing of the effect
of the exposure under study on the disease with that
of an extraneous factor.
This external factor or variable must be associated
with the exposure and,independent of the exposure
must be a risk factor for the disease.
64
Example of confounding
Smoking MI
age
65
Table 1. Relation of Myocardial infarction (MI)to
Recent Oral Contraceptive (OC) Use.
MI +ve MI -ve Estimated
relative risk
OC
Yes 29 135 =1.68
No 205 1607
Total 234 1742
66
Table2:-Age -specific Relation of Myocardial infarction (MI) to
recent Oral Contraceptive (OC) Use.
Age (yrs) Recent OC
use
MI +ve MI -ve Estimated
age-Specific
relative risk
25 – 29 Yes
No
4
2
62
224
7.2
30 – 34 Yes
No
9
12
33
390
8.9
35 – 39 Yes
No
4
33
26
330
1.5
40 – 44 Yes
No
6
65
9
362
3.7
45 – 49 Yes
No
6
93
5
301
3.9
Total 234 1742
67
Confounding can be controlled in study design
through:
 Restriction
Matching exposure
 Randomization
Confounding can be controlled in analysis
through:
 Stratification
 Multivariate analysis
68
The role of confounding, chance and bias have to be
evaluated in studies appropriate selection of the
population to be studied , with proper study design, so
that the results can be applied to other population i.e.,
they are valid and generalizable.
69
Evaluation of the role of chance consists
of two components:-
1. Hypothesis testing
2. Estimation of the confidence interval
70
71
WHAT IS HYPOTHESIS?
Hypothesis: A testable theory, or
statement of belief used in evaluation
of a population parameter of interest
e.g. Mean or proportion
72
 Suppose a study is being conducted to answer
questions about differences between two regimens for the
management of diarrhea in children:
the sugar based modern ORS and the time-tested indigenous
herbal solution made from locally available herbs.
 One question that could be asked is:
"In the population is there a difference in overall
improvement (after three days of treatment) between the ORS
and the herbal solution?"
73
There could be only two
answers to this question:
Yes
No
74
Null Hypothesis
"There is no difference between the 2 regimens in
term of improvement” (null hypothesis).
A null hypothesis is usually a statement that there
is no difference between groups or that one factor is
not dependent on another and corresponds to the
No answer.
75
Alternative Hypothesis
 "There is a difference in terms of improvement achieved by a
three days treatment with the ORS and that of the herbal
solution" (alternative hypothesis).
 Associated with the null hypothesis there is always another
hypothesis or implied statement concerning the true relationship
among the variables or conditions under study if no is an
implausible answer. This statement is called the alternative
hypothesis and corresponds to the “Yes” answer.
76
TYPES OF ALTERNATE HYPOTHESIS
o Directional
o Non Directional
77
THE NORMAL DISTRIBUTION
78
WHY TEST HYPOTHESIS
Hypothesis testing permits generalization
of an association or a difference obtained
from a sample to the population from which
it came.
Hypothesis testing involves conducting a
test of statistical significance and
quantifying the degree to which sampling
variability may account for the result
observed in a particular study. It entails the
following steps.
79
STEPS IN HYPOTHESIS TESTING
1. Statement of research question in terms of
statistical hypothesis (Null and alternate
hypothesis)
2. Selection of an appropriate level of
significance. The significance level is the
risk we are willing to take that a sample
which showed a difference was misleading.
5% significance level means that we are
ready to take a 5% chance of wrong results.
80
3. Choosing an appropriate statistics
t test, z test for continuous data, chi square for
proportions etc.
Test statistics is computed from the sample
data and is used to determine whether the
null hypothesis should be rejected or
retained.
Test statistics generates p value
STEPS IN HYPOTHESIS TESTING
81
P value: Indicates the probability or likelihood of
obtaining a result at least as extreme as that observed in
a study by chance alone, assuming that there is truly no
association between exposure and outcome under
consideration.
By convention the p value is set at 0.05 level. Thus any
value of p less than or equal to 0.05 indicates that there is
at most a 5% probability of observing an association as
large or larger than that found in the study due to chance
alone given that there is no association between
exposure and outcome. If p value0.05 do not reject the
null hypothesis .
82
4. Performing calculations and obtaining p value
5. Drawing conclusions, rejecting null
hypothesis if the p value is less than the set
significance level
83
SAMPLE SIZE
ESTIMATION
84
Sample size calculations depend on:
1. Type of study.
2. Magnitude of the outcome of interest derived
from previous studies.
3. Type of statistical analysis
required (comparing means or proportions).
4. Level of significance / Power.
85
Sample size for single proportion
depends on:
1. The prevalence of the
condition/attribute of interest.
2. Level of confidence.
3. Margin of error.
86
Example of Sample size calculation for single
proportion
 A local health department wishes to estimate the
prevalence of tuberculosis among children under 5
year of age in a locality. How many children should
be included in the sample so that the prevalence
may be estimated within 5% point of the true value
with 95% confidence, if it is known that the true
rate is unlikely to exceed 20%?
87
Sample size calculation and formula for single
proportion
88
Sample size for single group mean
depends on:
1. The Mean of the
condition of interest.
2. Level of confidence.
3. Margin of error.
89
Example of Sample size calculation for single group
mean
 A district medical officer seeks to estimate the mean
hemoglobin level among pregnant women in his
district. A previous study of pregnant women showed
average hemoglobin level 8.2 g/dl and standard
deviation of 4.2 g/dl. Assuming a sample of pregnant
women is to be selected, how many pregnant women
must be studied if he wanted the estimate should fall
within 1 g/dl with 95% confidence?
90
Sample size calculation and formula for single
group mean
91
Sample size for two proportions
depends on:
1. The prevalence of the condition /
attribute of interest for both groups.
2. Level of confidence.
3. Power of the test.
92
Example of Sample size calculation for two
proportions
 It is believed that the proportion of patient who
develop complications after undergoing one type
of surgery is 5% while the proportion of the
patients who develop complication after a
second type of surgery is 15%. How large should
the sample size be in each of the two groups of
patients if an investigator wishes to detect with a
power of 90%, wether the second procedure has
a complication rate significantly higher than the
first at the 5% level of significance?
93
Sample size calculation and formula for two
proportions
94
Sample size for two group means
depends on:
1. The means/variance for both groups.
2. Level of confidence.
3. Power of the test.
95
Example of Sample size calculation for two
group means
Suppose the true mean systolic blood pressure
(SBP) of 35 to 39 year old OC users is (132.86
mmHg) and standard deviation (15.34 mmHg).
Similarly, for non-OC users, the mean SBP is
(127.44 mmHg) with standard deviation (18.23
mmHg). If we desire to estimate the difference
between 2 groups of equal size, what would be the
minimal sample size required with a power of 80%
at 95% confidence level?
96
Calculator
97
Sample size - Calculation
98
Sample size for sensitivity and specificity
depends on:
1. The prevalence of the
condition/attribute of interest.
2. Estimated sensitivity.
3. Estimated specificity.
4. Level of significance.
5. Margin of error.
99
Example of Sample size calculation for sensitivity
and specificity
 If we want to determine the sensitivity and
specificity of graded compression
ultrasonography in the diagnosis of acute
appendicitis by the gold standard
histopathology. How many patients should be
included in the sample .The prevalence OF
AA is 77% and estimated sensitivity of US is
96.5% and estimated specificity is 94.1% with
95% confidence, if we want to keep margin of
error as 10%?
100
Sample size calculation and formula for sensitivity
and specificity studies
101
Suggested websites for sample size calculators
1.http://www.raosoft.com/samplesize.html
2.http://www.quantitativeskills.com/sisa/calculati
ons/samsize.htm
3.http://www.openepi.com/Menu/OpenEpiMenu.
htm
102
103
Screening
 Screening for disease control can be defined as
the examination of asymptomatic people in
order to classify them as likely or unlikely to
have the disease that is object of screening.
 If done in large groups---mass screening or
population screening.
104
Characteristics of Disease to be Screened
 Disease must pass through preclinical phase
during which it is undiagnosed but detectable
 Early treatment must offer some advantage
105
Validity
 The ability of a test to distinguish between who
has disease and who does not
106
Sensitivity
 Of a test is its ability to detect people who do
have disease.
 If a Test is always positive for all diseased
persons then sensitivity of the Test will be
100%.
107
Specificity
 It is the ability of a Test to detect people who
don’t have disease.
 Thus a Test which is always negative in non-
diseased individuals is called to have 100%
specificity.
108
Validity
a b
c d
Positive
Negative
Diseased Non Diseased
a+b
c+d
FP
T P
FN TN
109
Test
FNA
CA Breast
Positive
CA Breast
Negative
Total
Positive 60 a + + 50 b - + a + b
110
Negative 20 c - + 70 d - - c +d
90
Total 80 a + c 120 b + d a + b + c + d
200
110
 Sensitivity = a x 100 = 60 x100 = 75%
a + c 80
I.e. Test (FNA) is 75% sensitive in detecting disease
 Specificity = d x 100 = 70 x100 = 58%
d + b 120
I.e. Specificity of (FNA) is 58% to detect non- diseased
persons
111
Positive Predictive Value i.e PPV
 PPV = a x 100 = 60 x100 = 55%
a + b 110
I.e. 55% persons are actually suffering from
disease.
PPV  Prevalence
Negative Predictive Value i.e NPV
 NPV = d x 100 = 70 x100 = 78%
c + d 90
I.e. 78% persons are actually free from disease.
112
Test Disease
Present
Disease Not
Present
Total
Positive True Positive
(TP) + +
False Positive
(FP) - +
TP + FP
Negative False Negative
(FN) + -
True Negative
(TN) - -
FN + TN
Total TP +FN TN + FP TP+FP+TN+FN
•Sensitivity = TP x 100
TP + FN
•Specificity = TN x100
TN + FP
•PPV= TP x100
TP + FP
•NPV = TN x100
TN + FN
113
Relationship of Disease Prevalence to PPV
Dis. Prev Test
Results
Disease Not
Disease
Total
1% Positive 99 495 594
Negative 1 9405 9406
Total 100 9900 10,000
PPV = 99/594 = 17%
Example: Sensitivity = 99%; Specificity = 95% In a population of 10, 000
with a disease prevalence of 1%
114
Relationship of Disease Prevalence to PPV
Dis. Prev Test
Results
Disease Not
Disease
Total
5% Positive 495 475 970
Negative 5 9025 9030
Total 500 9500 10,000
PPV = 495 / 970 = 51%
Example: Sensitivity = 99%; Specificity = 95% In a population of 10, 000
with a disease prevalence of 5%
115
Relationship between PPV & Prevalence
 A screening program is most effective and
beneficial if it is directed to a high-risk target
population
 Screening a total population for a relatively
infrequent disease can be very wasteful of
resources and may yield very few previously
undetected cases
116
117
POINTS OF IMPORT IN DESIGNING A QUESTIONNAIRE
 It should be ensured that the format of the questionnaire be
attractive and easy for the respondents to fill, overcrowding or
clutter should be avoided and all questions and pages clearly
numbered
 The questionnaire should not be too long
 To maintain flow of the instrument, questions concerning
major areas should be grouped together
 Simple questions about age, birth date etc should be put at the
beginning to warm up the respondent
118
POINTS OF IMPORT IN DESIGNING A
QUESTIONNAIRE
 Questions should be close ended, possible answers to close
ended questions should be lined vertically, preceded by boxes,
brackets or numbers
Example
How many different medicines do you take daily (check one)
[ ] None
[ ] 1-2
[ ] 3-4
[ ] 5-6
[ ] 7 or more
119
 If more details are required pertaining to a question , then the
filter/skip technique should be used to save time and allow
respondents to avoid irrelevant questions.
Example :Have you ever been told that you have hypertension.
Yes
No
If yes proceed to next question
How long back were you told that you have hypertension
POINTS OF IMPORT IN DESIGNING
A QUESTIONNAIRE
120
POINTS OF IMPORT IN DESIGNING A
QUESTIONNAIRE
 Wordings of questions should be simple and free
from ambiguity, non judgmental and be soliciting
only one response.
 For behaviors that may change overtime specific
time span should be asked for in the question
Example :During the past 12 months how many
doctor visits did you make.
 Always choose a appropriate means of measurement
e.g. score /scales.
121
 Sensitive topic questions should be left for the end
 If similar research instruments are available it may
be a good idea to review and if required borrow
questions.
 Always try to ensure that if questions are to be
asked in any language besides English they shall be
so written too
POINTS OF IMPORT IN DESIGNING
A QUESTIONNAIRE
THANK YOU

More Related Content

Similar to Biostatistics in Research Methodoloyg Presentation.pptx (20)

PDF
1Basic biostatistics.pdf
YomifDeksisaHerpa
 
PPTX
Data Display and Summary
DrZahid Khan
 
PPTX
Biostatistics.pptx
Tawhid4
 
PPTX
bio 1 & 2.pptx
AshenafiTigabu
 
PPTX
Introduction to basics of bio statistics.
AB Rajar
 
PPTX
Applying_basic_health_statstics_2024_final.pptx
gebeyehu5
 
PPTX
Basics of biostatistic
NeurologyKota
 
PPTX
Biostatistics
khushbu mishra
 
PPT
Introduction to Biostatistics.ppt
Tauseef Jawaid
 
PPTX
Biostatistics PowerPoint Presentation...
mc6878266
 
PPTX
Biostatistics
Vaibhav Ambashikar
 
PPTX
Biostatistics khushbu
khushbu mishra
 
PPTX
PARAMETRIC TESTS.pptx
DrLasya
 
PDF
lecture introduction to biostatics 1.pdf
gebeyehu5
 
PPTX
biostatistics basic
jjm medical college
 
PPTX
2-L2 Presentation of data.pptx
ssuser03ba7c
 
PPTX
Basic medical statistics1234567891234567
shrikittu1008
 
PPTX
Basic statistics
GreevaPhilip
 
PPT
Introduction to statistics
Shaamma(Simi_ch) Fiverr
 
PPTX
Descriptive statistics
AKHIL C A CHANGANATH
 
1Basic biostatistics.pdf
YomifDeksisaHerpa
 
Data Display and Summary
DrZahid Khan
 
Biostatistics.pptx
Tawhid4
 
bio 1 & 2.pptx
AshenafiTigabu
 
Introduction to basics of bio statistics.
AB Rajar
 
Applying_basic_health_statstics_2024_final.pptx
gebeyehu5
 
Basics of biostatistic
NeurologyKota
 
Biostatistics
khushbu mishra
 
Introduction to Biostatistics.ppt
Tauseef Jawaid
 
Biostatistics PowerPoint Presentation...
mc6878266
 
Biostatistics
Vaibhav Ambashikar
 
Biostatistics khushbu
khushbu mishra
 
PARAMETRIC TESTS.pptx
DrLasya
 
lecture introduction to biostatics 1.pdf
gebeyehu5
 
biostatistics basic
jjm medical college
 
2-L2 Presentation of data.pptx
ssuser03ba7c
 
Basic medical statistics1234567891234567
shrikittu1008
 
Basic statistics
GreevaPhilip
 
Introduction to statistics
Shaamma(Simi_ch) Fiverr
 
Descriptive statistics
AKHIL C A CHANGANATH
 

More from ssuser40fd68 (7)

PPTX
Carotid space l-lllhddcfggghhhjjjhhhhhggcfc
ssuser40fd68
 
PPTX
Xray physics radiologybbfhshsjwjwkjwjshshszbhzhzhjjjjjjjjjj
ssuser40fd68
 
PPTX
GENITO-GYNAEOBS (Compiled ).pptx
ssuser40fd68
 
PDF
BIRADS_Decoded.pdf
ssuser40fd68
 
PPTX
GIT BRG SALEEM.pptx
ssuser40fd68
 
PPTX
pharynx larynx.pptx
ssuser40fd68
 
PPTX
CNS CPSP MOCK.pptx
ssuser40fd68
 
Carotid space l-lllhddcfggghhhjjjhhhhhggcfc
ssuser40fd68
 
Xray physics radiologybbfhshsjwjwkjwjshshszbhzhzhjjjjjjjjjj
ssuser40fd68
 
GENITO-GYNAEOBS (Compiled ).pptx
ssuser40fd68
 
BIRADS_Decoded.pdf
ssuser40fd68
 
GIT BRG SALEEM.pptx
ssuser40fd68
 
pharynx larynx.pptx
ssuser40fd68
 
CNS CPSP MOCK.pptx
ssuser40fd68
 
Ad

Recently uploaded (20)

PPTX
How to Configure Prepayments in Odoo 18 Sales
Celine George
 
PPTX
PYLORIC STENOSIS: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PPTX
Modern analytical techniques used to characterize organic compounds. Birbhum ...
AyanHossain
 
PDF
water conservation .pdf by Nandni Kumari XI C
Directorate of Education Delhi
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PDF
07.15.2025 - Managing Your Members Using a Membership Portal.pdf
TechSoup
 
PDF
Zoology (Animal Physiology) practical Manual
raviralanaresh2
 
PDF
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
PPTX
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
PPTX
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
PPTX
LEGAL ASPECTS OF PSYCHIATRUC NURSING.pptx
PoojaSen20
 
PPTX
PPT on the Development of Education in the Victorian England
Beena E S
 
PPTX
Blanket Order in Odoo 17 Purchase App - Odoo Slides
Celine George
 
PPTX
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
PDF
FULL DOCUMENT: Read the full Deloitte and Touche audit report on the National...
Kweku Zurek
 
PPTX
HIRSCHSPRUNG'S DISEASE(MEGACOLON): NURSING MANAGMENT.pptx
PRADEEP ABOTHU
 
PDF
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
PPTX
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
PPTX
Presentation: Climate Citizenship Digital Education
Karl Donert
 
PPTX
CLEFT LIP AND PALATE: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
How to Configure Prepayments in Odoo 18 Sales
Celine George
 
PYLORIC STENOSIS: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Modern analytical techniques used to characterize organic compounds. Birbhum ...
AyanHossain
 
water conservation .pdf by Nandni Kumari XI C
Directorate of Education Delhi
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
07.15.2025 - Managing Your Members Using a Membership Portal.pdf
TechSoup
 
Zoology (Animal Physiology) practical Manual
raviralanaresh2
 
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
LEGAL ASPECTS OF PSYCHIATRUC NURSING.pptx
PoojaSen20
 
PPT on the Development of Education in the Victorian England
Beena E S
 
Blanket Order in Odoo 17 Purchase App - Odoo Slides
Celine George
 
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
FULL DOCUMENT: Read the full Deloitte and Touche audit report on the National...
Kweku Zurek
 
HIRSCHSPRUNG'S DISEASE(MEGACOLON): NURSING MANAGMENT.pptx
PRADEEP ABOTHU
 
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
Presentation: Climate Citizenship Digital Education
Karl Donert
 
CLEFT LIP AND PALATE: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Ad

Biostatistics in Research Methodoloyg Presentation.pptx

  • 1. By Dr Muhammad Safdar Baig Associate Professor Oral & Dental Surgery BVH/QMC, Bahawalpur For Post Graduate Trainees Bahawal Victoria Hospital & Quaid-e-Azam Medical College, Bahawalpur 1
  • 2. 2 safdarbeg@gmail.com safdar_b@yahoo.com safdar_b@hotmail.com 03006821103 Kuzma & Rosner – Biostat Leon Gordis – Fundamentals of Epidemiology
  • 3. 3
  • 4.  Collection  Analysis &  Interpretation So as to find Solutions to a problem. 4 RESEARCH is A Process of Systematic, Scientific Data
  • 5. 5
  • 6. 6
  • 7. 7 A Variable is a characteristic of a person, object or phenomenon that can take on different values. A simple example of a variable is a person’s age. The variable age can take on different values because a person can be 20 years old, 35 years old, and so on.
  • 8. 8 Dependent and independent variables Because in health system research you often look for causal explanations, It is important to make distinction between dependent and independent variables. The variable that is used to describe or measure the problem under study (outcome) is called the DEPENDENT variable. The variables that are used to describe or measure the factors that are assumed to cause or at least to influence the problem are called the INDEPENDENT (exposure) variables.
  • 9. 9 Data Data are values of the observation recorded for variables (e.g. age, weight, gender).
  • 10. 10 TYPES of DATA Qualitative or categorical data:- The characteristic which can’t be expressed numerically like sex, ethnicity , healing etc. Quantitative data or numerical data:- The characteristic which can be expressed numerically like age, temperature, no. of children in a family. Categorical Data There are two types of categorical data: • Nominal • Ordinal data.
  • 11. 11 NOMINAL DATA  In NOMINAL DATA, the variables are divided into named categories. These categories however, cannot be ordered one above another (as they are not greater or less than each other).  Example: NOMINAL DATA CATEGORIES Sex/ Gender: male, female Marital status: single, married, widowed, separated, divorced
  • 12. 12 ORDINAL DATA  In ORDINAL DATA, the variables are also divided into a number of categories, but they can be ordered one above another, from lowest to highest or vice versa.  Example: ORDINAL DATA CATEGORIES Level of knowledge: good, average, poor Level of blood pressure: high, moderate, low
  • 13. 13 Presentation of Data  Data once collected should be presented in a such a way as to be easily understood . The style of presentation depends, of course, on type of data.  Data can be presented in as frequency tables, charts, graphs, etc. Here we would discuss some of the important means of presentation.
  • 14. 14 FREQUENCY TABLES  In a FREQUENCY TABLE data is presented in a tabular form. It gives the frequency with which (or the number of times) a particular value appears in the data.
  • 15. 15 Systolic Blood Pressure of patients coming to a tertiary care hospital OPD Distribution Frequency Relative Cumulative Relative Below 100 6 0.10 0.10 100 – 120 9 0.15 0.25 121 – 140 24 0.40 0.65 141 – 160 15 0.25 0.90 Above 160 6 0.10 1.00 n = 60
  • 16. 16 Graphs  Another way to summarize and display data is through the use of graph or pictorial representations of numerical data. Graphs should be designed so that they convey at a single glance the general patterns in a set of data.
  • 17. 17 Bar charts  Bar charts are used for nominal or ordinal data. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 Years No. of cigarettes Cigarette consumption of persons 18 years of age or older, United States, 1900 - 1990
  • 18. 18 Pie chart  Pie charts can also be used to display nominal or ordinal data. Male 70% Female 30% Gender distribution
  • 19. 19 Histogram A histogram depicts a frequency distribution for quantative data Histogram showing distribution of Age (years)
  • 21. 21 MEASURES OF CENTRAL TENDENCY •Mean •Median •Mode
  • 22. 22 MEAN  The MEAN (or arithmetic mean) is also known as the AVERAGE. It Is calculated by totaling the results of all the observations and dividing by the total number of observations. Note that the mean can only be calculated for numerical data.
  • 23. 23 MEDIAN The MEDIAN is the value that divides a distribution into two equal halves.  The median is useful when some measurements are much bigger or much smaller than the rest. The mean of such data will be biased toward these extreme values.  The median is not influenced by extreme values.
  • 24. 24 MODE  The MODE is the most frequently occurring value in a set of observations.
  • 25. 25 MEASURES OF VARIATION Range is defined as the difference in value between the highest (maximum) and the lowest (minimum) observation Variance Quantifies the amount of variability or spread about the mean of the sample. Standard deviation it is the square root of the variance
  • 26. 26 Standard Deviation  The STANDARD DEVIATION is a measure, which describes how much individual measurements differ, on the average, from the mean.  A large standard deviation shows that there is a wide scatter of measured values around the mean, while a small standard deviation shows that the individual values are concentrated around the mean with little variation among them.
  • 27. 27 Standard error of the mean When we draw a sample from study population and compute its sample mean it is not likely to be identical to the population mean. If we draw another sample from same population and compute its sample mean, this may also not be identical to the first sample mean. It probably also differs from the true mean of the total population from which the sample was drawn this phenomenon is called sampling variation. Standard error:- The standard error gives an estimate of the degree to which the sample mean varies from the population mean and this measures is used to calculate CI.
  • 28. 28 THE NORMAL DISTRIBUTION  Many variables have a normal distribution. This is a bell shaped curve with most of the values clustered near the mean and a few values out near the tails.
  • 29. 29  The normal distribution is symmetrical around the mean. The mean, the median and the mode of a normal distribution have the same value.  An important characteristic of a normally distributed variable is that 95% of the measurements have value which are approximately within 2 standard deviations (SD) of the mean.
  • 31. 31 Estimation The process of using sample information to draw conclusion about the value of a population parameter is known as estimation.
  • 32. 32  A point estimate is a specific numerical value estimate of a parameter.  The best point estimate of the population mean µ is the sample mean  But how good is a point estimate?  There is no way of knowing how close the point estimate is to the population mean  Statisticians prefer another type of estimate called an interval estimate Point Estimate X
  • 33. 33 Interval Estimate  An interval estimate of a parameter is an interval or a range of values used to estimate the parameter Confidence Level  The confidence level of an interval estimate of a parameter is the probability that the interval estimate will contain the parameter  Three commonly used confidence levels are 90%, 95% and 99%  If one desires to be more confident then the sample size must be larger
  • 35. 35 RATIO  The most basic measure of distribution.  Obtained by simply dividing one quantity by another without implying any specific relationship between the numerator and denominator, such as the number of stillbirths per thousand live births.  In ratio, the numerator & denominator are mutually exclusive.
  • 36. 36 PROPORTION  A proportion is a type of ratio in which those who are included in the numerator must also be included in the denominator.  For example: the proportion of women over the age of 50 who have had a hysterectomy, or the number of fetal deaths out of the total number of births (live births plus fetal deaths).
  • 37. 37 RATE  A rate is a proportion with specifications of time. There is a distinct relationship between the numerator and denominator with a measure of time being an intrinsic part of the denominator.  For example, the number of newly diagnosed cases of breast cancer per 100,000 women during a given year.
  • 38. 38 IMPORTANT POINT  It is necessary to be very specific about what constitutes both the numerator and the denominator. In some circumstances, it is important to make clear whether the measure represents the number of events or the number of individuals.  For example, the frequency of myopia among a population of school children could represent the number of affected eyes in relation to total eyes, or the number of children affected in one or both eyes relative to all students.
  • 39. 39 PREVALENCE  Prevalence quantifies the proportion of individuals in a population who have the disease at a specific instant and provides an estimate of the probability (risk) that an individual will be ill at a point in time  The formula for calculating the prevalence P = number of existing cases of a disease/ total population (at a given point in time)
  • 40. 40 POINT PREVALENCE  Prevalence can be thought of as the status of the disease in a population at a point in time and as such is also referred to as point prevalence.  This "point" can refer to a specific point in calendar time or to a fixed point in the course of events that varies in real time from person to person, such as the onset of menopause or puberty or the third postoperative day.
  • 41. 41 PERIOD PREVALENCE  It represents the proportion of cases that exist within a population at any point during a specified period of time.  The numerator thus includes cases that were present at the start of the period plus new cases that developed during this time E.g. Frequency of patients receiving Psychiatric Rx between May 31 – Dec 01 2008
  • 42. 42 INCIDENCE:  Incidence quantifies the number of new events or cases of disease that develop in a population of individuals at risk during a specified time interval.
  • 43. 43 Cumulative incidence (CI)  Is the proportion of people who become diseased during a specified period of time. It provides an estimate of the probability, or risk, that an individual will develop a disease during a specified period of time CI = No. of new cases of a disease Total population at risk
  • 44. 44 Issues in the Calculation of Measures of Incidence  For any measure of disease frequency, precise definition of the denominator is essential for both accuracy and clarity. This is a particular concern in the calculation of incidence. The denominator of a measure of incidence should include only those who are considered "at risk" of developing the disease.
  • 45. 45 Contd.  That is, the total population from which the new cases could arise. Consequently, those who currently have or have already had the disease under study or persons who cannot develop the disease for reasons such as age, immunization, or prior removal of the involved organ should be excluded from the denominator.
  • 46. 46 Special Types of Incidence Rates MORBIDITY RATE Is the incidence rate of non fatal cases in the total population at risk during a specified period of time. For example, the morbidity rate of tuberculosis (TB) in the U.S. in 1982 can be calculated by dividing the number of nonfatal cases newly reported during that year by the total U.S. midyear population. Total no of nonfatal cases of TB in POP at risk Mid year POP 25,520 231,534,000 = 11.0 per 100,000 population
  • 47. 47 MORTALITY RATE  It expresses the incidence of deaths in a particular population during a period of time.  It is calculated by dividing the number of fatalities during that period by the total population.  This can be further divided into cause specific or all case mortality.
  • 49. 49 Measures of Association  Relative risk (cohort study)  Odds ratio (case control)
  • 50. 50 Cohort Studies a b c d Exposed Non Exposed Diseased Non Diseased a+b c+d
  • 51. 51 Relative Risk  Incidence in exposed individuals=a/a+b Or proportion of exposed people who developed the disease  Incidence in non-exposed individuals =c/c+d Or proportion of non exposed people who develop disease Relative Risk= Incidence in exposed Incidence in non exposed RR = a/a+b c/c+d
  • 52. 52 Calculating the Relative Risk CHD + CHD – Total 112 176 288 88 224 312 Disease Status Smoker Non smoker Incidence in exposed = a /a+b = 112 / 288= 0.38 Incidence in non exposed = c /c+d= 88 / 312= 0.28 RR= 0.38 / 0.28 = 1.38
  • 53. 53 Interpretation of RR  Compared to non smokers, the smokers have a 1.38 times greater risk of developing CHD
  • 54. 54 Odds Ratio  Incidence cannot be measured in case control studies because we start with the diseased people (cases) and non diseased people (controls), hence we calculate OR
  • 55. 55 Case Control a b c d Exposed Non Exposed Cases Controls a+b c+d b+d a+c OR=a/c b/d or ad/bc
  • 56. 56 Passive Smoking & Breast Cancer Breast cancer No Breast cancer Total 140 (a) 370 (b) 510 40 (c) 234 (d) 274 Exposed (Passive Smokers) Not exposed Odds=140 / 40=3.5 Odds=370 / 234=1.6 OR=3.5 / 1.6=2.2 Compared to the control, the odds of being a passive smoker are 2.2 > in Ca breast cases
  • 57. 57
  • 58. 58
  • 59. 59 Bias is: Any systematic error that results in an incorrect estimate of the association between the exposure and outcome. Usually introduced by the experimenter or the researcher himself due to non-standardized measuring techniques.
  • 60. 60 Type of Bias:- Selection Bias Observation (Information/Misclassification) Bias Recall Bias Interviewers Bias Lost-to-follow up
  • 61. 61 Can Control Bias:- In study design through Choice of study population Data collection:- Uniform Source of information Efficient Questionnaire development Standardization of measurement technique Blinding
  • 62. 62
  • 63. 63 The concept of confounding is a central one in the interpretation of any epidemiological study. Confounding can be thought of as mixing of the effect of the exposure under study on the disease with that of an extraneous factor. This external factor or variable must be associated with the exposure and,independent of the exposure must be a risk factor for the disease.
  • 65. 65 Table 1. Relation of Myocardial infarction (MI)to Recent Oral Contraceptive (OC) Use. MI +ve MI -ve Estimated relative risk OC Yes 29 135 =1.68 No 205 1607 Total 234 1742
  • 66. 66 Table2:-Age -specific Relation of Myocardial infarction (MI) to recent Oral Contraceptive (OC) Use. Age (yrs) Recent OC use MI +ve MI -ve Estimated age-Specific relative risk 25 – 29 Yes No 4 2 62 224 7.2 30 – 34 Yes No 9 12 33 390 8.9 35 – 39 Yes No 4 33 26 330 1.5 40 – 44 Yes No 6 65 9 362 3.7 45 – 49 Yes No 6 93 5 301 3.9 Total 234 1742
  • 67. 67 Confounding can be controlled in study design through:  Restriction Matching exposure  Randomization Confounding can be controlled in analysis through:  Stratification  Multivariate analysis
  • 68. 68 The role of confounding, chance and bias have to be evaluated in studies appropriate selection of the population to be studied , with proper study design, so that the results can be applied to other population i.e., they are valid and generalizable.
  • 69. 69 Evaluation of the role of chance consists of two components:- 1. Hypothesis testing 2. Estimation of the confidence interval
  • 70. 70
  • 71. 71 WHAT IS HYPOTHESIS? Hypothesis: A testable theory, or statement of belief used in evaluation of a population parameter of interest e.g. Mean or proportion
  • 72. 72  Suppose a study is being conducted to answer questions about differences between two regimens for the management of diarrhea in children: the sugar based modern ORS and the time-tested indigenous herbal solution made from locally available herbs.  One question that could be asked is: "In the population is there a difference in overall improvement (after three days of treatment) between the ORS and the herbal solution?"
  • 73. 73 There could be only two answers to this question: Yes No
  • 74. 74 Null Hypothesis "There is no difference between the 2 regimens in term of improvement” (null hypothesis). A null hypothesis is usually a statement that there is no difference between groups or that one factor is not dependent on another and corresponds to the No answer.
  • 75. 75 Alternative Hypothesis  "There is a difference in terms of improvement achieved by a three days treatment with the ORS and that of the herbal solution" (alternative hypothesis).  Associated with the null hypothesis there is always another hypothesis or implied statement concerning the true relationship among the variables or conditions under study if no is an implausible answer. This statement is called the alternative hypothesis and corresponds to the “Yes” answer.
  • 76. 76 TYPES OF ALTERNATE HYPOTHESIS o Directional o Non Directional
  • 78. 78 WHY TEST HYPOTHESIS Hypothesis testing permits generalization of an association or a difference obtained from a sample to the population from which it came. Hypothesis testing involves conducting a test of statistical significance and quantifying the degree to which sampling variability may account for the result observed in a particular study. It entails the following steps.
  • 79. 79 STEPS IN HYPOTHESIS TESTING 1. Statement of research question in terms of statistical hypothesis (Null and alternate hypothesis) 2. Selection of an appropriate level of significance. The significance level is the risk we are willing to take that a sample which showed a difference was misleading. 5% significance level means that we are ready to take a 5% chance of wrong results.
  • 80. 80 3. Choosing an appropriate statistics t test, z test for continuous data, chi square for proportions etc. Test statistics is computed from the sample data and is used to determine whether the null hypothesis should be rejected or retained. Test statistics generates p value STEPS IN HYPOTHESIS TESTING
  • 81. 81 P value: Indicates the probability or likelihood of obtaining a result at least as extreme as that observed in a study by chance alone, assuming that there is truly no association between exposure and outcome under consideration. By convention the p value is set at 0.05 level. Thus any value of p less than or equal to 0.05 indicates that there is at most a 5% probability of observing an association as large or larger than that found in the study due to chance alone given that there is no association between exposure and outcome. If p value0.05 do not reject the null hypothesis .
  • 82. 82 4. Performing calculations and obtaining p value 5. Drawing conclusions, rejecting null hypothesis if the p value is less than the set significance level
  • 84. 84 Sample size calculations depend on: 1. Type of study. 2. Magnitude of the outcome of interest derived from previous studies. 3. Type of statistical analysis required (comparing means or proportions). 4. Level of significance / Power.
  • 85. 85 Sample size for single proportion depends on: 1. The prevalence of the condition/attribute of interest. 2. Level of confidence. 3. Margin of error.
  • 86. 86 Example of Sample size calculation for single proportion  A local health department wishes to estimate the prevalence of tuberculosis among children under 5 year of age in a locality. How many children should be included in the sample so that the prevalence may be estimated within 5% point of the true value with 95% confidence, if it is known that the true rate is unlikely to exceed 20%?
  • 87. 87 Sample size calculation and formula for single proportion
  • 88. 88 Sample size for single group mean depends on: 1. The Mean of the condition of interest. 2. Level of confidence. 3. Margin of error.
  • 89. 89 Example of Sample size calculation for single group mean  A district medical officer seeks to estimate the mean hemoglobin level among pregnant women in his district. A previous study of pregnant women showed average hemoglobin level 8.2 g/dl and standard deviation of 4.2 g/dl. Assuming a sample of pregnant women is to be selected, how many pregnant women must be studied if he wanted the estimate should fall within 1 g/dl with 95% confidence?
  • 90. 90 Sample size calculation and formula for single group mean
  • 91. 91 Sample size for two proportions depends on: 1. The prevalence of the condition / attribute of interest for both groups. 2. Level of confidence. 3. Power of the test.
  • 92. 92 Example of Sample size calculation for two proportions  It is believed that the proportion of patient who develop complications after undergoing one type of surgery is 5% while the proportion of the patients who develop complication after a second type of surgery is 15%. How large should the sample size be in each of the two groups of patients if an investigator wishes to detect with a power of 90%, wether the second procedure has a complication rate significantly higher than the first at the 5% level of significance?
  • 93. 93 Sample size calculation and formula for two proportions
  • 94. 94 Sample size for two group means depends on: 1. The means/variance for both groups. 2. Level of confidence. 3. Power of the test.
  • 95. 95 Example of Sample size calculation for two group means Suppose the true mean systolic blood pressure (SBP) of 35 to 39 year old OC users is (132.86 mmHg) and standard deviation (15.34 mmHg). Similarly, for non-OC users, the mean SBP is (127.44 mmHg) with standard deviation (18.23 mmHg). If we desire to estimate the difference between 2 groups of equal size, what would be the minimal sample size required with a power of 80% at 95% confidence level?
  • 97. 97 Sample size - Calculation
  • 98. 98 Sample size for sensitivity and specificity depends on: 1. The prevalence of the condition/attribute of interest. 2. Estimated sensitivity. 3. Estimated specificity. 4. Level of significance. 5. Margin of error.
  • 99. 99 Example of Sample size calculation for sensitivity and specificity  If we want to determine the sensitivity and specificity of graded compression ultrasonography in the diagnosis of acute appendicitis by the gold standard histopathology. How many patients should be included in the sample .The prevalence OF AA is 77% and estimated sensitivity of US is 96.5% and estimated specificity is 94.1% with 95% confidence, if we want to keep margin of error as 10%?
  • 100. 100 Sample size calculation and formula for sensitivity and specificity studies
  • 101. 101 Suggested websites for sample size calculators 1.http://www.raosoft.com/samplesize.html 2.http://www.quantitativeskills.com/sisa/calculati ons/samsize.htm 3.http://www.openepi.com/Menu/OpenEpiMenu. htm
  • 102. 102
  • 103. 103 Screening  Screening for disease control can be defined as the examination of asymptomatic people in order to classify them as likely or unlikely to have the disease that is object of screening.  If done in large groups---mass screening or population screening.
  • 104. 104 Characteristics of Disease to be Screened  Disease must pass through preclinical phase during which it is undiagnosed but detectable  Early treatment must offer some advantage
  • 105. 105 Validity  The ability of a test to distinguish between who has disease and who does not
  • 106. 106 Sensitivity  Of a test is its ability to detect people who do have disease.  If a Test is always positive for all diseased persons then sensitivity of the Test will be 100%.
  • 107. 107 Specificity  It is the ability of a Test to detect people who don’t have disease.  Thus a Test which is always negative in non- diseased individuals is called to have 100% specificity.
  • 108. 108 Validity a b c d Positive Negative Diseased Non Diseased a+b c+d FP T P FN TN
  • 109. 109 Test FNA CA Breast Positive CA Breast Negative Total Positive 60 a + + 50 b - + a + b 110 Negative 20 c - + 70 d - - c +d 90 Total 80 a + c 120 b + d a + b + c + d 200
  • 110. 110  Sensitivity = a x 100 = 60 x100 = 75% a + c 80 I.e. Test (FNA) is 75% sensitive in detecting disease  Specificity = d x 100 = 70 x100 = 58% d + b 120 I.e. Specificity of (FNA) is 58% to detect non- diseased persons
  • 111. 111 Positive Predictive Value i.e PPV  PPV = a x 100 = 60 x100 = 55% a + b 110 I.e. 55% persons are actually suffering from disease. PPV  Prevalence Negative Predictive Value i.e NPV  NPV = d x 100 = 70 x100 = 78% c + d 90 I.e. 78% persons are actually free from disease.
  • 112. 112 Test Disease Present Disease Not Present Total Positive True Positive (TP) + + False Positive (FP) - + TP + FP Negative False Negative (FN) + - True Negative (TN) - - FN + TN Total TP +FN TN + FP TP+FP+TN+FN •Sensitivity = TP x 100 TP + FN •Specificity = TN x100 TN + FP •PPV= TP x100 TP + FP •NPV = TN x100 TN + FN
  • 113. 113 Relationship of Disease Prevalence to PPV Dis. Prev Test Results Disease Not Disease Total 1% Positive 99 495 594 Negative 1 9405 9406 Total 100 9900 10,000 PPV = 99/594 = 17% Example: Sensitivity = 99%; Specificity = 95% In a population of 10, 000 with a disease prevalence of 1%
  • 114. 114 Relationship of Disease Prevalence to PPV Dis. Prev Test Results Disease Not Disease Total 5% Positive 495 475 970 Negative 5 9025 9030 Total 500 9500 10,000 PPV = 495 / 970 = 51% Example: Sensitivity = 99%; Specificity = 95% In a population of 10, 000 with a disease prevalence of 5%
  • 115. 115 Relationship between PPV & Prevalence  A screening program is most effective and beneficial if it is directed to a high-risk target population  Screening a total population for a relatively infrequent disease can be very wasteful of resources and may yield very few previously undetected cases
  • 116. 116
  • 117. 117 POINTS OF IMPORT IN DESIGNING A QUESTIONNAIRE  It should be ensured that the format of the questionnaire be attractive and easy for the respondents to fill, overcrowding or clutter should be avoided and all questions and pages clearly numbered  The questionnaire should not be too long  To maintain flow of the instrument, questions concerning major areas should be grouped together  Simple questions about age, birth date etc should be put at the beginning to warm up the respondent
  • 118. 118 POINTS OF IMPORT IN DESIGNING A QUESTIONNAIRE  Questions should be close ended, possible answers to close ended questions should be lined vertically, preceded by boxes, brackets or numbers Example How many different medicines do you take daily (check one) [ ] None [ ] 1-2 [ ] 3-4 [ ] 5-6 [ ] 7 or more
  • 119. 119  If more details are required pertaining to a question , then the filter/skip technique should be used to save time and allow respondents to avoid irrelevant questions. Example :Have you ever been told that you have hypertension. Yes No If yes proceed to next question How long back were you told that you have hypertension POINTS OF IMPORT IN DESIGNING A QUESTIONNAIRE
  • 120. 120 POINTS OF IMPORT IN DESIGNING A QUESTIONNAIRE  Wordings of questions should be simple and free from ambiguity, non judgmental and be soliciting only one response.  For behaviors that may change overtime specific time span should be asked for in the question Example :During the past 12 months how many doctor visits did you make.  Always choose a appropriate means of measurement e.g. score /scales.
  • 121. 121  Sensitive topic questions should be left for the end  If similar research instruments are available it may be a good idea to review and if required borrow questions.  Always try to ensure that if questions are to be asked in any language besides English they shall be so written too POINTS OF IMPORT IN DESIGNING A QUESTIONNAIRE