Biostatistics ppt

BIOSTATISTICSK.SANTOSHI
1st Year PG

CONTENTS:-
• Introduction
• Common statistical terms
• Sources and collection of data
• Presentation of data
• Sampling & Sampling methods
• Sampling errors
• Analysis and interpretation
• Statistical averages
• Measures of dispersion
• Test of significance
• Correlation &regression
• Conclusion

• What is Research?
“ A fundamental state of mind involving continual
examination of doctrines and axioms up on which current thought and action
are based”.
- Theobald smith.
• Need for Research:
The main purpose of research is to inform action, to prove a theory, and
contribute to developing knowledge in a field or study.

STEPS IN RESEARCH
1.Defining a problem
2.Determining objectives
7. Interpret the results and draw
conclusion
6. Analysing the data
3. Formulating an hypothesis
5. Collecting data
4.Design a study

INTRODUCTION
• The word ‘Statistics’ is derived from the Latin for “State” indicating
historical importance of governmental data gathering, which principally is
demographic information.
• “Statistics is the science , which deals with
• Collection
• Organisation
• presentation
• Analysis&
• Interpretation of numerical data’’( Acc to Croxton and Cowden).
• John Grant(1620-1674)- Father of health statistics.
• Biostatistics:Biostatistics is the application of statistics to the biologic
sciences, medicine and public health.

COMMON STATISTICAL TERMS
• Variable:- A characteristic that takes on different values in different persons, places/ things.
• Constant:- Quantities that do not vary such as π = 3.141 etc. these do not require statistical study.
• In Biostatistics, mean, standard deviation, standard error, correlation coefficient and proportion of a
particular population are considered constant.
• Observation:- An event and its measurement. for eg.. BP and its measurement.
• Observational unit:- The “sources” that gives observation for e.g. Object, person etc. in medical statistics
terms like individuals, subjects etc are used more often.
• Data :- A set of values recorded on one or more observational units.
• Population:- It is an entire group of people or study elements persons, things or measurements for which
we have an interest at particular time.

• Sample:- it is may be defined as a part of population.
• Parameter:- It is summary value or constant of a variable, that describes the sample such as its mean,
standard deviation, standard error, correlation coefficient etc.
• Parametric tests:- It is one in which population constants such as described above are used :- mean,
variances etc. data tend to follow one assumed or established distribution such as normal, binominal etc.
• Non- parametric tests:- Tests such as CHI- SQUARE test, in which no constant of population is used.
Data do not follow any specific distribution and no assumptions are made in non- parametric tests.
e.g .good, better and best.

SOURCES AND COLLECTION OF DATA
• Data-: measured / counted fact or piece of information.
Such as height of a person.
Types of Data
Qualitative Data Quantitative Data
Nominal Ordinal Discrete Continous
Interval Ratio

• Qualitative Data:-
• Also called as Enumeration data.
• Represents a particular quality or attribute.
• There is no notion of magnitude or size of the characteristic , as they can’t be measured.
• Expressed as numbers without unit of measurements. Eg: religion, sex , Blood group etc.
• Quantitative Data:-
• Also called as Measurement data.
• These data have a magnitude.
• can be expressed as number with or without unit of measurement.
• Eg: Height in cm ,Hb ingm%, weight in kg etc.

• Continuous data:-
• It can take any value possible to measure or possibility of getting fractions
• Eg. Hb level, height , weight etc.
• Discrete data:-
• Here we always get a whole number.
• Eg : Number of beds in hospital, number of students in a school etc.
Quantitative data Qualitative data
Hb level in gm% Anemic or non anemic
Ht in cm Tall or short
Fluroide conc. Hypo, normo or hypertensive
weight Idiot, genius or normal

Common data collection methods
Survey Test
Case study Photo graphs, video tapes, slides
Interview Diaries, journals, logs
Observation Document review and analysis
Group assessment

• The main sources of data are:-
1) Surveys
2) experiments
3) records in OPD
• Data can be collected through either:-
1) Primary source
2) Secondary source

• Primary source:-
Here the data is obtained by the investigator himself. This is a first
hand hand information.
Advantages-
precise information and reliable.
Disadvantages-
Time consuming, expensive.
Primary data can be obtained using:-
1) Direct personal
2) Oral health examination
3) Questionnaire method
• Secondary source:-
The data already recorded is utilized to serve the purpose of the objective of study.
Ex:- The records of opd of the dental clinics.

Data presentation
The objective of classification of data is to make the data simple, concise, meaningful
and interesting and helpful in further analysis.

Tabulation
• Is the first step before the data is used for analysis or interpretation.
• In the process of tabulation the following type of classification are
encountered.
1) Geographical i.e. area wise
2) Chronological i.e. on the basis of time
3) Qualitative i.e. acc to attribute
4) Quantitative i.e. in terms of magnitude

• Classification by space:-
Data are classified by location of occurrence
Arrangement of set of categories in alphabetical order of the terms defining these
categories.

• Chronological:- on the basis of time
• in this case data are classified by time of occurrence of the observation
• Arrangement of categories is almost always in chronological order.

• Classification by attribute:- When the data represent observations made on a
qualitative characteristic the classification in such a case is made according to this
qualities.
1) Alphabetical arrangement of categories may be suitable for general purpose table.
2) In the case of special purpose table arrangement may be made in the order of
importance of these category.

• Classification by the size of observations:-
• When the data represent observations of some characteristic on a numerical scale ,classification
is made on the basis of the individual observations.
• The range of observations is suitable divided into smaller divisions called class intervals
• The numerical scale adopted may be either discrete or continuous.

Graphical presentation
For quantitative data:- Histogram
• it is a bar diagram without gap between bars.
• If we draw frequencies of each group or class intervals in the form of columns or rectangles
such a diagram is called histogram.
• It represents a frequency distribution.

Frequency polygon
• The most commonly used graphic device to illustrate statistical distribution.
• Used to represent frequency distribution of quantitative data.
• Useful to compare 2 or more frequency distribution.
• A frequency polygon is a variation of a histogram, in which the bars are
Replaced by lines connecting the midpoints of the tops
of the bars.
Advantages:
• It is very easy to construct and interpret.
• It is useful in portraying more than two distributions on
the same.
graph paper with different colours. So it is very useful
to compare 2 ormore than 2 distributions.

Frequency curve
• When the number of observations are very large and groups are more (i.e small class
intervals) the frequency polygon tends to loose its angulation and it forms a smooth
curve known as frequency curve.

Scatter diagram or Dot diagram
• It is a graphic presentation of data
• It is used to show the nature of correlation between two variables.

• For Qualitative data:-
Bar chart
This diagram is used to represent qualitative data
It represent only one variable.
The width of the bar remains the same and only the length varies according to the frequency in
each category.
There are 3 types of bars:
a) Simple bar
b) Multiple bar
c) Component bar diagram
Multiple bar Component bar

Pictogram
• Display of data through pictogram was initiated by Dr Otto neurath in 1923.
• Data are displayed by the pictures of the items to which the data pertain.
• A single picture represents a fixed number.
• They are the least satisfactory type of diagrams.
• They are in accurate too.

Pie chart
• These are popularly used to show percentage break downs for qualitative data.
• It is so called because the entire graph looks like a pie and its components represents slices cut from a
pie.
• A circle is divided into different sectors corresponding to the frequencies of the distribution.
• The total angle at the centre of the circle is 360 degrees and it represents the total frequency.
• After the calculation of angle , segments are drawn in the circle and segments are shaded with
different shades or colours and an index is provided for the shade colors.
• Cannot be used to represent 2 or more data sets.

SAMPLING & SAMPLING METHODS
• Sample: a collection consisting of a part or subset of the objects or individuals of population.
Which is selected for the purpose, representing the population.
• Sampling : It is the process of selecting a sample from the population . For this population is
divided into a number of parts called sampling units
• When a large proportion of individuals or units have to be studied, we take a sample.
• It is easier
• More economical
• Important to ensure that group of people or items included in sample are representative of
whole population to be studied.

Probability sampling:-
• Simple Random sampling:
Here all members have the same chance(probability) of being selected. Random method
provides an unbiased cross selection of the population.

Systemic sampling: Each member of the sample comes after an equal interval from its
previous member.
For example: Suppose you want to sample 8 houses from a street of 120 houses. 120/8=15,so
every 15th house is chosen after a random starting point between 1 and 15. if the random
starting point is 11, then the houses selected are 11,26,41,56,71,86,101and 116. as an aside ,if
every15 th house was a “corner house” then this corner pattern could destroy the randomness
of the population. Population=120,sample=8,k=15

• Stratified sampling: The population is divided into smaller homogenous group or
strata by some characteristic and from each of these strata members are selected
randomly.
• Finally from each stratum using simple random or systematic sample method is used
to select final sample.

• Multistage sampling: carried out in stages
• Using smaller and smaller sampling units at each stage.
• Ex. Tuberculosis
• 1st: Mantoux test for all cases
• 2nd: x- ray chest in Mantoux positive group
• 3rd: sputum examination in x- ray positive group
• Advantage : Less cost, less laborious and purposeful.

Non probability sampling:
Quota sampling:-
• The selection of the sampling made by the researcher, who decides the quotas for
selecting sample from specified sub group of the population.
• For example , an interviewer might be need data from 40 adults and 20 adolescents
in order to study students television viewing habits.
• Selection will be
• 20 adult men and 20 adult women
• 10 adolescent girls and 10 adolescent boys

• Purposive sampling: In this sampling method, the researcher selects a typical group
of individuals who might represent the larger population and then collects data from
this group. Also known as Judgmental sampling.

• Snow ball sampling:-
• In snow ball sampling , the researchers identifying and selecting available
respondents who meet the criteria for inclusion.
• After the data have been collected from the subject the researcher asks for a referal
of other individuals, who would also meet the criteria and represent the population of
concern.

BLINDING
• Also called as Masking or concealment of treatment.
• Is intended to avoid bias caused by subjective judgement in Reporting , evaluation, Data
processing and analysis due to knowledge of treatment.
Blinding techniques:
Single-blind: Subject
Double-blind: Subject& investigator
Triple-blind: Subject, investigator& statistician.

• Sampling error:-
• sampling error refers to differences between the sample and the population that exist only
because of the observations that happened to be selected for the sample
• Repeated samples from same population
• Results obtained will differ from sample to sample
• This type of variation from one sample to another is called sampling error
• Statistical errors are sample errors
• Factors influencing the sample error are
• Size of the sample
• Natural variability of individual readings
• As sample size increases the sample error will decrease.

Non sampling error:-
• Non sampling error refers to biases and mistakes in selection of sample.
• Causes for Non-sampling errors
• Sampling operations
• Inadequate response
• Misunderstanding the concept
• Lack of knowledge
• Concealment of the truth
• Loaded questions
• Processing errors
• Sample size

• Response error:-
• A response or data error is any systematic bias that occurs during data collection,
analysis or interpretation.
• Respondent error
• Interviewer bias
• Measurement error
Non Response Error:-
A non response error occurs when units selected as part of the sampling procedure do
not respond in whole or in part.

• Respondent error:
• Respondent gives an incorrect answer.
• Eg: due to prestige or competence implications or due to sensitivity or social undesirability
of questions.
• Respondent misunderstand the requirements
• Lack of motivation to give an accurate answer
• Lazy respondent gives an average answer
• Question requires memory /recall
• Proxy respondents are used i.e taking answers from someone other than the respondant.
• Interviewer bias:
• Different interviewers administer a survey in different ways
• Differences occur in reactions of respondents to different interviewers. E.g: to interviewers
of their own sex or own ethnic group.
• Inadequate training of interviewers
• Inadequate attention to the selection of interviewers
• There is too high workload for the interviewer.

• Measurement error:-
• The question is unclear ,ambiguous or difficult to answer
• The list of possible answers suggested in the recording instrument is incomplete
• Requested information assumes a framework un familiar to the respondent
• The definitions used by the survey are different from those used by the respondent.
• Methods of reducing sampling errors:-
• Specific problem selection
• Systematic documentation of related research
• Effective enumeration
• Effective pretesting
• Controlling methodological bias
• Selection of appropriate sampling techniques.

ANALYSIS AND INTERPRETATION
• Measures of central tendency/statistical averages:
• The word “average” implies a value in the distribution, around which other values are
distributed.
• It gives a mental picture of the central value.
• Commonly used methods to measure central tendancy.
a) The arithmetic mean
b) Median
c) Mode.
Mean = sum of all values
Median= Middle value ( when the data
arranged in order)
Mode = most common value
Total no. of values

• For example
• the income of 7 people per day in rupees are as follows.
5,5,5,7,10,20,102=( total 154)
Mean =154/7=22
Median= 7
Median therefore, is a better indicator of central tendency when more of the lowest or
the highest observations are wide apart.
Mode is rarely used as series can have no modes ,1 mode or multiple modes.
Example : 5,6,7,5,10. the mode in this data is 5 since the number 5 is repeated twice.
20,18,14,20,13,14,19 in this particular data there are two modes 14 and 20.
Another ex: 300,200,120,125, 270. has no mode.

MEASURES OF DISPERSION
• widely known measures of dispersion are :-
• a) The range
• b) The mean or Average deviation
• c) The standard deviation.
Range : simplest
Difference between highest and lowest figures for eg- Diastolic BP
72,83,75,81,79,90,77,94 so the range expressed as 71 to 95.
Mean deviation:
Average of deviation from arithmetic mean.
Mean deviation = ∑(x-x)2
ƞ

Standard deviation: most frequently used
S.D= Squre root of ∑(x-x)2
ƞ
• If sample size is less than 30 in Denominator,(ƞ-1)
• S.D gives idea of spread of dispersion.
• Larger the standard deviation, greater the dispersion of values about the mean.

Normal distribution
• It is a special type if density curve. That is in bell shape. So it is called as Bell curve.
• The normal distribution describes the tendency for the data to cluster around the central value
• central value is the population mean. Which is always located in the middle of the curve.Some
observations are below the mean and some are above the mean.
• If they are arranged in order , deviating towards the extremes from the mean, on plus or minus side ,
maximum number of frequencies will be seen in the middle around the mean and fewer at extreme,
decreasing smoothly on both the sides.
• Normally, almost half the observations lie above and half below the mean and all observations are
symmetrically distributed on each of the mean.

STATISTICAL INFERENCE:-
• The main objective of sampling is to draw conclusions about the unknown
population from the information provided by a sample this is called statistical
inference.
• Statistical inference may be of two kinds: Parameter estimation
Hypothesis testing

Estimation: The process by which one makes inferences about a population, based on
information obtained from a sample.
Hypothesis: It is an assumption that is made before investigation regarding the
outcome understudy. Hypothesis is made because it can be tested scientifically using
statistical procedure.
A test procedure used to decide whether a hypothesis is to reject or not is called
Testing of hypothesis.
• Example- “For males over 40 suffering from chronic
hypertension, a 100 mg daily dose of this new drug
lowers diastolic blood pressure an average of
10mmHg.”

Procedure for testing a hypothesis
Setup a hypothesis:
The first thing in hypothesis testing is to set up a hypothesis about a population parameter
and use this information to decide how likely it is that our hypothesized population
parameter is correct
Null hypothesis :
The hypothesis asserts that there is no real difference in the sample and the population in
the particular matter under consideration and that the difference found is accidental and
unimportant acting out of fluctuations of sampling. The notation used for this is H0.
Alternative Hypothesis :
If Null hypothesis found false what Alternative would be true ? The Alternative
hypothesis directed by H1 is the opposite of H0 that must be true when H0 is false

Types of errors
• In testing the hypothesis we are likely to commit two types of errors they are :
• Type I error :
Type I error is the mistake of rejecting the null hypothesis when it is true. The
symbol  (alpha) is used to represent the probability of a type I error
• Type II error :
Type II error is the mistake of failing to reject the null hypothesis when it is false.
The symbol (beta) is used to represent the probability of a type II error.

Level of significance
The level of significance is the maximum probability of making a type I error and it is
denoted by α .(i.e., probability of rejecting HO = when it is true ) . It is a concept in the
context of hypothesis tesfiebefore a test procedure so that the results may not influence
the decisi
• In p concept in the context of hypothesis testing.
• The level of significance is usually specified before a test procedure so that the
results may not influence the decision.
• In practical we take either 5% or 1& or 10% as level of significance so that the
results may not influence the decision.
• a test procedure so that the results may not influence the decision.
• In practical we take either 5% or 1& or 10% as level of significance

TEST OF SIGNIFICANCE
Parametric tests: A statistical test in which assumptions are made about the underlying normal distribution
of observation data.
Un paired t test
Paired t test
Z test
ANOVA
Non- parametric test : These are equivalent parametric tests, which are used to analyse data that does not
fit a normal distribution. They are based on the rank order of measurements rather than their values.
1)Sign test
2)Mc Nemer test
3)Wilcoxan Matched pairs test (or Signed rank test)
4)Rank Sum tests
a) Mann Whitney test (U test)
b) Kruskal Wallis test (H test)
• Spearman’s rank correlation test
• Kendall’s coefficient of concordance
• Chi square test

Student ‘ t’ test:
Very common test used in biomedical research.
Applied to test the significance of difference between twomeans
It has the advantage that it can be used for small samples
Types : Paired ‘t’ test.
Unpaired ‘t’ test
‘Z’test: Are used when we have large sample size(n>30).
ANOVA: (Analysis of variance):
When comparisons of more than two independent groups on a continuous out come is
required,we make use of the ANOVA.
Types: 1 way ANOVA
2 way ANOVA

Chi squre test.
• Chi-square is an important continuous probability distribution, first formulated by Helmert and then developed
by Karl Pearson.
• Chi square is a non parametric test not based on any summary values of population.
APPLICATIONS OF CHI SQUARE:
1) Testing of goodness - of – fit
2) Testing of independence
3) Testing of homogeneity
• Testing of goodness of fit
In this an observed frequency distribution is compared to find out whether its pattern is as good as a
hypothetical or theoretical pattern. Usually making use of observed qualities, an appropriate theoretical
distribution is fitted and the expected frequencies are obtained.
The conclusion is drawn by referring the table of x2.
• Testing of independence
Is used in testing the association (or independence ) of 2 variables.
• Testing of homogeneity(similarity)
to test the homogeneity or similarity between frequency distributions or groups.

CORRELATION &REGRESSION
• To find whether there is significant association or not between two variables, we calculate co-
efficient of correlation, which is represented by symbol “r”.
• r = Ʃ (x - x ) (y - y ) √ Ʃ( x-x)2 Ʃ(y-y)2
• The correlation coefficient r tends to lie between – 1.0 and +1.0.
• Types of correlation :
• Perfect positive correlation:
The correlation co-efficient(r) = +1 i.e. both variables rise or fall in the same proportion.
• Perfect negative correlation:
The correlation co-efficient(r) = -1 i.e. variables are inversely proportional to each other, when
one rises, the other falls in the same proportions.
• Moderately positive correlation: Correlation co-efficient value lie between 0< r< 1
• Moderately negative correlation: Correlation coefficient value lies between -1< r< 0
• Absolutely no correlation:
r = 0, indicating that no linear relationship exits between the 2 variables.

Conclusion
• Statistics is central to most medical research .
• Basic principles of statistical methods or techniques equip medical and dental
students to the extent that they may be able to appreciate the utility and usefulness of
statistics in medical and other biosciences.
• Certain essential bits of methods in biostatistics, must be learnt to understand their
application in diagnosis, prognosis, prescription and management of diseases in
individuals and community.

References
• Methods in Biostatistics, 6th edition, by B.K. Mahajan,
• Text book of Research methodology by C.R.Kothari..
• Text book of Preventive and community Dentistry, second edition by S S Hiremath,
• Essentials of preventive and community dentistry, 3rd edition by Soben Peter.
• Textbook of biostatistics K.S.Negi.

Biostatistics ppt

More Related Content

What's hot (20)

Similar to Biostatistics ppt (20)

Recently uploaded (20)

Biostatistics ppt

Editor's Notes