2. learning objectives:
05/11/2025 Introduction to Biostatistics 2
• At the end of the session,
– Define statistics/biostatistics
– Define some terms in statistics/biostatistics
– Identify types of data/variables
– Explain the role of statistics in health sciences and
– List the main uses of statistical methods in the broader
field of health care,
3. Why Statistics?
05/11/2025 Introduction to Biostatistics 3
• Variability and Uncertainty
– Much of medical and public health researches involve
considerable uncertainty
– Many characteristics vary from individual to
individual. For example, a habitual smoker may live to
be 90, while someone who never smoked may die at
age 30.
4. What is statistics?
05/11/2025 Introduction to Biostatistics 4
• Statistics is the science of understanding data and making
decisions in the face of “variability” and “uncertainty”.
• Statistics: A field of study of the collection, organization,
analysis, summarization and interpretation of data, and the
drawing of inferences about a body of data when only part of
the data is observed.
Biostatistics: The application of statistical methods to the fields
of biological and health sciences.
5. Field of statistics can be divided into
1. Mathematical(Pure) Statistics
The study and development of statistical theory and methods in the
abstract which includes probability theory.
An ideal reference for applied statisticians
2. Applied Statistics
The application of statistical methods to solve real problems involving
randomly generated data and the development of new statistical
methodology motivated by real problems.
Descriptive statistics and the application of inferential statistics (predictive
statistics) together comprise applied statistics.
What is Statistics?
6. Biostatistics
05/11/2025 Introduction to Biostatistics 6
• It is the science which deals with development
and application of the most appropriate
methods for the:
Collection of data.
Presentation of the collected data.
Analysis and interpretation of the results.
Making decisions on the basis of such analysis
7. Uses of Biostatistics:
05/11/2025 Introduction to Biostatistics 7
• Assessment of health status
• Evaluation
• Resource allocation
• Vaccination uptake
• Magnitudes of a disease/condition
• Assessing risk factors
• Making diagnosis and choosing an appropriate
treatment
8. Role of statisticians
05/11/2025
To guide the design of an
experiment or survey prior to data collection
To analyze data using proper statistical procedures
and techniques
To present and interpret the results to researchers
and other decision makers
9. Characteristics of statistical data
05/11/2025 Biostatistcs
In order that numerical descriptions may be
called statistics they must possess the following
characteristics:
They must be in aggregates of facts
They must be affected to a marked extent
by a
multiplicity of causes
They must be enumerated orestimated
according to reasonable standard of
accuracy
They must have been collected in a
systematic
10. Limitation of Statistics
It deals with only those subjects of inquiry that
are capable of being quantitatively measured
and numerically expressed.
05/11/2025 Biostatistcs
It deals on aggregates of facts and no
importance is attached to individual items –
suited only if their group characteristics are
desired to be studied.
Statistical data are only approximately and not
mathematically correct
11. How to properly use Biostatistics
05/11/2025
• Develop an underlying question of interest
Generate a hypothesis
Design a study
Collect Data
Analyze Data
– Descriptive statistics
– Statistical Inference
Interpretation of Data and Reporting the Results
11/21/2014 Biostatistics course 11
12. Introduction to Biostatistics
11/21/2014 12
Types of Statistics:
05/11/2025
1. Descriptive statistics:
• Ways of organizing and summarizing data
• Methods for identifying the important
features of a set of data and extracting useful
information
• Example: tables, graphs, numerical summary
measures
13. Introduction to Biostatistics
11/21/2014 13
Use of descriptive methods:
05/11/2025
• Used to
- detect and correct the data (mistakes, outliers)
- communicate the data effectively
- describe the data sample(s) (sample characteristics,
representative? comparable?)
- check statistical assumptions (for example
those needed to test a hypothesis, statistical
inference!)
14. Introduction to Biostatistics
11/21/2014 14
Types of Statistics
05/11/2025
2. Inferential statistics:
• Methods used for drawing
conclusions population value
(parameter) based on
about
the
information contained in a sample of
observations (statistic) drawn from that population
• Example: Principles of probability,
estimation, confidence interval, hypothesis testing,
etc.
15. Introduction to Biostatistics
11/21/2014 15
Role of biostat in assessment
05/11/2025
– decide which information to gather,
– find patterns in collected data, and
– make the best summary description of the
population and associated problems
It may be necessary to
– design general surveys of the population needs,
– plan experiments to supplement these surveys, and
– assist scientists in estimating the extent of
health problems and associated risk factors.
16. Introduction to Biostatistics
11/21/2014 16
Role of biostat in policy setting
05/11/2025
develop mathematical tools to:
• measure the problems,
• prioritize the problems,
• quantify associations of risk factors with disease,
• predict the effect of policy changes, and
• estimate costs, including monetary and undesirable
side effects of preventive and curative measures.
17. Introduction to Biostatistics
11/21/2014 17
Role of biostat in assurance
05/11/2025
- use sampling and estimation methods to study the factors
related to compliance and outcome.
– decide if improvement is due to compliance or something else,
how best to measure compliance, and how to increase the
compliance level in the target population.
– take into account possible inaccuracy in responses and
measurements, both intentional and unintentional.
Survey instruments should be designed to make it
possible to check for inaccuracies, and to correct for
nonresponce and missing values
18. Introduction to Biostatistics
11/21/2014 18
Population and sample:
05/11/2025
• Target population:
– A collection of items that have something in common for
which we wish to draw conclusions at a particular time
– The whole group of interest
• Study (sampled) population:
– The subset of the target population that has at least some
chance of being sampled
– The specific population from which data are collected
19. Introduction to Biostatistics
11/21/2014 19
Sample:
05/11/2025
. A subset of a study
population, about which information
is actually obtained.
. The individuals who are actually measured and
comprise the actual data.
20. Sample
E.g.: In
prevalence
adolescents
05/11/2025 Biostatistcs
Study Population
Target Population
of the
a study
of HIV
among
in
Ethiopia, a random sample
of
adolescents in Lideta
Kifle
Ketema of AA were included.
Target Population: All
adolescents in
Ethiopia
Study population: All
adolescents in Addis
Ababa
Sample: Adolescents in
Lideta Kifle Ketema who
were included in the study
21. Population
05/11/2025
• Role of statistics
in using
informatio from
a sample to m
inferences about
th population
Information
Sample
Biostatistcs
22. Introduction to Biostatistics
11/21/2014 22
Statistic Parameter
Sample mean X̅ Population mean (μ)
Sample proportion (p̂ ) Population proportion (π)
Sample odds ratio (OR̂ ) Population odds ratio (OR)
Difference between two sample
means: (X̅ 1 – X̅ 2)
Difference between two population
means: (μ1 - μ2)
Difference between two sample
proportions: (P1 - P2)
Difference between two Population
proportions: (π1 - π2)
Statistic versus parameter:
05/11/2025
23. 05/11/2025 Biostatistcs
Variable ?
– Any aspect of an individual
that is measured and take
any value for different individuals
or cases, like blood pressure, or recorded,
like age, sex is called a variable
Variable :
– Quantitative ( Discrete / Continuous )
– Qualitative ( Nominal / Ordinal)
24. Variable types:
05/11/2025
Continuous
• Quantitative intervals
with typical ranking
– Examples:
• Cholesterol level
• Number of drinks
• Day supply of drug
• Waist size
• BMD
11/21/2014 24
Categorical
–
Dichotomous
(yes/no) (e.g.,
death, fracture, DM)
– Nominal (no order)
(e.g., marital status,
occupation)
– Ordinal (ordered
rank) (e.g., disease
Introduction to Biostatistiscseverity)
25. Types of variables:
05/11/2025
11/21/2014 Introduction to Biostatistics
uninterrupted
25
Categorical Quantitative
continuous
discrete
ordinal
nominal
binary
2 categories +
more categories +
order matters +
numerical +
26. 05/11/2025 Biostatistcs
⚫Categorical variable: A variable or characteristic which
can not be measured in quantitative form but can only
be sorted by name or categories
⚫Not able to be measured as we measure height or
weight
⚫The notion of magnitude is absent or implicit.
27. ⚫Quantitative variable: A variable that can
be
measured (or counted) and expressed numerically.
⚫Height, wt, # of children, etc.
⚫Has the notion of magnitude.
⚫Numerical or quantitative data can be continuous or
discrete.
05/11/2025 Biostatistcs
28. 1. Discrete: It can only have a limited number of discrete values
(usually whole numbers).
05/11/2025 Biostatistcs
⚫E.g., the number of pregnancy mother has had in her life. You
can’t have 2.5 pregnancy
⚫Characterized by gaps or interruptions in the values (integers).
⚫Both the order and magnitude of the values matter.
⚫The values aren’t just labels, but are actual measurable quantities.
• Integers that correspond to a count
• Can assume only whole numbers
• Examples
⚫# of bacterial colonies on a plate
⚫# of missing teeth
⚫# of accidents in a time period
⚫# of illnesses in a time period
⚫The binomial and Poisson distribution
29. 2. Continuous variable: It can have an
infinite number of possible values in any given
interval.
⚫Both the magnitude and the order of the values matter.
• Can take any value within a defined range
• Limitations imposed by the measuring stick
⚫Does not possess the gaps or interruptions
• Examples – blood pressure, height, weight, time; Weight
is continuous since it can take on any number of values
(e.g.,
34.575 Kg).
05/11/2025 Biostatistcs
30. Scale of Measurements
05/11/2025 Biostatistcs
A logical place to begin the discussion of descriptive
methods is to consider the various forms in which
medical data occur. Data analysis techniques that
are useful to some data may not be appropriate to
others.
Measuring scales are different according to the
degree of precision involved.
There are four types of scales of measurement
31. Scale of Measurements
05/11/2025 Biostatistcs
1. Nominal Scale: qualitative, categorical data
o There is no implied order to the categories of
nominal data
o In these types of data, individuals are simply placed
in the mutually exclusive and collectively exhaustive
categories, and the number in each category is counted.
⚫ Uses names, labels, or symbols to assign each
measurement.
⚫ Examples: Blood type, sex, race, marital status,
etc.
⚫The mode, or modal group (repeated group) is the only
appropriate measure of centre for nominal data.
32. Scale of Measurements
05/11/2025 Biostatistcs
2. Ordinal scale: Rank-ordered data
o Data are grouped in order from low to high. But we
cannot say how much lower or how much higher.
o Example:
– "low anxiety", "moderate anxiety" and "high
anxiety".
– Pain level: None, mild, moderate and sever
– Patient status, cancer stages, social class,
Likert scales etc.
33. Scale of Measurements
05/11/2025
3. Interval data: quantitative data
o There is fixed equal interval between numbers.
E.g.
⚫ the difference between 10 km and 15 km is the same as
the distance between 30 km and 35 km
⚫ in the Fahrenheit temperature scale, the difference
between 70 degrees and 71 degrees is the same as the
difference between 32 and 33 degrees. he distance
between 30 km and 35 km.
o But the scale is not a RATIO Scale.
Forty degrees Fahrenheit is not twice as much
as 20
11/21d/2e01g4 rees 33
34. Scale of Measurements
05/11/2025
4. Ratio level data
The data values in ratio data have meaningful
ratios, for example, age is Ratio data, some one
who is 40 is twice as old as someone who is 20.
Both interval and ratio data involve
measurement. Most data analysis techniques
that apply to ratio data also apply to interval
data. Therefore, in most practical aspects, these
types of data (interval & ratio) are grouped
under metric data.
11/2F1/o20r14 interval or
raBtioistoatistcsdata, the mean an34 d
35. Scale of Measurements
05/11/2025 Biostatistcs
Ratio Data ---
Numerical discrete
Numerical discrete data occur when the observations are integers that
correspond with a count of some sort.
Some common examples are: the number of bacteria colonies on a plate, the
number of cells within a prescribed area upon microscopic examination, the
number of heart beats within a specified time interval, a mother’s history of
number of births ( parity) and pregnancies (gravidity), etc.
Numerical Continuous
The scale with the greatest degree of quantification is a numerical continuous
scale.
Each observation theoretically falls somewhere along a continuum. One is not
restricted, in principle, to particular values such as the integers of the discrete
scale.
The restricting factor is the degree of accuracy of the measuring instrument.
Most clinical measurements, such as blood pressure, serum cholesterol level,
height, weight, age etc. are on a numerical continuous scale.
37. Scales of Measurement
05/11/2025 Biostatistcs
• Nominal = Naming
• Ordinal = Naming + Order
• Interval = Naming + Order + Equal Intervals
• Ratio = Naming + Order + Equal Intervals + True
Zero
38. Data
05/11/2025 Introduction to Biostatistics
• Data are figures/numbers which can be
obtained from measurements or by counting
• The raw material for statistics
• Can be obtained from:
– Routinely kept records
– Surveys
– Counting
– Experiments
– Reports
39. Typical data sources:
05/11/2025 Introduction to Biostatistics
• Survey/questionnaire
• Interviews
• Diaries
• Direct observation
• Environmental measurements
• Databases/registries
• Medical records
• Physiologic measures
• Biomarkers (e.g., DNA, sera)
• Imaging tests
• Pathology
Goal: choose the source that gives data closest
to the “gold standard” while being feasible to
collect
41. Types of data:
05/11/2025
1. Primary data: collected from the items or individual
respondents directly by the researcher for the
purpose of certain study.
11/21/2014 Introduction to Biostatistics 41
42. Method of Collecting Primary Data
05/11/2025
1. Direct personal Investigation ( i.e.
Interview Method)
2. Indirect oral investigation ( i.e.
through
enumerators)
3.
Investigation
Questionnaire
through Local reporters
4. Investigation through mailed Questionnaire
5. Investigation through Observation
11/21/2014 Biostatistics course 42
43. 2. Secondary data: which had been collected by certain people
or agency, and statistically treated and the
information contained in it is used for other purpose
05/11/2025
11/21/2014 Introduction to Biostatistics 43
44. Method of Collecting Secondary Data
•1. Published Sources
a) International Publication
b) Government Publications
c) Publication
d)Commercials Research, Educational
Institute, Unions, Organizations etc.
•2. Unpublished Sources
Secondary data
05/11/2025 Biostatistics course
45. Difference between Primary and Secondary Data
05/11/2025 Biostatistics course
Primary Data
• Real time data.
• Sure about sources of data.
• Help to give results/finding
• Costly and Time consuming
process.
• Avoid biasness of response
data
• More flexible.
Secondary Data
• Past data.
• Not sure about sources of
data.
• Refining the problem.
• Cheap and No time
consuming process.
• Can not know in data
biasness or not
• Less Flexible.
46. Sources of Data:
05/11/2025 Introduction to Biostatistics
• We search for suitable data to serve as the raw
material for our investigation.
• Such data are available from one or more of the
following sources:
– Routinely Records
– External Source
– Survey
– Experiment
47. Practice problem 1: data types
05/11/2025 Introduction to Biostatistics
• Smoker (current, former, no)
• CHD onset (yes or no)
• Family history of CHD (yes or no)
• Non-smoker, light-smoker, moderate smoker, heavy smoker
• BMI (kgs/m3)
• Age (years)
• Weight presently
• Weight at age 18
Classify the variables into binary, nominal, ordinal, discrete and
continuous
48. References:
05/11/2025 Introduction to Biostatistics
• Daniel, W. W. 1999. Biostatistics: a foundation for analysis in the health
sciences. New York: John Wiley and Sons.
• C.R.Cothari. Research Methodology: Methods and Techniques. 2nd ed. New
Age International (P) Ltd, Publishers, New Delhi, 2004.
• Morton RF, Hebel JR, McCarter RJ: A Study Guide to Epidemiology and
Biostatistics, 4th ed. Gaithersburg, Maryland, Aspen Publications, 1996.
• Norman GR, Streiner DL: Biostatistics: The Bare Essentials, 2nd ed. Hamilton,
Ontario, B.C. Decker, 2000.
• Pagano M, Gauvreau K: Principles of Biostatistics, 2nd ed. Pacific Grove, CA,
Duxbury Press, 2000.
• BMJ. Statistics at Square One.
• Kline et al. Annals of Emergency Medicine 2002; 39: 144-152.
• Johnson R. Just the Essentials of Statistics. Duxbury Press, 1995.