SlideShare a Scribd company logo
INTRODUCTION TO BIOSTATISTICS Dr. Higenyi
Emmanuel (PhD)
SCOPE
Part 1 Introduction
• Definitions
• Importance of statistics
• Application of biostatistics
• Statistical notations
• Types of data
• Variables
• Sources of data
• Data presentation
• Data summarization
• Sampling
• Probability
Part 2 Basic Data statistical analysis
• t-test
• z-test
• Binomial test
• Chi-square test
• Fischer exact test
• Corelation
• Simple linear regression
PART 1: DEFINITIONS
Statistics
• The study and manipulation of data, including ways to gather, review,
analyze, and draw conclusions from data.
• The two major areas of statistics are descriptive and inferential statistics.
• Statistics can be communicated at different levels ranging from non-numerical
descriptor (nominal-level) to numerical in reference to a zero-point (ratio-
level).
• Several sampling techniques can be used to compile statistical data,
including simple random, systematic, stratified, or cluster sampling.
• Statistics are present in almost every department of every company and are
an integral part of investing.
PART 1: DEFINITIONS
Biostatistics or biometry
• Branch of biological science concerned with the study and methods for
collecting, presenting, analysing and interpreting biological research
data.
• The primary aim of this branch of science is to allow researchers, health
care providers and public health administrators to make decisions
concerning a population using sample data.
• For example, the government wants to know the prevalence of a specific
health problem among residents in a given town. If there are 3 million
residents in the town it may not be realistic to test them individually and
determine whether they have the disease or are susceptible to it.
PART 1: DEFINITIONS
Biostatistics or biometry
• The realistic and cost-effective approach is to study a representative subset
of the population and apply their results to the entire group.
• Hence biostatistics makes research possible by providing tools and
techniques for collecting, analysing and interpreting biological and medical
data, allowing stakeholders to draw actionable insights about a population
from sample data.
• Biostatisticians usually get their data from a wide range of sources,
including medical records, peer-reviewed literature, claims records, vital
records, disease registries, surveillance, experiments and surveys.
• The professionals collaborate with scientists, health care providers, public
health administrators and other stakeholders.
PART 1: DEFINITIONS
Biostatistics or biometry sources of data
• Medical records: Medical records can provide researchers with data about
diagnoses, lab tests and procedures common amongst a specific population,
such as people above 50 years working in the police force.
• Claims data: Scientists can get data about doctor's appointments and medical
bills in claims data.
• Vital records: Vital records contain information about births, deaths, causes of
death and divorces.
• Peer-reviewed literature: Researchers can also pull data from the articles and
studies that experts in a particular field published in peer-reviewed journals.
• Surveys: The researchers can collect primary data using surveys designed
specifically for an experiment.
• Disease registries: These systems help to collect, store, analyse, retrieve and
disseminate information regarding people living with specific disease
Part 1 Types of statistics:
PART 1: DESCRIPTIVE STATISTICS
Descriptive statistics
• Mostly focus on the central tendency, variability, and distribution of sample data.
• Central tendency means the estimate of the characteristics, a typical element of
a sample or population-It includes descriptive statistics such as mean, median,
and mode.
• Variability refers to a set of statistics that show how much difference there is
among the elements of a sample or population along the characteristics
measured. It includes metrics such as range, variance, and standard deviation.
• The distribution refers to the overall “shape” of the data, which can be depicted
on a chart such as a histogram or a dot plot, and includes properties such as the
probability distribution function, skewness, and kurtosis
CENTRAL TENDENCY, VARIABILITY
60, 65, 66, 68, 70, 70, 70, 70, 70, 71, 72, 75, 80, 81, 82, 83, 85, 86, 88, 90
Sum=1502
Mean =1502/20=75.1
Mode =70
Median =71.5
90
CENTRAL TENDENCY, VARIABILITY
60, 65, 66, 68, 70, 70, 70, 70, 70, 71, 72, 75, 80, 81, 82, 83, 85, 86, 88, 90
Mean=75.1, SS=1394, SQRT OF (SS/20=SD)=8.3
90
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
60 65 66 68 70 70 70 70 70 71 72 75 80 81 82 83 85 86 88 90
75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75
-15 -10 -9 -7 -5 -5 -5 -5 -5 -4 -3 0 5 6 7 8 10 11 13 15
225 100 81 49 25 25 25 25 25 16 9 0 25 36 49 64 100 121 169 225
FORMULA FOR SD
SD= SQRT OF [SUM
OF(Number-the
mean)/the number of
elements in the data set)]
SD
1 Calculate the mean
2. Subtract the mean from each element individually
3. Square the differences from subtraction
4. Get the sum of the squared differences
5. Divide the sum of the squared difference by the number of elements
6. Get the square root of the answer after division (quotient)=SD
DESCRIPTIVE STATISTICS
Central tendency
 Mean
 Median
mode
Variability
 Range
 Variance
 SD
Shape or distribution
 Skewness and Ketosis
Relative frequencies/proportions
Graphs and charts and tables
PART 1: DESCRIPTIVE STATISTICS
Descriptive statistics
• Can also describe differences between observed
characteristics of the elements of a data set.
• Can help us understand the collective properties of the
elements of a data sample and form the basis for testing
hypotheses and making predictions using inferential statistic
• Useful in summarizing data
• Can be in form of numbers, tables or graphs
Part 1: Descriptive statistics
PART 1: INFERENTIAL STATISTICS
Inferential statistics
• Is a tool that statisticians use to draw conclusions about the characteristics of a
population, drawn from the characteristics of a sample, and to determine how
certain they can be of the reliability of those conclusions.
• Based on the sample size and distribution, statisticians can calculate the
probability that statistics, which measure the central tendency, variability,
distribution, and relationships between characteristics within a data sample,
provide an accurate picture of the corresponding parameters of the whole
population from which the sample is drawn.
• Are used to make generalizations about large groups, such as estimating
average demand for a product by surveying a sample of consumers’ buying
habits or attempting to predict future events.
Part 1: Inferential Statistics
Part 1:
FACTORS ASSOCIATED WITH MEN’S INVOLVEMENT IN
ANTENATAL CARE VISITS IN ASMARA, ERITREA: COMMUNITY-
BASED SURVEY
The necessity for a pregnant woman to attend ANC was recognized by almost all
(98.7%) of the male partners; however, 26.6% identified a minimum frequency of
ANC visits.
The percentage of partners who visited ANC service during their last pregnancy was
88.6%. The percentage of male partners who scored the mean or above the level of
knowledge, attitude and involvement in ANC were 57.0, 57.5, and 58.7, respectively.
Religion (p = 0.006, AOR = 1.91, 95% CI 1.20–3.03), level of education (p =
0.027, AOR = 1.96, 95% CI 1.08–3.57), and level of knowledge (p<0.001, AOR =
3.80, 95% CI 2.46–5.87) were significantly associated factors of male involvement in
ANC.
METHODS USED
List of households with pregnant women was prepared for each administration area
and was used as sampling frame
A community-based cross-sectional survey was applied using a two-stage sampling
technique to select 605 eligible respondents in Asmara in 2019.
Data was collected using a pretested structured questionnaire.
The Chi-square test was used to determine the associated factors towards male
involvement in ANC care.
Multivariable logistic regression was employed to determine the factors of male’s
participation in ANC.
A P-value less than 0.05 was considered statistically significant.
USE-CASE INFORMATION NEEDED
Define target population
State the type of statistics you expect (descriptive, inferential, or both)
State the possible sources of data
PART 1: INFERENTIAL TESTS
Inferential tests
• Tests concerned with using selected sample data compared with population
data in a variety of ways are called inferential statistical tests.
• There are two main bodies of these tests.
• The first and most frequently used are called parametric statistical tests.
• The second are called nonparametric tests.
• For each parametric test, there may be a comparable nonparametric test,
sometimes even two or three.
• Parametric tests are tests of significance appropriate when the data
represent an interval or ratio scale of measurement
PART 1: INFERENTIAL TESTS
Parametric tests
• Tests of significance appropriate when the data represent an interval or ratio
scale of measurement and other specific assumptions have been met, specifically,
that the sample statistics relate to the population parameters, that the variance of
the sample relates to the variance of the population, that the population has
normality, and that the data are statistically independent.
Nonparametric tests
• Statistical tests used when the data represent a nominal or ordinal level scale or
when assumptions required for parametric tests cannot be met, specifically, small
sample sizes, biased samples, an inability to determine the relationship between
sample and population, and unequal variances between the sample and
population. These are a class of tests that do not hold the assumptions of normality.
PART 1: DATA TYPES
Data types
Qualitative
Dichotomous Multinomial
Quantitative
Discrete Continuous
Biostatistics notes for Masters in Public Health
ILLUSTRATION OF QUALITATIVE AND
QUANTITATIVE DATA
To assess the nutritional status and to determine potential risk factors of malnutrition
in children under 3 years of age in Nghean, Vietnam.
The study carried out in November 2007, a total of 383 child/mother pairs were
selected by using a 2-stage cluster sampling methodology. A structured questionnaire
was administered to mothers in their home settings.
Anthropometric measurement was defined as being underweight (weight for age),
wasting (weight for height) and stunting (height for age) on the basis of reference
data from the National Center for Health Statistics (NCHS) / World Health
Organization (WHO).
ILLUSTRATION OF QUALITATIVE AND QUANTITATIVE
DATA
Logistic regression analysis was used to into account the hierarchical relationships between potential
determinants of malnutrition.
The mean Z-score for weight-for-age was -1.51 (95% CI -1.64, -1.38), for height-for-age was -
1.51 (95% CI -1.65, -1.37) and for weight-for-height was -0.63 (95% CI -0.78, -0.48). Of the
children, 103 (27.7%) were underweight, 135 (36.3%) were stunted and 38 (10.2%) were wasted.
Region of residence, ethnic, mother’s occupation, household size, mother’s BMI, number of children in
family, weight at birth, time of initiation of breast-feeding and duration of exclusive breast-feeding
were found to be significantly related to malnutrition.
The findings of this study indicates that malnutrition is still an important problem among children
under three years of age in Nghean, Vietnam. Socio-economic, environmental factors and feeding
practices are significant risk factors for malnutrition among under-three.
PART 1: COMMON STATISTICAL TERMS
Binomial test
• When a test has two alternative outcomes, either failure or success, and you
know what the possibilities of success are, you may apply a binomial test.
• Use a binomial test to determine if an observed test outcome is different
from its predicted outcome.
Causation
• Causation is a direct relationship between two variables.
• Two variables have a direct relationship if a change in one’s value causes a
change in the other variable.
• In that case, one becomes the cause, and the other is the effect.
PART 1: COMMON STATISTICAL TERMS
Confidence interval
• A confidence interval measures the level of uncertainty of a collection of
data.
• This is the range in which you anticipate your values to fall within a specific
degree of confidence if you repeat the same experiment.
Correlation coefficient
• The correlation coefficient describes the level of correlation or dependence
between two variables.
• This value is a number between -1 and +1, and if it falls beyond this limit,
there’s been a mistake in the measurement of a coefficient.
PART 1: COMMON STATISTICAL TERMS
Z-score:
• A score expressed in units of standard deviations from
the mean. It is also known as a standard score.
Z-test:
• A test of any of a number of hypotheses in inferential
statistics that has validity if sample sizes are sufficiently
large and the underlying data are normally distributed.
PART 1: COMMON STATISTICAL TERMS
Hypothesis tests
• A hypothesis test is a method of testing results. Before conducting research, the researcher creates a
hypothesis or a theory for what they believe the results will prove.
• A study then tests that theory.
Kruskal-Wallis one-way analysis of variance:
• A nonparametric inferential statistic used to compare two or more independent groups for statistical
significance of differences.
Mann-Whitney U-test (U):
• A nonparametric inferential statistic used to determine whether two uncorrelated groups differ
significantly.
McNemar’s test:
• A nonparametric method used on nominal data to determine whether the row and column marginal
frequencies are equal. *NPT
PART 1: COMMON STATISTICAL TERMS
Dependent variable
• A dependent variable is a value that depends on another variable to exhibit change.
• When computing in statistical analysis, you can use dependent variables to make conclusions about causes of
events, changes and other translations in statistical research.
Independent variable
• In a statistical experiment, an independent variable is one that you modify, control or manipulate in order to
investigate its effects.
• It's called independent since no other factor in the research affects it.
Multivariate analysis of covariance (MANCOVA):
• An extension of ANOVA that incorporates two or more dependent variables in the same analysis. It is an
extension of MANOVA where artificial dependent variables (DVs) are initially adjusted for differences in one or
more covariates. It computes the multivariate F statistic.
Multivariate analysis of variance (MANOVA):
• It is an ANOVA with several dependent variables.
PART 1: COMMON STATISTICAL TERMS
One-way analysis of variance (ANOVA):
• An extension of the independent group t-test where you have more than two groups. It computes the
difference in means both between and within groups and compares variability between groups and
variables. Its parametric test statistic is the F-test.
Pearson correlation coefficient (r): T
• This is a measure of the correlation or linear relationship between two variables x and y, giving a value
between +1 and −1 inclusive.
• It is widely used in the sciences as a measure of the strength of linear dependence between two
variables.
Pooled point estimate:
• An approximation of a point, usually a mean or variance, that combines information from two or more
independent samples believed to have the same characteristics.
• It is used to assess the effects of treatment samples versus comparative samples
PART 1: COMMON STATISTICAL TERMS
Standard deviation
• The standard deviation is a metric that calculates the square root of a variance. It informs you
how far a single or group result deviates from the average.
Standard error of the mean
• A standard error of mean assesses the likelihood of a sample's mean deviating from the
population mean. You can find the standard error of the mean if you divide the standard
deviation by the square root of the sample size.
Range
• The range is the difference between the lowest and highest values in a collection of data.
Quartile and quintile
• Quartile refers to data divided into four equal parts, while quintile refers to data divided into
five equal parts.
PART 1: COMMON STATISTICAL TERMS
Pearson correlation coefficient
• Pearson's correlation coefficient is a statistical test that determines the connection between two continuous
variables.
• Since it is based on covariance, they recognize it as the best approach to quantify the relationship among
variables of interest.
Median
• The median refers to the middle point of data.
• Typically, if you have a data set with an odd number of items, the median appears directly in the middle of
the numbers.
• When computing the median of a set of data with an even number of items, you can calculate the simple
mean between the two middle-most values to achieve the median.
Mode
• Mode refers to the value in a database that repeats the most number of times. If none of the values repeat,
there’s no mode in that database.
PART 1: COMMON STATISTICAL TERMS
Statistical inference
• Statistical inference occurs when you use sample data to generate an inference or conclusion.
Statistical inference can include regression, confidence intervals or hypothesis tests.
Statistical power
• Statistical power is a metric of a study's probability of discovering statistical relevance in a
sample, provided the effect is present in the entire population. A powerful statistical test likely
rejects the null hypothesis.
Runs test:
• Where measurements are made according to some well-defined ordering, in either time or space.
• A frequent question is whether or not the average value of the measurement is different at
different points in the sequence. This nonparametric test provides a means for this
PART 1: COMMON STATISTICAL TERMS
T-score
• A t-score in a t-distribution refers to the number of standard deviations a sample is away
from the average.
Z-score
• A z-score, also known as a standard score, is a measurement of the distance between the
mean and data point of a variable. You can measure it in standard deviation units.
Z-test
• A z-test is a test that determines if two populations' means are different. To use a z-test, you
need to know the differences in variances and have a large sample size
Sign test:
• A test that can be used whenever an experiment is conducted to compare a treatment with a
control on a number of matched pairs, provided the two treatments are assigned to the
members of each pair at random.
PART 1: COMMON STATISTICAL TERMS
Student t-test
• A student t-test is a hypothesis that tests the mean of a small sample with a bell curve where you
don’t know the standard deviation. This can include correlated means, correlation, independent
proportions or independent means.
T-distribution
• T-distribution means when the population standard deviation is unknown and the data originates
from a bell-curve population, it describes the standardized deviations of the mean of the sample
to the mean of the population.
Standard error of the mean (SEM):
• An estimate of the amount by which an obtained mean may be expected to differ by chance
from the true mean. It is an indication of how well the mean of a sample estimates the mean of a
population
PART 1: COMMON STATISTICAL TERMS
Variance (SD2 ):
• A measure of the
dispersion of a set of
data points around their
mean value.
• It is a mathematical
expectation of the
average squared
deviations from the mean
Analysis of covariance
(ANCOVA):
• A statistical technique for
equating groups on one
or more variables when
testing for statistical
significance using the F-
test statistic.
• It adjusts scores on a
dependent variable for
initial differences on
other variables, such as
pretest performance or
IQ. *PT
Analysis of variance
(ANOVA):
• A statistical technique for
determining the statistical
significance of
differences among
means; it can be used wit
PART 1: COMMON STATISTICAL TERMS
Effect size
• Effect size is a statistical term that quantifies the degree of a relationship between
two given variables. For example, we can learn about the effect of therapy on
anxiety patients.
• The effect size aims to determine whether the therapy is highly successful or mildly
successful.
Measures of variability
• Measures of variability, also referred to as measures of dispersion, denote how
scattered or dispersed a database is.
• Four main measures of variability are the interquartile range, range, standard
deviation and variance.
PART 1: COMMON STATISTICAL TERMS
Median test
• A median test is a nonparametric test that tests two independent groups that have
the same median.
• It follows the null hypothesis that each of the two groups maintains the same median.
Population
• Population refers to the group you’re studying. This might include a certain
demographic or a sample of the group, which is a subset of the population.
Parameter
• A parameter is a quantitative measurement that you use to measure a population.
• It’s the unknown value of a population on which you conduct research to learn more.
PART 1: COMMON STATISTICAL TERMS
Post hoc test
• Researchers perform a post hoc test only after they’ve discovered a statistically
relevant finding and need to identify where the differences actually originated.
Probability density
• The probability density is a statistical measurement that measures the likely
outcome of a calculation over a given range.
Random variable
• A random variable is a variable in which the value is unknown.
• It can be discrete or continuous with any value given in a range.
PART 1: COMMON STATISTICAL TERMS
Chi-square (²):
• A nonparametric test of statistical significance appropriate when the data are in the form of
frequency counts; it compares frequencies actually observed in a study with expected
frequencies to see if they are significantly different.
Coefficient of determination (r²):
• The square of the correlation coefficient (r), it indicates the degree of relationship strength
by potentially explained variance between two variables.
Cohen’s d:
• A standardized way of measuring the effect size or difference by comparing two means by
a simple math formula. It can be used to accompany the reporting of a t-test or ANOVA
result and is often used in meta-analysis.
• The conventional benchmark scores for the magnitude of effect sizes are as follows: small, d
= 0.2; medium, d = 0.5; large, d = 0.8
PART 1: COMMON STATISTICAL TERMS
Cronbach’s alpha coefficient ():
• A coefficient of consistency that measures how well a set of variables or items measures a single,
unidimensional, latent construct in a scale or inventory.
• Alpha scores are conventionally interpreted as follows: high, 0.90; medium, 0.70 to 0.89; and
low, 0.55 to 0.69
F-test (F):
• A parametric statistical test of the equality of the means of two or more samples. It compares the
means and variances between and within groups over time. It is also called analysis of variance
(ANOVA)
Tukey’s test of significance:
• A single-step multiple comparison procedure and statistical test generally used in conjunction with
an ANOVA to find which means are significantly different from one another.
• Named after John Tukey, it compares all possible pairs of means and is based on a studentized
range distribution q (this distribution is similar to the distribution of t from the t-test).
PART 1: COMMON STATISTICAL TERMS
Fisher’s exact test:
• A nonparametric statistical significance test used in the analysis of contingency tables where
sample sizes are small.
• The test is useful for categorical data that result from classifying objects in two different
ways; it is used to examine the significance of the association (contingency) between two
kinds of classifications
Wald-Wolfowitz test:
• A nonparametric statistical test used to test the hypothesis that a series of numbers is
random. It is also known as the runs test for randomness
Wilcoxon sign rank test (W+ ):
• A nonparametric statistical hypothesis test for the case of two related samples or repeated
measurements on a single sample. It can be used as an alternative to the paired Student’s t-
test when the population cannot be assumed to be normally distributed.
PART 1: COMMON STATISTICAL TERMS
Independent t-test:
• A statistical procedure for comparing measurements of mean scores in two
different groups or samples.
• It is also called the independent samples t-test. *
Kendall’s tau:
• A nonparametric statistic used to measure the degree of correspondence
between two rankings and to assess the significance of the correspondence.
Kolmogorav-Smirnov (K-S) test:
• A nonparametric goodness of-fit test used to decide if a sample comes
from a population with a specific distribution.
• The test is based on the empirical distribution function (ECDF)
PART 1: APPLICATION OF BIOSTATISTICS
1. Clinical Trials
• One of the most impactful applications of biostatistics is in the design and analysis of clinical
trials.
• Biostatisticians ensure the validity and reliability of trial results, which helps researchers assess
the safety and efficacy of new drugs and treatments.
• Using various statistical methods, we analyze patient data to draw conclusions that will help in
making medical decisions.
2. Epidemiology
• In the field of epidemiology, biostatistics aids in studying the distribution and determinants of
diseases within populations.
• Biostatisticians use different statistical models to analyze patterns, identify risk factors, and
assess the impact of interventions.
• This information is crucial for public health planning and for developing disease prevention
strategies.
PART 1: APPLICATION OF BIOSTATISTICS
3. Genetics and Genomics
• Biostatistics is indispensable in the analysis of genetic and genomic data.
• Researchers use statistical methods to identify genes associated with specific diseases,
understand the heritability of these genes, and figure out complex genetic interactions.
• This application of biostatistics is instrumental in advancing our understanding of what
is the genetic basis of various medical conditions.
4. Public Health Policy
• Biostatistics contributes significantly to the formulation and evaluation of public health
policies.
• By analyzing health data, biostatisticians can assess the effectiveness of interventions,
evaluate health disparities, and guide policymakers in making informed decisions to
improve public health outcomes.
PART 1: APPLICATION OF BIOSTATISTICS
5. Environmental Health
• Biostatistics is applied in environmental health studies to analyze the impact of environmental factors on
human health. Whether it is assessing the effects of air quality on respiratory diseases or studying the
correlation between water contaminants and health outcomes, biostatistics helps decode the complex
relationships in environmental health research.
6. Bioinformatics
• This is the era of big data, biostatistics plays a crucial role in bioinformatics, where vast amounts of
biological data are analyzed to extract meaningful patterns. Biostatisticians develop statistical methods
and algorithms to interpret data from genomics, proteomics, and other ‘omics’ technologies. And the result
is visible in the form of advancements in personalized medicine and drug discovery.
7. Quality Control in Healthcare
• Biostatistics is also employed in quality control processes within healthcare systems. It ensures the accuracy
and reliability of medical tests, monitors healthcare processes, and helps identify areas for improvement.
This application is vital for maintaining high standards of patient care
PART 1: APPLICATION OF BIOSTATISTICS
8. Create population-based interventions
• Researchers can use biometric techniques to assess the impact of a
health programme on the target population.
• With biometric techniques, researchers can use insights from data to:
• Measure the performance of public health interventions
• Boost immunisation rates
• Increase the number of patients attending post-surgery appointments
• Improve training and supervision of health care professionals
standards of patient care
PART 1: APPLICATION OF BIOSTATISTICS
9. Create population-based interventions
• Biometrics can also help researchers, health care providers and public health
administrators to create population-based health interventions based on the results
of biostatistical data analysis and interpretation.
These data insights can be used to:
• Identify populations that require interventions to reduce their exposure to specific
health problems
• Identify areas susceptible to high risk of certain diseases
• Identify the factors influencing the high cases of health disparities within a
population
• Identify members of a population that require the highest level of health care
PART 1: APPLICATION OF BIOSTATISTICS
10. Control epidemics
• Biostatistical techniques can also help public health officials, health care practitioners and
epidemiologists to control epidemics.
• Researchers not only use statistical analysis to understand how diseases spread, but they can also
use it to determine the mortality rate amongst specific populations.
• It can also help health care professionals determine the most at-risk members of the population and
create a framework for formulating strategies to stop the spread of such diseases.
11. Identify barriers to care
• Researchers and health care professionals can use biostatistical methods to learn about the barriers
preventing people from getting access to quality care.
• Researchers use surveys to identify the factors that limit access to health care. Medical records,
interviews and claims records can show patient perceptions about health care services, providing
insights to make such services more accessible and acceptable to target populations for higher
efficiency.
PART 1: APPLICATION OF BIOSTATISTICS
12. Study demography
• Demography is the statistical study of the human population.
• The field uses statistical techniques to describe births, deaths, income, disease disparity and other
structural changes in human populations.
• Using census data, surveys and statistical models, biostatisticians can analyse the structure, size and
movement of populations, providing insights for government agencies, health care administrators, town
planners and other stakeholders to create and adjust their plans based on the dynamics of the population
13. Derive conclusions about populations from samples
• One major importance of biostatistical methods is that they help researchers derive far-reaching
conclusions about a population from samples.
• Due to several factors, such as finances, size and time constraints, it's not always possible for researchers
to collect data about an entire population when testing assumptions about them.
• Biostatistical methods provide researchers and administrators with the tools they require to select a
sample that's representative of the population, choose the right independent and dependent variables
and derive logical conclusions from the data
PART 1: APPLICATION OF BIOSTATISTICS
14. Check drug efficacy
• In the medical and pharmaceutical fields, biostatistical research is used to check the
efficacy and effectiveness of treatments during clinical trials.
• Researchers can also use it to find possible side effects of drugs.
• These methods are ideal for conducting drug treatment trials and performing other
experiments to understand the impact of different medications and medical devices on
the human body
15. Perform genetics studies
• It's an important discipline in the study of Mendelian genetics.
• Geneticists use it to study the inheritance patterns of genes.
• They also use it to study the genetic structure of a population.
• Researchers also use biometry to map chromosomes and understand the behaviour of
genes in a population.
PART 1: APPLICATION OF BIOSTATISTICS
16. Other applications
• Determining leading causes of death
and burden of disease
• Health status of the population
• Morbidity patterns
PART 1: APPLICATIONS OF BIOSTATISTICS
Predictive modelling
• In public health, predictive modeling is a pivotal aspect of biostatistics.
• This statistical process utilizes existing data to forecast future events, uncovering
patterns and trends.
• Is applied in epidemiology for screening individuals prone to specific diseases.
• For instance in breast cancer, where factors like age, race, family history, and more
are analyzed to gauge the risk.
• Predictive modeling plays a crucial role in preventing breast cancer-related deaths
by identifying individuals who may need preventive or treatment measures.
• Beyond cancer and pandemics, this approach extends to various public health
concerns, showcasing its versatility in foreseeing and addressing health challenges.
PART 1: APPLICATIONS OF BIOSTATISTICS
Decision-making
• Healthcare leaders
• Researchers
• Policymakers.
Operational Viability:
• Biostatistics provides the necessary data to assess the operational feasibility of new ideas and initiatives.
• It helps in making informed decisions about acquisitions, tool prototypes, and hiring strategies, setting the parameters for
project scopes and methodologies.
Guarding Against Bias:
• Biostatistical studies undergo rigorous examination to detect and eliminate bias.
• Public health’s commitment to equitability ensures that data collection processes are designed to be fair and objective,
preventing unfair conclusions.
Protecting Data Subjects:
• Biostatistical researchers prioritize the protection of data subjects. Personal information collected for public health
research is anonymized and safeguarded, addressing privacy concerns and mitigating risks associated with unsecured
data.
PART 1: DATA CLASSIFICATION
The main objectives of Classification of Data are
as follows:
• Explain similarities and differences of data
• Simplify and condense data’s mass
• Facilitate comparisons
• Study the relationship
• Prepare data for tabular presentation
• Present a mental picture of the data
PART 1: DATA CLASSIFICATION
There are different types of data classification,
depending on the characteristics.
• Structured and Unstructured
• Primary or Secondary
• Qualitative and Quantitative.
• Number of variables: Univariate, Bivariate, Multivariate
Classifying data is an important step to ensure proper
analysis
PART 1: DATA CLASSIFICATION
Univariate data
• This type of data consists of only one variable.
• The analysis of univariate data is thus the simplest form of
analysis since the information deals with only one quantity that
changes.
• It does not deal with cause sor relationships and the main
purpose of the analysis is to describe the data and find
patterns that exist within it.
• The example of a univariate data can be height.
PART 1: DATA CLASSIFICATION
Bivariate data
 This type of data involves two different variables.
 The analysis of this type of data deals with causes and relationships and the analysis is done to find
out the relationship among the two variables.
 Example of bivariate data can be temperature and ice cream sales in summer season.
 Bivariate data analys isinvolves comparisons, relationships, causes and explanations
PART 1: DATA CLASSIFICATION
Multivariate data
• When the data involves three or more variables, it is categorized under
multivariate.
• Example of this type of data is suppose an advertiser wants to compare
the popularity of four advertisements on a website, then their click rates
could be measured for both men and women and relationships between
variables can then be examined.
• It is similar to bivariate but contains more than one dependent variable
• The ways to perform analysis on this data depends on the goals to be
achieved.
• Some of the techniques are regression analysis, path analysis, factor
analysis and multivariate analysis of variance (MANOVA)
DATA ANALYSIS
Choice of method
• Size
• Complexity
• Number of variables
• Nature of variables
• Study objectives
• Research questions
• Hypothesis
PART 1: DATA ANALYSIS
Descriptive analysis
• Suitable for analyzing and presenting data, such as
mean and median.
Inferential
• To establish functional relationship between variables,
more advanced analytical techniques, such as
correlation and regression
PART 1: DATA INTERPRETATION
Data interpretation
• Involves inferring conclusions from the results of
data analysis.
• This exercise allows researchers to categorise,
manipulate and summarise their findings to
answer important questions in public health,
biology and medicine.
PART 1: PRIMARY AND SECONDARY DATA
Definition
• Primary data are the original data derived from research endeavors
and collected through methods such as direct observation, indirect
observation, interviews, questionnaire
• Secondary data are data derived from primary data and sources
include published reports, journal articles, news papers
• Often, the distinction between primary and secondary data may be
less than clear.
• In conducting research, both types of data are collected and created
• It is essential to have plan for the management of all types of data
and primary materials
PART1: SOURCES OF EPIDEMIOLOGICAL DATA
Epidemiologists use primary and secondary data sources to calculate
rates and conduct studies.
• Primary data is the original data collected for a specific purpose by or for an
investigator. For example, an epidemiologist may collect primary data by interviewing
people who became ill after eating at a restaurant in order to identify which specific
foods were consumed.
• Collecting primary data is expensive and time-consuming, and it usually is
undertaken only when secondary data is not available.
• Secondary data is data collected for another purpose by other individuals or
organizations.
• Examples of sources of secondary data that are commonly used in epidemiological
studies include birth and death certificates, population census records, patient medical
records, disease registries, insurance claim forms and billing records, public
health department case reports, and surveys of individuals and households
PART 1: PRIMARY AND SECONDARY DATA
Primary Materials Primary Data Secondary Data
Interview schedules Interview audio recordings
Surveys
Experiments
Nvivo interview transcripts
Purchased laboratory
reagents
Investigational product Product analyses
Research animals Tissue samples Stained slides
Validated questionnaires Completed paper and
pencil questionnaires
SPSS data files containing
raw data and calculated
variable summary scores
PART 1: PRIMARY AND SECONDARY DATA
BASIS FOR COMPARISON PRIMARY DATA SECONDARY DATA
Meaning Primary data refers to the first
hand data gathered by the
researcher himself.
Secondary data means data
collected by someone else earlier.
Data Real time data Past data
Process Very involved Quick and easy
Source Surveys, observations,
experiments, questionnaire,
personal interview, etc.
Government publications, websites,
books, journal articles, internal
records etc.
Cost effectiveness Expensive Economical
Collection time Long Short
Specific Always specific to the researcher's
needs.
May or may not be specific to the
researcher's need.
Available in Crude form Refined form
Accuracy and Reliability More Relatively less
Part 1: RAW DATA
PART 1: ELEMENTS, OBSERVATIONS,VARIABLES, DATA
Element
• Entities or units on which data are collected such as person, place, or object
Observation
• Set of measurements or observations related to a particular element
Variable
• Character or attribute of interest on a particular element and which takes on different values
Total number of data values
• The number of elements times the number of variables
Data
• Is a specific measurement of a variable – it is the value you record in your data sheet.
PART 1: QUANTITATIVE AND QUALITATIVE
VARIABLES
Data is generally divided into two categories:
• Quantitative data represents amounts
• Qualitative or Categorical data represents groupings
• A variable that contains quantitative data is
a quantitative variable;
• A variable that contains categorical data is
a categorical variable.
PART 1: QUANTITATIVE AND QUALITATIVE
VARIABLES
Quantitative variables
• The numbers recorded represent real amounts
that can be added, subtracted, divided, etc.
• There are two types of quantitative variables:
• Discrete and continuous.
DATA TAXONOMY
Structured
Qualitative
Nominal
Ordinal
Quantitative
Discrete
Continuous
Unstructured
Text
Digital
Analogue
Image
Digital
Analogue
Indicate whether each of the following variables is discrete or continuous:the time it takes for you to
get to school
the number of Canadian couples who were married last year
the number of goals scored by a women’s hockey team
the speed of a bicycle
your age
the number of subjects your school offered last year
the length of time of a telephone call
the annual income of an individual
the distance between your house and school
the number of pages in a dictionary
PART 1: QUANTITATIVE VARIABLES
Type of variable What does the data
represent?
Examples
Discrete variables (aka
integer variables)
Counts of individual
items or values.
•Number of students in a
class
•Number of different
tree species in a forest
Continuous
variables (aka ratio
variables)
Measurements of
continuous or non-finite
values.
•Distance
•Volume
•Age
PART 1: QUALITATIVE VARIABLES
Qualitative variables
• Categorical variables represent groupings of some kind.
• They are sometimes recorded as numbers, but the numbers represent
categories rather than actual amounts of things.
• There are three types of categorical variables:
• Binary, nominal, and ordinal variables.
• Sometimes a variable can work as more than one type
• An ordinal variable can also be used as a quantitative variable if the scale is
numeric and doesn’t need to be kept as discrete integers.
• For example, star ratings on product reviews are ordinal (1 to 5 stars), but
the average star rating is quantitative.
PART 1: QUALITATIVE VARIABLES
Type of variable What does the data
represent?
Examples
Binary variables (aka
dichotomous variables)
Yes or no outcomes. •Heads/tails in a coin flip
•Win/lose in a football game
Nominal variables Groups with no rank or order
between them.
•Species names
•Colors
•Brands
Ordinal variables Groups that are ranked in a
specific order.
•Finishing place in a race
•Rating scale responses in a
survey, such as Likert scales
PART 1: INDEPENDENT AND DEPENDENT
VARIABLES
Independent vs dependent variables
• Experiments are usually designed to find out what effect one
variable has on another for instance the effect of salt addition on
plant growth.
• The independent variable (the one you think might be the cause) is
manipulated and then the dependent variable (the one you think
might be the effect) is measured to find out what this effect might
be.
• There are variables that you hold constant (control variables) in
order to focus on your experimental treatment.
PART 1: INDEPENDENT AND DEPENDENT VARIABLES
Independent vs dependent vs control variables
Type of variable Definition Example (salt tolerance
experiment)
Independent variables (aka
treatment variables)
Variables you manipulate in
order to affect the outcome of an
experiment.
The amount of salt added to each
plant’s water.
Dependent
variables (aka response
variables)
Variables that represent the
outcome of the experiment.
Any measurement of plant health
and growth: in this case, plant
height and wilting.
Control variables Variables that are held constant
throughout the experiment.
The temperature and light in the
room the plants are kept in, and
the volume of water given to each
plant.
OTHER COMMON TYPES OF VARIABLES
Other types
 Definition of the independent and dependent variables and determination of whether they are
categorical or quantitative enables choice of the correct statistical test.
Type of variable Definition Example (salt tolerance experiment)
Confounding variables A variable that hides the true effect of another variable
in an experiment.
This can happen when another variable is closely related
to a variable you are interested in, but you haven’t
controlled it in your experiment. Be careful with these,
because confounding variables run a high risk of
introducing a variety of research biases to your work,
particularly omitted variable bias.
Pot size and soil type might affect plant survival as much
or more than salt additions. In an experiment you would
control these potential confounders by holding them
constant.
Latent variables A variable that can’t be directly measured, but that you
represent via a proxy.
Salt tolerance in plants cannot be measured directly, but
can be inferred from measurements of plant health in our
salt-addition experiment.
Composite variables A variable that is made by combining multiple variables
in an experiment. These variables are created when you
analyze data, not when you measure it.
The three plant health variables could be combined into
a single plant-health score to make it easier to present
your findings
PART 1: VARIABLES RESEARCH
82
No Variable Type Measurement Scale Categories
1 Age (years) Independent Interval -
2 Weight (kg) Independent Interval -
3 Serum creatinine (μmol/L) Independent Interval -
4 Blood cholesterol (mmol/L) Independent Interval -
5 Serum triglyceride (mmol/L) Independent Interval -
6 Blood uric acid (μmol/L) Independent Interval -
7 Fast blood glucose (mmol/L) Independent Interval -
8 Systolic blood pressure (mmHg) Independent Interval -
9 Diastolic blood pressure (mmHg) Independent Interval -
10 Hemoglobin (g/L) Independent Interval -
11 Hematocrit Independent Interval -
PART 1: VARIABLES IN RESEARCH
83
No Variable Type Measurement Scale Categories
12 BMI Independent Interval -
13 ≥High school education Independent Nominal Above/Under
14 Health insurance coverage Independent Nominal Yes/No
15 Smoking Independent Nominal Yes/No
16 History of CKD Independent Nominal Yes/No
17 Family history of diabetes Background Nominal Yes/No
18 Family history of hypertension Background Nominal Yes/No
19 Family history of CKD Background Nominal Yes/No
20 Repeatedly respiratory tract infection Background Nominal Yes/No
21 Nephrotoxic medications Independent Nominal Yes/No
22 Obesity Independent Nominal Yes/No
PART 1: VARIABLES IN RESEARCH
84
No Variable Type Measurement Scale Categories
23 Central obesity Independent Nominal Yes/No
24 Metabolic syndrome Independent Nominal Yes/No
25 Hypertension Independent Nominal Yes/No
26 Diabetes Independent Nominal Yes/No
27 Hyperlipidemia Independent Nominal Yes/No
28 Hyperuricemia Independent Nominal Yes/No
29 Cardiovascular disease Independent Nominal Yes/No
30 eGFR <60 mL/min/1.73 m2 Independent Nominal Yes(<60)/No(>60)
31 ACR >30 mg/g Independent Nominal Yes(>30)/No(<30)
32 Hematuria Independent Nominal Yes/No
33 CKD Status Dependent Nominal Yes/No
PART 1: DISCUSS THE CATEGORIZATION OF THE
FOLLOWING VARIABLES
Number of all hospital discharges
Acute care hospital discharges per 100
Number of acute care hospital discharges
Inpatient surgical procedures per year per 100 000
Total number of inpatient surgical procedures per
year
Average length of hospital stay
Bed occupancy rate (%)
Outpatient contacts per person per year
Autopsy rate (%) for hospital deaths
Inpatient care discharges per 100
Turn over rate
Outpatient/In-patient ration
Number of surgeries
Number of deliveries
Number of x-rays/scans
Number of lab tests
Number of beds per capita
Number of
PART 1: QUALITATIVE RESEARCH METHODS
Method Overall Purpose Advantages Challenges
Surveys •Quickly and/or easily gets lots
of information from people in a non
threatening way
•Can complete anonymously
•Inexpensive to administer
•Easy to compare and analyze
•Administer to many people
•Can get lots of data
•Many sample questionnaires already
exist
•Might not get careful feedback
•Wording can bias client's responses
•Impersonal
•May need sampling expert
•Doesn't get full story
Interviews •Understand someone's impressions
or experiences
•Learn more about answers to
questionnaires
•Get full range and depth of
information
•Develops relationship with client
•Can be flexible with client
•Can take time
•Can be hard to analyze and compare
•Can be costly
•Interviewer can bias client's responses
Observation •Gather firsthand information about
people, events, or programs
•View operations of a program as
they are actually occurring
•Can adapt to events as they occur
•Can be difficult to interpret seen
behaviors
•Can be complex to categorize
observations
•Can influence behaviors of program
participants
•Can be expensive
PART 1: QUALITATIVE RESEARCH METHODS
Method Overall Purpose Advantages Challenges
Focus Groups •Explore a topic in depth
through group discussion
•quickly and reliably get
common impressions
•can be efficient way to get
much range and depth of
information in short time
•can convey key information
about programs
•can be hard to analyze
responses
•need good facilitator for
safety and closure
•difficult to schedule 6-8
people together
Case Studies •Understand an experience
or conduct comprehensive
examination through cross
comparison of cases
•depicts client's experience in
program input, process and
results
•powerful means to portray
program to outsiders
•usually time consuming to
collect, organize and
describe
•represents depth of
information, rather than
breadth
FACTORIALS
These provide an easier way of wring large numbers in forms
Just like in mathematics where we can use bases to write large numbers
For instance 10 to base 2 =1010
100 to base 2 =1100100
10!=10x9x8x7x6x5x4x3x2x1=
PART 1: FACTORIALS
The Factorial of a whole number 'n' is defined as the product of that number with
every whole number less than or equal to 'n' till 1.
For example, the factorial of 4 is 4 × 3 × 2 × 1, which is equal to 24. It is
represented using the symbol
5 factorial, that is, 5! can be written as: 5! = 5 × 4 × 3 × 2 × 1 = 120.
The formulas for n factorial are:
n! = n(n-1)(n-2)…………………….(3)(2)(1)
n! = n × (n - 1)!
PART 1: FACTORIALS
5!=5x4x3x2x1
6!=6x5x4x3x2x1
0!=1
PART 1: FACTORIALS
PERMUTATION AND COMBINATION
Both are about rearranging
numbers or objects to change their
position
For instance a set of numbers like
123 can be rearranged in different
ways e.g
123
132
321
312
213
231
Both are about rearranging
numbers or objects to change
their position
For instance a set of numbers
like 12 can be rearranged in
different ways e.g
12
21
Both are about rearranging
numbers or objects to change
their position
For instance a set of numbers
like 1 can be rearranged in
different ways e.g
1
A set {1 2 3 4 5 6 7 8 9}
Permutation
 123
 Where n=3, and r =3
 nPr=3P3=n!/(n-r)!=3!/(3-3)!=3!/0!=(3X2X1)/1=6/1=6
Combination
nCr
Where n=3 and r =3
nCr =n!/[(n-r)! X r!=3!/[(3-3)!x3!=(3x2x1)/{(0!) x (3x2x1)]=6/6=1
PREMIER LEAGUE EXAMPLE
The set of 20 clubs which are unique
They play in pairs meaning 2 at a time
That is r=2
Then n=20
For permutation, the number of unique pairs given that the order matters i.e home
away, therefore the pairs =nPr=20P2=20!/(20-2)!=20!/18!=
PART 1: USE OF FACTORIAL
Use of Factorial
• One area where factorials are widely used is in permutations &
combinations.
• Permutation is an ordered arrangement of outcomes and it can be
calculated with the formula: n Pr= n! / (n - r)!
• Combination is a grouping of outcomes in which order does not
matter. It can be calculated with the formula: nCr = n! / [ (n - r)! r!]
• In both of these formulas, 'n' is the total number of things available
and 'r' is the number of things that have to be chosen.
PART 1: FACTORIAL
Set {1,2,3)has three elements meaning n=3
• Permutation i.e the order in which the elements of the sub-set matters
and makes a difference: 1,2; 1,3; 2,3; 2,1; 3,1;3,2, the number of
subsets of twos is equal to 6
• nPr=n!/(n-r)!, wherE n is the total number of elements in the mother set
or population and r is the number of elements in the subset.
• For example if we had 20 elements in the mother set and we are
picking two at a time, the by permutation we will process as follows:
20P2
• Therefore 20P2=20!/(20-2)!=20!/18!=380
PART 1: FACTORIAL
Set {1,2,3)has three elements meaning n=3
• Combination i.e the order in which the elements of the sub-set does not
matter and makes no difference: 1,2; 1,3; 2,3 the number of subsets
of twos is equal to 3
• nCr=n!/(n-r)!r!, where n is the total number of elements in the mother
set or population and r is the number of elements in the subset.
• For example if we had 20 elements in the mother set and we are
picking two at a time, the by permutation we will process as follows:
20P2
• Therefore 20C2=20!/(20-2)!2!=20!/18!2!=190
PART 1: PERMUTATION-THE ORDER MATTERS
{1, 2, 3} arrange in pairs using permutation i.e. the order should be respected and
matters
1,2; 1,3, 2, 3, 2,1, 3,1. 3,2=6 pairs
{1,2,3,4}
1,2; 1,3; 1,4; 2,3; 2,4; 3,4; 2,1; 3,1;4,1;3,2;4,2;4,3;=12 pairs
Premiere league we have 20 clubs
n Pr= n! / (n- r)!= 20 P2= 20! / (20 - 2)!=20!/18!
PART 1: COMBINATION –ORDER DOES NOT
MATTER
{1,2,3}
1,2; 2,3,1,3=3 pairs
nCr = n! / [ (n- r)! 2!]
3C2 = 3! / [ (3 - 2)! 2!]
=3!/[1!x2!]=6/2=3
For premiere league with 20 clubs if the games were one way only then the number
of games would be =20!(18!x2!)=
PART 1: COMBINATIONS
nCr = n! / [ (n- r)! 2!], where r is the number of elements we pick for arrangement
at a time
30C5 = 30! / [ (30- 5)! 2!]
=30! / [ (25)! 2!]
PART 1: USE OF FACTORIALS
Example 1: How many 5-digit numbers can be formed using the digits 1, 2, 5, 7, and
8 in each of which no digit is repeated?
Solution:
The given 5 digits (1, 2, 5, 7 and 8) should be arranged among themselves in order
to get all possible 5-digit numbers.
The number of ways for doing this can be done by calculating the 5 factorial.
5! = 5 × 4 × 3 × 2 × 1 = 120
Answer: Therefore, the required number of 5-digit numbers is 120.
PART 1: USE OF FACTORIAL
Example 2v: In a group of 10 people, $200, $100, and $50 prizes are to be given. In how many
ways can the prizes be distributed?
Solution:
This is permutation because here the order of distribution of prizes matters. It can be calculated
as 10P3 ways.
10P3 = (10!) / (10 - 3)! = 10! / 7! = (10 × 9 × 8 × 7!) / 7! = 10 × 9 × 8 = 720 ways.
Example 3: Three $50 prizes are to be distributed to a group of 10 people. In how many ways
can the prizes be distributed?
Solution:
This is a combination because here the order of distribution of prizes does not matter (because all
prizes are of the same worth). It can be calculated using 10C3.
10C3 = (10!) / [ 3! (10 - 3)!] = 10! / (3! 7!) = (10 × 9 × 8 × 7!) / [(3 × 2 × 1) 7!] = 120 ways.
PERMUTION AND COMBINATION
Difference between Permutation and Combination
Permutation Combination
The different ways of arranging a set of
objects into a sequential order are termed as
Permutation.
One of the several ways of choosing items from
a large set of objects, without considering an
order is termed as Combination.
The order is very relevant. The order is quite irrelevant.
It denotes the arrangement of objects. It does not denote the arrangement of objects.
Multiple permutations can be derived from a
single combination.
From a single permutation, only a single
combination can be derived.
They can simply be defined as ordered
elements.
They can simply be defined as unordered set
PART 1: UNIVARIATE AND BIVARIATE DATA
Univariate data
• Data on ne variable
• Examples include height, skin colour, ethnicity, service coverage
Bivariate data
• Data where two variables are being compared for correlation or
causation
• Correlation =height and body weight; age and body weight
• Causation such as obesity and heart disease
PART 1: UNIVARIATE AND BIVARIATE DATA
Univariate analysis
• Summary statistics
• Central tendency
• Dispersion
• Frequency distribution
• Bar charts
• Histogram
• Pie chart
PRACTICE QUESTIONS
1. Explain why a sample statistic (the estimate from the sample) may differ from the
population parameter (the true value) and how you would minimize the difference.
2. A local coffee shop is creating a spreadsheet of their drinks for customers to view
on their website. The spreadsheet includes the calories, sugar content, and
ingredients for each coffee drink. Which of the following would be considered a
variable in this data set?
 Answers:
 The Calories
 The Customers
 The Coffee Shop
 The Coffee Drink
What are the other variables in the passage?
PRACTICE QUESTIONS
1. A political pollster is conducting a survey about voter's affiliation to a major
political party. He selects a random sample of voters who voted in the last
presidential election, and looks into how party affiliation differs based on age,
race, gender and location. How many variables can you identify in this data set?
Answers:
A. 5
B. 6
C. 4
D. 7
PART 1: SCALES OF MEASUREMENT
Rationale
• In order to analyze data, the variables have to be defined and categorized using
different scales of measurements.
• There are four scales of measurements- nominal scale, ordinal scale, interval scale,
and ratio scale.
• The scale of measurement of a variable determines the kind of statistical test to be
used.
• Psychologist Stanley Stevens developed the four common scales of measurement:
nominal, ordinal, interval and ratio.
• 1. Nominal scale
• 2. Ordinal scale
• 3. Interval scale
• 4. Ratio scale
PART 1: SCALES OF MEASUREMENT
Properties and scales of measurement
• Each scale of measurement has properties that determine how to properly analyse the data.
• The properties evaluated are identity, magnitude, equal intervals and a minimum value of
zero.
Properties of Measurement
• Identity: Identity refers to each value having a unique meaning.
• Magnitude: Magnitude means that the values have an ordered relationship to one another, so
there is a specific order to the variables.
• Equal intervals: Equal intervals mean that data points along the scale are equal, so the
difference between data points one and two will be the same as the difference between data
points five and six.
• A minimum value of zero: A minimum value of zero means the scale has a true zero point.
Degrees, for example, can fall below zero and still have meaning. But if you weigh nothing,
you don’t exist.
PART 1: STATISTICAL LEVELS OF
MEASUREMENT
Nominal-level Measurement
• There’s no numerical or quantitative value, and
qualities are not ranked.
• Nominal-level measurements are instead simply
labels or categories assigned to other variables.
• It’s easiest to think of nominal-level measurements
as non-numerical facts about a variable.
SCALES OF MEASUREMENT
Nominal scale,
• Also known as categorical variable scale, can be defined as a scale used for
labelling variables into different categories.
• The numbers are used to identify and classify people, objects or events, like
identity number, jersey number of sportspersons, and vehicle registration
number; thus, they have no specific numerical value or meaning. I
• In research, the nominal scale is used for analysing categorical variables such
as gender, place of residence, marital status, political party, blood group
and so on.
• The interval between numbers and their order does not matter on the
nominal scale
SCALES OF MEASUREMENT
Nominal scale:
• A nominal scale preserves only the equality property; there is no
‘more or less than’ relation in this measurement.
• The nominal scale of measurement defines the identity property
of data.
• This scale has certain characteristics, but doesn’t have any form
of numerical meaning.
• The data can be placed into categories but can’t be multiplied,
divided, added or subtracted from one another.
• It’s also not possible to measure the difference between data
points
SCALES OF MEASUREMENT
Nominal scale:
• The statistical analysis that can be performed on a nominal scale is the
frequency distribution and percentage.
• It can be analyzed graphically using a bar chart or a pie chart. If there are two
categorical variables, quantitative analysis techniques such as joint frequency
distribution and cross-tabulation can be used.
• Mode is the only measure of central tendency which can be used in this scale.
• Since numbers do not have a quantitative value, addition, subtraction,
multiplication, division, and measures of dispersion cannot be applied.
• It is also possible to perform contingency correlation. Hypothesis tests can be
carried out on data collected in the nominal form using the Chi-square test. It can
tell whether there is an association between the variables.
• However, it cannot establish a cause and effect relationship or explain the form
of relationship.
PART 1: STATISTICAL LEVELS OF
MEASUREMENT
Ordinal-level Measurement
• Outcomes can be arranged in an order, but all data
values have the same value or weight.
• Although they’re numerical, ordinal-level measurements
can’t be subtracted against each other in statistics
because only the position of the data point matters.
• Ordinal levels are often incorporated into nonparametric
statistics and compared against the total variable group.
SCALES OF MEASUREMENT
Ordinal scale
• is a ranking scale in which numbers are assigned to variables to represent their
rank or relative position in the data set.
• The variables are arranged in a specific order rather than just naming them.
• So they can be named, grouped, and ranked.
• In research, the ordinal scale is used for ranking students in a class (1,2,3), rating
a product satisfaction (very unsatisfied-1, unsatisfied-2, neutral-3, satisfied-4,
very satisfied-5), evaluating the frequency of occurrences (very often-1, often-2,
not often-3, not at all-4), assessing the degree of agreement (totally agree-1,
agree-2, neutral-3, disagree-4, totally disagree-5
• In this scale, the attributes are arranged in ascending or descending order. The
numbers indicate rank or the order of quality or quantity.
SCALES OF MEASUREMENT
Ordinal Scale:
• The origin of scale is absent because there is no fixed start or ‘true zero’ in the data.
• Hence, it is impossible to find the magnitude of difference or distance between the variables or their
degree of quality.
• For example, while ranking students in terms of potential for an award, a student labelled ‘1’ is better
than the student labelled ‘2’, ‘2’ is better than ‘3’ and so forth.
• However, this ordinal scaling cannot quantify or indicate how much better the second student to the first
student, or the difference between the potential of first and second students, the same as the difference
between the second and third.
• Similarly, very satisfied will always be better than satisfied and unsatisfied will be better than very
unsatisfied.
• The order of variables is of prime importance, and so is the labelling.
• The ordinal scale is the second level of measurement from a statistical point of view.
• These scales are unique up to a monotone transformation. A monotone transformation T is one that assigns
new values such that if f(X) > f(Y) in the ordinal scale, then T(f(X)) > T(f(X)) in the newly transformed scale
SCALES OF MEASUREMENT
Ordinal Scale:
• The ordinal data can be presented using tabular or graphical formats.
• The descriptive analysis such as percentile, quartile, median and mode
can be determined in ordinal scale data. Since the interval between
numbers is insignificant, addition, subtraction, multiplication, division, and
measures of dispersion cannot be applied.
• It is possible to test for order correlation using Spearman's rank
correlation coefficient.
• Non-parametric tests such as Mann-Whitney U test, Friedman’s ANOVA,
Kruskal–Wallis H test can also be used to analyze ordinal scale data
SCALES OF MEASUREMENT
Interval Scale
• can be defined as a quantitative scale in which both the order and the exact difference
between categories are known.
• Thus it measures variables that can be labelled, ordered, and have an equal interval.
• However, the point of beginning or zero point on an interval scale is arbitrarily
established and is not a ‘true zero’ or ‘absolute zero’.
• Thus the value of zero does not indicate the complete absence of the characteristic being
measured.
• In Fahrenheit/Celsius temperature scales, 0°F and 0°C do not indicate an absence of
temperature.
• In fact, negative values of temperature do exist.
• Temperature, calendar years, attitudes, opinions and so on fall under the interval scale.
Likert scale, Net Promoter Score (NPS), Bipolar matrix table, Semantic differential scale
are the widely used interval scale examples
PART 1: STATISTICAL LEVELS OF
MEASUREMENT
Interval-level Measurement
• Outcomes can be arranged in order, but differences
between data values may now have meaning. T
• wo data points are often used to compare the passing
of time or changing conditions within a data set.
• There is often no “starting point” for the range of data
values, and calendar dates or temperatures may not
have a meaningful intrinsic zero value.
SCALES OF MEASUREMENT
Interval Scale:
• The major difference between ordinal and interval scale is the existence of
meaningful and equal intervals between variables.
• For example, 40 degrees is higher than 30 degrees, and the difference between
them is a measurable 10 degrees, as is the difference between 90 and 100
degrees.
• However, while ranking students on an ordinal scale, the difference between first
and second student might be 5 marks, and between second and third student is 8
marks.
• Thus, with an interval scale, it is possible to identify whether a given attribute is
higher or lower than another and the extent to which one is higher or lower than
another.
SCALES OF MEASUREMENT
Interval Scale:
• The interval scale is the third level of measurement scale. The arbitrary presence of zero has implications
in data manipulation and analysis.
• It is possible to add or subtract a constant to all of the interval scale values without affecting the form of
the scale but not possible to multiply or divide the values.
• For instance, two persons with scale positions 4 and 5 are as far apart as persons with scale positions 9
and 10, but not that a person with score a 10 feels twice as strong as one with a score 5.
• Similarly, 100°F cannot be defined as twice as hot as 50°F because the corresponding temperatures on
the centigrade scale, 37.78°C and 10°C, are not in the ratio 2:1.
• Unlike the ordinal and nominal scale, arithmetic operations such as addition and subtraction can be
performed on an interval scale.
• Any positive linear transformation of form Y = a + bX will preserve the properties of an interval scale
• The arithmetic mean, median, and mode can be used to calculate the central tendency in this scale.
• The measures of dispersion, such as range and standard deviation, can also be calculated.
• Apart from those techniques, product-moment correlation, t-test, and regression analysis are extensively
used for analyzing interval data.
PART 1: STATISTICAL LEVELS OF
MEASUREMENT
Interval-level Measurement
• Outcomes can be arranged in order, but differences
between data values may now have meaning. T
• wo data points are often used to compare the passing
of time or changing conditions within a data set.
• There is often no “starting point” for the range of data
values, and calendar dates or temperatures may not
have a meaningful intrinsic zero value.
SCALES OF MEASUREMENT
Ratio Scale
• Can be defined as a quantitative scale that bears all the characteristics of an interval scale and
a ‘true zero’ or ‘absolute zero’, which implies the complete absence of the attribute being
measured.
• Thus it measures variables that can be labelled, ordered, has equal intervals and the ‘absolute
zero’ property.
• Before deciding to use a ratio scale, the researcher must observe whether the variables possess
all these characteristics.
• The variables such as length, age, weight, income, years of schooling, price etc., are examples of
a ratio scale.
• They do not have negative numbers because of the existence of an absolute zero point of origin.
• For instance, a price of zero means the commodity does not have any price (it is free); and there
cannot be any negative price.
• Thus ratio scale has a meaningful zero.
• It allows unit conversions like metres to feet, kilogram to calories etc.
SCALES OF MEASUREMENT
Ratio Scale:
• The ratio scale is the highest level of measurement scale. It is unique to
a congruence or proportionality transformation of form Y = bX.
• The ‘absolute zero’ property allows performing a wide range of
descriptive and inferential statistics on ratio scale variables.
• It is possible to compare both differences in values and the relative
magnitude of values.
• For instance, the difference between 15cm and 20cm is the same as
between 30cm and 35cm, and 30 cm is twice as long as 15 cm.
• Arithmetic operations such as addition, subtraction, multiplication, and
division (ratio) can be performed in ratio scale data
SCALES OF MEASUREMENT
Ratio Scale:
• All statistical operations applicable to nominal, ordinal and
interval scale can be performed on ratio scale data as well.
• Besides, measures of central tendency such as geometric
mean and harmonic mean and all measures of dispersion,
including coefficient of variation, can be determined.
• Parametric tests such as independent sample t-test, paired
sample t-test, ANOVA etc., can also be performed.
• The ratio scale provides unique opportunities for statistical
analysis.
SCALES OF MEASUREMENT
Scale Properties
Nominal Categories
Ordinal Categories Rank
Interval Categories Rank Intervals
Ratio Categories Rank Interval True or absolute
zero
SCALES OF MEASUREMENT
CROSS TABULATION
Body weight
Normal Overweight
Gender Male 10 15 25
Female 15 10 25
SOURCES OF DATA
Three main sources for demographic and social statistics
• Censuses
• Surveys
• Administrative records.
A population census
• The total process of collecting, compiling, evaluating, analysing and publishing or otherwise
disseminating demographic, economic and social data pertaining, at a specified time, to all persons
in a country or in a well-delimited part of a country.
• The census collects data from each individual and each set of living quarters for the whole country
or area.
• It allows estimates to be produced for small geographic areas and for population subgroups.
• It also provides the base population figures needed to calculate vital rates from civil registration
data, and it supplies the sampling frame for sample surveys.
SOURCES OF DATA
Population census steps
• Securing the required legislation, political support and funding
• Mapping and listing all households
• Planning and printing questionnaires, instruction manuals and procedures
• Planning for shipping census materials
• Recruiting and training census personnel
• Organizing field operations
• Launching publicity campaigns
• Preparing for data processing
• Planning for tabulation
SOURCES OF DATA
Population census data
• Because of the expense and complexity of the census, only the most basic items
are included on the questionnaire for the whole population.
• Choosing these items requires considering the needs of data users; availability
of the information from other data sources; international comparability;
willingness of the respondents to give information; and available resources to
fund the census.
• Many countries carry out a sample enumeration in conjunction with the census.
• This can be a cost-effective way to collect more detailed information on
additional topics from a sample of the population.
• The sample enumeration uses the infrastructure and facilities that are already in
place for the census.
SOURCES OF DATA
Surveys
• A continuing program of intercensal household surveys is useful for
collecting detailed data on social, economic and housing characteristics
that are not appropriate for collection in a full-scale census.
• Household-based surveys are the most flexible type of data collection.
• They can examine most subjects in detail and provide timely information
about emerging issues.
• They increase the ability and add to the experience of in-house technical
and field staff and maintain resources that have already been
developed, such as maps, sampling frame, field operations, infrastructure
and data-processing capability.
SOURCES OF DATA
Surveys
• The many types of household surveys include multi-
subject surveys, specialized surveys, multi-phase surveys
and panel or longitudinal surveys.
• Each type of survey is appropriate for certain kinds of
data-collection needs.
• Household surveys can be costly to undertake, especially
if a country has no ongoing program
SOURCES OF DATA
Administrative records
• Administrative records are statistics compiled from various administrative
processes.
• They include not only the vital events recorded in a civil registration system but
also education statistics from school records; health statistics from hospital
records; employment statistics; and many others.
• The reliability and usefulness of these statistics depend on the completeness of
coverage and the compatibility of concepts, definitions and classifications with
those used in the census.
• Administrative records are often by-products of administrative processes, but
they can also be valuable complementary sources of data for censuses and
surveys.
SOURCES OF DATA
Administrative records
• Birth certificates
• Death certificates
• Patient medical records
• Disease registries
• Insurance claim forms
• Billing records
• Public health department case reports
Ad

More Related Content

Similar to Biostatistics notes for Masters in Public Health (20)

Introduction to statistics in health care
Introduction to statistics in health care Introduction to statistics in health care
Introduction to statistics in health care
Dhasarathi Kumar
 
Biostatistics khushbu
Biostatistics khushbuBiostatistics khushbu
Biostatistics khushbu
khushbu mishra
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
khushbu mishra
 
Frequency Distribution.pdf
Frequency Distribution.pdfFrequency Distribution.pdf
Frequency Distribution.pdf
Chaitali Dongaonkar
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdf
BirhanTesema
 
Biostatistics
Biostatistics Biostatistics
Biostatistics
Vaibhav Ambashikar
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
MoonWeryah
 
biostats.pptx hjvbuvfyjgvguyjgvfvtfugvghjbk
biostats.pptx hjvbuvfyjgvguyjgvfvtfugvghjbkbiostats.pptx hjvbuvfyjgvguyjgvfvtfugvghjbk
biostats.pptx hjvbuvfyjgvguyjgvfvtfugvghjbk
anweshagarg49
 
Lecture lecture lecture lecture lecturen
Lecture lecture lecture lecture lecturenLecture lecture lecture lecture lecturen
Lecture lecture lecture lecture lecturen
zaeme05
 
050325Online SPSS.pptx spss social science
050325Online SPSS.pptx spss social science050325Online SPSS.pptx spss social science
050325Online SPSS.pptx spss social science
NurFatin805963
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.ppt
ssuserf0d95a
 
data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023
ayesha455941
 
presentaion-ni-owel.pptx
presentaion-ni-owel.pptxpresentaion-ni-owel.pptx
presentaion-ni-owel.pptx
JareezRobios
 
Planning-Data-Analysis-CHOOSING-STATISTICAL-TOOL.docx
Planning-Data-Analysis-CHOOSING-STATISTICAL-TOOL.docxPlanning-Data-Analysis-CHOOSING-STATISTICAL-TOOL.docx
Planning-Data-Analysis-CHOOSING-STATISTICAL-TOOL.docx
emmanuelangelof
 
presentaion ni owel iwiw.pptx
presentaion ni owel iwiw.pptxpresentaion ni owel iwiw.pptx
presentaion ni owel iwiw.pptx
JocundBrewDelaCernaA
 
Biostatistics clinical research & trials
Biostatistics clinical research & trialsBiostatistics clinical research & trials
Biostatistics clinical research & trials
eclinicaltools
 
biostatistics-210618023858.pptx bbbbbbbbbb
biostatistics-210618023858.pptx bbbbbbbbbbbiostatistics-210618023858.pptx bbbbbbbbbb
biostatistics-210618023858.pptx bbbbbbbbbb
RAMJIBANYADAV2
 
Statistics as a discipline
Statistics as a disciplineStatistics as a discipline
Statistics as a discipline
RosalinaTPayumo
 
Chapter 1: Introduction to Statistics.pptx
Chapter 1: Introduction to Statistics.pptxChapter 1: Introduction to Statistics.pptx
Chapter 1: Introduction to Statistics.pptx
RaviSinghMahatra
 
Introduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptxIntroduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptx
Melba Shaya Sweety
 
Introduction to statistics in health care
Introduction to statistics in health care Introduction to statistics in health care
Introduction to statistics in health care
Dhasarathi Kumar
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdf
BirhanTesema
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
MoonWeryah
 
biostats.pptx hjvbuvfyjgvguyjgvfvtfugvghjbk
biostats.pptx hjvbuvfyjgvguyjgvfvtfugvghjbkbiostats.pptx hjvbuvfyjgvguyjgvfvtfugvghjbk
biostats.pptx hjvbuvfyjgvguyjgvfvtfugvghjbk
anweshagarg49
 
Lecture lecture lecture lecture lecturen
Lecture lecture lecture lecture lecturenLecture lecture lecture lecture lecturen
Lecture lecture lecture lecture lecturen
zaeme05
 
050325Online SPSS.pptx spss social science
050325Online SPSS.pptx spss social science050325Online SPSS.pptx spss social science
050325Online SPSS.pptx spss social science
NurFatin805963
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.ppt
ssuserf0d95a
 
data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023
ayesha455941
 
presentaion-ni-owel.pptx
presentaion-ni-owel.pptxpresentaion-ni-owel.pptx
presentaion-ni-owel.pptx
JareezRobios
 
Planning-Data-Analysis-CHOOSING-STATISTICAL-TOOL.docx
Planning-Data-Analysis-CHOOSING-STATISTICAL-TOOL.docxPlanning-Data-Analysis-CHOOSING-STATISTICAL-TOOL.docx
Planning-Data-Analysis-CHOOSING-STATISTICAL-TOOL.docx
emmanuelangelof
 
Biostatistics clinical research & trials
Biostatistics clinical research & trialsBiostatistics clinical research & trials
Biostatistics clinical research & trials
eclinicaltools
 
biostatistics-210618023858.pptx bbbbbbbbbb
biostatistics-210618023858.pptx bbbbbbbbbbbiostatistics-210618023858.pptx bbbbbbbbbb
biostatistics-210618023858.pptx bbbbbbbbbb
RAMJIBANYADAV2
 
Statistics as a discipline
Statistics as a disciplineStatistics as a discipline
Statistics as a discipline
RosalinaTPayumo
 
Chapter 1: Introduction to Statistics.pptx
Chapter 1: Introduction to Statistics.pptxChapter 1: Introduction to Statistics.pptx
Chapter 1: Introduction to Statistics.pptx
RaviSinghMahatra
 
Introduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptxIntroduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptx
Melba Shaya Sweety
 

More from PituaIvaan1 (9)

1727261612120_Gastrointestinal Radiology.pptx MBChB FINAL!!.pptx
1727261612120_Gastrointestinal Radiology.pptx MBChB FINAL!!.pptx1727261612120_Gastrointestinal Radiology.pptx MBChB FINAL!!.pptx
1727261612120_Gastrointestinal Radiology.pptx MBChB FINAL!!.pptx
PituaIvaan1
 
Ocular leprosy undergraduate lecture powerpoint
Ocular leprosy undergraduate lecture powerpointOcular leprosy undergraduate lecture powerpoint
Ocular leprosy undergraduate lecture powerpoint
PituaIvaan1
 
AN APPROACH TO A PATIENT WITH TREMORS.pptx
AN APPROACH TO A PATIENT WITH TREMORS.pptxAN APPROACH TO A PATIENT WITH TREMORS.pptx
AN APPROACH TO A PATIENT WITH TREMORS.pptx
PituaIvaan1
 
Clinical Epilepsy power point presentation
Clinical Epilepsy power point presentationClinical Epilepsy power point presentation
Clinical Epilepsy power point presentation
PituaIvaan1
 
Acute and Chronic Urinary Retention.pptx
Acute and Chronic Urinary Retention.pptxAcute and Chronic Urinary Retention.pptx
Acute and Chronic Urinary Retention.pptx
PituaIvaan1
 
diseases of conjunctiva power point presentation
diseases of conjunctiva power point presentationdiseases of conjunctiva power point presentation
diseases of conjunctiva power point presentation
PituaIvaan1
 
acuteandchronicurinaryretention-171230065610.pptx
acuteandchronicurinaryretention-171230065610.pptxacuteandchronicurinaryretention-171230065610.pptx
acuteandchronicurinaryretention-171230065610.pptx
PituaIvaan1
 
6-Pre-operative care assessment and preparations-1 - Copy - Copy.pptx
6-Pre-operative care assessment and preparations-1 - Copy - Copy.pptx6-Pre-operative care assessment and preparations-1 - Copy - Copy.pptx
6-Pre-operative care assessment and preparations-1 - Copy - Copy.pptx
PituaIvaan1
 
APPROACH TO A PATIENT PRESENTING WITH LIMB WEAKNESS
APPROACH TO A PATIENT PRESENTING WITH LIMB WEAKNESSAPPROACH TO A PATIENT PRESENTING WITH LIMB WEAKNESS
APPROACH TO A PATIENT PRESENTING WITH LIMB WEAKNESS
PituaIvaan1
 
1727261612120_Gastrointestinal Radiology.pptx MBChB FINAL!!.pptx
1727261612120_Gastrointestinal Radiology.pptx MBChB FINAL!!.pptx1727261612120_Gastrointestinal Radiology.pptx MBChB FINAL!!.pptx
1727261612120_Gastrointestinal Radiology.pptx MBChB FINAL!!.pptx
PituaIvaan1
 
Ocular leprosy undergraduate lecture powerpoint
Ocular leprosy undergraduate lecture powerpointOcular leprosy undergraduate lecture powerpoint
Ocular leprosy undergraduate lecture powerpoint
PituaIvaan1
 
AN APPROACH TO A PATIENT WITH TREMORS.pptx
AN APPROACH TO A PATIENT WITH TREMORS.pptxAN APPROACH TO A PATIENT WITH TREMORS.pptx
AN APPROACH TO A PATIENT WITH TREMORS.pptx
PituaIvaan1
 
Clinical Epilepsy power point presentation
Clinical Epilepsy power point presentationClinical Epilepsy power point presentation
Clinical Epilepsy power point presentation
PituaIvaan1
 
Acute and Chronic Urinary Retention.pptx
Acute and Chronic Urinary Retention.pptxAcute and Chronic Urinary Retention.pptx
Acute and Chronic Urinary Retention.pptx
PituaIvaan1
 
diseases of conjunctiva power point presentation
diseases of conjunctiva power point presentationdiseases of conjunctiva power point presentation
diseases of conjunctiva power point presentation
PituaIvaan1
 
acuteandchronicurinaryretention-171230065610.pptx
acuteandchronicurinaryretention-171230065610.pptxacuteandchronicurinaryretention-171230065610.pptx
acuteandchronicurinaryretention-171230065610.pptx
PituaIvaan1
 
6-Pre-operative care assessment and preparations-1 - Copy - Copy.pptx
6-Pre-operative care assessment and preparations-1 - Copy - Copy.pptx6-Pre-operative care assessment and preparations-1 - Copy - Copy.pptx
6-Pre-operative care assessment and preparations-1 - Copy - Copy.pptx
PituaIvaan1
 
APPROACH TO A PATIENT PRESENTING WITH LIMB WEAKNESS
APPROACH TO A PATIENT PRESENTING WITH LIMB WEAKNESSAPPROACH TO A PATIENT PRESENTING WITH LIMB WEAKNESS
APPROACH TO A PATIENT PRESENTING WITH LIMB WEAKNESS
PituaIvaan1
 
Ad

Recently uploaded (20)

Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
Volkswagen - Analyzing the World's Biggest Purchasing Process
Volkswagen - Analyzing the World's Biggest Purchasing ProcessVolkswagen - Analyzing the World's Biggest Purchasing Process
Volkswagen - Analyzing the World's Biggest Purchasing Process
Process mining Evangelist
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682
way to join real illuminati Agent In Kampala Call/WhatsApp+256782561496/0756664682
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
Process Mining at Rabobank - Organizational challenges
Process Mining at Rabobank - Organizational challengesProcess Mining at Rabobank - Organizational challenges
Process Mining at Rabobank - Organizational challenges
Process mining Evangelist
 
717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx
dharmendrasingh31102
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
muhammed84essa
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
Volkswagen - Analyzing the World's Biggest Purchasing Process
Volkswagen - Analyzing the World's Biggest Purchasing ProcessVolkswagen - Analyzing the World's Biggest Purchasing Process
Volkswagen - Analyzing the World's Biggest Purchasing Process
Process mining Evangelist
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
Process Mining at Rabobank - Organizational challenges
Process Mining at Rabobank - Organizational challengesProcess Mining at Rabobank - Organizational challenges
Process Mining at Rabobank - Organizational challenges
Process mining Evangelist
 
717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx
dharmendrasingh31102
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
muhammed84essa
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Ad

Biostatistics notes for Masters in Public Health

  • 1. INTRODUCTION TO BIOSTATISTICS Dr. Higenyi Emmanuel (PhD)
  • 2. SCOPE Part 1 Introduction • Definitions • Importance of statistics • Application of biostatistics • Statistical notations • Types of data • Variables • Sources of data • Data presentation • Data summarization • Sampling • Probability Part 2 Basic Data statistical analysis • t-test • z-test • Binomial test • Chi-square test • Fischer exact test • Corelation • Simple linear regression
  • 3. PART 1: DEFINITIONS Statistics • The study and manipulation of data, including ways to gather, review, analyze, and draw conclusions from data. • The two major areas of statistics are descriptive and inferential statistics. • Statistics can be communicated at different levels ranging from non-numerical descriptor (nominal-level) to numerical in reference to a zero-point (ratio- level). • Several sampling techniques can be used to compile statistical data, including simple random, systematic, stratified, or cluster sampling. • Statistics are present in almost every department of every company and are an integral part of investing.
  • 4. PART 1: DEFINITIONS Biostatistics or biometry • Branch of biological science concerned with the study and methods for collecting, presenting, analysing and interpreting biological research data. • The primary aim of this branch of science is to allow researchers, health care providers and public health administrators to make decisions concerning a population using sample data. • For example, the government wants to know the prevalence of a specific health problem among residents in a given town. If there are 3 million residents in the town it may not be realistic to test them individually and determine whether they have the disease or are susceptible to it.
  • 5. PART 1: DEFINITIONS Biostatistics or biometry • The realistic and cost-effective approach is to study a representative subset of the population and apply their results to the entire group. • Hence biostatistics makes research possible by providing tools and techniques for collecting, analysing and interpreting biological and medical data, allowing stakeholders to draw actionable insights about a population from sample data. • Biostatisticians usually get their data from a wide range of sources, including medical records, peer-reviewed literature, claims records, vital records, disease registries, surveillance, experiments and surveys. • The professionals collaborate with scientists, health care providers, public health administrators and other stakeholders.
  • 6. PART 1: DEFINITIONS Biostatistics or biometry sources of data • Medical records: Medical records can provide researchers with data about diagnoses, lab tests and procedures common amongst a specific population, such as people above 50 years working in the police force. • Claims data: Scientists can get data about doctor's appointments and medical bills in claims data. • Vital records: Vital records contain information about births, deaths, causes of death and divorces. • Peer-reviewed literature: Researchers can also pull data from the articles and studies that experts in a particular field published in peer-reviewed journals. • Surveys: The researchers can collect primary data using surveys designed specifically for an experiment. • Disease registries: These systems help to collect, store, analyse, retrieve and disseminate information regarding people living with specific disease
  • 7. Part 1 Types of statistics:
  • 8. PART 1: DESCRIPTIVE STATISTICS Descriptive statistics • Mostly focus on the central tendency, variability, and distribution of sample data. • Central tendency means the estimate of the characteristics, a typical element of a sample or population-It includes descriptive statistics such as mean, median, and mode. • Variability refers to a set of statistics that show how much difference there is among the elements of a sample or population along the characteristics measured. It includes metrics such as range, variance, and standard deviation. • The distribution refers to the overall “shape” of the data, which can be depicted on a chart such as a histogram or a dot plot, and includes properties such as the probability distribution function, skewness, and kurtosis
  • 9. CENTRAL TENDENCY, VARIABILITY 60, 65, 66, 68, 70, 70, 70, 70, 70, 71, 72, 75, 80, 81, 82, 83, 85, 86, 88, 90 Sum=1502 Mean =1502/20=75.1 Mode =70 Median =71.5 90
  • 10. CENTRAL TENDENCY, VARIABILITY 60, 65, 66, 68, 70, 70, 70, 70, 70, 71, 72, 75, 80, 81, 82, 83, 85, 86, 88, 90 Mean=75.1, SS=1394, SQRT OF (SS/20=SD)=8.3 90 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 60 65 66 68 70 70 70 70 70 71 72 75 80 81 82 83 85 86 88 90 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 -15 -10 -9 -7 -5 -5 -5 -5 -5 -4 -3 0 5 6 7 8 10 11 13 15 225 100 81 49 25 25 25 25 25 16 9 0 25 36 49 64 100 121 169 225
  • 11. FORMULA FOR SD SD= SQRT OF [SUM OF(Number-the mean)/the number of elements in the data set)]
  • 12. SD 1 Calculate the mean 2. Subtract the mean from each element individually 3. Square the differences from subtraction 4. Get the sum of the squared differences 5. Divide the sum of the squared difference by the number of elements 6. Get the square root of the answer after division (quotient)=SD
  • 13. DESCRIPTIVE STATISTICS Central tendency  Mean  Median mode Variability  Range  Variance  SD Shape or distribution  Skewness and Ketosis Relative frequencies/proportions Graphs and charts and tables
  • 14. PART 1: DESCRIPTIVE STATISTICS Descriptive statistics • Can also describe differences between observed characteristics of the elements of a data set. • Can help us understand the collective properties of the elements of a data sample and form the basis for testing hypotheses and making predictions using inferential statistic • Useful in summarizing data • Can be in form of numbers, tables or graphs
  • 15. Part 1: Descriptive statistics
  • 16. PART 1: INFERENTIAL STATISTICS Inferential statistics • Is a tool that statisticians use to draw conclusions about the characteristics of a population, drawn from the characteristics of a sample, and to determine how certain they can be of the reliability of those conclusions. • Based on the sample size and distribution, statisticians can calculate the probability that statistics, which measure the central tendency, variability, distribution, and relationships between characteristics within a data sample, provide an accurate picture of the corresponding parameters of the whole population from which the sample is drawn. • Are used to make generalizations about large groups, such as estimating average demand for a product by surveying a sample of consumers’ buying habits or attempting to predict future events.
  • 17. Part 1: Inferential Statistics
  • 19. FACTORS ASSOCIATED WITH MEN’S INVOLVEMENT IN ANTENATAL CARE VISITS IN ASMARA, ERITREA: COMMUNITY- BASED SURVEY The necessity for a pregnant woman to attend ANC was recognized by almost all (98.7%) of the male partners; however, 26.6% identified a minimum frequency of ANC visits. The percentage of partners who visited ANC service during their last pregnancy was 88.6%. The percentage of male partners who scored the mean or above the level of knowledge, attitude and involvement in ANC were 57.0, 57.5, and 58.7, respectively. Religion (p = 0.006, AOR = 1.91, 95% CI 1.20–3.03), level of education (p = 0.027, AOR = 1.96, 95% CI 1.08–3.57), and level of knowledge (p<0.001, AOR = 3.80, 95% CI 2.46–5.87) were significantly associated factors of male involvement in ANC.
  • 20. METHODS USED List of households with pregnant women was prepared for each administration area and was used as sampling frame A community-based cross-sectional survey was applied using a two-stage sampling technique to select 605 eligible respondents in Asmara in 2019. Data was collected using a pretested structured questionnaire. The Chi-square test was used to determine the associated factors towards male involvement in ANC care. Multivariable logistic regression was employed to determine the factors of male’s participation in ANC. A P-value less than 0.05 was considered statistically significant.
  • 21. USE-CASE INFORMATION NEEDED Define target population State the type of statistics you expect (descriptive, inferential, or both) State the possible sources of data
  • 22. PART 1: INFERENTIAL TESTS Inferential tests • Tests concerned with using selected sample data compared with population data in a variety of ways are called inferential statistical tests. • There are two main bodies of these tests. • The first and most frequently used are called parametric statistical tests. • The second are called nonparametric tests. • For each parametric test, there may be a comparable nonparametric test, sometimes even two or three. • Parametric tests are tests of significance appropriate when the data represent an interval or ratio scale of measurement
  • 23. PART 1: INFERENTIAL TESTS Parametric tests • Tests of significance appropriate when the data represent an interval or ratio scale of measurement and other specific assumptions have been met, specifically, that the sample statistics relate to the population parameters, that the variance of the sample relates to the variance of the population, that the population has normality, and that the data are statistically independent. Nonparametric tests • Statistical tests used when the data represent a nominal or ordinal level scale or when assumptions required for parametric tests cannot be met, specifically, small sample sizes, biased samples, an inability to determine the relationship between sample and population, and unequal variances between the sample and population. These are a class of tests that do not hold the assumptions of normality.
  • 24. PART 1: DATA TYPES Data types Qualitative Dichotomous Multinomial Quantitative Discrete Continuous
  • 26. ILLUSTRATION OF QUALITATIVE AND QUANTITATIVE DATA To assess the nutritional status and to determine potential risk factors of malnutrition in children under 3 years of age in Nghean, Vietnam. The study carried out in November 2007, a total of 383 child/mother pairs were selected by using a 2-stage cluster sampling methodology. A structured questionnaire was administered to mothers in their home settings. Anthropometric measurement was defined as being underweight (weight for age), wasting (weight for height) and stunting (height for age) on the basis of reference data from the National Center for Health Statistics (NCHS) / World Health Organization (WHO).
  • 27. ILLUSTRATION OF QUALITATIVE AND QUANTITATIVE DATA Logistic regression analysis was used to into account the hierarchical relationships between potential determinants of malnutrition. The mean Z-score for weight-for-age was -1.51 (95% CI -1.64, -1.38), for height-for-age was - 1.51 (95% CI -1.65, -1.37) and for weight-for-height was -0.63 (95% CI -0.78, -0.48). Of the children, 103 (27.7%) were underweight, 135 (36.3%) were stunted and 38 (10.2%) were wasted. Region of residence, ethnic, mother’s occupation, household size, mother’s BMI, number of children in family, weight at birth, time of initiation of breast-feeding and duration of exclusive breast-feeding were found to be significantly related to malnutrition. The findings of this study indicates that malnutrition is still an important problem among children under three years of age in Nghean, Vietnam. Socio-economic, environmental factors and feeding practices are significant risk factors for malnutrition among under-three.
  • 28. PART 1: COMMON STATISTICAL TERMS Binomial test • When a test has two alternative outcomes, either failure or success, and you know what the possibilities of success are, you may apply a binomial test. • Use a binomial test to determine if an observed test outcome is different from its predicted outcome. Causation • Causation is a direct relationship between two variables. • Two variables have a direct relationship if a change in one’s value causes a change in the other variable. • In that case, one becomes the cause, and the other is the effect.
  • 29. PART 1: COMMON STATISTICAL TERMS Confidence interval • A confidence interval measures the level of uncertainty of a collection of data. • This is the range in which you anticipate your values to fall within a specific degree of confidence if you repeat the same experiment. Correlation coefficient • The correlation coefficient describes the level of correlation or dependence between two variables. • This value is a number between -1 and +1, and if it falls beyond this limit, there’s been a mistake in the measurement of a coefficient.
  • 30. PART 1: COMMON STATISTICAL TERMS Z-score: • A score expressed in units of standard deviations from the mean. It is also known as a standard score. Z-test: • A test of any of a number of hypotheses in inferential statistics that has validity if sample sizes are sufficiently large and the underlying data are normally distributed.
  • 31. PART 1: COMMON STATISTICAL TERMS Hypothesis tests • A hypothesis test is a method of testing results. Before conducting research, the researcher creates a hypothesis or a theory for what they believe the results will prove. • A study then tests that theory. Kruskal-Wallis one-way analysis of variance: • A nonparametric inferential statistic used to compare two or more independent groups for statistical significance of differences. Mann-Whitney U-test (U): • A nonparametric inferential statistic used to determine whether two uncorrelated groups differ significantly. McNemar’s test: • A nonparametric method used on nominal data to determine whether the row and column marginal frequencies are equal. *NPT
  • 32. PART 1: COMMON STATISTICAL TERMS Dependent variable • A dependent variable is a value that depends on another variable to exhibit change. • When computing in statistical analysis, you can use dependent variables to make conclusions about causes of events, changes and other translations in statistical research. Independent variable • In a statistical experiment, an independent variable is one that you modify, control or manipulate in order to investigate its effects. • It's called independent since no other factor in the research affects it. Multivariate analysis of covariance (MANCOVA): • An extension of ANOVA that incorporates two or more dependent variables in the same analysis. It is an extension of MANOVA where artificial dependent variables (DVs) are initially adjusted for differences in one or more covariates. It computes the multivariate F statistic. Multivariate analysis of variance (MANOVA): • It is an ANOVA with several dependent variables.
  • 33. PART 1: COMMON STATISTICAL TERMS One-way analysis of variance (ANOVA): • An extension of the independent group t-test where you have more than two groups. It computes the difference in means both between and within groups and compares variability between groups and variables. Its parametric test statistic is the F-test. Pearson correlation coefficient (r): T • This is a measure of the correlation or linear relationship between two variables x and y, giving a value between +1 and −1 inclusive. • It is widely used in the sciences as a measure of the strength of linear dependence between two variables. Pooled point estimate: • An approximation of a point, usually a mean or variance, that combines information from two or more independent samples believed to have the same characteristics. • It is used to assess the effects of treatment samples versus comparative samples
  • 34. PART 1: COMMON STATISTICAL TERMS Standard deviation • The standard deviation is a metric that calculates the square root of a variance. It informs you how far a single or group result deviates from the average. Standard error of the mean • A standard error of mean assesses the likelihood of a sample's mean deviating from the population mean. You can find the standard error of the mean if you divide the standard deviation by the square root of the sample size. Range • The range is the difference between the lowest and highest values in a collection of data. Quartile and quintile • Quartile refers to data divided into four equal parts, while quintile refers to data divided into five equal parts.
  • 35. PART 1: COMMON STATISTICAL TERMS Pearson correlation coefficient • Pearson's correlation coefficient is a statistical test that determines the connection between two continuous variables. • Since it is based on covariance, they recognize it as the best approach to quantify the relationship among variables of interest. Median • The median refers to the middle point of data. • Typically, if you have a data set with an odd number of items, the median appears directly in the middle of the numbers. • When computing the median of a set of data with an even number of items, you can calculate the simple mean between the two middle-most values to achieve the median. Mode • Mode refers to the value in a database that repeats the most number of times. If none of the values repeat, there’s no mode in that database.
  • 36. PART 1: COMMON STATISTICAL TERMS Statistical inference • Statistical inference occurs when you use sample data to generate an inference or conclusion. Statistical inference can include regression, confidence intervals or hypothesis tests. Statistical power • Statistical power is a metric of a study's probability of discovering statistical relevance in a sample, provided the effect is present in the entire population. A powerful statistical test likely rejects the null hypothesis. Runs test: • Where measurements are made according to some well-defined ordering, in either time or space. • A frequent question is whether or not the average value of the measurement is different at different points in the sequence. This nonparametric test provides a means for this
  • 37. PART 1: COMMON STATISTICAL TERMS T-score • A t-score in a t-distribution refers to the number of standard deviations a sample is away from the average. Z-score • A z-score, also known as a standard score, is a measurement of the distance between the mean and data point of a variable. You can measure it in standard deviation units. Z-test • A z-test is a test that determines if two populations' means are different. To use a z-test, you need to know the differences in variances and have a large sample size Sign test: • A test that can be used whenever an experiment is conducted to compare a treatment with a control on a number of matched pairs, provided the two treatments are assigned to the members of each pair at random.
  • 38. PART 1: COMMON STATISTICAL TERMS Student t-test • A student t-test is a hypothesis that tests the mean of a small sample with a bell curve where you don’t know the standard deviation. This can include correlated means, correlation, independent proportions or independent means. T-distribution • T-distribution means when the population standard deviation is unknown and the data originates from a bell-curve population, it describes the standardized deviations of the mean of the sample to the mean of the population. Standard error of the mean (SEM): • An estimate of the amount by which an obtained mean may be expected to differ by chance from the true mean. It is an indication of how well the mean of a sample estimates the mean of a population
  • 39. PART 1: COMMON STATISTICAL TERMS Variance (SD2 ): • A measure of the dispersion of a set of data points around their mean value. • It is a mathematical expectation of the average squared deviations from the mean Analysis of covariance (ANCOVA): • A statistical technique for equating groups on one or more variables when testing for statistical significance using the F- test statistic. • It adjusts scores on a dependent variable for initial differences on other variables, such as pretest performance or IQ. *PT Analysis of variance (ANOVA): • A statistical technique for determining the statistical significance of differences among means; it can be used wit
  • 40. PART 1: COMMON STATISTICAL TERMS Effect size • Effect size is a statistical term that quantifies the degree of a relationship between two given variables. For example, we can learn about the effect of therapy on anxiety patients. • The effect size aims to determine whether the therapy is highly successful or mildly successful. Measures of variability • Measures of variability, also referred to as measures of dispersion, denote how scattered or dispersed a database is. • Four main measures of variability are the interquartile range, range, standard deviation and variance.
  • 41. PART 1: COMMON STATISTICAL TERMS Median test • A median test is a nonparametric test that tests two independent groups that have the same median. • It follows the null hypothesis that each of the two groups maintains the same median. Population • Population refers to the group you’re studying. This might include a certain demographic or a sample of the group, which is a subset of the population. Parameter • A parameter is a quantitative measurement that you use to measure a population. • It’s the unknown value of a population on which you conduct research to learn more.
  • 42. PART 1: COMMON STATISTICAL TERMS Post hoc test • Researchers perform a post hoc test only after they’ve discovered a statistically relevant finding and need to identify where the differences actually originated. Probability density • The probability density is a statistical measurement that measures the likely outcome of a calculation over a given range. Random variable • A random variable is a variable in which the value is unknown. • It can be discrete or continuous with any value given in a range.
  • 43. PART 1: COMMON STATISTICAL TERMS Chi-square (²): • A nonparametric test of statistical significance appropriate when the data are in the form of frequency counts; it compares frequencies actually observed in a study with expected frequencies to see if they are significantly different. Coefficient of determination (r²): • The square of the correlation coefficient (r), it indicates the degree of relationship strength by potentially explained variance between two variables. Cohen’s d: • A standardized way of measuring the effect size or difference by comparing two means by a simple math formula. It can be used to accompany the reporting of a t-test or ANOVA result and is often used in meta-analysis. • The conventional benchmark scores for the magnitude of effect sizes are as follows: small, d = 0.2; medium, d = 0.5; large, d = 0.8
  • 44. PART 1: COMMON STATISTICAL TERMS Cronbach’s alpha coefficient (): • A coefficient of consistency that measures how well a set of variables or items measures a single, unidimensional, latent construct in a scale or inventory. • Alpha scores are conventionally interpreted as follows: high, 0.90; medium, 0.70 to 0.89; and low, 0.55 to 0.69 F-test (F): • A parametric statistical test of the equality of the means of two or more samples. It compares the means and variances between and within groups over time. It is also called analysis of variance (ANOVA) Tukey’s test of significance: • A single-step multiple comparison procedure and statistical test generally used in conjunction with an ANOVA to find which means are significantly different from one another. • Named after John Tukey, it compares all possible pairs of means and is based on a studentized range distribution q (this distribution is similar to the distribution of t from the t-test).
  • 45. PART 1: COMMON STATISTICAL TERMS Fisher’s exact test: • A nonparametric statistical significance test used in the analysis of contingency tables where sample sizes are small. • The test is useful for categorical data that result from classifying objects in two different ways; it is used to examine the significance of the association (contingency) between two kinds of classifications Wald-Wolfowitz test: • A nonparametric statistical test used to test the hypothesis that a series of numbers is random. It is also known as the runs test for randomness Wilcoxon sign rank test (W+ ): • A nonparametric statistical hypothesis test for the case of two related samples or repeated measurements on a single sample. It can be used as an alternative to the paired Student’s t- test when the population cannot be assumed to be normally distributed.
  • 46. PART 1: COMMON STATISTICAL TERMS Independent t-test: • A statistical procedure for comparing measurements of mean scores in two different groups or samples. • It is also called the independent samples t-test. * Kendall’s tau: • A nonparametric statistic used to measure the degree of correspondence between two rankings and to assess the significance of the correspondence. Kolmogorav-Smirnov (K-S) test: • A nonparametric goodness of-fit test used to decide if a sample comes from a population with a specific distribution. • The test is based on the empirical distribution function (ECDF)
  • 47. PART 1: APPLICATION OF BIOSTATISTICS 1. Clinical Trials • One of the most impactful applications of biostatistics is in the design and analysis of clinical trials. • Biostatisticians ensure the validity and reliability of trial results, which helps researchers assess the safety and efficacy of new drugs and treatments. • Using various statistical methods, we analyze patient data to draw conclusions that will help in making medical decisions. 2. Epidemiology • In the field of epidemiology, biostatistics aids in studying the distribution and determinants of diseases within populations. • Biostatisticians use different statistical models to analyze patterns, identify risk factors, and assess the impact of interventions. • This information is crucial for public health planning and for developing disease prevention strategies.
  • 48. PART 1: APPLICATION OF BIOSTATISTICS 3. Genetics and Genomics • Biostatistics is indispensable in the analysis of genetic and genomic data. • Researchers use statistical methods to identify genes associated with specific diseases, understand the heritability of these genes, and figure out complex genetic interactions. • This application of biostatistics is instrumental in advancing our understanding of what is the genetic basis of various medical conditions. 4. Public Health Policy • Biostatistics contributes significantly to the formulation and evaluation of public health policies. • By analyzing health data, biostatisticians can assess the effectiveness of interventions, evaluate health disparities, and guide policymakers in making informed decisions to improve public health outcomes.
  • 49. PART 1: APPLICATION OF BIOSTATISTICS 5. Environmental Health • Biostatistics is applied in environmental health studies to analyze the impact of environmental factors on human health. Whether it is assessing the effects of air quality on respiratory diseases or studying the correlation between water contaminants and health outcomes, biostatistics helps decode the complex relationships in environmental health research. 6. Bioinformatics • This is the era of big data, biostatistics plays a crucial role in bioinformatics, where vast amounts of biological data are analyzed to extract meaningful patterns. Biostatisticians develop statistical methods and algorithms to interpret data from genomics, proteomics, and other ‘omics’ technologies. And the result is visible in the form of advancements in personalized medicine and drug discovery. 7. Quality Control in Healthcare • Biostatistics is also employed in quality control processes within healthcare systems. It ensures the accuracy and reliability of medical tests, monitors healthcare processes, and helps identify areas for improvement. This application is vital for maintaining high standards of patient care
  • 50. PART 1: APPLICATION OF BIOSTATISTICS 8. Create population-based interventions • Researchers can use biometric techniques to assess the impact of a health programme on the target population. • With biometric techniques, researchers can use insights from data to: • Measure the performance of public health interventions • Boost immunisation rates • Increase the number of patients attending post-surgery appointments • Improve training and supervision of health care professionals standards of patient care
  • 51. PART 1: APPLICATION OF BIOSTATISTICS 9. Create population-based interventions • Biometrics can also help researchers, health care providers and public health administrators to create population-based health interventions based on the results of biostatistical data analysis and interpretation. These data insights can be used to: • Identify populations that require interventions to reduce their exposure to specific health problems • Identify areas susceptible to high risk of certain diseases • Identify the factors influencing the high cases of health disparities within a population • Identify members of a population that require the highest level of health care
  • 52. PART 1: APPLICATION OF BIOSTATISTICS 10. Control epidemics • Biostatistical techniques can also help public health officials, health care practitioners and epidemiologists to control epidemics. • Researchers not only use statistical analysis to understand how diseases spread, but they can also use it to determine the mortality rate amongst specific populations. • It can also help health care professionals determine the most at-risk members of the population and create a framework for formulating strategies to stop the spread of such diseases. 11. Identify barriers to care • Researchers and health care professionals can use biostatistical methods to learn about the barriers preventing people from getting access to quality care. • Researchers use surveys to identify the factors that limit access to health care. Medical records, interviews and claims records can show patient perceptions about health care services, providing insights to make such services more accessible and acceptable to target populations for higher efficiency.
  • 53. PART 1: APPLICATION OF BIOSTATISTICS 12. Study demography • Demography is the statistical study of the human population. • The field uses statistical techniques to describe births, deaths, income, disease disparity and other structural changes in human populations. • Using census data, surveys and statistical models, biostatisticians can analyse the structure, size and movement of populations, providing insights for government agencies, health care administrators, town planners and other stakeholders to create and adjust their plans based on the dynamics of the population 13. Derive conclusions about populations from samples • One major importance of biostatistical methods is that they help researchers derive far-reaching conclusions about a population from samples. • Due to several factors, such as finances, size and time constraints, it's not always possible for researchers to collect data about an entire population when testing assumptions about them. • Biostatistical methods provide researchers and administrators with the tools they require to select a sample that's representative of the population, choose the right independent and dependent variables and derive logical conclusions from the data
  • 54. PART 1: APPLICATION OF BIOSTATISTICS 14. Check drug efficacy • In the medical and pharmaceutical fields, biostatistical research is used to check the efficacy and effectiveness of treatments during clinical trials. • Researchers can also use it to find possible side effects of drugs. • These methods are ideal for conducting drug treatment trials and performing other experiments to understand the impact of different medications and medical devices on the human body 15. Perform genetics studies • It's an important discipline in the study of Mendelian genetics. • Geneticists use it to study the inheritance patterns of genes. • They also use it to study the genetic structure of a population. • Researchers also use biometry to map chromosomes and understand the behaviour of genes in a population.
  • 55. PART 1: APPLICATION OF BIOSTATISTICS 16. Other applications • Determining leading causes of death and burden of disease • Health status of the population • Morbidity patterns
  • 56. PART 1: APPLICATIONS OF BIOSTATISTICS Predictive modelling • In public health, predictive modeling is a pivotal aspect of biostatistics. • This statistical process utilizes existing data to forecast future events, uncovering patterns and trends. • Is applied in epidemiology for screening individuals prone to specific diseases. • For instance in breast cancer, where factors like age, race, family history, and more are analyzed to gauge the risk. • Predictive modeling plays a crucial role in preventing breast cancer-related deaths by identifying individuals who may need preventive or treatment measures. • Beyond cancer and pandemics, this approach extends to various public health concerns, showcasing its versatility in foreseeing and addressing health challenges.
  • 57. PART 1: APPLICATIONS OF BIOSTATISTICS Decision-making • Healthcare leaders • Researchers • Policymakers. Operational Viability: • Biostatistics provides the necessary data to assess the operational feasibility of new ideas and initiatives. • It helps in making informed decisions about acquisitions, tool prototypes, and hiring strategies, setting the parameters for project scopes and methodologies. Guarding Against Bias: • Biostatistical studies undergo rigorous examination to detect and eliminate bias. • Public health’s commitment to equitability ensures that data collection processes are designed to be fair and objective, preventing unfair conclusions. Protecting Data Subjects: • Biostatistical researchers prioritize the protection of data subjects. Personal information collected for public health research is anonymized and safeguarded, addressing privacy concerns and mitigating risks associated with unsecured data.
  • 58. PART 1: DATA CLASSIFICATION The main objectives of Classification of Data are as follows: • Explain similarities and differences of data • Simplify and condense data’s mass • Facilitate comparisons • Study the relationship • Prepare data for tabular presentation • Present a mental picture of the data
  • 59. PART 1: DATA CLASSIFICATION There are different types of data classification, depending on the characteristics. • Structured and Unstructured • Primary or Secondary • Qualitative and Quantitative. • Number of variables: Univariate, Bivariate, Multivariate Classifying data is an important step to ensure proper analysis
  • 60. PART 1: DATA CLASSIFICATION Univariate data • This type of data consists of only one variable. • The analysis of univariate data is thus the simplest form of analysis since the information deals with only one quantity that changes. • It does not deal with cause sor relationships and the main purpose of the analysis is to describe the data and find patterns that exist within it. • The example of a univariate data can be height.
  • 61. PART 1: DATA CLASSIFICATION Bivariate data  This type of data involves two different variables.  The analysis of this type of data deals with causes and relationships and the analysis is done to find out the relationship among the two variables.  Example of bivariate data can be temperature and ice cream sales in summer season.  Bivariate data analys isinvolves comparisons, relationships, causes and explanations
  • 62. PART 1: DATA CLASSIFICATION Multivariate data • When the data involves three or more variables, it is categorized under multivariate. • Example of this type of data is suppose an advertiser wants to compare the popularity of four advertisements on a website, then their click rates could be measured for both men and women and relationships between variables can then be examined. • It is similar to bivariate but contains more than one dependent variable • The ways to perform analysis on this data depends on the goals to be achieved. • Some of the techniques are regression analysis, path analysis, factor analysis and multivariate analysis of variance (MANOVA)
  • 63. DATA ANALYSIS Choice of method • Size • Complexity • Number of variables • Nature of variables • Study objectives • Research questions • Hypothesis
  • 64. PART 1: DATA ANALYSIS Descriptive analysis • Suitable for analyzing and presenting data, such as mean and median. Inferential • To establish functional relationship between variables, more advanced analytical techniques, such as correlation and regression
  • 65. PART 1: DATA INTERPRETATION Data interpretation • Involves inferring conclusions from the results of data analysis. • This exercise allows researchers to categorise, manipulate and summarise their findings to answer important questions in public health, biology and medicine.
  • 66. PART 1: PRIMARY AND SECONDARY DATA Definition • Primary data are the original data derived from research endeavors and collected through methods such as direct observation, indirect observation, interviews, questionnaire • Secondary data are data derived from primary data and sources include published reports, journal articles, news papers • Often, the distinction between primary and secondary data may be less than clear. • In conducting research, both types of data are collected and created • It is essential to have plan for the management of all types of data and primary materials
  • 67. PART1: SOURCES OF EPIDEMIOLOGICAL DATA Epidemiologists use primary and secondary data sources to calculate rates and conduct studies. • Primary data is the original data collected for a specific purpose by or for an investigator. For example, an epidemiologist may collect primary data by interviewing people who became ill after eating at a restaurant in order to identify which specific foods were consumed. • Collecting primary data is expensive and time-consuming, and it usually is undertaken only when secondary data is not available. • Secondary data is data collected for another purpose by other individuals or organizations. • Examples of sources of secondary data that are commonly used in epidemiological studies include birth and death certificates, population census records, patient medical records, disease registries, insurance claim forms and billing records, public health department case reports, and surveys of individuals and households
  • 68. PART 1: PRIMARY AND SECONDARY DATA Primary Materials Primary Data Secondary Data Interview schedules Interview audio recordings Surveys Experiments Nvivo interview transcripts Purchased laboratory reagents Investigational product Product analyses Research animals Tissue samples Stained slides Validated questionnaires Completed paper and pencil questionnaires SPSS data files containing raw data and calculated variable summary scores
  • 69. PART 1: PRIMARY AND SECONDARY DATA BASIS FOR COMPARISON PRIMARY DATA SECONDARY DATA Meaning Primary data refers to the first hand data gathered by the researcher himself. Secondary data means data collected by someone else earlier. Data Real time data Past data Process Very involved Quick and easy Source Surveys, observations, experiments, questionnaire, personal interview, etc. Government publications, websites, books, journal articles, internal records etc. Cost effectiveness Expensive Economical Collection time Long Short Specific Always specific to the researcher's needs. May or may not be specific to the researcher's need. Available in Crude form Refined form Accuracy and Reliability More Relatively less
  • 70. Part 1: RAW DATA
  • 71. PART 1: ELEMENTS, OBSERVATIONS,VARIABLES, DATA Element • Entities or units on which data are collected such as person, place, or object Observation • Set of measurements or observations related to a particular element Variable • Character or attribute of interest on a particular element and which takes on different values Total number of data values • The number of elements times the number of variables Data • Is a specific measurement of a variable – it is the value you record in your data sheet.
  • 72. PART 1: QUANTITATIVE AND QUALITATIVE VARIABLES Data is generally divided into two categories: • Quantitative data represents amounts • Qualitative or Categorical data represents groupings • A variable that contains quantitative data is a quantitative variable; • A variable that contains categorical data is a categorical variable.
  • 73. PART 1: QUANTITATIVE AND QUALITATIVE VARIABLES Quantitative variables • The numbers recorded represent real amounts that can be added, subtracted, divided, etc. • There are two types of quantitative variables: • Discrete and continuous.
  • 75. Indicate whether each of the following variables is discrete or continuous:the time it takes for you to get to school the number of Canadian couples who were married last year the number of goals scored by a women’s hockey team the speed of a bicycle your age the number of subjects your school offered last year the length of time of a telephone call the annual income of an individual the distance between your house and school the number of pages in a dictionary
  • 76. PART 1: QUANTITATIVE VARIABLES Type of variable What does the data represent? Examples Discrete variables (aka integer variables) Counts of individual items or values. •Number of students in a class •Number of different tree species in a forest Continuous variables (aka ratio variables) Measurements of continuous or non-finite values. •Distance •Volume •Age
  • 77. PART 1: QUALITATIVE VARIABLES Qualitative variables • Categorical variables represent groupings of some kind. • They are sometimes recorded as numbers, but the numbers represent categories rather than actual amounts of things. • There are three types of categorical variables: • Binary, nominal, and ordinal variables. • Sometimes a variable can work as more than one type • An ordinal variable can also be used as a quantitative variable if the scale is numeric and doesn’t need to be kept as discrete integers. • For example, star ratings on product reviews are ordinal (1 to 5 stars), but the average star rating is quantitative.
  • 78. PART 1: QUALITATIVE VARIABLES Type of variable What does the data represent? Examples Binary variables (aka dichotomous variables) Yes or no outcomes. •Heads/tails in a coin flip •Win/lose in a football game Nominal variables Groups with no rank or order between them. •Species names •Colors •Brands Ordinal variables Groups that are ranked in a specific order. •Finishing place in a race •Rating scale responses in a survey, such as Likert scales
  • 79. PART 1: INDEPENDENT AND DEPENDENT VARIABLES Independent vs dependent variables • Experiments are usually designed to find out what effect one variable has on another for instance the effect of salt addition on plant growth. • The independent variable (the one you think might be the cause) is manipulated and then the dependent variable (the one you think might be the effect) is measured to find out what this effect might be. • There are variables that you hold constant (control variables) in order to focus on your experimental treatment.
  • 80. PART 1: INDEPENDENT AND DEPENDENT VARIABLES Independent vs dependent vs control variables Type of variable Definition Example (salt tolerance experiment) Independent variables (aka treatment variables) Variables you manipulate in order to affect the outcome of an experiment. The amount of salt added to each plant’s water. Dependent variables (aka response variables) Variables that represent the outcome of the experiment. Any measurement of plant health and growth: in this case, plant height and wilting. Control variables Variables that are held constant throughout the experiment. The temperature and light in the room the plants are kept in, and the volume of water given to each plant.
  • 81. OTHER COMMON TYPES OF VARIABLES Other types  Definition of the independent and dependent variables and determination of whether they are categorical or quantitative enables choice of the correct statistical test. Type of variable Definition Example (salt tolerance experiment) Confounding variables A variable that hides the true effect of another variable in an experiment. This can happen when another variable is closely related to a variable you are interested in, but you haven’t controlled it in your experiment. Be careful with these, because confounding variables run a high risk of introducing a variety of research biases to your work, particularly omitted variable bias. Pot size and soil type might affect plant survival as much or more than salt additions. In an experiment you would control these potential confounders by holding them constant. Latent variables A variable that can’t be directly measured, but that you represent via a proxy. Salt tolerance in plants cannot be measured directly, but can be inferred from measurements of plant health in our salt-addition experiment. Composite variables A variable that is made by combining multiple variables in an experiment. These variables are created when you analyze data, not when you measure it. The three plant health variables could be combined into a single plant-health score to make it easier to present your findings
  • 82. PART 1: VARIABLES RESEARCH 82 No Variable Type Measurement Scale Categories 1 Age (years) Independent Interval - 2 Weight (kg) Independent Interval - 3 Serum creatinine (μmol/L) Independent Interval - 4 Blood cholesterol (mmol/L) Independent Interval - 5 Serum triglyceride (mmol/L) Independent Interval - 6 Blood uric acid (μmol/L) Independent Interval - 7 Fast blood glucose (mmol/L) Independent Interval - 8 Systolic blood pressure (mmHg) Independent Interval - 9 Diastolic blood pressure (mmHg) Independent Interval - 10 Hemoglobin (g/L) Independent Interval - 11 Hematocrit Independent Interval -
  • 83. PART 1: VARIABLES IN RESEARCH 83 No Variable Type Measurement Scale Categories 12 BMI Independent Interval - 13 ≥High school education Independent Nominal Above/Under 14 Health insurance coverage Independent Nominal Yes/No 15 Smoking Independent Nominal Yes/No 16 History of CKD Independent Nominal Yes/No 17 Family history of diabetes Background Nominal Yes/No 18 Family history of hypertension Background Nominal Yes/No 19 Family history of CKD Background Nominal Yes/No 20 Repeatedly respiratory tract infection Background Nominal Yes/No 21 Nephrotoxic medications Independent Nominal Yes/No 22 Obesity Independent Nominal Yes/No
  • 84. PART 1: VARIABLES IN RESEARCH 84 No Variable Type Measurement Scale Categories 23 Central obesity Independent Nominal Yes/No 24 Metabolic syndrome Independent Nominal Yes/No 25 Hypertension Independent Nominal Yes/No 26 Diabetes Independent Nominal Yes/No 27 Hyperlipidemia Independent Nominal Yes/No 28 Hyperuricemia Independent Nominal Yes/No 29 Cardiovascular disease Independent Nominal Yes/No 30 eGFR <60 mL/min/1.73 m2 Independent Nominal Yes(<60)/No(>60) 31 ACR >30 mg/g Independent Nominal Yes(>30)/No(<30) 32 Hematuria Independent Nominal Yes/No 33 CKD Status Dependent Nominal Yes/No
  • 85. PART 1: DISCUSS THE CATEGORIZATION OF THE FOLLOWING VARIABLES Number of all hospital discharges Acute care hospital discharges per 100 Number of acute care hospital discharges Inpatient surgical procedures per year per 100 000 Total number of inpatient surgical procedures per year Average length of hospital stay Bed occupancy rate (%) Outpatient contacts per person per year Autopsy rate (%) for hospital deaths Inpatient care discharges per 100 Turn over rate Outpatient/In-patient ration Number of surgeries Number of deliveries Number of x-rays/scans Number of lab tests Number of beds per capita Number of
  • 86. PART 1: QUALITATIVE RESEARCH METHODS Method Overall Purpose Advantages Challenges Surveys •Quickly and/or easily gets lots of information from people in a non threatening way •Can complete anonymously •Inexpensive to administer •Easy to compare and analyze •Administer to many people •Can get lots of data •Many sample questionnaires already exist •Might not get careful feedback •Wording can bias client's responses •Impersonal •May need sampling expert •Doesn't get full story Interviews •Understand someone's impressions or experiences •Learn more about answers to questionnaires •Get full range and depth of information •Develops relationship with client •Can be flexible with client •Can take time •Can be hard to analyze and compare •Can be costly •Interviewer can bias client's responses Observation •Gather firsthand information about people, events, or programs •View operations of a program as they are actually occurring •Can adapt to events as they occur •Can be difficult to interpret seen behaviors •Can be complex to categorize observations •Can influence behaviors of program participants •Can be expensive
  • 87. PART 1: QUALITATIVE RESEARCH METHODS Method Overall Purpose Advantages Challenges Focus Groups •Explore a topic in depth through group discussion •quickly and reliably get common impressions •can be efficient way to get much range and depth of information in short time •can convey key information about programs •can be hard to analyze responses •need good facilitator for safety and closure •difficult to schedule 6-8 people together Case Studies •Understand an experience or conduct comprehensive examination through cross comparison of cases •depicts client's experience in program input, process and results •powerful means to portray program to outsiders •usually time consuming to collect, organize and describe •represents depth of information, rather than breadth
  • 88. FACTORIALS These provide an easier way of wring large numbers in forms Just like in mathematics where we can use bases to write large numbers For instance 10 to base 2 =1010 100 to base 2 =1100100 10!=10x9x8x7x6x5x4x3x2x1=
  • 89. PART 1: FACTORIALS The Factorial of a whole number 'n' is defined as the product of that number with every whole number less than or equal to 'n' till 1. For example, the factorial of 4 is 4 × 3 × 2 × 1, which is equal to 24. It is represented using the symbol 5 factorial, that is, 5! can be written as: 5! = 5 × 4 × 3 × 2 × 1 = 120. The formulas for n factorial are: n! = n(n-1)(n-2)…………………….(3)(2)(1) n! = n × (n - 1)!
  • 92. PERMUTATION AND COMBINATION Both are about rearranging numbers or objects to change their position For instance a set of numbers like 123 can be rearranged in different ways e.g 123 132 321 312 213 231 Both are about rearranging numbers or objects to change their position For instance a set of numbers like 12 can be rearranged in different ways e.g 12 21 Both are about rearranging numbers or objects to change their position For instance a set of numbers like 1 can be rearranged in different ways e.g 1
  • 93. A set {1 2 3 4 5 6 7 8 9} Permutation  123  Where n=3, and r =3  nPr=3P3=n!/(n-r)!=3!/(3-3)!=3!/0!=(3X2X1)/1=6/1=6 Combination nCr Where n=3 and r =3 nCr =n!/[(n-r)! X r!=3!/[(3-3)!x3!=(3x2x1)/{(0!) x (3x2x1)]=6/6=1
  • 94. PREMIER LEAGUE EXAMPLE The set of 20 clubs which are unique They play in pairs meaning 2 at a time That is r=2 Then n=20 For permutation, the number of unique pairs given that the order matters i.e home away, therefore the pairs =nPr=20P2=20!/(20-2)!=20!/18!=
  • 95. PART 1: USE OF FACTORIAL Use of Factorial • One area where factorials are widely used is in permutations & combinations. • Permutation is an ordered arrangement of outcomes and it can be calculated with the formula: n Pr= n! / (n - r)! • Combination is a grouping of outcomes in which order does not matter. It can be calculated with the formula: nCr = n! / [ (n - r)! r!] • In both of these formulas, 'n' is the total number of things available and 'r' is the number of things that have to be chosen.
  • 96. PART 1: FACTORIAL Set {1,2,3)has three elements meaning n=3 • Permutation i.e the order in which the elements of the sub-set matters and makes a difference: 1,2; 1,3; 2,3; 2,1; 3,1;3,2, the number of subsets of twos is equal to 6 • nPr=n!/(n-r)!, wherE n is the total number of elements in the mother set or population and r is the number of elements in the subset. • For example if we had 20 elements in the mother set and we are picking two at a time, the by permutation we will process as follows: 20P2 • Therefore 20P2=20!/(20-2)!=20!/18!=380
  • 97. PART 1: FACTORIAL Set {1,2,3)has three elements meaning n=3 • Combination i.e the order in which the elements of the sub-set does not matter and makes no difference: 1,2; 1,3; 2,3 the number of subsets of twos is equal to 3 • nCr=n!/(n-r)!r!, where n is the total number of elements in the mother set or population and r is the number of elements in the subset. • For example if we had 20 elements in the mother set and we are picking two at a time, the by permutation we will process as follows: 20P2 • Therefore 20C2=20!/(20-2)!2!=20!/18!2!=190
  • 98. PART 1: PERMUTATION-THE ORDER MATTERS {1, 2, 3} arrange in pairs using permutation i.e. the order should be respected and matters 1,2; 1,3, 2, 3, 2,1, 3,1. 3,2=6 pairs {1,2,3,4} 1,2; 1,3; 1,4; 2,3; 2,4; 3,4; 2,1; 3,1;4,1;3,2;4,2;4,3;=12 pairs Premiere league we have 20 clubs n Pr= n! / (n- r)!= 20 P2= 20! / (20 - 2)!=20!/18!
  • 99. PART 1: COMBINATION –ORDER DOES NOT MATTER {1,2,3} 1,2; 2,3,1,3=3 pairs nCr = n! / [ (n- r)! 2!] 3C2 = 3! / [ (3 - 2)! 2!] =3!/[1!x2!]=6/2=3 For premiere league with 20 clubs if the games were one way only then the number of games would be =20!(18!x2!)=
  • 100. PART 1: COMBINATIONS nCr = n! / [ (n- r)! 2!], where r is the number of elements we pick for arrangement at a time 30C5 = 30! / [ (30- 5)! 2!] =30! / [ (25)! 2!]
  • 101. PART 1: USE OF FACTORIALS Example 1: How many 5-digit numbers can be formed using the digits 1, 2, 5, 7, and 8 in each of which no digit is repeated? Solution: The given 5 digits (1, 2, 5, 7 and 8) should be arranged among themselves in order to get all possible 5-digit numbers. The number of ways for doing this can be done by calculating the 5 factorial. 5! = 5 × 4 × 3 × 2 × 1 = 120 Answer: Therefore, the required number of 5-digit numbers is 120.
  • 102. PART 1: USE OF FACTORIAL Example 2v: In a group of 10 people, $200, $100, and $50 prizes are to be given. In how many ways can the prizes be distributed? Solution: This is permutation because here the order of distribution of prizes matters. It can be calculated as 10P3 ways. 10P3 = (10!) / (10 - 3)! = 10! / 7! = (10 × 9 × 8 × 7!) / 7! = 10 × 9 × 8 = 720 ways. Example 3: Three $50 prizes are to be distributed to a group of 10 people. In how many ways can the prizes be distributed? Solution: This is a combination because here the order of distribution of prizes does not matter (because all prizes are of the same worth). It can be calculated using 10C3. 10C3 = (10!) / [ 3! (10 - 3)!] = 10! / (3! 7!) = (10 × 9 × 8 × 7!) / [(3 × 2 × 1) 7!] = 120 ways.
  • 103. PERMUTION AND COMBINATION Difference between Permutation and Combination Permutation Combination The different ways of arranging a set of objects into a sequential order are termed as Permutation. One of the several ways of choosing items from a large set of objects, without considering an order is termed as Combination. The order is very relevant. The order is quite irrelevant. It denotes the arrangement of objects. It does not denote the arrangement of objects. Multiple permutations can be derived from a single combination. From a single permutation, only a single combination can be derived. They can simply be defined as ordered elements. They can simply be defined as unordered set
  • 104. PART 1: UNIVARIATE AND BIVARIATE DATA Univariate data • Data on ne variable • Examples include height, skin colour, ethnicity, service coverage Bivariate data • Data where two variables are being compared for correlation or causation • Correlation =height and body weight; age and body weight • Causation such as obesity and heart disease
  • 105. PART 1: UNIVARIATE AND BIVARIATE DATA Univariate analysis • Summary statistics • Central tendency • Dispersion • Frequency distribution • Bar charts • Histogram • Pie chart
  • 106. PRACTICE QUESTIONS 1. Explain why a sample statistic (the estimate from the sample) may differ from the population parameter (the true value) and how you would minimize the difference. 2. A local coffee shop is creating a spreadsheet of their drinks for customers to view on their website. The spreadsheet includes the calories, sugar content, and ingredients for each coffee drink. Which of the following would be considered a variable in this data set?  Answers:  The Calories  The Customers  The Coffee Shop  The Coffee Drink What are the other variables in the passage?
  • 107. PRACTICE QUESTIONS 1. A political pollster is conducting a survey about voter's affiliation to a major political party. He selects a random sample of voters who voted in the last presidential election, and looks into how party affiliation differs based on age, race, gender and location. How many variables can you identify in this data set? Answers: A. 5 B. 6 C. 4 D. 7
  • 108. PART 1: SCALES OF MEASUREMENT Rationale • In order to analyze data, the variables have to be defined and categorized using different scales of measurements. • There are four scales of measurements- nominal scale, ordinal scale, interval scale, and ratio scale. • The scale of measurement of a variable determines the kind of statistical test to be used. • Psychologist Stanley Stevens developed the four common scales of measurement: nominal, ordinal, interval and ratio. • 1. Nominal scale • 2. Ordinal scale • 3. Interval scale • 4. Ratio scale
  • 109. PART 1: SCALES OF MEASUREMENT Properties and scales of measurement • Each scale of measurement has properties that determine how to properly analyse the data. • The properties evaluated are identity, magnitude, equal intervals and a minimum value of zero. Properties of Measurement • Identity: Identity refers to each value having a unique meaning. • Magnitude: Magnitude means that the values have an ordered relationship to one another, so there is a specific order to the variables. • Equal intervals: Equal intervals mean that data points along the scale are equal, so the difference between data points one and two will be the same as the difference between data points five and six. • A minimum value of zero: A minimum value of zero means the scale has a true zero point. Degrees, for example, can fall below zero and still have meaning. But if you weigh nothing, you don’t exist.
  • 110. PART 1: STATISTICAL LEVELS OF MEASUREMENT Nominal-level Measurement • There’s no numerical or quantitative value, and qualities are not ranked. • Nominal-level measurements are instead simply labels or categories assigned to other variables. • It’s easiest to think of nominal-level measurements as non-numerical facts about a variable.
  • 111. SCALES OF MEASUREMENT Nominal scale, • Also known as categorical variable scale, can be defined as a scale used for labelling variables into different categories. • The numbers are used to identify and classify people, objects or events, like identity number, jersey number of sportspersons, and vehicle registration number; thus, they have no specific numerical value or meaning. I • In research, the nominal scale is used for analysing categorical variables such as gender, place of residence, marital status, political party, blood group and so on. • The interval between numbers and their order does not matter on the nominal scale
  • 112. SCALES OF MEASUREMENT Nominal scale: • A nominal scale preserves only the equality property; there is no ‘more or less than’ relation in this measurement. • The nominal scale of measurement defines the identity property of data. • This scale has certain characteristics, but doesn’t have any form of numerical meaning. • The data can be placed into categories but can’t be multiplied, divided, added or subtracted from one another. • It’s also not possible to measure the difference between data points
  • 113. SCALES OF MEASUREMENT Nominal scale: • The statistical analysis that can be performed on a nominal scale is the frequency distribution and percentage. • It can be analyzed graphically using a bar chart or a pie chart. If there are two categorical variables, quantitative analysis techniques such as joint frequency distribution and cross-tabulation can be used. • Mode is the only measure of central tendency which can be used in this scale. • Since numbers do not have a quantitative value, addition, subtraction, multiplication, division, and measures of dispersion cannot be applied. • It is also possible to perform contingency correlation. Hypothesis tests can be carried out on data collected in the nominal form using the Chi-square test. It can tell whether there is an association between the variables. • However, it cannot establish a cause and effect relationship or explain the form of relationship.
  • 114. PART 1: STATISTICAL LEVELS OF MEASUREMENT Ordinal-level Measurement • Outcomes can be arranged in an order, but all data values have the same value or weight. • Although they’re numerical, ordinal-level measurements can’t be subtracted against each other in statistics because only the position of the data point matters. • Ordinal levels are often incorporated into nonparametric statistics and compared against the total variable group.
  • 115. SCALES OF MEASUREMENT Ordinal scale • is a ranking scale in which numbers are assigned to variables to represent their rank or relative position in the data set. • The variables are arranged in a specific order rather than just naming them. • So they can be named, grouped, and ranked. • In research, the ordinal scale is used for ranking students in a class (1,2,3), rating a product satisfaction (very unsatisfied-1, unsatisfied-2, neutral-3, satisfied-4, very satisfied-5), evaluating the frequency of occurrences (very often-1, often-2, not often-3, not at all-4), assessing the degree of agreement (totally agree-1, agree-2, neutral-3, disagree-4, totally disagree-5 • In this scale, the attributes are arranged in ascending or descending order. The numbers indicate rank or the order of quality or quantity.
  • 116. SCALES OF MEASUREMENT Ordinal Scale: • The origin of scale is absent because there is no fixed start or ‘true zero’ in the data. • Hence, it is impossible to find the magnitude of difference or distance between the variables or their degree of quality. • For example, while ranking students in terms of potential for an award, a student labelled ‘1’ is better than the student labelled ‘2’, ‘2’ is better than ‘3’ and so forth. • However, this ordinal scaling cannot quantify or indicate how much better the second student to the first student, or the difference between the potential of first and second students, the same as the difference between the second and third. • Similarly, very satisfied will always be better than satisfied and unsatisfied will be better than very unsatisfied. • The order of variables is of prime importance, and so is the labelling. • The ordinal scale is the second level of measurement from a statistical point of view. • These scales are unique up to a monotone transformation. A monotone transformation T is one that assigns new values such that if f(X) > f(Y) in the ordinal scale, then T(f(X)) > T(f(X)) in the newly transformed scale
  • 117. SCALES OF MEASUREMENT Ordinal Scale: • The ordinal data can be presented using tabular or graphical formats. • The descriptive analysis such as percentile, quartile, median and mode can be determined in ordinal scale data. Since the interval between numbers is insignificant, addition, subtraction, multiplication, division, and measures of dispersion cannot be applied. • It is possible to test for order correlation using Spearman's rank correlation coefficient. • Non-parametric tests such as Mann-Whitney U test, Friedman’s ANOVA, Kruskal–Wallis H test can also be used to analyze ordinal scale data
  • 118. SCALES OF MEASUREMENT Interval Scale • can be defined as a quantitative scale in which both the order and the exact difference between categories are known. • Thus it measures variables that can be labelled, ordered, and have an equal interval. • However, the point of beginning or zero point on an interval scale is arbitrarily established and is not a ‘true zero’ or ‘absolute zero’. • Thus the value of zero does not indicate the complete absence of the characteristic being measured. • In Fahrenheit/Celsius temperature scales, 0°F and 0°C do not indicate an absence of temperature. • In fact, negative values of temperature do exist. • Temperature, calendar years, attitudes, opinions and so on fall under the interval scale. Likert scale, Net Promoter Score (NPS), Bipolar matrix table, Semantic differential scale are the widely used interval scale examples
  • 119. PART 1: STATISTICAL LEVELS OF MEASUREMENT Interval-level Measurement • Outcomes can be arranged in order, but differences between data values may now have meaning. T • wo data points are often used to compare the passing of time or changing conditions within a data set. • There is often no “starting point” for the range of data values, and calendar dates or temperatures may not have a meaningful intrinsic zero value.
  • 120. SCALES OF MEASUREMENT Interval Scale: • The major difference between ordinal and interval scale is the existence of meaningful and equal intervals between variables. • For example, 40 degrees is higher than 30 degrees, and the difference between them is a measurable 10 degrees, as is the difference between 90 and 100 degrees. • However, while ranking students on an ordinal scale, the difference between first and second student might be 5 marks, and between second and third student is 8 marks. • Thus, with an interval scale, it is possible to identify whether a given attribute is higher or lower than another and the extent to which one is higher or lower than another.
  • 121. SCALES OF MEASUREMENT Interval Scale: • The interval scale is the third level of measurement scale. The arbitrary presence of zero has implications in data manipulation and analysis. • It is possible to add or subtract a constant to all of the interval scale values without affecting the form of the scale but not possible to multiply or divide the values. • For instance, two persons with scale positions 4 and 5 are as far apart as persons with scale positions 9 and 10, but not that a person with score a 10 feels twice as strong as one with a score 5. • Similarly, 100°F cannot be defined as twice as hot as 50°F because the corresponding temperatures on the centigrade scale, 37.78°C and 10°C, are not in the ratio 2:1. • Unlike the ordinal and nominal scale, arithmetic operations such as addition and subtraction can be performed on an interval scale. • Any positive linear transformation of form Y = a + bX will preserve the properties of an interval scale • The arithmetic mean, median, and mode can be used to calculate the central tendency in this scale. • The measures of dispersion, such as range and standard deviation, can also be calculated. • Apart from those techniques, product-moment correlation, t-test, and regression analysis are extensively used for analyzing interval data.
  • 122. PART 1: STATISTICAL LEVELS OF MEASUREMENT Interval-level Measurement • Outcomes can be arranged in order, but differences between data values may now have meaning. T • wo data points are often used to compare the passing of time or changing conditions within a data set. • There is often no “starting point” for the range of data values, and calendar dates or temperatures may not have a meaningful intrinsic zero value.
  • 123. SCALES OF MEASUREMENT Ratio Scale • Can be defined as a quantitative scale that bears all the characteristics of an interval scale and a ‘true zero’ or ‘absolute zero’, which implies the complete absence of the attribute being measured. • Thus it measures variables that can be labelled, ordered, has equal intervals and the ‘absolute zero’ property. • Before deciding to use a ratio scale, the researcher must observe whether the variables possess all these characteristics. • The variables such as length, age, weight, income, years of schooling, price etc., are examples of a ratio scale. • They do not have negative numbers because of the existence of an absolute zero point of origin. • For instance, a price of zero means the commodity does not have any price (it is free); and there cannot be any negative price. • Thus ratio scale has a meaningful zero. • It allows unit conversions like metres to feet, kilogram to calories etc.
  • 124. SCALES OF MEASUREMENT Ratio Scale: • The ratio scale is the highest level of measurement scale. It is unique to a congruence or proportionality transformation of form Y = bX. • The ‘absolute zero’ property allows performing a wide range of descriptive and inferential statistics on ratio scale variables. • It is possible to compare both differences in values and the relative magnitude of values. • For instance, the difference between 15cm and 20cm is the same as between 30cm and 35cm, and 30 cm is twice as long as 15 cm. • Arithmetic operations such as addition, subtraction, multiplication, and division (ratio) can be performed in ratio scale data
  • 125. SCALES OF MEASUREMENT Ratio Scale: • All statistical operations applicable to nominal, ordinal and interval scale can be performed on ratio scale data as well. • Besides, measures of central tendency such as geometric mean and harmonic mean and all measures of dispersion, including coefficient of variation, can be determined. • Parametric tests such as independent sample t-test, paired sample t-test, ANOVA etc., can also be performed. • The ratio scale provides unique opportunities for statistical analysis.
  • 126. SCALES OF MEASUREMENT Scale Properties Nominal Categories Ordinal Categories Rank Interval Categories Rank Intervals Ratio Categories Rank Interval True or absolute zero
  • 128. CROSS TABULATION Body weight Normal Overweight Gender Male 10 15 25 Female 15 10 25
  • 129. SOURCES OF DATA Three main sources for demographic and social statistics • Censuses • Surveys • Administrative records. A population census • The total process of collecting, compiling, evaluating, analysing and publishing or otherwise disseminating demographic, economic and social data pertaining, at a specified time, to all persons in a country or in a well-delimited part of a country. • The census collects data from each individual and each set of living quarters for the whole country or area. • It allows estimates to be produced for small geographic areas and for population subgroups. • It also provides the base population figures needed to calculate vital rates from civil registration data, and it supplies the sampling frame for sample surveys.
  • 130. SOURCES OF DATA Population census steps • Securing the required legislation, political support and funding • Mapping and listing all households • Planning and printing questionnaires, instruction manuals and procedures • Planning for shipping census materials • Recruiting and training census personnel • Organizing field operations • Launching publicity campaigns • Preparing for data processing • Planning for tabulation
  • 131. SOURCES OF DATA Population census data • Because of the expense and complexity of the census, only the most basic items are included on the questionnaire for the whole population. • Choosing these items requires considering the needs of data users; availability of the information from other data sources; international comparability; willingness of the respondents to give information; and available resources to fund the census. • Many countries carry out a sample enumeration in conjunction with the census. • This can be a cost-effective way to collect more detailed information on additional topics from a sample of the population. • The sample enumeration uses the infrastructure and facilities that are already in place for the census.
  • 132. SOURCES OF DATA Surveys • A continuing program of intercensal household surveys is useful for collecting detailed data on social, economic and housing characteristics that are not appropriate for collection in a full-scale census. • Household-based surveys are the most flexible type of data collection. • They can examine most subjects in detail and provide timely information about emerging issues. • They increase the ability and add to the experience of in-house technical and field staff and maintain resources that have already been developed, such as maps, sampling frame, field operations, infrastructure and data-processing capability.
  • 133. SOURCES OF DATA Surveys • The many types of household surveys include multi- subject surveys, specialized surveys, multi-phase surveys and panel or longitudinal surveys. • Each type of survey is appropriate for certain kinds of data-collection needs. • Household surveys can be costly to undertake, especially if a country has no ongoing program
  • 134. SOURCES OF DATA Administrative records • Administrative records are statistics compiled from various administrative processes. • They include not only the vital events recorded in a civil registration system but also education statistics from school records; health statistics from hospital records; employment statistics; and many others. • The reliability and usefulness of these statistics depend on the completeness of coverage and the compatibility of concepts, definitions and classifications with those used in the census. • Administrative records are often by-products of administrative processes, but they can also be valuable complementary sources of data for censuses and surveys.
  • 135. SOURCES OF DATA Administrative records • Birth certificates • Death certificates • Patient medical records • Disease registries • Insurance claim forms • Billing records • Public health department case reports