SlideShare a Scribd company logo
Health statistics
PRESENTED BY
- Aman Siddiqui, Ayushi Jaiswal, Abdul
Gayas , Hijab Chaudhary, Nauman
Baig, KK Shravan, Divya Chadha,
Vibhor Kumar Singh-
Statistics is a branch of mathematics that deals with every aspect of data. It
encompasses the collection, analysis, interpretation, and presentation of data. Here
are some key points about statistics:
Purpose: Statistics helps us make informed decisions based on data. It guides us in
choosing the appropriate methods for data collection and analyzing samples
effectively.
Types of Statistics:
• Descriptive Statistics: Descriptive statistics use graphs, tables, or numerical
calculations to provide descriptions of a population.
• Inferential Statistics: Based on data samples from the population, inferential
statistics make predictions and draw inferences.
Characteristics of Statistics:
• Numerically expressed.
• Collected systematically.
• Comparable to each other.
• Used for planned purposes.
Importance of Statistics:
• Gathers information about quantitative data.
• Presents complex data in graphical or tabular forms.
• Provides accurate descriptions.
• Aids in effective planning and inquiry design.
• Offers valid inferences about population parameters from sample
data.
• Reveals variability patterns through quantitative
observations.
Biostatistics:
Biostatistics, also known as biometry, is a branch of statistics that applies statistical methods to various
topics in biology.
It encompasses several key aspects:
Design of Biological Experiments: Biostatisticians help plan and design experiments
in fields such as medicine, pharmacy, and agriculture.
Data Collection: They collect data from these experiments.
Data Summarization and Analysis: Biostatistics involves summarizing and analyzing
the collected data.
Interpretation of Results: Biostatisticians interpret the findings to draw meaningful
conclusions .
Applications of Biostatistics:
• Biostatistics plays a crucial role in various health-related fields, including:
• Public Health: It provides the quantitative foundation for public health
practice and research.
• Medicine: Biostatistics aids in clinical trials, drug development, and medical
research.
• Biology: It supports biological research by analyzing data from experiments
and surveys.
Biostatistics helps researchers:
• Understand the nature of variability in biological data.
• Derive general laws from small samples.
• Make informed decisions based on statistical analyses.
Experiments:
Purpose: Experiments are designed to investigate cause-and-effect relationships by
manipulating one or more variables.
Data Collection:
• Controlled Environment: Researchers conduct experiments in a controlled
environment, ensuring consistency.
• Treatment Groups: Participants are assigned to different treatment groups (e.g.,
experimental and control groups).
Observations: Data is collected through observations, measurements, and
recordings.
Variables: Researchers collect data on independent and dependent variables.
Examples:
• Clinical drug trials.
• Laboratory experiments.
• Agricultural field trials.
Surveys:
Purpose: Surveys gather information from a sample of individuals to understand
opinions, behaviours, or characteristics.
Data Collection:
Questionnaires: Researchers design questionnaires or interviews.
Sampling: A representative sample is selected from the population.
Responses: Participants provide answers to survey questions.
Quantitative and Qualitative Data: Surveys yield both quantitative (numeric) and
qualitative (descriptive) data.
Examples:
• Consumer satisfaction surveys.
• Political polls.
• Health behaviour surveys.
• In both experiments and surveys, rigorous data collection ensures
reliable and meaningful results for scientific inquiry
AYUSHI JAISWAL
MPT ORTHO
A011141923019
HEALTH
STATISTICS
DATA
AND ITS TYPES
Introduction to Health statistics and biostats
Quantitative data
• Quantitative data seems to be the easiest to explain. It answers key questions
such as “how many, “how much” and “how often”.
• Quantitative data can be expressed as a number or can be quantified. Simply
put, it can be measured by numerical variables.
• Quantitative data are easily amenable to statistical manipulation and can be
represented by a wide variety of statistical types of graphs and charts such as
line, bar graph, scatter plot, and etc.
• Examples of quantitative data:
• Scores on tests and exams e.g. 85, 67, 90 and etc.
• The weight of a person or a subject.
• Your shoe size.
• The temperature in a room.
STRENGTHS
• Precision and Accuracy: Quantitative data provides precise and accurate numerical
measurements.
• Objectivity: It minimizes subjectivity, being based on facts rather than personal opinions.
• Data Visualization: Easily represented through charts and graphs for effective
communication.
• Comparability: Enables straightforward comparisons between different groups or variables.
• Objective Decision-Making: Provides a clear basis for making informed decisions.
• Ease of Communication: Numerical data simplifies communication and reporting.
LIMITATIONS
• Lack of Context: Quantitative data may lack the context or depth that qualitative
data can provide, offering a limited understanding of the underlying reasons or
meanings.
• Difficulty in Capturing Complex Phenomena: Some complex phenomena, emotions,
or experiences may not be adequately captured or measured through numerical
values alone.
• Potential for Oversimplification: Quantitative data might oversimplify reality,
reducing multifaceted situations to numerical representations and potentially
overlooking important nuances.
• Inability to Address Unanticipated Factors: Quantitative methods may struggle to
account for unexpected variables or factors that were not initially considered in the
research design.
Qualitative data
• Qualitative data can’t be expressed as a number and can’t be measured. It consist of
words, pictures, and symbols, not numbers.
• Qualitative data is also called categorical data because the information can be sorted
by category, not by number.
• Qualitative data can answer questions such as “how this has happened” or and “why
this has happened”.
• Examples of qualitative data:
• Colors e.g. the color of the sea
• Your favorite holiday destination such as Hawaii, New Zealand and etc.
• Names as ,…..
• Ethnicity such as American Indian, Asian, etc.
STRENGTH
• Data based on the participants' own categories of meaning
• Useful for studying a limited number of cases in depth
• Can conduct cross-case comparisons and analysis
• Provides understanding and description of people's personal
experiences of phenomena
• -Qualitative researchers are especially responsive to changes that
occur during the conduct
of a study and may shift the focus of their studies
as a result
LIMITATIONS
• -Knowledge produced might not generalize to other people or other
settings
-It is difficult to make quantitative predictions
-It is more difficult to test hypotheses and theories with large participant
pools.
-It might have lower credibility with some administrators and
commissioners of programs
-It generally takes more time to collect the data when compared to
quantitative research
-Data analysis is often time consuming
-The results are more easily influenced by the researcher's personal biases
NOMINAL DATA
• Nominal data is used just for labeling variables, without any type of quantitative value.
The name ‘nominal’ comes from the Latin word “nomen” which means ‘name’.
• The nominal data just name a thing without applying it to order. Actually, the nominal
data could just be called “labels.”
• Nominal data cannot be quantified.
• It also cannot be assigned to any type of order.
• Those categories have no meaningful order.
• Examples of Nominal Data:
• Gender (Women, Men)
• Hair color (Blonde, Brown, Brunette, Red, etc.)
• Marital status (Married, Single, Widowed)
• Ethnicity (Hispanic, Asian)
ORDINAL DATA
• Ordinal data shows where a number is in order. This is the crucial difference from
nominal types of data.
• Ordinal data is data which is placed into some kind of order by their position on a
scale. Ordinal data may indicate superiority.
• However, you cannot do arithmetic with ordinal numbers because they only show
sequence.
• Ordinal variables are considered as “in between” qualitative and quantitative
variables.
• In other words, the ordinal data is qualitative data for which the values are ordered.
• In comparison with nominal data, the second one is qualitative data for which the
values cannot be placed in an ordered.
• We can also assign numbers to ordinal data to show their relative position. But
we cannot do math with those numbers. For example: “first, second, third…etc.”
• Examples of Ordinal Data:
• The first, second and third person in a competition.
• Letter grades: A, B, C, and etc.
• When a company asks a customer to rate the sales experience on a scale of 1-10.
• Economic status: low, medium and high.
DISCRETE DATA
• Discrete data is a count that involves only integers. The discrete values cannot be subdivided
into parts.
• For example, the number of children in a class is discrete data. You can count whole
individuals. You can’t count 1.5 kids.
• To put in other words, discrete data can take only certain values. The data variables cannot be
divided into smaller parts.
• It has a limited number of possible values e.g. days of the month.
• Examples of discrete data:
• The number of students in a class.
• The number of workers in a company.
• The number of home runs in a baseball game.
• The number of test questions you answered correctly
CONTINOUS DATA
• Continuous data is information that could be meaningfully divided into
finer levels. It can be measured on a scale or continuum and can have
almost any numeric value.
• For example, you can measure your height at very precise scales —
meters, centimeters, millimeters and etc.
• You can record continuous data at so many different measurements –
width, temperature, time, and etc. This is where the key difference
from discrete types of data lies.
• The continuous variables can take any value between two numbers.
For example, between 50 and 72 inches, there are literally millions of
possible heights: 52.04762 inches, 69.948376 inches and etc.
• A good great rule for defining if a data is continuous or discrete is that
if the point of measurement can be reduced in half and still make
sense, the data is continuous.
• Examples of continuous data:
• The amount of time required to complete a project.
• The height of children.
• The square footage of a two-bedroom house.
• The speed of cars.
HYPOTHESIS TESTING
ABDUL GAYAS
A011141923008
MPT SEM 2 (SPORTS)
Hypothesis
Hypothesis is considered as an intelligent guess or prediction, that gives directional to the
researcher to answer the research question.
• Hypothesis or Hypotheses are defined as the formal statement of the tentative or
expected prediction or explanation of the relationship between two or more variables in a
specified population
A hypothesis is a formal tentative statement of the expected relationship between two or
more variables under study.
• A hypothesis helps to translate the research problem and objective into a clear
explanation or prediction of the expected results or outcomes of the study
Why hypothesis is done ?
• It provides clarity to the research problem and research objectives.
• It describes, explains or predicts the expected results or outcome of
the research.
• It indicates the type of research design.
• It directs the research study process. It identifies the population of
the research study that is to be investigated or examined. It facilitates
data collection, data analysis and data interpretation
Types of hypothesis
•SIMPLE hypothesis is that one in which there exits relationship between two variables one is called
independent variable or cause and the other is dependent variable or effect. • Ex. Smoking leads to cancer • The
higher ratio of unemployment leads to crimes.
• COMPLEX HYPOTHESIS is that one in which as relationship among variables exists. • In this type dependent
and independent variables are more than two • Ex. Smoking and other drugs leads to cancer, tension, chest
infections etc. • The higher ration of unemployment poverty illiteracy leads to crimes like dacoit etc.
• EMPIRICAL HYPOTHESIS which means it is based on evidence. • In scientific method the word "empirical"
refers to the use of working hypothesis that can be tested using observation and experiment. • Empirical data is
produced by experiment and observation.
•QUESTION FORM OF HYPOTHESIS It Is the simplest form of empirical hypothesis. • In simple case of
investigation and research are adequately implemented by resuming a question. • Ex. how is the ability of 9th
class students in learning moral values?
•NULL HYPOTHESIS is that there is no significant difference between specified populations, any observed
difference being due to sampling or experimental error. • It is denoted by H0
• ALTERNATE HYPOTHESIS denoted by H1 or Ha, • Is the hypothesis that sample observations are influenced
by some non-random cause.
• STATISTICAL HYPOTHESIS which can be verified statistically called statistical hypothesis. • The statement
would be logical or illogical but if statistic verifies it, it will be statistical hypothesis..
Steps for testing hypothesis
Hypothesis testing refers to
1. Making an assumption, called hypothesis, about a population
parameter.
2. Collecting sample data.
3. Calculating a sample statistic.
4. Using the sample statistic to evaluate the hypothesis (how likely is it
that our hypothesized parameter is correct. To test the validity of our
assumption we determine the difference between the hypothesized
parameter value and the sample value.)
Null hypothesis
• The null hypothesis H0 represents a theory that has been put forward
either because it is believed to be true or because it - the B-school is
used as a basis for an argument and has not been proven. For
example, in a clinical trial of a new drug, the null hypothesis might be
that the new drug is no better, on average, than the current drug. We
would write H0: there is no difference between the two drugs on an
average.
Alternate hypothesis
• The alternative hypothesis, HA, is a statement of what a statistical
hypothesis test is set up to establish.
• For example, in the clinical - the B-school trial of a new drug, the
alternative hypothesis might be that the new drug has a different
effect, on average, compared to that of the current drug. We would
write HA: the two drugs have different effects, on average. or HA: the
new drug is better than the current drug, on average. The result of a
hypothesis test: ‘Reject H0 in favour of HA’ OR ‘Do not reject H0’
Test of significance
The methods of inference used to support or reject claims based on
sample data is called the test of significance.
TYPES OF TEST OF SIGNIFICANCE:
Parametric Test
These tests concern parameters of population and hence conducted on
quantitative data i.e. Numeric data which may be discrete or continuous,
Eg. Z-test, t-test, ANOVA
Non Parametric Test
Distribution free test are applied when the above mentioned assumptions are not met. These are
conducted on qualitative data i.e. Nominal or ordinal data
Eg. Wilcoxon Test , Mc Nemar Test
Selecting & interpreting significance level
• 1. Deciding on a criterion for accepting or rejecting the null hypothesis.
• 2. Significance level refers to the percentage of sample means that is
outside certain prescribed limits.
• E.g testing a hypothesis at 5% level of significance means" that we reject
the null hypothesis if it falls in the two regions of area 0.025.
• Do not reject the null hypothesis if it falls within the region of area
0.95.3.
• The higher the level of significance, the higher is the probability of
rejecting the null hypothesis when it is true.(acceptance region narrows)
Type I error
• Type I error refers to the situation when we reject the null hypothesis
when it is true (H is wrongly rejected).
• Eg H0: there is no difference between the two drugs on average.
• Type I error will occur if we conclude that the two drugs produce
different effects when actually there isn't a difference.
• Prob (Type I error) = significance level = α
Type II error
• Type II error refers to the situation when we accept the null
hypothesis when it is false.
• H0: there is no difference between the two drugs on average .
• Type two error will occur If we conclude that the two drugs produce
the same effect when there is a difference.
• Prob ( Type II error) = ββ
Graphical Representation
of Median, Mode,
Partition Values
By: Mirza Nauman Baig
Understanding Central Tendencies
• Central tendencies are statistical measures
that represent the center of a data set.
• The median is the middle value, the mode
is the most common value, and the mean is
the average value.
• These values can provide insights about
the distribution of the data.
Median:
The median is the value that separates a dataset into two equal halves
Mode:
The value that appears most frequently in a dataset
Partition Values:
A way to divide data for analysis. Quartiles divides a dataset into four equal parts.
DEFINITIONS
Graphical Determination of Median
• STEPS:
1. Arrange data in ascending order. Depending on the nature of the data, you can either draw a number line
or create a histogram with the data intervals.
2. For a number line, mark each data point at its corresponding value on the line.
3. For a histogram, plot the frequency of each interval. For each data point or interval in the histogram,
calculate the cumulative frequency.
4. Determine the total number of data points, denoted as 'n'. If 'n' is odd, the median will be the value at
position (n+1)/2. If 'n' is even, the median will be the average of the values at positions n/2.
5. On the number line or histogram, find the point where the cumulative frequency equals or exceeds the
value calculated in step 5. This point corresponds to the median.
6. If the median falls exactly on a data point, that data point is the median. If it falls between two data
points, interpolate to find the exact value of the median.
Example Of Graphical Representation
Graphical Determination of Mode
• STEPS:
1. Create a frequency distribution table that lists each unique value in the dataset
along with its frequency. Determine appropriate intervals for the data.
2. For a histogram, draw a set of bars representing each interval with the height of
each bar corresponding to the frequency of data points within that interval.
3. For a frequency polygon, plot points for each interval at the midpoint of the
interval on the x-axis and the frequency on the y-axis. Then, connect the points
with straight lines.
4. Look for the bar with the highest frequency in the histogram or the peak point
in the frequency polygon. This value represents the mode of the dataset.
Example Of Graphical Representation
Graphical Determination of Partition Values
It helps divide the data into equals parts. Quartiles, Deciles, and Percentiles are some
of the most often used partition values.
For example, Quartile divides data into 4 equal parts.
STEPS:
1. Arrange the data in an ascending manner. Find the Q1, Q2 and Q3 values.
Quartile 1 (Q1) lies between starting term and the middle term. ({n + 1}/{4})th
term
Quartile 2(Q2) lies between starting terms and the last terms i.e., the Middle term. The second
quartile is also equal to the median. ({n + 1}/{2})th
term
Quartile 3(Q3) lies between quartile 2 and the last term. ({3(n + 1)}/{4})th
term
• Where n is the total count of numbers in the given data
2. Locate these points on the graph.
Example Of Graphical Representation
CASE STUDY
• Lets take an example of a dataset to understand how to graphically
represent median, mode and partition values.
• Dataset: 80, 75, 90, 85, 70, 80, 85, 95, 85, 75, 80, 70, 85, 90, 85, 80, 75, 90,
85, 70
• STEP 1: Arrange the data and create a frequency table.
•
CASE STUDY
STEP 2: Plot the graph: we'll create a histogram to visualize the
distribution of the scores.
CASE STUDY
• STEP 3: Determine the median in the dataset which is also the Q2
value. Here, the median in 82.5. Now, plot this value in the graph.
CASE STUDY
• STEP 4: Determine the mode in the dataset. Here, the median in 85.
Now, plot this value in the graph.
CASE STUDY
• STEP 5: Determine the Q1 & Q2 values in the dataset. Here, the Q1 is
75 and Q2 is 90. Now, plot this value in the graph.
Quartile,Decile,Percentile
DIVYACHADHA
A011141903042
Defination
• In statistics, Quartiles are key values that divide a dataset into four equal parts. When dealing with
extensive numerical data in statistical analysis, various concepts and formulas come into play, making
them highly applicable in research and surveys. One of the most practical applications of quartiles is
in creating box and whisker plots.
Quartiles play a crucial role in dividing a dataset into three distinct sections.
• The middle section, which is one of the three quartiles, represents the central point of the data
distribution and includes data points close to this central value. The lower section comprises roughly
half of the dataset and encompasses the values that fall below the median, while the upper section
represents the remaining half that falls above the median. In essence, quartiles provide valuable
insights into the distribution and dispersion of a dataset, making them a fundamental tool in statistical
analysis.
• Quartiles: As previously described, quartiles divide the data into four equal parts, focusing on the
median (Q2) and the values below (Q1) and above (Q3) the median.
• Deciles: Deciles divide the data into ten equal parts. Decile values, such as D1, D2, D3, and so on,
help assess the distribution and spread of data with more granularity.
• Percentiles: Percentiles divide the data into one hundred equal parts. They are often used to compare
data points to a broader population and are valuable for assessing how specific data points rank
relative to the entire dataset
Quartiles
Definition
• Quartiles are values that divide an entire dataset into four equal parts, resulting in three quartiles:
Q1, Q2, and Q3. Q2 is equivalent to the median, as it indicates the position of an item in the ordered
list and serves as a positional average. To calculate quartiles for a dataset, it’s necessary to first
arrange the data in ascending order.
• While the median provides insight into the central tendency of the data, we can further assess the
distribution by considering the lower and upper quartiles. Beyond quartiles, statistics offer other
measures that divide data into specific equal parts, including:
• These different measures, including quartiles, deciles, and percentiles, allow statisticians and data
analysts to gain a deeper understanding of data distributions and make more informed decisions in
various fields such as finance, healthcare, and social sciences.
Quartiles in Statistics
• An excellent explanation of how quartiles divide a dataset into four equal parts, each representing a specific percentage of the
data. This breakdown is particularly valuable for understanding the distribution of data and identifying key points within that
distribution. Here’s a summary of the quartiles and their associated percentages:
• First Quartile (Q1): The first quartile represents the 25% mark of the data when it is sorted from smallest to largest. This
means that 25% of the data points are less than or equal to Q1, while 75% of the data points are greater.
• Second Quartile (Q2 – Median): The second quartile corresponds to the median, which divides the data into two equal halves.
50% of the data points are less than or equal to Q2, and the remaining 50% are greater.
• Third Quartile (Q3): The third quartile represents the 75% mark of the data. This indicates that 75% of the data points are less
than or equal to Q3, while 25% are greater.
• Fourth Quartile: While not explicitly mentioned in your description, the fourth quartile would encompass the remaining 25%
of the largest data points, meaning 25% of the data falls within this quartile.
• Understanding these quartiles is crucial in statistics for summarizing data distributions and identifying specific data points that
correspond to given percentages within the dataset.
Quartiles
Formula
• Suppose, Q3 is the upper quartile is the median of the upper half of the data set. Whereas, Q1 is the
lower quartile and median of the lower half of the data set. Q2 is the median. Consider, we have n
number of items in a data set. Then the quartiles are given by:
• Q1 = [(n+1)/4]th
item
• Q2 = [(n+1)/2]th
item
• Q3 = [3(n+1)/4]th
item
• Hence, the formula for quartile can be given by;
• Where Qr is the rth
quartile
• l1 is the lower limit
• l2 is the upper limit
• f is the frequency
• c is the cumulative frequency of the class preceding the quartile class.
Quartile
Deviation
• Quartile deviation is a measure of the spread or dispersion of data within a dataset, and it is calculated as half of the distance
between the third quartile (Q3) and the first quartile (Q1). The formula for quartile deviation is as follows:
• ​
Quartile deviation = (Q3-Q1)/2
• This measure provides a sense of the variability of data within the interquartile range, which contains the middle 50% of the data
points. Quartile deviation is a useful statistic for understanding the spread of data and identifying the middle range of values in a
dataset.
• Interquartile Range
• The interquartile range (IQR) and its formula. The interquartile range is a valuable statistical measure that helps us understand the
spread or dispersion of data within a dataset, particularly focusing on the middle 50% of the data points. It is defined as the
difference between the third quartile (Q3) and the first quartile (Q1).
• IQR = Q3 – Q1
Quartiles
Formula
Examples
• Example 1: Quartiles Calculation
• Suppose you have the following dataset of exam scores (in ascending order):
• Dataset: 55, 60, 65, 70, 75, 80, 85, 90, 95, 100
• To find the quartiles:
• Find the Median (Q2 – Second Quartile):
• The dataset has an even number of values, so the median is the average of the two middle values.
• Q2 = (70 + 75) / 2 = 72.5
• Find the Lower Quartile (Q1 – First Quartile):
• To calculate Q1, focus on the lower half of the data (values below the median).
• Q1 = (60 + 65) / 2 = 62.5
• Find the Upper Quartile (Q3 – Third Quartile):
• To calculate Q3, focus on the upper half of the data (values above the median).
• Q3 = (90 + 95) / 2 = 92.5
• So, for this dataset, the quartiles are:
• First Quartile (Q1): 62.5
• Second Quartile (Median – Q2): 72.5
• Third Quartile (Q3): 92.5.
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
ESTIMATION OF CONFIDENCE
INTERVAL & LIMIT
KK SHRAVAN
A011141923009
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
Introduction to Health statistics and biostats
NORMAL DISTRIBUTION
• The commonest and the most useful continuous distribution.
• A symmetrical probability distribution where most results are located in the
middle and few are spread on both sides.
• Can entirely be described by its mean and standard deviation.
• Its tails extending infinitely in both directions.
• The wider the curve, the larger the standard deviation and the more variation
exists in the process.
PROPERTIES OF NORMAL DISTRIBUTION
• The mean, mode and median are all equal.
• The curve is symmetric at the center (i.e. around the mean, μ).
• Exactly half of the values are to the left of center and exactly half
the values are to the right.
• The total area under the curve is 1.
Introduction to Health statistics and biostats
CHARACTERISTICS OF THE NORMAL DISTRIBUTION
• Symmetric, bell shaped
• All normal curves are symmetric about the mean μ.
• Continuous for all values of X between -∞ and ∞ so that each conceivable
interval of real numbers has a probability other than zero.
• -∞ ≤ X ≤ ∞
• Two parameters, µ and σ. Note that the normal distribution is actually a family
of distributions, since µ and σ determine the shape of the distribution.
• The notation N(µ, σ 2 ) means normally distributed with mean µ and variance
σ 2 . If we say X N(µ, σ 2 ) we mean that X is distributed N(µ, σ 2 ).
∼
• About 2/3 of all cases fall within one standard deviation of the mean, that is
• P(µ - σ ≤ X ≤ µ + σ) = .6826.
• About 95% of cases lie within 2 standard deviations of the mean, that is
• P(µ - 2σ ≤ X ≤ µ + 2σ) = .9544
Empirical Rule:
For any normally distributed data:
68% of the data fall within 1 standard deviation of the mean.
95% of the data fall within 2 standard deviations of the mean.
99.7% of the data fall within 3 standard deviations of the
USAGE OF NORMAL DISTRIBUTION
• It is used to illustrate the shape and variability of the data.
• It is used to estimate future process performance.
• Normality is an important assumption when conducting statistical analysis.
• Can be found practically everywhere:
• In nature.
• In engineering and industrial processes.
• In social and human sciences.
• Many everyday data sets follow approximately the normal distribution.
• Helps calculating the probabilities for normally distributed populations.
• The probabilities are represented by the area under the normal curve.
• The total area under the curve is equal to 100% (or 1.00).
• This represents the population of the observations.
• Since the normal curve is symmetrical, 50 percent of the data lie on each side
of the curve.
THE STANDARDIZED NORMAL
• Any normal distribution (with any mean and standard deviation
combination) can be transformed into the standardized normal distribution
(Z)
• To compute normal probabilities need to transform X units into Z units
• The standardized normal distribution (Z) has a mean of 0 and a standard
deviation of 1
TRANSLATION TO THE STANDARDIZED NORMAL DISTRIBUTION
• Translate from X to the standardized normal (the “Z” distribution) by subtracting the mean of X
and dividing by its standard deviation
Introduction to Health statistics and biostats

More Related Content

Similar to Introduction to Health statistics and biostats (20)

PPTX
BIOSTATISTICS (MPT) 11 (1).pptx
VaishnaviElumalai
 
PPTX
Basic statistical & pharmaceutical statistical applications
YogitaKolekar1
 
PPTX
Chapter 2 business mathematics for .pptx
nursophia27
 
PPTX
Biostatistic 2.pptx
imrantestmails
 
DOCX
Note.docx
ashiquepa3
 
PPTX
Introduction to basics of bio statistics.
AB Rajar
 
PPTX
Measures of Condensation.pptx
Melba Shaya Sweety
 
PPT
1. Biost. Introduction(2).ppt
muktarkedir459
 
PPSX
Data type source presentation im
Mohmmedirfan Momin
 
PPTX
Introduction to Data (Data Analytics)...
70147084
 
PPTX
Medical Statistics.pptx
Siddanna B Chougala C
 
PPTX
Introduction to Statistics statistics formuls
MaulikVasani1
 
PPTX
Basic of Biostatisticsin the field of healthcare research.pptx
ZainyKhan9
 
PDF
Stats !.pdf
phweb
 
PPT
1.introduction
abdi beshir
 
PPTX
Chapter_1_Lecture.pptx
ZelalemGebreegziabhe
 
PPTX
What is Statistics
sidra-098
 
PPTX
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
PETTIROSETALISIC
 
PPT
Introduction to statistics
Shaamma(Simi_ch) Fiverr
 
PPTX
Data in Research
DrRipika Sharma
 
BIOSTATISTICS (MPT) 11 (1).pptx
VaishnaviElumalai
 
Basic statistical & pharmaceutical statistical applications
YogitaKolekar1
 
Chapter 2 business mathematics for .pptx
nursophia27
 
Biostatistic 2.pptx
imrantestmails
 
Note.docx
ashiquepa3
 
Introduction to basics of bio statistics.
AB Rajar
 
Measures of Condensation.pptx
Melba Shaya Sweety
 
1. Biost. Introduction(2).ppt
muktarkedir459
 
Data type source presentation im
Mohmmedirfan Momin
 
Introduction to Data (Data Analytics)...
70147084
 
Medical Statistics.pptx
Siddanna B Chougala C
 
Introduction to Statistics statistics formuls
MaulikVasani1
 
Basic of Biostatisticsin the field of healthcare research.pptx
ZainyKhan9
 
Stats !.pdf
phweb
 
1.introduction
abdi beshir
 
Chapter_1_Lecture.pptx
ZelalemGebreegziabhe
 
What is Statistics
sidra-098
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
PETTIROSETALISIC
 
Introduction to statistics
Shaamma(Simi_ch) Fiverr
 
Data in Research
DrRipika Sharma
 

Recently uploaded (20)

PDF
2026-mvp-medicare-advantage-product-training.pdf
MVP Health Care
 
PDF
Understanding Gonadorelin Functions, Uses, and Clinical Significance.pdf
aasraw.co
 
PPTX
pptejkfhdwjfbwdkfhbdsnfbsdmfbsdfsdbhmfsdfnmsdnf
AbdulRehman385433
 
PPTX
Cells are the fundamental structural and functional units of life, and their ...
tarun35435605
 
PPTX
Leukemia / CHRONIC LEUKEMIA / CML / CLL.pptx
Ayesha Fatima
 
PPTX
2026-medicare-basics-training-presentation
MVP Health Care
 
PPTX
Spinal cord lesions(central nervous system)
drnaushimujeeb
 
PDF
Get Fastest Relocation via Air Ambulance from Delhi and Chennai by Panchmukhi...
Panchmukhi Air& Train Ambulance Services
 
PPTX
6) ASSR-KB.pptxgghjjttyjjjjvbnjgfffujjbjjm
kaurmuskanpreet2504
 
PPT
BENIGN PROSTATE HYPERPLASIA POWER POINT PRESENTATION
Rekhanjali Gupta
 
PDF
Best Diet Plan For The Hairfall & Healthy Hair
Toxic Free India
 
PPTX
2026ProposedChangestoQPP_MedisolvBlog.pptx
Shelby Lewis
 
PPTX
Disorders of prostate - Benign hypertrophy of prostate.pptx
Anju Kumawat
 
PPTX
ICCA Training - Non ICU Staff V1.pptx Slides
digitalcriticalcare
 
PPTX
Orthopedic Billing Services in Ohio, OH.pptx
JessicaParker89031
 
PPTX
Chemistry of terpenoids-Dr. Pravin N. Khatale
PravinKhatale2
 
PDF
2030 Analysis: Dental Anesthesia Market Trends Reshaping Patient Comfort
Kumar Satyam
 
PPTX
Introduction to Healthcare: The Importance of Infection Control
Earlene McNair
 
PDF
Seen and Celebrated! Bassem Matta Vanderbilt, 2
Smiling Lungs
 
PPTX
NEONATAL SEPSIS AND MANAGEMENT......pptx
KalupsaDick
 
2026-mvp-medicare-advantage-product-training.pdf
MVP Health Care
 
Understanding Gonadorelin Functions, Uses, and Clinical Significance.pdf
aasraw.co
 
pptejkfhdwjfbwdkfhbdsnfbsdmfbsdfsdbhmfsdfnmsdnf
AbdulRehman385433
 
Cells are the fundamental structural and functional units of life, and their ...
tarun35435605
 
Leukemia / CHRONIC LEUKEMIA / CML / CLL.pptx
Ayesha Fatima
 
2026-medicare-basics-training-presentation
MVP Health Care
 
Spinal cord lesions(central nervous system)
drnaushimujeeb
 
Get Fastest Relocation via Air Ambulance from Delhi and Chennai by Panchmukhi...
Panchmukhi Air& Train Ambulance Services
 
6) ASSR-KB.pptxgghjjttyjjjjvbnjgfffujjbjjm
kaurmuskanpreet2504
 
BENIGN PROSTATE HYPERPLASIA POWER POINT PRESENTATION
Rekhanjali Gupta
 
Best Diet Plan For The Hairfall & Healthy Hair
Toxic Free India
 
2026ProposedChangestoQPP_MedisolvBlog.pptx
Shelby Lewis
 
Disorders of prostate - Benign hypertrophy of prostate.pptx
Anju Kumawat
 
ICCA Training - Non ICU Staff V1.pptx Slides
digitalcriticalcare
 
Orthopedic Billing Services in Ohio, OH.pptx
JessicaParker89031
 
Chemistry of terpenoids-Dr. Pravin N. Khatale
PravinKhatale2
 
2030 Analysis: Dental Anesthesia Market Trends Reshaping Patient Comfort
Kumar Satyam
 
Introduction to Healthcare: The Importance of Infection Control
Earlene McNair
 
Seen and Celebrated! Bassem Matta Vanderbilt, 2
Smiling Lungs
 
NEONATAL SEPSIS AND MANAGEMENT......pptx
KalupsaDick
 
Ad

Introduction to Health statistics and biostats

  • 1. Health statistics PRESENTED BY - Aman Siddiqui, Ayushi Jaiswal, Abdul Gayas , Hijab Chaudhary, Nauman Baig, KK Shravan, Divya Chadha, Vibhor Kumar Singh-
  • 2. Statistics is a branch of mathematics that deals with every aspect of data. It encompasses the collection, analysis, interpretation, and presentation of data. Here are some key points about statistics: Purpose: Statistics helps us make informed decisions based on data. It guides us in choosing the appropriate methods for data collection and analyzing samples effectively. Types of Statistics: • Descriptive Statistics: Descriptive statistics use graphs, tables, or numerical calculations to provide descriptions of a population. • Inferential Statistics: Based on data samples from the population, inferential statistics make predictions and draw inferences.
  • 3. Characteristics of Statistics: • Numerically expressed. • Collected systematically. • Comparable to each other. • Used for planned purposes.
  • 4. Importance of Statistics: • Gathers information about quantitative data. • Presents complex data in graphical or tabular forms. • Provides accurate descriptions. • Aids in effective planning and inquiry design. • Offers valid inferences about population parameters from sample data. • Reveals variability patterns through quantitative observations.
  • 5. Biostatistics: Biostatistics, also known as biometry, is a branch of statistics that applies statistical methods to various topics in biology. It encompasses several key aspects: Design of Biological Experiments: Biostatisticians help plan and design experiments in fields such as medicine, pharmacy, and agriculture. Data Collection: They collect data from these experiments. Data Summarization and Analysis: Biostatistics involves summarizing and analyzing the collected data. Interpretation of Results: Biostatisticians interpret the findings to draw meaningful conclusions .
  • 6. Applications of Biostatistics: • Biostatistics plays a crucial role in various health-related fields, including: • Public Health: It provides the quantitative foundation for public health practice and research. • Medicine: Biostatistics aids in clinical trials, drug development, and medical research. • Biology: It supports biological research by analyzing data from experiments and surveys.
  • 7. Biostatistics helps researchers: • Understand the nature of variability in biological data. • Derive general laws from small samples. • Make informed decisions based on statistical analyses.
  • 8. Experiments: Purpose: Experiments are designed to investigate cause-and-effect relationships by manipulating one or more variables. Data Collection: • Controlled Environment: Researchers conduct experiments in a controlled environment, ensuring consistency. • Treatment Groups: Participants are assigned to different treatment groups (e.g., experimental and control groups). Observations: Data is collected through observations, measurements, and recordings. Variables: Researchers collect data on independent and dependent variables.
  • 9. Examples: • Clinical drug trials. • Laboratory experiments. • Agricultural field trials.
  • 10. Surveys: Purpose: Surveys gather information from a sample of individuals to understand opinions, behaviours, or characteristics. Data Collection: Questionnaires: Researchers design questionnaires or interviews. Sampling: A representative sample is selected from the population. Responses: Participants provide answers to survey questions. Quantitative and Qualitative Data: Surveys yield both quantitative (numeric) and qualitative (descriptive) data.
  • 11. Examples: • Consumer satisfaction surveys. • Political polls. • Health behaviour surveys. • In both experiments and surveys, rigorous data collection ensures reliable and meaningful results for scientific inquiry
  • 14. Quantitative data • Quantitative data seems to be the easiest to explain. It answers key questions such as “how many, “how much” and “how often”. • Quantitative data can be expressed as a number or can be quantified. Simply put, it can be measured by numerical variables. • Quantitative data are easily amenable to statistical manipulation and can be represented by a wide variety of statistical types of graphs and charts such as line, bar graph, scatter plot, and etc. • Examples of quantitative data: • Scores on tests and exams e.g. 85, 67, 90 and etc. • The weight of a person or a subject. • Your shoe size. • The temperature in a room.
  • 15. STRENGTHS • Precision and Accuracy: Quantitative data provides precise and accurate numerical measurements. • Objectivity: It minimizes subjectivity, being based on facts rather than personal opinions. • Data Visualization: Easily represented through charts and graphs for effective communication. • Comparability: Enables straightforward comparisons between different groups or variables. • Objective Decision-Making: Provides a clear basis for making informed decisions. • Ease of Communication: Numerical data simplifies communication and reporting.
  • 16. LIMITATIONS • Lack of Context: Quantitative data may lack the context or depth that qualitative data can provide, offering a limited understanding of the underlying reasons or meanings. • Difficulty in Capturing Complex Phenomena: Some complex phenomena, emotions, or experiences may not be adequately captured or measured through numerical values alone. • Potential for Oversimplification: Quantitative data might oversimplify reality, reducing multifaceted situations to numerical representations and potentially overlooking important nuances. • Inability to Address Unanticipated Factors: Quantitative methods may struggle to account for unexpected variables or factors that were not initially considered in the research design.
  • 17. Qualitative data • Qualitative data can’t be expressed as a number and can’t be measured. It consist of words, pictures, and symbols, not numbers. • Qualitative data is also called categorical data because the information can be sorted by category, not by number. • Qualitative data can answer questions such as “how this has happened” or and “why this has happened”. • Examples of qualitative data: • Colors e.g. the color of the sea • Your favorite holiday destination such as Hawaii, New Zealand and etc. • Names as ,….. • Ethnicity such as American Indian, Asian, etc.
  • 18. STRENGTH • Data based on the participants' own categories of meaning • Useful for studying a limited number of cases in depth • Can conduct cross-case comparisons and analysis • Provides understanding and description of people's personal experiences of phenomena • -Qualitative researchers are especially responsive to changes that occur during the conduct of a study and may shift the focus of their studies as a result
  • 19. LIMITATIONS • -Knowledge produced might not generalize to other people or other settings -It is difficult to make quantitative predictions -It is more difficult to test hypotheses and theories with large participant pools. -It might have lower credibility with some administrators and commissioners of programs -It generally takes more time to collect the data when compared to quantitative research -Data analysis is often time consuming -The results are more easily influenced by the researcher's personal biases
  • 20. NOMINAL DATA • Nominal data is used just for labeling variables, without any type of quantitative value. The name ‘nominal’ comes from the Latin word “nomen” which means ‘name’. • The nominal data just name a thing without applying it to order. Actually, the nominal data could just be called “labels.” • Nominal data cannot be quantified. • It also cannot be assigned to any type of order. • Those categories have no meaningful order. • Examples of Nominal Data: • Gender (Women, Men) • Hair color (Blonde, Brown, Brunette, Red, etc.) • Marital status (Married, Single, Widowed) • Ethnicity (Hispanic, Asian)
  • 21. ORDINAL DATA • Ordinal data shows where a number is in order. This is the crucial difference from nominal types of data. • Ordinal data is data which is placed into some kind of order by their position on a scale. Ordinal data may indicate superiority. • However, you cannot do arithmetic with ordinal numbers because they only show sequence. • Ordinal variables are considered as “in between” qualitative and quantitative variables. • In other words, the ordinal data is qualitative data for which the values are ordered. • In comparison with nominal data, the second one is qualitative data for which the values cannot be placed in an ordered.
  • 22. • We can also assign numbers to ordinal data to show their relative position. But we cannot do math with those numbers. For example: “first, second, third…etc.” • Examples of Ordinal Data: • The first, second and third person in a competition. • Letter grades: A, B, C, and etc. • When a company asks a customer to rate the sales experience on a scale of 1-10. • Economic status: low, medium and high.
  • 23. DISCRETE DATA • Discrete data is a count that involves only integers. The discrete values cannot be subdivided into parts. • For example, the number of children in a class is discrete data. You can count whole individuals. You can’t count 1.5 kids. • To put in other words, discrete data can take only certain values. The data variables cannot be divided into smaller parts. • It has a limited number of possible values e.g. days of the month. • Examples of discrete data: • The number of students in a class. • The number of workers in a company. • The number of home runs in a baseball game. • The number of test questions you answered correctly
  • 24. CONTINOUS DATA • Continuous data is information that could be meaningfully divided into finer levels. It can be measured on a scale or continuum and can have almost any numeric value. • For example, you can measure your height at very precise scales — meters, centimeters, millimeters and etc. • You can record continuous data at so many different measurements – width, temperature, time, and etc. This is where the key difference from discrete types of data lies. • The continuous variables can take any value between two numbers. For example, between 50 and 72 inches, there are literally millions of possible heights: 52.04762 inches, 69.948376 inches and etc.
  • 25. • A good great rule for defining if a data is continuous or discrete is that if the point of measurement can be reduced in half and still make sense, the data is continuous. • Examples of continuous data: • The amount of time required to complete a project. • The height of children. • The square footage of a two-bedroom house. • The speed of cars.
  • 27. Hypothesis Hypothesis is considered as an intelligent guess or prediction, that gives directional to the researcher to answer the research question. • Hypothesis or Hypotheses are defined as the formal statement of the tentative or expected prediction or explanation of the relationship between two or more variables in a specified population A hypothesis is a formal tentative statement of the expected relationship between two or more variables under study. • A hypothesis helps to translate the research problem and objective into a clear explanation or prediction of the expected results or outcomes of the study
  • 28. Why hypothesis is done ? • It provides clarity to the research problem and research objectives. • It describes, explains or predicts the expected results or outcome of the research. • It indicates the type of research design. • It directs the research study process. It identifies the population of the research study that is to be investigated or examined. It facilitates data collection, data analysis and data interpretation
  • 29. Types of hypothesis •SIMPLE hypothesis is that one in which there exits relationship between two variables one is called independent variable or cause and the other is dependent variable or effect. • Ex. Smoking leads to cancer • The higher ratio of unemployment leads to crimes. • COMPLEX HYPOTHESIS is that one in which as relationship among variables exists. • In this type dependent and independent variables are more than two • Ex. Smoking and other drugs leads to cancer, tension, chest infections etc. • The higher ration of unemployment poverty illiteracy leads to crimes like dacoit etc. • EMPIRICAL HYPOTHESIS which means it is based on evidence. • In scientific method the word "empirical" refers to the use of working hypothesis that can be tested using observation and experiment. • Empirical data is produced by experiment and observation. •QUESTION FORM OF HYPOTHESIS It Is the simplest form of empirical hypothesis. • In simple case of investigation and research are adequately implemented by resuming a question. • Ex. how is the ability of 9th class students in learning moral values? •NULL HYPOTHESIS is that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error. • It is denoted by H0 • ALTERNATE HYPOTHESIS denoted by H1 or Ha, • Is the hypothesis that sample observations are influenced by some non-random cause. • STATISTICAL HYPOTHESIS which can be verified statistically called statistical hypothesis. • The statement would be logical or illogical but if statistic verifies it, it will be statistical hypothesis..
  • 30. Steps for testing hypothesis Hypothesis testing refers to 1. Making an assumption, called hypothesis, about a population parameter. 2. Collecting sample data. 3. Calculating a sample statistic. 4. Using the sample statistic to evaluate the hypothesis (how likely is it that our hypothesized parameter is correct. To test the validity of our assumption we determine the difference between the hypothesized parameter value and the sample value.)
  • 31. Null hypothesis • The null hypothesis H0 represents a theory that has been put forward either because it is believed to be true or because it - the B-school is used as a basis for an argument and has not been proven. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug. We would write H0: there is no difference between the two drugs on an average.
  • 32. Alternate hypothesis • The alternative hypothesis, HA, is a statement of what a statistical hypothesis test is set up to establish. • For example, in the clinical - the B-school trial of a new drug, the alternative hypothesis might be that the new drug has a different effect, on average, compared to that of the current drug. We would write HA: the two drugs have different effects, on average. or HA: the new drug is better than the current drug, on average. The result of a hypothesis test: ‘Reject H0 in favour of HA’ OR ‘Do not reject H0’
  • 33. Test of significance The methods of inference used to support or reject claims based on sample data is called the test of significance. TYPES OF TEST OF SIGNIFICANCE: Parametric Test These tests concern parameters of population and hence conducted on quantitative data i.e. Numeric data which may be discrete or continuous, Eg. Z-test, t-test, ANOVA Non Parametric Test Distribution free test are applied when the above mentioned assumptions are not met. These are conducted on qualitative data i.e. Nominal or ordinal data Eg. Wilcoxon Test , Mc Nemar Test
  • 34. Selecting & interpreting significance level • 1. Deciding on a criterion for accepting or rejecting the null hypothesis. • 2. Significance level refers to the percentage of sample means that is outside certain prescribed limits. • E.g testing a hypothesis at 5% level of significance means" that we reject the null hypothesis if it falls in the two regions of area 0.025. • Do not reject the null hypothesis if it falls within the region of area 0.95.3. • The higher the level of significance, the higher is the probability of rejecting the null hypothesis when it is true.(acceptance region narrows)
  • 35. Type I error • Type I error refers to the situation when we reject the null hypothesis when it is true (H is wrongly rejected). • Eg H0: there is no difference between the two drugs on average. • Type I error will occur if we conclude that the two drugs produce different effects when actually there isn't a difference. • Prob (Type I error) = significance level = α
  • 36. Type II error • Type II error refers to the situation when we accept the null hypothesis when it is false. • H0: there is no difference between the two drugs on average . • Type two error will occur If we conclude that the two drugs produce the same effect when there is a difference. • Prob ( Type II error) = ββ
  • 37. Graphical Representation of Median, Mode, Partition Values By: Mirza Nauman Baig
  • 38. Understanding Central Tendencies • Central tendencies are statistical measures that represent the center of a data set. • The median is the middle value, the mode is the most common value, and the mean is the average value. • These values can provide insights about the distribution of the data.
  • 39. Median: The median is the value that separates a dataset into two equal halves Mode: The value that appears most frequently in a dataset Partition Values: A way to divide data for analysis. Quartiles divides a dataset into four equal parts. DEFINITIONS
  • 40. Graphical Determination of Median • STEPS: 1. Arrange data in ascending order. Depending on the nature of the data, you can either draw a number line or create a histogram with the data intervals. 2. For a number line, mark each data point at its corresponding value on the line. 3. For a histogram, plot the frequency of each interval. For each data point or interval in the histogram, calculate the cumulative frequency. 4. Determine the total number of data points, denoted as 'n'. If 'n' is odd, the median will be the value at position (n+1)/2. If 'n' is even, the median will be the average of the values at positions n/2. 5. On the number line or histogram, find the point where the cumulative frequency equals or exceeds the value calculated in step 5. This point corresponds to the median. 6. If the median falls exactly on a data point, that data point is the median. If it falls between two data points, interpolate to find the exact value of the median.
  • 41. Example Of Graphical Representation
  • 42. Graphical Determination of Mode • STEPS: 1. Create a frequency distribution table that lists each unique value in the dataset along with its frequency. Determine appropriate intervals for the data. 2. For a histogram, draw a set of bars representing each interval with the height of each bar corresponding to the frequency of data points within that interval. 3. For a frequency polygon, plot points for each interval at the midpoint of the interval on the x-axis and the frequency on the y-axis. Then, connect the points with straight lines. 4. Look for the bar with the highest frequency in the histogram or the peak point in the frequency polygon. This value represents the mode of the dataset.
  • 43. Example Of Graphical Representation
  • 44. Graphical Determination of Partition Values It helps divide the data into equals parts. Quartiles, Deciles, and Percentiles are some of the most often used partition values. For example, Quartile divides data into 4 equal parts. STEPS: 1. Arrange the data in an ascending manner. Find the Q1, Q2 and Q3 values. Quartile 1 (Q1) lies between starting term and the middle term. ({n + 1}/{4})th term Quartile 2(Q2) lies between starting terms and the last terms i.e., the Middle term. The second quartile is also equal to the median. ({n + 1}/{2})th term Quartile 3(Q3) lies between quartile 2 and the last term. ({3(n + 1)}/{4})th term • Where n is the total count of numbers in the given data 2. Locate these points on the graph.
  • 45. Example Of Graphical Representation
  • 46. CASE STUDY • Lets take an example of a dataset to understand how to graphically represent median, mode and partition values. • Dataset: 80, 75, 90, 85, 70, 80, 85, 95, 85, 75, 80, 70, 85, 90, 85, 80, 75, 90, 85, 70 • STEP 1: Arrange the data and create a frequency table. •
  • 47. CASE STUDY STEP 2: Plot the graph: we'll create a histogram to visualize the distribution of the scores.
  • 48. CASE STUDY • STEP 3: Determine the median in the dataset which is also the Q2 value. Here, the median in 82.5. Now, plot this value in the graph.
  • 49. CASE STUDY • STEP 4: Determine the mode in the dataset. Here, the median in 85. Now, plot this value in the graph.
  • 50. CASE STUDY • STEP 5: Determine the Q1 & Q2 values in the dataset. Here, the Q1 is 75 and Q2 is 90. Now, plot this value in the graph.
  • 52. Defination • In statistics, Quartiles are key values that divide a dataset into four equal parts. When dealing with extensive numerical data in statistical analysis, various concepts and formulas come into play, making them highly applicable in research and surveys. One of the most practical applications of quartiles is in creating box and whisker plots. Quartiles play a crucial role in dividing a dataset into three distinct sections. • The middle section, which is one of the three quartiles, represents the central point of the data distribution and includes data points close to this central value. The lower section comprises roughly half of the dataset and encompasses the values that fall below the median, while the upper section represents the remaining half that falls above the median. In essence, quartiles provide valuable insights into the distribution and dispersion of a dataset, making them a fundamental tool in statistical analysis. • Quartiles: As previously described, quartiles divide the data into four equal parts, focusing on the median (Q2) and the values below (Q1) and above (Q3) the median. • Deciles: Deciles divide the data into ten equal parts. Decile values, such as D1, D2, D3, and so on, help assess the distribution and spread of data with more granularity. • Percentiles: Percentiles divide the data into one hundred equal parts. They are often used to compare data points to a broader population and are valuable for assessing how specific data points rank relative to the entire dataset
  • 53. Quartiles Definition • Quartiles are values that divide an entire dataset into four equal parts, resulting in three quartiles: Q1, Q2, and Q3. Q2 is equivalent to the median, as it indicates the position of an item in the ordered list and serves as a positional average. To calculate quartiles for a dataset, it’s necessary to first arrange the data in ascending order. • While the median provides insight into the central tendency of the data, we can further assess the distribution by considering the lower and upper quartiles. Beyond quartiles, statistics offer other measures that divide data into specific equal parts, including: • These different measures, including quartiles, deciles, and percentiles, allow statisticians and data analysts to gain a deeper understanding of data distributions and make more informed decisions in various fields such as finance, healthcare, and social sciences.
  • 54. Quartiles in Statistics • An excellent explanation of how quartiles divide a dataset into four equal parts, each representing a specific percentage of the data. This breakdown is particularly valuable for understanding the distribution of data and identifying key points within that distribution. Here’s a summary of the quartiles and their associated percentages: • First Quartile (Q1): The first quartile represents the 25% mark of the data when it is sorted from smallest to largest. This means that 25% of the data points are less than or equal to Q1, while 75% of the data points are greater. • Second Quartile (Q2 – Median): The second quartile corresponds to the median, which divides the data into two equal halves. 50% of the data points are less than or equal to Q2, and the remaining 50% are greater. • Third Quartile (Q3): The third quartile represents the 75% mark of the data. This indicates that 75% of the data points are less than or equal to Q3, while 25% are greater. • Fourth Quartile: While not explicitly mentioned in your description, the fourth quartile would encompass the remaining 25% of the largest data points, meaning 25% of the data falls within this quartile. • Understanding these quartiles is crucial in statistics for summarizing data distributions and identifying specific data points that correspond to given percentages within the dataset.
  • 55. Quartiles Formula • Suppose, Q3 is the upper quartile is the median of the upper half of the data set. Whereas, Q1 is the lower quartile and median of the lower half of the data set. Q2 is the median. Consider, we have n number of items in a data set. Then the quartiles are given by: • Q1 = [(n+1)/4]th item • Q2 = [(n+1)/2]th item • Q3 = [3(n+1)/4]th item • Hence, the formula for quartile can be given by; • Where Qr is the rth quartile • l1 is the lower limit • l2 is the upper limit • f is the frequency • c is the cumulative frequency of the class preceding the quartile class.
  • 56. Quartile Deviation • Quartile deviation is a measure of the spread or dispersion of data within a dataset, and it is calculated as half of the distance between the third quartile (Q3) and the first quartile (Q1). The formula for quartile deviation is as follows: • ​ Quartile deviation = (Q3-Q1)/2 • This measure provides a sense of the variability of data within the interquartile range, which contains the middle 50% of the data points. Quartile deviation is a useful statistic for understanding the spread of data and identifying the middle range of values in a dataset. • Interquartile Range • The interquartile range (IQR) and its formula. The interquartile range is a valuable statistical measure that helps us understand the spread or dispersion of data within a dataset, particularly focusing on the middle 50% of the data points. It is defined as the difference between the third quartile (Q3) and the first quartile (Q1). • IQR = Q3 – Q1
  • 57. Quartiles Formula Examples • Example 1: Quartiles Calculation • Suppose you have the following dataset of exam scores (in ascending order): • Dataset: 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 • To find the quartiles: • Find the Median (Q2 – Second Quartile): • The dataset has an even number of values, so the median is the average of the two middle values. • Q2 = (70 + 75) / 2 = 72.5 • Find the Lower Quartile (Q1 – First Quartile): • To calculate Q1, focus on the lower half of the data (values below the median).
  • 58. • Q1 = (60 + 65) / 2 = 62.5 • Find the Upper Quartile (Q3 – Third Quartile): • To calculate Q3, focus on the upper half of the data (values above the median). • Q3 = (90 + 95) / 2 = 92.5 • So, for this dataset, the quartiles are: • First Quartile (Q1): 62.5 • Second Quartile (Median – Q2): 72.5 • Third Quartile (Q3): 92.5.
  • 69. ESTIMATION OF CONFIDENCE INTERVAL & LIMIT KK SHRAVAN A011141923009
  • 91. NORMAL DISTRIBUTION • The commonest and the most useful continuous distribution. • A symmetrical probability distribution where most results are located in the middle and few are spread on both sides. • Can entirely be described by its mean and standard deviation. • Its tails extending infinitely in both directions. • The wider the curve, the larger the standard deviation and the more variation exists in the process.
  • 92. PROPERTIES OF NORMAL DISTRIBUTION • The mean, mode and median are all equal. • The curve is symmetric at the center (i.e. around the mean, μ). • Exactly half of the values are to the left of center and exactly half the values are to the right. • The total area under the curve is 1.
  • 94. CHARACTERISTICS OF THE NORMAL DISTRIBUTION • Symmetric, bell shaped • All normal curves are symmetric about the mean μ. • Continuous for all values of X between -∞ and ∞ so that each conceivable interval of real numbers has a probability other than zero. • -∞ ≤ X ≤ ∞ • Two parameters, µ and σ. Note that the normal distribution is actually a family of distributions, since µ and σ determine the shape of the distribution. • The notation N(µ, σ 2 ) means normally distributed with mean µ and variance σ 2 . If we say X N(µ, σ 2 ) we mean that X is distributed N(µ, σ 2 ). ∼ • About 2/3 of all cases fall within one standard deviation of the mean, that is • P(µ - σ ≤ X ≤ µ + σ) = .6826. • About 95% of cases lie within 2 standard deviations of the mean, that is • P(µ - 2σ ≤ X ≤ µ + 2σ) = .9544
  • 95. Empirical Rule: For any normally distributed data: 68% of the data fall within 1 standard deviation of the mean. 95% of the data fall within 2 standard deviations of the mean. 99.7% of the data fall within 3 standard deviations of the
  • 96. USAGE OF NORMAL DISTRIBUTION • It is used to illustrate the shape and variability of the data. • It is used to estimate future process performance. • Normality is an important assumption when conducting statistical analysis. • Can be found practically everywhere: • In nature. • In engineering and industrial processes. • In social and human sciences. • Many everyday data sets follow approximately the normal distribution. • Helps calculating the probabilities for normally distributed populations. • The probabilities are represented by the area under the normal curve. • The total area under the curve is equal to 100% (or 1.00). • This represents the population of the observations. • Since the normal curve is symmetrical, 50 percent of the data lie on each side of the curve.
  • 97. THE STANDARDIZED NORMAL • Any normal distribution (with any mean and standard deviation combination) can be transformed into the standardized normal distribution (Z) • To compute normal probabilities need to transform X units into Z units • The standardized normal distribution (Z) has a mean of 0 and a standard deviation of 1
  • 98. TRANSLATION TO THE STANDARDIZED NORMAL DISTRIBUTION • Translate from X to the standardized normal (the “Z” distribution) by subtracting the mean of X and dividing by its standard deviation