SlideShare a Scribd company logo
Descriptive
Statistics
Arithmetic Mean
Median
Mode
Approach of describing numerical data
Variance
Standard Deviation
Coefficient of Variation
Range
Interquartile Range
Central Tendency Variation
Central Tendency
• Numerical central value of a set observation is called measures of central
tendency.
• It is a central or typical value for a probability distribution.
• It may also be called a center or location of the distribution.
• Measures of central tendency:
Mean
Median
Mode
Measures of Central Tendency
Central Tendency
Mean Median Mode
n
x
x
n
1
i
i



Midpoint of
ranked values
Most frequently
observed value
Arithmetic
average
Mean
Mean is a single and typical value used to represent a set of data. It also
referred as the average.
Objective:
• To get a single value that represents the entire data
• To facilitate the comparison between groups of data of similar nature
Classification of mean
• Arithmetic Mean (AM)
• Geometric Mean (GM)
• Harmonic Mean (HM)
Arithmetic Mean
The arithmetic mean (mean) is the most common measure of central tendency
• For a population of N values:
• For a sample of size n:
Sample size
n
x
x
x
n
x
x n
2
1
n
1
i
i






  Observed
values
N
x
x
x
N
x
μ N
2
1
N
1
i
i






 
Population size
Population
values
Arithmetic Mean
• The most common measure of central tendency
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
3
5
15
5
5
4
3
2
1






4
5
20
5
10
4
3
2
1






Properties of mean
• It takes all observations into account reflecting the value
• It is used in other statistical tools
• It is most reliable for drawing inferences
• It is the easiest to use in advanced statistics technique
Limitations of mean
• Highly affected by extreme values, even just one extreme value
• Sometimes negative and zero values can not be counted
Median
In an ordered list, the median is the “middle” number (50% above, 50%
below)
• It is not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median
• It is the middle value of a set of numbers which have been ordered by
magnitude
• The median is also the number that is halfway into the set.
• The location of the median:
• If the number of values is odd, the median is the middle number
• If the number of values is even, the median is the average of the two middle
numbers
Median
For grouped frequency
Where,
L = lower limit of the median class
N = total number of observations
F = cumulative frequency of preceding median class
fm = frequency of the median class
C = class interval of the median class
SBP Range
(mmHg)
Frequency
Cumulative
Frequency
101-105 2 2
106-110 3 5
111-115 5 10
116-120 8 18
121-125 6 24
126-130 4 28
131-135 2 30
136-140 1 31
L = 121
N= 31
F= 18
fm=6
C=5
118.92
L = 116
N= 31
F= 10
fm=8
C=5
119.43
Properties of median
• Not affected by extreme value
• Perfect statistical example for skewed distribution
• Can be calculated from frequency distribution
• It is not influenced by the position of items
Limitations of median
• It is not based on all observations
• Compared to mean it is less reliable
• Not suitable for further analysis
Mode
• The mode is the value of a data set that occurs most frequently.
• It is the commonly observed value which occurs maximum number
times
Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode
• There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Mode
SBP Range (mmHg) Frequency
101-105 2
106-110 3
111-115 5
116-120 8
121-125 6
126-130 4
131-135 2
136-140 1
l = 116
f1 = 8
f0 = 5
f2 = 6
h = 5
119
SBP Range (mmHg) Frequency
101-105 2
106-110 3
111-115 5
116-120 8
121-125 6
126-130 4
131-135 8
136-140 1
L = 131
f1 = 8
f0 = 4
f2 = 1
h = 5
132.82
Properties of mode
• Not affected by extreme value
• For large number of data, mode happens to be
meaningful as an average
• Can be calculated from frequency distribution
• Do not affected by small and large numbers
• It is not based on all observations
• Compared to mean it is less reliable
• Not suitable for further advanced analysis
Limitations of mode
• Mean is generally used, unless extreme values (outliers) exist
• Then median is often used, since the median is not sensitive to
extreme values.
Which one is the “best” measurement?
Dispersions/variability
• Dispersions are the measures of extent of deviation of individual from
the central value (average).
• It determines how much representative the central value is.
• It may be small if the values are closely bunched about their mean and
it is large when the values are scattered widely about their mean.
• To determine the reliability of an average
• For controlling the variability
• For comparing two or more series of data regarding their variability
• For facilitating the use of other statistical measures
Objectives of Dispersions Measurement
• It should be rigidly defined
• It should be easy to calculate and easy to understand
• It should be based on all observations
• It should be suitable for further mathematical treatment
• It should be affected as little as possible to the sampling fluctuation
Characteristics of a good measure of
Dispersions
Shape of a Distribution
• Describes how data are distributed
• Measures of shape: Symmetric or skewed
Mean = Median
Mean < Median Median < Mean
Right-Skewed
Left-Skewed Symmetric
Same center,
different variation
Measures of Variability
Variation
Variance Standard
Deviation
Coefficient
of Variation
Range Interquartile
Range
Measures of variation give
information on the spread or
variability of the data values
Range
• Simplest measure of variation
• Difference between the largest and the smallest
observations:
Range = Xlargest – Xsmallest
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
• Ignores the way in which data are distributed
• Sensitive to outliers
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
Characteristics of the Range
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
Quartiles
Quartiles split the ranked data into 4 segments with an equal number of values
per segment
25% 25% 25% 25%
 The first quartile, Q1, is the value for which 25% of the observations
are smaller and 75% are larger
 Q2 is the same as the median (50% are smaller, 50% are larger)
 Only 25% of the observations are greater than the third quartile
Q1 Q2 Q3
Quartile Formulas
Find a quartile by determining the value in the appropriate
position in the ranked data, where
First quartile position: Q1 = 0.25(n+1)
Second quartile position: Q2 = 0.50(n+1)
(median)
Third quartile position: Q3 = 0.75(n+1)
where n is the number of observed values
(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half-way between the 2nd
and 3rd
values,
Q1 = 12.5
Quartiles
Sample Ranked Data: 11 12 13 16 16 17 18 21 22
 Example: Find the first, second and third quartiles
 second and third quartiles ??
Interquartile Range
Example:
Median
(Q2)
X
maximum
X
minimum Q1 Q3
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Interquartile range = 3rd
quartile – 1st
quartile
IQR = Q3 – Q1
Variance
Variance measures how far each number in the set is from the mean.
It is calculated by
• taking the differences between each number in the set and the mean
• squaring the differences
• dividing the sum of the squares by the number of values in the set.
Standard deviation
• It is a measure of how spread-out numbers are. Its symbol is σ. It is the
square root of the deviations of individual items from their arithmetic
mean.
• 8, 9, 11, 12 Ava. = 10
• 18, 19, 2, 1 Ava. = 10
• Calculate standard deviation, consider a sample of IQ scores given by
96, 104, 126, 134 and 140.
Examples
Calculate standard deviation, consider a sample of IQ scores given by 96,
104, 126, 134 and 140.
The mean of this data is (96+104+126+134+140)/5 =120.
σ = √[ ∑(x-120)^2 / 5 ]
The deviation from the mean is given by
96-120 = -24,
104-120 = -16,
126-120 = 6,
134-120 = 14,
140-120 = 20.
σ = √[ ((-24)^2+(-16)^2+(6)^2+(14)^2+(20)^2) / 5 ]
σ = √[ (1464) / 5 ] = ± 17.11
Comparing Standard Deviations
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = 0.926
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.570
Data C
Measuring variation
Small standard deviation
Large standard deviation
Standard Deviation
• Most commonly used measure of variation
• Each value in the data set is used in the calculation
• Shows variation from the mean
• Has the same units as the original data
• It cannot be negative.
• A standard deviation close to 0 indicates that the data points tend to be close
to the mean.
• The further the data points are from the mean, the greater the standard
deviation
Important Note
• Standard deviation of sample data of a population
• Variance of sample data of a population
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of data
measured in different units
100%
x
s
CV 









Comparing Coefficient of Variation
• Stock A:
• Average price last year = $50
• Standard deviation = $5
• Stock B:
• Average price last year = $100
• Standard deviation = $5
Both stocks
have the same
standard
deviation, but
stock B is less
variable relative
to its price
10%
100%
$50
$5
100%
x
s
CVA 












5%
100%
$100
$5
100%
x
s
CVB 












Measure of Locations of Data
Percentiles
• Percentile is a measure of position in a set of observations.
• It is a number where a certain percentage of scores fall below that
percentile.
• It is a measure used in statistics indicating the value below which a
given percentage of observations in a group of observations falls.
For example, the 29th percentile is the value of a variable such that
29% of the observations are less than the value and 71% of the
observations are greater.
• Suppose, you got 80th
percentile on GRE analytical score, that means
80% of GRE test taker have marks less than you and 20% of GRE test
taker have more marks than you.
• Standard error (SE) is the standard deviation of the sampling
distribution of a statistic.
• If the statistic is the sample mean, it is called the standard error of the
mean (SEM)
Standard Error of the sample mean
Standard error of the mean is a
measure of the dispersion of sample
means around population mean
Find out the standard error among the following data
Exercise
Drug
Concentration
(μg/ml)
Absorption Mean ± SE
25
0.286
??
0.214
0.255
50
0.482
??
0.510
0.524
100
1.119
??
1.225
1.316
Percentile rank
A percentile rank is the percentage of scores that fall at or below a given
score.
• 10 marks
• 2,5,6,3, 6,8,10,1, 4,6,7,2
• 1,2,2,3, 4,5,6,6, 6,7,8,10
• 10 marks
• 7,10,10,10 9,8,9,9, 10,9,10,7
• 7, 7,8,9,9,9,9,
10,10,10,10,10
Practical problems
1. Find the mean, median & mood of each of the following sets of blood
pressure reading
145, 146, 148, 146, 145, 147, 144, 144, 138, 142, 140, 152, 160, 158,
148, 148.
2. Calculate the appropriate average for prolactin levels (ng/L) obtained
during a clinical trial involving 10 subjects, 9.4, 7.0, 7.6, 6.7, 6.3, 8.6,
6.8, 10.6, 8.9, 9.4
Ad

More Related Content

Similar to Descriptive statistics: Mean, Mode, Median (20)

Measures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and ShapesMeasures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and Shapes
ScholarsPoint1
 
3 Descriptive Numerical Summary Measures.ppt
3 Descriptive Numerical Summary Measures.ppt3 Descriptive Numerical Summary Measures.ppt
3 Descriptive Numerical Summary Measures.ppt
MuazbashaAlii
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendency
Nida Nafees
 
Lesson03_new
Lesson03_newLesson03_new
Lesson03_new
shengvn
 
Lesson03_static11
Lesson03_static11Lesson03_static11
Lesson03_static11
thangv
 
Bio statistics
Bio statisticsBio statistics
Bio statistics
Nc Das
 
Descriptive statistics -review(2)
Descriptive statistics -review(2)Descriptive statistics -review(2)
Descriptive statistics -review(2)
Hanimarcelo slideshare
 
computation of measures of central tendency
computation of measures of central tendencycomputation of measures of central tendency
computation of measures of central tendency
ROCKYSINGHSALESMAN
 
Statr sessions 4 to 6
Statr sessions 4 to 6Statr sessions 4 to 6
Statr sessions 4 to 6
Ruru Chowdhury
 
Measures of Central Tendencyyyyyyyyyyyyyyy
Measures of Central TendencyyyyyyyyyyyyyyyMeasures of Central Tendencyyyyyyyyyyyyyyy
Measures of Central Tendencyyyyyyyyyyyyyyy
s2024101146
 
BIOSTATISTICS OVERALL JUNE 20241234567.pptx
BIOSTATISTICS OVERALL JUNE 20241234567.pptxBIOSTATISTICS OVERALL JUNE 20241234567.pptx
BIOSTATISTICS OVERALL JUNE 20241234567.pptx
anasabdulmajeed3sker
 
STATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and studentsSTATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and students
ssuseref12b21
 
Upload 140103034715-phpapp01 (1)
Upload 140103034715-phpapp01 (1)Upload 140103034715-phpapp01 (1)
Upload 140103034715-phpapp01 (1)
captaininfantry
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
Mayuri Joshi
 
Hygrology Precipitation introduction civil part 02
Hygrology Precipitation introduction civil part 02Hygrology Precipitation introduction civil part 02
Hygrology Precipitation introduction civil part 02
SumitKumarSinha16
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy
Pranav Krishna
 
measures of central tendency.pptx
measures of central tendency.pptxmeasures of central tendency.pptx
measures of central tendency.pptx
Manish Agarwal
 
Measure OF Central Tendency
Measure OF Central TendencyMeasure OF Central Tendency
Measure OF Central Tendency
Iqrabutt038
 
Statistics 3, 4
Statistics 3, 4Statistics 3, 4
Statistics 3, 4
Diana Diana
 
Penggambaran Data Secara Numerik
Penggambaran Data Secara NumerikPenggambaran Data Secara Numerik
Penggambaran Data Secara Numerik
anom1392
 
Measures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and ShapesMeasures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and Shapes
ScholarsPoint1
 
3 Descriptive Numerical Summary Measures.ppt
3 Descriptive Numerical Summary Measures.ppt3 Descriptive Numerical Summary Measures.ppt
3 Descriptive Numerical Summary Measures.ppt
MuazbashaAlii
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendency
Nida Nafees
 
Lesson03_new
Lesson03_newLesson03_new
Lesson03_new
shengvn
 
Lesson03_static11
Lesson03_static11Lesson03_static11
Lesson03_static11
thangv
 
Bio statistics
Bio statisticsBio statistics
Bio statistics
Nc Das
 
computation of measures of central tendency
computation of measures of central tendencycomputation of measures of central tendency
computation of measures of central tendency
ROCKYSINGHSALESMAN
 
Measures of Central Tendencyyyyyyyyyyyyyyy
Measures of Central TendencyyyyyyyyyyyyyyyMeasures of Central Tendencyyyyyyyyyyyyyyy
Measures of Central Tendencyyyyyyyyyyyyyyy
s2024101146
 
BIOSTATISTICS OVERALL JUNE 20241234567.pptx
BIOSTATISTICS OVERALL JUNE 20241234567.pptxBIOSTATISTICS OVERALL JUNE 20241234567.pptx
BIOSTATISTICS OVERALL JUNE 20241234567.pptx
anasabdulmajeed3sker
 
STATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and studentsSTATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and students
ssuseref12b21
 
Upload 140103034715-phpapp01 (1)
Upload 140103034715-phpapp01 (1)Upload 140103034715-phpapp01 (1)
Upload 140103034715-phpapp01 (1)
captaininfantry
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
Mayuri Joshi
 
Hygrology Precipitation introduction civil part 02
Hygrology Precipitation introduction civil part 02Hygrology Precipitation introduction civil part 02
Hygrology Precipitation introduction civil part 02
SumitKumarSinha16
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy
Pranav Krishna
 
measures of central tendency.pptx
measures of central tendency.pptxmeasures of central tendency.pptx
measures of central tendency.pptx
Manish Agarwal
 
Measure OF Central Tendency
Measure OF Central TendencyMeasure OF Central Tendency
Measure OF Central Tendency
Iqrabutt038
 
Penggambaran Data Secara Numerik
Penggambaran Data Secara NumerikPenggambaran Data Secara Numerik
Penggambaran Data Secara Numerik
anom1392
 

Recently uploaded (20)

chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Process Mining at AE - Key success factors
Process Mining at AE - Key success factorsProcess Mining at AE - Key success factors
Process Mining at AE - Key success factors
Process mining Evangelist
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhhChapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
ChrisjohnAlfiler
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
AWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdfAWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATAAWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest InsurerSuncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Process mining Evangelist
 
Process Mining at Rabobank - Organizational challenges
Process Mining at Rabobank - Organizational challengesProcess Mining at Rabobank - Organizational challenges
Process Mining at Rabobank - Organizational challenges
Process mining Evangelist
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
How to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process miningHow to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process mining
Process mining Evangelist
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhhChapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
ChrisjohnAlfiler
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
AWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdfAWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
Decision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdfDecision Trees in Artificial-Intelligence.pdf
Decision Trees in Artificial-Intelligence.pdf
Saikat Basu
 
Process Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial IndustryProcess Mining and Data Science in the Financial Industry
Process Mining and Data Science in the Financial Industry
Process mining Evangelist
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATAAWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
AWS-AIML-PRESENTATION RELATED TO DATA SCIENCE TO DATA
SnehaBoja
 
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest InsurerSuncorp - Integrating Process Mining at Australia's Largest Insurer
Suncorp - Integrating Process Mining at Australia's Largest Insurer
Process mining Evangelist
 
Process Mining at Rabobank - Organizational challenges
Process Mining at Rabobank - Organizational challengesProcess Mining at Rabobank - Organizational challenges
Process Mining at Rabobank - Organizational challenges
Process mining Evangelist
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
How to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process miningHow to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process mining
Process mining Evangelist
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
Ad

Descriptive statistics: Mean, Mode, Median

  • 2. Arithmetic Mean Median Mode Approach of describing numerical data Variance Standard Deviation Coefficient of Variation Range Interquartile Range Central Tendency Variation
  • 3. Central Tendency • Numerical central value of a set observation is called measures of central tendency. • It is a central or typical value for a probability distribution. • It may also be called a center or location of the distribution. • Measures of central tendency: Mean Median Mode
  • 4. Measures of Central Tendency Central Tendency Mean Median Mode n x x n 1 i i    Midpoint of ranked values Most frequently observed value Arithmetic average
  • 5. Mean Mean is a single and typical value used to represent a set of data. It also referred as the average. Objective: • To get a single value that represents the entire data • To facilitate the comparison between groups of data of similar nature Classification of mean • Arithmetic Mean (AM) • Geometric Mean (GM) • Harmonic Mean (HM)
  • 6. Arithmetic Mean The arithmetic mean (mean) is the most common measure of central tendency • For a population of N values: • For a sample of size n: Sample size n x x x n x x n 2 1 n 1 i i         Observed values N x x x N x μ N 2 1 N 1 i i         Population size Population values
  • 7. Arithmetic Mean • The most common measure of central tendency • Mean = sum of values divided by the number of values • Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 3 5 15 5 5 4 3 2 1       4 5 20 5 10 4 3 2 1      
  • 8. Properties of mean • It takes all observations into account reflecting the value • It is used in other statistical tools • It is most reliable for drawing inferences • It is the easiest to use in advanced statistics technique
  • 9. Limitations of mean • Highly affected by extreme values, even just one extreme value • Sometimes negative and zero values can not be counted
  • 10. Median In an ordered list, the median is the “middle” number (50% above, 50% below) • It is not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 Median = 3 0 1 2 3 4 5 6 7 8 9 10 Median = 3
  • 11. Median • It is the middle value of a set of numbers which have been ordered by magnitude • The median is also the number that is halfway into the set. • The location of the median: • If the number of values is odd, the median is the middle number • If the number of values is even, the median is the average of the two middle numbers
  • 12. Median For grouped frequency Where, L = lower limit of the median class N = total number of observations F = cumulative frequency of preceding median class fm = frequency of the median class C = class interval of the median class
  • 13. SBP Range (mmHg) Frequency Cumulative Frequency 101-105 2 2 106-110 3 5 111-115 5 10 116-120 8 18 121-125 6 24 126-130 4 28 131-135 2 30 136-140 1 31 L = 121 N= 31 F= 18 fm=6 C=5 118.92 L = 116 N= 31 F= 10 fm=8 C=5 119.43
  • 14. Properties of median • Not affected by extreme value • Perfect statistical example for skewed distribution • Can be calculated from frequency distribution • It is not influenced by the position of items Limitations of median • It is not based on all observations • Compared to mean it is less reliable • Not suitable for further analysis
  • 15. Mode • The mode is the value of a data set that occurs most frequently. • It is the commonly observed value which occurs maximum number times
  • 16. Mode • A measure of central tendency • Value that occurs most often • Not affected by extreme values • Used for either numerical or categorical data • There may be no mode • There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode
  • 17. Mode
  • 18. SBP Range (mmHg) Frequency 101-105 2 106-110 3 111-115 5 116-120 8 121-125 6 126-130 4 131-135 2 136-140 1 l = 116 f1 = 8 f0 = 5 f2 = 6 h = 5 119
  • 19. SBP Range (mmHg) Frequency 101-105 2 106-110 3 111-115 5 116-120 8 121-125 6 126-130 4 131-135 8 136-140 1 L = 131 f1 = 8 f0 = 4 f2 = 1 h = 5 132.82
  • 20. Properties of mode • Not affected by extreme value • For large number of data, mode happens to be meaningful as an average • Can be calculated from frequency distribution • Do not affected by small and large numbers • It is not based on all observations • Compared to mean it is less reliable • Not suitable for further advanced analysis Limitations of mode
  • 21. • Mean is generally used, unless extreme values (outliers) exist • Then median is often used, since the median is not sensitive to extreme values. Which one is the “best” measurement?
  • 22. Dispersions/variability • Dispersions are the measures of extent of deviation of individual from the central value (average). • It determines how much representative the central value is. • It may be small if the values are closely bunched about their mean and it is large when the values are scattered widely about their mean.
  • 23. • To determine the reliability of an average • For controlling the variability • For comparing two or more series of data regarding their variability • For facilitating the use of other statistical measures Objectives of Dispersions Measurement
  • 24. • It should be rigidly defined • It should be easy to calculate and easy to understand • It should be based on all observations • It should be suitable for further mathematical treatment • It should be affected as little as possible to the sampling fluctuation Characteristics of a good measure of Dispersions
  • 25. Shape of a Distribution • Describes how data are distributed • Measures of shape: Symmetric or skewed Mean = Median Mean < Median Median < Mean Right-Skewed Left-Skewed Symmetric
  • 26. Same center, different variation Measures of Variability Variation Variance Standard Deviation Coefficient of Variation Range Interquartile Range Measures of variation give information on the spread or variability of the data values
  • 27. Range • Simplest measure of variation • Difference between the largest and the smallest observations: Range = Xlargest – Xsmallest 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example:
  • 28. • Ignores the way in which data are distributed • Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 Characteristics of the Range 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119
  • 29. Quartiles Quartiles split the ranked data into 4 segments with an equal number of values per segment 25% 25% 25% 25%  The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger  Q2 is the same as the median (50% are smaller, 50% are larger)  Only 25% of the observations are greater than the third quartile Q1 Q2 Q3
  • 30. Quartile Formulas Find a quartile by determining the value in the appropriate position in the ranked data, where First quartile position: Q1 = 0.25(n+1) Second quartile position: Q2 = 0.50(n+1) (median) Third quartile position: Q3 = 0.75(n+1) where n is the number of observed values
  • 31. (n = 9) Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data so use the value half-way between the 2nd and 3rd values, Q1 = 12.5 Quartiles Sample Ranked Data: 11 12 13 16 16 17 18 21 22  Example: Find the first, second and third quartiles  second and third quartiles ??
  • 32. Interquartile Range Example: Median (Q2) X maximum X minimum Q1 Q3 25% 25% 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27 Interquartile range = 3rd quartile – 1st quartile IQR = Q3 – Q1
  • 33. Variance Variance measures how far each number in the set is from the mean. It is calculated by • taking the differences between each number in the set and the mean • squaring the differences • dividing the sum of the squares by the number of values in the set.
  • 34. Standard deviation • It is a measure of how spread-out numbers are. Its symbol is σ. It is the square root of the deviations of individual items from their arithmetic mean.
  • 35. • 8, 9, 11, 12 Ava. = 10 • 18, 19, 2, 1 Ava. = 10 • Calculate standard deviation, consider a sample of IQ scores given by 96, 104, 126, 134 and 140.
  • 36. Examples Calculate standard deviation, consider a sample of IQ scores given by 96, 104, 126, 134 and 140. The mean of this data is (96+104+126+134+140)/5 =120. σ = √[ ∑(x-120)^2 / 5 ] The deviation from the mean is given by 96-120 = -24, 104-120 = -16, 126-120 = 6, 134-120 = 14, 140-120 = 20. σ = √[ ((-24)^2+(-16)^2+(6)^2+(14)^2+(20)^2) / 5 ] σ = √[ (1464) / 5 ] = ± 17.11
  • 37. Comparing Standard Deviations Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s = 0.926 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.570 Data C
  • 38. Measuring variation Small standard deviation Large standard deviation
  • 39. Standard Deviation • Most commonly used measure of variation • Each value in the data set is used in the calculation • Shows variation from the mean • Has the same units as the original data • It cannot be negative. • A standard deviation close to 0 indicates that the data points tend to be close to the mean. • The further the data points are from the mean, the greater the standard deviation
  • 40. Important Note • Standard deviation of sample data of a population • Variance of sample data of a population
  • 41. Coefficient of Variation • Measures relative variation • Always in percentage (%) • Shows variation relative to mean • Can be used to compare two or more sets of data measured in different units 100% x s CV          
  • 42. Comparing Coefficient of Variation • Stock A: • Average price last year = $50 • Standard deviation = $5 • Stock B: • Average price last year = $100 • Standard deviation = $5 Both stocks have the same standard deviation, but stock B is less variable relative to its price 10% 100% $50 $5 100% x s CVA              5% 100% $100 $5 100% x s CVB             
  • 43. Measure of Locations of Data Percentiles • Percentile is a measure of position in a set of observations. • It is a number where a certain percentage of scores fall below that percentile. • It is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. For example, the 29th percentile is the value of a variable such that 29% of the observations are less than the value and 71% of the observations are greater.
  • 44. • Suppose, you got 80th percentile on GRE analytical score, that means 80% of GRE test taker have marks less than you and 20% of GRE test taker have more marks than you.
  • 45. • Standard error (SE) is the standard deviation of the sampling distribution of a statistic. • If the statistic is the sample mean, it is called the standard error of the mean (SEM) Standard Error of the sample mean Standard error of the mean is a measure of the dispersion of sample means around population mean
  • 46. Find out the standard error among the following data Exercise Drug Concentration (μg/ml) Absorption Mean ± SE 25 0.286 ?? 0.214 0.255 50 0.482 ?? 0.510 0.524 100 1.119 ?? 1.225 1.316
  • 47. Percentile rank A percentile rank is the percentage of scores that fall at or below a given score.
  • 48. • 10 marks • 2,5,6,3, 6,8,10,1, 4,6,7,2 • 1,2,2,3, 4,5,6,6, 6,7,8,10 • 10 marks • 7,10,10,10 9,8,9,9, 10,9,10,7 • 7, 7,8,9,9,9,9, 10,10,10,10,10
  • 49. Practical problems 1. Find the mean, median & mood of each of the following sets of blood pressure reading 145, 146, 148, 146, 145, 147, 144, 144, 138, 142, 140, 152, 160, 158, 148, 148. 2. Calculate the appropriate average for prolactin levels (ng/L) obtained during a clinical trial involving 10 subjects, 9.4, 7.0, 7.6, 6.7, 6.3, 8.6, 6.8, 10.6, 8.9, 9.4