SlideShare a Scribd company logo
Descriptive StatisticsDescriptive Statistics
Numerically SummarizingNumerically Summarizing
DataData
Descriptive StatisticsDescriptive Statistics
OverviewOverview
Numerical DataNumerical Data
PropertiesProperties
Mean
Median
Mode
Central
Tendency
Range
Interquartile
Range
Variance
Standard Deviation
Coefficient of Variation
Variation
Skewness
Shape
Kurtosis
Introduction:Introduction:
Given a set of data, one invariably wishes to find a valueGiven a set of data, one invariably wishes to find a value
about which the observations tend to cluster. The threeabout which the observations tend to cluster. The three
most common values are themost common values are the meanmean, the, the medianmedian, and the, and the
modemode. They are known as measures of central tendency-. They are known as measures of central tendency-
the tendency of a set of data to center around certainthe tendency of a set of data to center around certain
numerical values.numerical values.
Central tendencyCentral tendency
This is what people usually have in mind when they
say
“average”
The Arithmetic MeanThe Arithmetic Mean ))))
The Arithmetic MeanThe Arithmetic Mean
May be considered the balance point, in aMay be considered the balance point, in a
distribution of observations.distribution of observations.
Computed by summing all the observations in theComputed by summing all the observations in the
sample and dividing the sum by the number ofsample and dividing the sum by the number of
observations.observations.
The sample arithmetic mean, is computed using sample
data.
The sample mean is a statistic
1 2 1...
n
i
n i
x
x x x
x
n n
=+ + +
= =
∑
∑
1 2 1...
n
i
n i
x
x x x
x
n n
=+ + +
= =
∑
1
n
i =
∑
x (pronounced "x bar"), representing the sample mean;(pronounced "x bar"), representing the sample mean; xx11 is theis the
first andfirst and xxii thethe iith in a series of observations.th in a series of observations.
The symbol is the Greek letter sigma and denotes "the sumThe symbol is the Greek letter sigma and denotes "the sum
of."of."
Thus indicates that the sum as to begin withThus indicates that the sum as to begin with ii = 1 and= 1 and
increment by one up to and including the last observationincrement by one up to and including the last observation nn..
ExampleExample
Consider 7 observations: 4.2, 4.3, 4.7, 4.8, 5.0,Consider 7 observations: 4.2, 4.3, 4.7, 4.8, 5.0,
5.1, 9.0.5.1, 9.0.
By definitionBy definition
== (4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1+ 9.0)/7(4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1+ 9.0)/7 == 5.35.3
The population arithmeticThe population arithmetic MeanMean
The symbol for the mean of a population is the GreekThe symbol for the mean of a population is the Greek
letter mu, or µ.letter mu, or µ.
The population mean is aThe population mean is a parameterparameter..
1 Sum of the values of all observations in population
Total number of observations in population
N
i
i
x
N
µ =
= =
∑
Weighted Mean
The weighted mean of a set of numbers , with corresponding weights
, is computed from the following formula:
Example: Al-Quds Hospital at Gaza pays its hourly employees $16.50, $19.00, or
$25.00 per day. There are 26 daily employees, 14 of which are paid at the $16.50
rate, 10 at the $19.00 rate, and 2 at the $25.00 rate. What is the mean hourly rate
paid the 26 employees?
Mean of Grouped Data
In a grouped distribution, we use the middle point of each interval as x value.
Example: find the mean of the age for the following data
Interval (age) Middle point ( ) Frequency ( )
1-3 2 18
4-6 5 27
7-9 8 34
10-12 11 22
13-15 14 13
Total 114
61.7
114
867
1322342718
13)(1422)(1134)(827)(518)(2
==
++++
×+×+×+×+×
=
∑
∑
=
i
ii
f
fx
x year
Advantages of the mean:Advantages of the mean:
It is a measure that can be calculated and is unique.It is a measure that can be calculated and is unique.
It is useful for performing statistical proceduresIt is useful for performing statistical procedures
such as comparing the means from several datasuch as comparing the means from several data
sets.sets.
Disadvantages of the mean:Disadvantages of the mean:
It is affected by extreme values.It is affected by extreme values.
== (4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1+(4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1+ 9.09.0)/7)/7 == 5.35.3
== (4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1)/6(4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1)/6 == 4.74.7
It would be more representative to calculate the meanIt would be more representative to calculate the mean
without including such an extreme value.without including such an extreme value.
The MedianThe Median ))))
The median of a variable is the numerical
value that lies in the middle of the data when
arranged in ascending order. That is, half the
data is below the median and half the data is
above the median.
Steps in computing the Median of a dataSteps in computing the Median of a data
setset
1. Arrange the data in ascending order.1. Arrange the data in ascending order.
2. Determine the number of observation2. Determine the number of observation nn..
3. Determine the observation in the middle of the data set.3. Determine the observation in the middle of the data set.
If the number of observations isIf the number of observations is oddodd, then the median is the data, then the median is the data
value that is exactly in the middle of the data set. That is, it is thevalue that is exactly in the middle of the data set. That is, it is the
observation that lies in theobservation that lies in the (n +1)/2 position(n +1)/2 position..
ExampleExample
Find the median of the data set consisting of the observations 7, 4, 3, 5, 6,Find the median of the data set consisting of the observations 7, 4, 3, 5, 6,
8, 10.8, 10.
SolutionSolution:: First, weFirst, we arrangearrange the data set in ascending orderthe data set in ascending order
3 4 53 4 5 66 7 8 10.7 8 10.
Since the number of observations is odd, then median = (7+ 1)/2 = 4thSince the number of observations is odd, then median = (7+ 1)/2 = 4th
number in the ordered list, namelynumber in the ordered list, namely 66..
Steps in computing the Median of a dataSteps in computing the Median of a data
setset
If the number of observations isIf the number of observations is eveneven, then the median is, then the median is
the arithmetic mean of the two middle observations in thethe arithmetic mean of the two middle observations in the
data set. That is, it is the arithmetic mean of the data valuesdata set. That is, it is the arithmetic mean of the data values
that lie in thethat lie in the n/2n/2 andand (n/2)+1(n/2)+1 positionposition..
ExampleExample
Suppose we have the observations 7, 4, 3, 5, 6, 8, 10, 1. Find theSuppose we have the observations 7, 4, 3, 5, 6, 8, 10, 1. Find the
median of this data set.median of this data set.
Solution:Solution: First, we arrange the data set in ascending orderFirst, we arrange the data set in ascending order
1 3 41 3 4 5 65 6 7 8 10.7 8 10.
Since the number of the observationsSince the number of the observations n =n = 8, then by Definition the8, then by Definition the
median is the average of the 4th (median is the average of the 4th (nn/2 = 8/2 = 4th) and the 5th i.e./2 = 8/2 = 4th) and the 5th i.e.
Median = (5+6)/2 = 5.5Median = (5+6)/2 = 5.5
Advantage of the median over theAdvantage of the median over the
mean:mean:
It may be determined even if the values of allIt may be determined even if the values of all
observations are not known.observations are not known.
3 4 53 4 5 66 xx11 xx22 xx33
Extreme values in data set do not affect the medianExtreme values in data set do not affect the median
as strongly as they do the mean.as strongly as they do the mean.
ExampleExample
Consider 5 physicians who practice in Gaza Strip are sampledConsider 5 physicians who practice in Gaza Strip are sampled
and asked how much an office visit costs. Suppose we get theand asked how much an office visit costs. Suppose we get the
answers: 7.5, 7.5, 8.0, 8.0, and 28.0 JD. The mean charge foranswers: 7.5, 7.5, 8.0, 8.0, and 28.0 JD. The mean charge for
the sample of five doctors isthe sample of five doctors is
While the median is 8.0. This value is easily seen to be moreWhile the median is 8.0. This value is easily seen to be more
representative of the values than was the sample mean, JDrepresentative of the values than was the sample mean, JD
11.8 which was affected by the extreme value of 28.0.11.8 which was affected by the extreme value of 28.0.
7.5 7.5 8.0 8.0 28.0 59.0
JD 11.8
5 5
x
+ + + +
= = =
Median of grouped data
In a grouped distribution, the following steps are followed:
Step 1: Form the cumulative frequency (F)
Step 2: Find the value of where
Step 3: Find F value that the first exceeds , which identifies the median class M.
Step 4: Calculate the median using the following formula
where;
• lower bound of the median class
• cumulative frequency of class immediately prior to the median class
• actual frequency of median class
• median class width.
Median of grouped data
Example: Estimate the median for the Age in the following data set
Age 20-25 25-30 30-35 35-40 40-45 45-50
frequency 2 14 29 43 33 9
Solution: Step 1
Age (f) (F)
20-25 2 2
25-30 14 16
30-35 29 45
35-40 43 88
40-45 33 121
45-50 9 130
Step 2: =130/2 = 65 Step 3: Median class is 35-40
Step 4: =35; =45; =5.
years
The ModeThe Mode ))))
The mode is the observation that occurs most frequently. i.e., is repeated mostThe mode is the observation that occurs most frequently. i.e., is repeated most
often in the data set.often in the data set.
For a given sample N=16:
33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 45
The mode = 39
It corresponds to the highest point on the frequency distribution.It corresponds to the highest point on the frequency distribution.
0
2
4
6
8
10
12
14
16
18
20
1 2 3 4 5 6 7
ExampleExample
Find the mode of the data set in The TableFind the mode of the data set in The Table
Quantity of glucose (mg%) inQuantity of glucose (mg%) in
blood of 25 studentsblood of 25 students
7070 8888 9595 101101 106106
7979 9393 9696 101101 107107
8383 9393 9797 103103 108108
8686 9393 9797 103103 112112
8787 9595 9898 106106 115115
Solution:Solution:
First we arrange this data set in the ascending orderFirst we arrange this data set in the ascending order
This data set contains 25 numbers. We see that, the value of 93 isThis data set contains 25 numbers. We see that, the value of 93 is
repeated most often. Therefore, the mode of the data set is 93.repeated most often. Therefore, the mode of the data set is 93.
7070 8888 9595 101101 106106
7979 9393 9696 101101 107107
8383 9393 9797 103103 108108
8686 9393 9797 103103 112112
8787 9595 9898 106106 115115
Multimodal distributionMultimodal distribution:: A data set may haveA data set may have
several modes. In this case it is called multimodalseveral modes. In this case it is called multimodal
distribution.distribution.
ExampleExample The data set has two modes: 1 and 4.The data set has two modes: 1 and 4.
This distribution is calledThis distribution is called bimodalbimodal distribution.distribution.
00 22 66 99
00 44 66 1010
11 44 77 1111
11 44 88 1111
11 55 99 1212
Advantage of the modeAdvantage of the mode
Like the median, the mode isLike the median, the mode is notnot affected by extremeaffected by extreme
values.values.
Easily determined for categorical dataEasily determined for categorical data
For a given sample N=16:
33 35 36 37 38 38 38 39 39 39 39 40 40 41 41
The mode = 39 455060
Mode of grouped data
In a grouped distribution, the following steps are followed:
Step 1: Determine the model class (class with the largest frequency).
Step 2: Calculate = Difference between the largest frequency
and frequency immediately preceding it.
Step 3: Calculate = Difference between the largest frequency
and the frequency immediately following it.
Step 4: Obtain the mode using the following formula
• = Lower bound of the modal class
• = Model class width
• and are described in Step 2 and Step 3.
Mode of grouped data
Example: Estimate the mode for the Age in the following data set
Age 20-25 25-30 30-35 35-40 40-45 45-50
frequency 2 14 29 43 33 9
Solution:
Step 1:
Age Number (f)
20-25 2
25-30 14
30-35 29
35-40 43
40-45 33
45-50 9
Step 2: = 43-29=14
Step 3: = 43-33=10
Step 4: =35; = 40-35 =5
years
Disadvantages of the mode:Disadvantages of the mode:
Too often, there isToo often, there is no modalno modal value because thevalue because the
data set contains no values that occur more thandata set contains no values that occur more than
once. Other times, every value is the modeonce. Other times, every value is the mode
because every value occurs the same number ofbecause every value occurs the same number of
times. Clearly, the mode is a useless measure intimes. Clearly, the mode is a useless measure in
these cases.these cases.
For a given sample N=16:
33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40
No unique mode
Disadvantages of the mode:Disadvantages of the mode:
When data sets contain two, three, or manyWhen data sets contain two, three, or many
modes, they are difficult to interpret andmodes, they are difficult to interpret and
compare.compare.
For a given sample N=16:
34 34 35 35 35 35 36 37 38 38 39 39 39 39 40 40
The modes = 35 and 39
The Shape of DistributionsThe Shape of Distributions
Distributions can be eitherDistributions can be either symmetricalsymmetrical oror
skewedskewed, depending on whether there are, depending on whether there are
more frequencies at one end of themore frequencies at one end of the
distribution than the other.distribution than the other.
Relative Positions of the
Mean, Median and the Mode
Selecting an Appropriate Measure ofSelecting an Appropriate Measure of
Central TendencyCentral Tendency
There are two general criteria for choosingThere are two general criteria for choosing
between the measures of central tendencybetween the measures of central tendency
1.1. Scale of measurementScale of measurement
– NominalNominal scale data, you can only use thescale data, you can only use the ModeMode
– OrdinalOrdinal scale data, you can only usescale data, you can only use Median or ModeMedian or Mode;;
Median is more informativeMedian is more informative
– IntervalInterval oror ratioratio scale data, you can usescale data, you can use any one ofany one of
the three.the three.
1.1. Shape of the distributionShape of the distribution
– Mean is more informative, if you don’t have a skewedMean is more informative, if you don’t have a skewed
distributiondistribution
– If you have skewed distribution, you use the medianIf you have skewed distribution, you use the median
in place of mean.in place of mean.
3 descritive statistics measure of   central   tendency variatio
Measures of VariationMeasures of Variation
(dispersion)(dispersion)
Measures of Variation (dispersion)Measures of Variation (dispersion)
Just as measures of central tendency locate theJust as measures of central tendency locate the
“center” of a relative frequency distribution,“center” of a relative frequency distribution,
measures of variation measure its “spread”. When themeasures of variation measure its “spread”. When the
variation is small, this means that the values are closevariation is small, this means that the values are close
together (but not the same).together (but not the same).
To understand Measures of VariationTo understand Measures of Variation
consider the following two examples:consider the following two examples:
Night and DayNight and Day
Temperatures (Temperatures (oo
C)C)
Country ACountry A CountryCountry
BB
2222 1717
3636 4040
2323 1616
3535 4242
2020 2020
3434 3535
Average 28.3 28.3
Example 1Example 1
Think of the difference betweenThink of the difference between
an exam with an average mark ofan exam with an average mark of
6565 in which scores ranged fromin which scores ranged from
(62 to 66)(62 to 66) and an exam with anand an exam with an
average score ofaverage score of 6565 in whichin which
scores ranged fromscores ranged from (30 to 90).(30 to 90).
Example 2
Population 1
Population 2
Population 1
Population 2
Population 1
Population 2
Two frequency distributions with equal means but different
amounts of variation.
Mean
Measures of variabilityMeasures of variability
Three statistics to measure variabilityThree statistics to measure variability
– RangeRange
– VarianceVariance
– Interquartile rangeInterquartile range
RangeRange
The range is defined as the difference in valueThe range is defined as the difference in value
between the highest (maxi-mum) and lowestbetween the highest (maxi-mum) and lowest
(minimum) observation:(minimum) observation:
Range =Range = xxmaxmax –– xxminmin
The range can be computed quickly, but it isThe range can be computed quickly, but it is notnot
very usefulvery useful since issince is considers only the extremesconsiders only the extremes
andand does not take into consideration the bulkdoes not take into consideration the bulk
of the observationsof the observations..
VarianceVariance
The Variance is a measure which uses theThe Variance is a measure which uses the
mean as a point of reference.mean as a point of reference.
The Variance is less when all value are close toThe Variance is less when all value are close to
the mean while it is more when the values arethe mean while it is more when the values are
spread out from the mean.spread out from the mean.
Population variancePopulation variance
TheThe population variancepopulation variance of the population of theof the population of the
observationsobservations xx is defined the formulais defined the formula
(sigma squared) =population variance(sigma squared) =population variance
= the item or observation= the item or observation
µµ = population mean= population mean
NN = total number of observations in the population.= total number of observations in the population.
( )
2
2 1
N
i
i
x
N
µ
σ =
−
=
∑
2
σ
ix
The population variance of a variable is the sum of squared
deviations about the population mean divided by the number of
observations in the population, N.
That is it is the arithmetic mean of the sum of the squared
deviations about the population mean.
( )
2
2 1
N
i
i
x
N
µ
σ =
−
=
∑
Population variancePopulation variance
TheThe standard deviationstandard deviation of aof a
populationpopulation
TheThe standard deviationstandard deviation of a population is equalof a population is equal
to the square root of the varianceto the square root of the variance
( )
2
2 1
N
i
i
x
N
µ
σ σ =
−
= =
∑
Since most populations are large, the computation ofSince most populations are large, the computation of
σσ22
andand σσ are rarely performed. In practice, theare rarely performed. In practice, the
population variance (or standard deviation) is usuallypopulation variance (or standard deviation) is usually
estimated by taking a sample from the population andestimated by taking a sample from the population and
usingusing ss22
andand ss as a estimate ofas a estimate of σσ22
andand σσ respectivelyrespectively..
The sample varianceThe sample variance
The sample variance of the sample of the observations isThe sample variance of the sample of the observations is
defined the formuladefined the formula
where:where:
ss22
=sample variance=sample variance
= sample mean= sample mean
nn = total number of observations in the sample= total number of observations in the sample
( )
2
2 1
1
n
i
i
x x
s
n
=
−
=
−
∑
x
2
12
2 1
1
n
in
i
i
i
x
x
n
s
n
=
=
 
 ÷
 −
=
−
∑
∑
OR
Standard deviation of the sampleStandard deviation of the sample
The standard deviation of the sample isThe standard deviation of the sample is
It could be also determined from the equations:It could be also determined from the equations:
OROR
2
12
1
1
n
in
i
i
i
x
x
n
s
n
=
=
 
 ÷
 −
=
−
∑
∑( )
2
1
1
n
i
i
x x
s
n
=
−
=
−
∑
2
s s=
RemarkRemark:: In the denominator of the formula forIn the denominator of the formula for ss22
wewe
useuse n-n-1 instead1 instead nn because statisticians proved that ifbecause statisticians proved that if ss22
is defined as above thenis defined as above then ss22
is an unbiased estimate ofis an unbiased estimate of
the variance of the population from which the samplethe variance of the population from which the sample
was selected ( i.e. the expected value ofwas selected ( i.e. the expected value of ss22
is equal tois equal to
the population variance ).the population variance ).
Note: Whenever a statistic consistently overestimates or
underestimates a parameter, it is called biased. To obtain
an unbiased estimate of the population variance, we
divide the sum of the squared deviations about the mean
by n - 1.
ExampleExample
A pediatric registrar in a district general hospital is investigatingA pediatric registrar in a district general hospital is investigating
the amount of lead in the urine of children from a nearbythe amount of lead in the urine of children from a nearby
housing estate. In a particular street there are 15 childrenhousing estate. In a particular street there are 15 children
whose ages range from 1 year to under 16, and in a preliminarywhose ages range from 1 year to under 16, and in a preliminary
study the registrar has found the amounts given in the Tablestudy the registrar has found the amounts given in the Table
below of urinary lead (µmol/24hr),below of urinary lead (µmol/24hr),
What is the variance and standard deviation?What is the variance and standard deviation?
Urinary concentration of lead in 15 children from housing estate (µmol/24hr)Urinary concentration of lead in 15 children from housing estate (µmol/24hr)
0.6, 2.6, 0.1, 1.1, 0.4, 2.0, 0.8, 1.3, 1.2, 1.5, 3.2, 1.7, 1.9, 1.9, 2.20.6, 2.6, 0.1, 1.1, 0.4, 2.0, 0.8, 1.3, 1.2, 1.5, 3.2, 1.7, 1.9, 1.9, 2.2
Note: When using the variance formula, do not round
until the last computation. Use as many decimals as
allowed by your calculator in order to avoid round off
errors.
Calculation of standard deviation
(1)
Lead
concentration
x
(2)
Differences
from mean
(3)
Differences
squared
(4)
Observations in col.
(1) squared
0.1 -1.4 1.96 0.01
0.4 -1.1 1.21 0.16
0.6 -0.9 0.81 0.36
0.8 -0.7 0.49 0.64
1.1 -0.4 0.16 1.21
1.2 -0.3 0.09 1.44
1.3 -0.2 0.04 1.69
1.5 0 0 2.25
1.7 0.2 0.04 2.89
1.9 0.4 0.16 3.61
1.9 0.4 0.16 3.61
2.0 0.5 0.25 4.00
2.2 0.7 0.49 4.84
2.6 1.1 1.21 6.76
3.2 1.7 2.89 10.24
Total =22.5 = 0 =9.96 = 43.71
n= 15, = l.5
( )
1
2
12
−
−
=
∑=
n
xx
s
n
i
i
)(µmol/24hr7114.0
14
96.92
==s
2
ss =
SolutionSolution
1
2
1
1
2
−






−
=
∑
∑ =
=
n
n
x
x
s
n
i
in
i
i
=0.843 µmol/(24hr)
One can apply the following
equation as an alternative
Coefficient of VariationCoefficient of Variation
One important application of the mean and the standardOne important application of the mean and the standard
deviation is the coefficient of variation.deviation is the coefficient of variation. It is defined as theIt is defined as the
ratio of the standard deviation to the value of the mean,ratio of the standard deviation to the value of the mean,
expressed as a percentage.expressed as a percentage.
cvcv = Coefficient of variation == Coefficient of variation =
Since both standard deviation and the mean are expressed in sameSince both standard deviation and the mean are expressed in same
units, thereforeunits, therefore cvcv is unitlessis unitless oror dimensionless.dimensionless.
Therefore, it is possible toTherefore, it is possible to use it to compare the relative variation ofuse it to compare the relative variation of
even unrelated quantities. It also useful in comparing the variabilityeven unrelated quantities. It also useful in comparing the variability
among different variables that vary in magnitude of the valuesamong different variables that vary in magnitude of the values
(elephant weight versus mouse weight)(elephant weight versus mouse weight)
Standard deviation
100%
x
×
Suppose that each day laboratory technicianSuppose that each day laboratory technician AA completes 40 analysescompletes 40 analyses
with a standard deviation of 5. Technicianwith a standard deviation of 5. Technician BB completes 160 analyses percompletes 160 analyses per
day with a standard deviation of 15. Which employee shows lessday with a standard deviation of 15. Which employee shows less
variability?variability?
At first glance, it appears that technicianAt first glance, it appears that technician BB has three times more variationhas three times more variation
in the output rate than technicianin the output rate than technician AA. But. But BB completes analyses at a rate 4completes analyses at a rate 4
times faster thantimes faster than AA. Taking all this information into account, we compute. Taking all this information into account, we compute
the coefficient of variation for both technicians:the coefficient of variation for both technicians:
For technicianFor technician AA:: cvcv=5/40 x 100% = 12.5%=5/40 x 100% = 12.5%
For technicianFor technician BB:: cvcv=15/160 x 100% = 9.4%.=15/160 x 100% = 9.4%.
So, we find that, technicianSo, we find that, technician BB who has more absolute variation in outputwho has more absolute variation in output
than technicianthan technician AA, has less relative variation., has less relative variation.
ExampleExample
Means and standard deviations fromMeans and standard deviations from
grouped datagrouped data
More often than not, data are presented inMore often than not, data are presented in groupedgrouped
form. That is, the data are in part summarized andform. That is, the data are in part summarized and
grouped in a frequency table.grouped in a frequency table.
Formulas for calculating the mean and theFormulas for calculating the mean and the
standard deviation for grouped data:standard deviation for grouped data:
where = mean of the data set,where = mean of the data set,
ss = standard deviation of the data set= standard deviation of the data set
xxii = midpoint of the ith class,= midpoint of the ith class,
ffii = frequency of the ith class,= frequency of the ith class,
kk = number of classes,= number of classes,
nn = total number of observations in the data set.= total number of observations in the data set.
1
1
k
i i
i
k
i
i
f x
x
f
=
=
=
∑
∑
2
12
1
1
k
i ik
i
i i
i
f x
f x
n
s
n
=
=
 
 ÷
 −
=
−
∑
∑
x
ExampleExample
Given below are the frequency distributions for the heights (in centimeters)Given below are the frequency distributions for the heights (in centimeters)
of a sample of 100 student in the Islamic University, find the approximateof a sample of 100 student in the Islamic University, find the approximate
value for the standard deviation for students.value for the standard deviation for students.
Frequency of heights of a sample of 100 students in the Islamic UniversityFrequency of heights of a sample of 100 students in the Islamic University
Class interval xi xi
2
fi fxi fxi
2
150-154 152 23,104 9 1,368 207,936
155-159 157 24,649 22 3,454 542,278
160-164 162 26,244 31 5,022 813,564
165-169 167 27,889 24 4,008 669,336
170-174 172 29,584 13 2,236 384,592
175-179 177 31,329 1 177 31,329
Total 100 16,265 2,649,035
Class interval xi xi
2
fi fxi fxi
2
150-154 152 23,104 9 1,368 207,936
155-159 157 24,649 22 3,454 542,278
160-164 162 26,244 31 5,022 813,564
165-169 167 27,889 24 4,008 669,336
170-174 172 29,584 13 2,236 384,592
175-179 177 31,329 1 177 31,329
Total 100 16,265 2,649,035
∑
∑
=
=
= k
i
i
k
i
ii
f
xf
x
1
1
cm65.162
100
265,16
==
1
2
1
1
2
−






−
=
∑
∑ =
=
n
n
xf
xf
s
k
i
iik
i
ii
99
25.502,645,2035,649,2 −
=
68.35= =5.97 cm
Note thatNote that there isthere is some difference betweensome difference between results fromresults from
computations ungrouped and grouped data. The size of thecomputations ungrouped and grouped data. The size of the
discrepancy depends ondiscrepancy depends on width of the class intervalwidth of the class interval and onand on
thethe number of observations within an intervalnumber of observations within an interval.. With shortWith short
class intervals and large samples, the discrepancy isclass intervals and large samples, the discrepancy is
negligible.negligible.
MEASURES OF POSITION:MEASURES OF POSITION:
Percentiles, Deciles, and QuartilesPercentiles, Deciles, and Quartiles
In cases where our data distribution are heavily skewed or evenIn cases where our data distribution are heavily skewed or even
bimodal, we often get a better summary of the distribution bybimodal, we often get a better summary of the distribution by
utilizing relative position of data rather than exact value.utilizing relative position of data rather than exact value.
MeasuresMeasures ofof positionposition are used to describe the location of aare used to describe the location of a
particular observation in relation to the rest of the data set.particular observation in relation to the rest of the data set.
Recall that the median is an average computed by using relativeRecall that the median is an average computed by using relative
position of the data. If we are told that 71 is the median score on aposition of the data. If we are told that 71 is the median score on a
biology test, we know that after the data have been ordered, 50%biology test, we know that after the data have been ordered, 50%
of the data fall at or below the median value of 71. The median isof the data fall at or below the median value of 71. The median is
an example of aan example of a percentilepercentile; in fact, it is the 50th percentile. The; in fact, it is the 50th percentile. The
general definition of the Pth percentile follows.general definition of the Pth percentile follows.
PercentilesPercentiles
PercentilesPercentiles areare valuesvalues that divide the ranked data setthat divide the ranked data set
into 100 equal parts. These values, denoted byinto 100 equal parts. These values, denoted by P1P1,, P2P2,,
….,…., P99P99, are such that 1% of the data falls below, are such that 1% of the data falls below P1,P1,
2% falls below2% falls below P2P2, …., and 99% falls below, …., and 99% falls below P99P99..
1% 1% 1% 1% 1% 1% 1% 1%
Lowest 1st 2nd 3rd 4th 5th 98th 99th Highest
1% 1% 1% 1% 1% 1% 1% 1%
Lowest 1st 2nd 3rd 4th 5th 98th 99th Highest
DecilesDeciles
DecilesDeciles areare valuesvalues that divide the ranked data set intothat divide the ranked data set into
10 equal parts. These values, denoted10 equal parts. These values, denoted D1D1,, D2D2, ….,, …., D9D9,,
are such that 10% of the data falls beloware such that 10% of the data falls below D1D1, 20% falls, 20% falls
belowbelow D2D2, ….., and 90% falls below, ….., and 90% falls below D9D9..
Lowest Highest
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
1st 2nd 3rd 4th 5th 6th 7th 8th 9thLowest Highest
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
1st 2nd 3rd 4th 5th 6th 7th 8th 9th
QuartilesQuartiles
QuartilesQuartiles areare valuesvalues that divide the ranked data setthat divide the ranked data set
into 4 equal parts. These values are denoted byinto 4 equal parts. These values are denoted by Q1Q1,,
Q2Q2, and, and Q3Q3 are such that 25% of the data falls beloware such that 25% of the data falls below
Q1Q1, 50% falls below, 50% falls below Q2Q2, and 75% falls below, and 75% falls below Q3Q3..
25% 25% 25% 25%
Lowest Q1 Q2 Q3 Highest
Median
50th percentile
Percentiles, deciles and quartilesPercentiles, deciles and quartiles
All the quartiles and deciles are percentiles. ForAll the quartiles and deciles are percentiles. For
example, the 7th decile is the 70th percentile and theexample, the 7th decile is the 70th percentile and the
1st quartile is the 25th percentile. Consequently,1st quartile is the 25th percentile. Consequently,
deciles and quartiles are often stated as percentilesdeciles and quartiles are often stated as percentiles..
25% 25% 25% 25%
Lowest Q1 Q2 Q3 Highest
Median
50th percentile
The 50th percentile, 5th decile, and 2nd quartile of
a distribution are all the same and correspond
to the median
Step 2: Use the following formula to determine the percentile of
the score, x:
This percent is then rounded to the nearest whole numberThis percent is then rounded to the nearest whole number
((integer) to give the percentile for observation) to give the percentile for observation xx..
Finding the Percentile that Corresponds to aFinding the Percentile that Corresponds to a
Data ValueData Value
Step 1: Arrange the data in ascending order.
Number of data values less than x
Percentile of x = .100
Total number of values
3.0 5.0 6.2 7.6 9.4
3.3 5.2 6.3 7.6 9.5
3.5 5.5 6.4 7.7 9.5
3.5 5.5 6.6 7.8 10.0
3.6 5.5 6.6 7.8 10.5
4.0 5.8 6.8 8.5 10.8
4.0 5.8 6.8 8.5 10.9
4.2 5.9 6.8 8.8 11.0
4.6 6.0 7.0 8.8 11.0
The table contains the ranked aortic diameters measured in centimetersThe table contains the ranked aortic diameters measured in centimeters
forfor 4545 patients. Notice that the data in the Table are already ranked.patients. Notice that the data in the Table are already ranked. RawRaw
data need to be ranked prior to finding measures of positiondata need to be ranked prior to finding measures of position..
Example 1Example 1 The number of observationsThe number of observations
less than 5.5less than 5.5 is 11 .is 11 .
This percent rounds to 24. The diameterThis percent rounds to 24. The diameter
5.55.5 is the 24th percentile and we expressis the 24th percentile and we express
this as P24 = 5.5this as P24 = 5.5..
Example 2Example 2 The number of observationsThe number of observations
less than 10.0 is 39.less than 10.0 is 39. ThusThus
11
.100 24.4%
45
=
39
.100 86.7% 87
45
= ≈ we write P87 = 10.0
Finding the Percentile that Corresponds to aFinding the Percentile that Corresponds to a
Data Value (5.5 and 10)Data Value (5.5 and 10)
TheThe pth percentilepth percentile for a ranked data set consisting offor a ranked data set consisting of nn observationsobservations
is found by a two-step procedure.is found by a two-step procedure.
The first step is to compute index .The first step is to compute index .
IfIf ii is not an integer,is not an integer, , round up to the next highest integer. Locate
the ith value of the data set written in ascending order. This
number represents the pth percentile.
IfIf ii is an integer, the pth percentile is the average of theis an integer, the pth percentile is the average of the
observations in positionsobservations in positions ii andand ii + 1+ 1 in the ranked data set.in the ranked data set.
( )( )
100
p n
i =
Computing the pth Percentile
The pth percentile of a data set is a value such that p percent of the
observations less than this value and (100 - p) percent of the
observations are more than this value .
To find theTo find the tenth percentiletenth percentile for the data of thefor the data of the
Table,Table,
compute = 4.5.compute = 4.5.
The next integer greater than 4.5 isThe next integer greater than 4.5 is 5.5. TheThe
observation inobservation in the fifth positionthe fifth position in the Tablein the Table
is 3.6.is 3.6.
Therefore,Therefore, P10 = 3.6P10 = 3.6..
Note that at least 10% of the data in the TableNote that at least 10% of the data in the Table
are 3.6 or less (the actual amount is 11.1%)are 3.6 or less (the actual amount is 11.1%)
and at least 90% of the data are 3.6 or moreand at least 90% of the data are 3.6 or more
(the actual amount is 91.1%).(the actual amount is 91.1%).
For very large data sets, the percentage ofFor very large data sets, the percentage of
observations equal to or less than P10 willobservations equal to or less than P10 will
be very close to 10% and the percentage ofbe very close to 10% and the percentage of
observations equal to or greater than P10observations equal to or greater than P10
will be very close to 90%.will be very close to 90%.
3.0 5.0 6.2 7.6 9.4
3.3 5.2 6.3 7.6 9.5
3.5 5.5 6.4 7.7 9.5
3.5 5.5 6.6 7.8 10.0
3.6 5.5 6.6 7.8 10.5
4.0 5.8 6.8 8.5 10.8
4.0 5.8 6.8 8.5 10.9
4.2 5.9 6.8 8.8 11.0
4.6 6.0 7.0 8.8 11.0
(10)(45)
100
i =
EXAMPLEEXAMPLE
EXAMPLEEXAMPLE
To find the fortieth percentile for the data inTo find the fortieth percentile for the data in
the Table,the Table,
compute = 18.compute = 18.
The fortieth percentile is the average of theThe fortieth percentile is the average of the
observations in the 18th and 19thobservations in the 18th and 19th
positions in the ranked data set.positions in the ranked data set.
The observation in the 18th position is 6.0The observation in the 18th position is 6.0
and the observation in the 19th positionand the observation in the 19th position
is 6.2.is 6.2.
Therefore P40 = 6.1.Therefore P40 = 6.1.
Note that 40% of the data in the Table areNote that 40% of the data in the Table are
6.1 or less and that 60% of the6.1 or less and that 60% of the
observations are 6.1 or more.observations are 6.1 or more.
3.0 5.0 6.2 7.6 9.4
3.3 5.2 6.3 7.6 9.5
3.5 5.5 6.4 7.7 9.5
3.5 5.5 6.6 7.8 10.0
3.6 5.5 6.6 7.8 10.5
4.0 5.8 6.8 8.5 10.8
4.0 5.8 6.8 8.5 10.9
4.2 5.9 6.8 8.8 11.0
4.6 6.0 7.0 8.8 11.0
(40)(45)
100
i =
(6.0)(6.2)
2
=
Procedure to compute quartilesProcedure to compute quartiles
– Order the data from smallest to largest.Order the data from smallest to largest.
– Find the median. This is the second quartile.Find the median. This is the second quartile.
– The first quartileThe first quartile Q1Q1 is then the median of the loweris then the median of the lower
half of the data; that is, it is the median of the datahalf of the data; that is, it is the median of the data
falling below thefalling below the Q2Q2 position (position (and not includingand not including Q2Q2).).
– The third quartileThe third quartile Q3Q3 is the median of the upper halfis the median of the upper half
of the data; that is, it is the median of the data fallingof the data; that is, it is the median of the data falling
above theabove the Q3Q3 position (position (and not includingand not including Q2Q2).).
2, 3, 4, 5, 6, 7, 82, 3, 4, 5, 6, 7, 8
median (Q2)
Lower half
Lower Quartile (Q1)
Upper half
Upper Quartile (Q3)
6, 8, 2, 7, 4, 5, 36, 8, 2, 7, 4, 5, 3
Example 2 ….Even numberExample 2 ….Even number
Find the median, and upper and lower quartiles of thisFind the median, and upper and lower quartiles of this
set: 22, 19, 27, 32, 38, 25, 32, 26set: 22, 19, 27, 32, 38, 25, 32, 26
First step, order the data:First step, order the data:
19, 22, 25, 26, 27, 32, 32, 3819, 22, 25, 26, 27, 32, 32, 38
So, there are eight numbers, the median is the averageSo, there are eight numbers, the median is the average
of the fourth and fifth numbers.of the fourth and fifth numbers.
Median = (26+27)/2 = 26.5Median = (26+27)/2 = 26.5
Lower Quartile = (22+25)/2 = 23.5Lower Quartile = (22+25)/2 = 23.5
Upper Quartile = (32+32)/2 = 32Upper Quartile = (32+32)/2 = 32
The lower quartile is the median of the first four numbers,The lower quartile is the median of the first four numbers,
and the upper quartile is the median of the last four numbers.and the upper quartile is the median of the last four numbers.
Interquartile Range (Interquartile Range (IQRIQR))
The interquartile range tells us the spread of theThe interquartile range tells us the spread of the
middle half of the data.middle half of the data.
Interquartile range =Upper Quartile - Lower QuartileInterquartile range =Upper Quartile - Lower Quartile
Or,Or,
IQRIQR == Q3Q3 –– Q1Q1
25% 25% 25% 25%
Q1 Q2 Q3
OutliersOutliers
AnAn outlieroutlier is a number that is so far above the data set or below most ofis a number that is so far above the data set or below most of
the data set as to bethe data set as to be considered abnormal and therefore of questionableconsidered abnormal and therefore of questionable
accuracyaccuracy..
Outliers may be fromOutliers may be from
data collection errors,data collection errors,
data entry errors,data entry errors,
or simply valid but unusual data values.or simply valid but unusual data values.
Regardless of the reason, it is important to identify the outliers in the dataRegardless of the reason, it is important to identify the outliers in the data
set and examine outliers carefully to determine if they are an error.set and examine outliers carefully to determine if they are an error.
An outlier isAn outlier is defineddefined to be any data point that is 1.5to be any data point that is 1.5 IQRIQRss below thebelow the
lower quartile or above the upper quartile.lower quartile or above the upper quartile.
OutliersOutliers
ExampleExample
28, 55, 57, 58, 61, 61, 63, 65, 8328, 55, 57, 58, 61, 61, 63, 65, 83
UQUQ = (65+63)/2 = 64= (65+63)/2 = 64
LQLQ = (55+57)/2 = 56= (55+57)/2 = 56
IQRIQR = 64 – 56 = 8= 64 – 56 = 8
So any numberSo any number below LQ – 1.5(below LQ – 1.5(IQRIQR)) = 56 – 1.5(8) = 44= 56 – 1.5(8) = 44
or any numberor any number aboveabove UQUQ + 1.5(+ 1.5(IQRIQR)) = 64 + 1.5(8) = 78= 64 + 1.5(8) = 78
is an outlier.is an outlier.
Therefore the outliers of this data set are 28 & 83.Therefore the outliers of this data set are 28 & 83.
Box-and –Whisker PlotsBox-and –Whisker Plots
The quartiles together with the low and high data valuesThe quartiles together with the low and high data values
give us a very useful five number summary of the datagive us a very useful five number summary of the data
and their spread. These Five-number summary include;and their spread. These Five-number summary include;
Lowest value,Lowest value, Q1,Q1, median,median, Q3Q3, and highest value., and highest value.
These five numbers can be used to create sketch of theThese five numbers can be used to create sketch of the
data called adata called a box-and-Whisker plotbox-and-Whisker plot.. Box-and-WhiskerBox-and-Whisker
plots provide another useful technique for describingplots provide another useful technique for describing
data.data.
(lowest value) (highest value)(median)
Q1 Q2 Q3
Lowest value
Highest value
Median
Q1
Q3
1. Draw a vertical scale to include the lowest
and highest data values.
2. To the right of the scale draw a box from
Q1 to Q3.
3. Include a solid line through the box at the
median level.
4. Draw solid lines, called whiskers, from Q1
to the lowest value and from Q3 to the
highest value.
5. Any outliers are marked with an asterisk
(*).
To make Box-and-Whisker plotTo make Box-and-Whisker plot
**
60 -
55 -
50 -
45 -
40 -
35 -
30 -
25 -
20 -
15 -
10 -
median = 24
lower quartile = 17
upper quartile = 33
minimum value = 12
maximum value = 51
12 15 16 16 17 18 22 22
23 24 25 30 32 33 33 34
41 45 51
Construct a
Box-and-Whisker Plot:
1. Symmetric
If the median is near the center of the box and each
of the horizontal lines are approximately equal length,
then the distribution is roughly symmetric.
Distribution Shape Based Upon Boxplot
2. Skewed Right
If the median is left of the center of the box and/or
the right line is substantially longer than the left
line, the distribution is right skewed.
3. Skewed Left
If the median is right of the center of the box and/or
the left line is substantially longer than the right line,
the distribution is left skewed
Ad

More Related Content

What's hot (20)

Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
Joydeep Hazarika
 
Lesson 2 percentiles
Lesson 2   percentilesLesson 2   percentiles
Lesson 2 percentiles
karisashley
 
Mean Deviation
Mean DeviationMean Deviation
Mean Deviation
Carlo Luna
 
MEDIAN.pptx
MEDIAN.pptxMEDIAN.pptx
MEDIAN.pptx
SreeLatha98
 
6. point and interval estimation
6. point and interval estimation6. point and interval estimation
6. point and interval estimation
ONE Virtual Services
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
ewhite00
 
Standard normal distribution
Standard normal distributionStandard normal distribution
Standard normal distribution
Nadeem Uddin
 
2.1 frequency distributions, histograms, and related topics
2.1 frequency distributions, histograms, and related topics2.1 frequency distributions, histograms, and related topics
2.1 frequency distributions, histograms, and related topics
leblance
 
statistic
statisticstatistic
statistic
Pwalmiki
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
raileeanne
 
Chapter 3 Confidence Interval
Chapter 3 Confidence IntervalChapter 3 Confidence Interval
Chapter 3 Confidence Interval
ghalan
 
Ppt central tendency measures
Ppt central tendency measuresPpt central tendency measures
Ppt central tendency measures
MtMt37
 
Central limit theorem
Central limit theoremCentral limit theorem
Central limit theorem
Nadeem Uddin
 
Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"
muhammad raza
 
Measures of Variability
Measures of VariabilityMeasures of Variability
Measures of Variability
jasondroesch
 
Relative frequency distribution
Relative frequency distributionRelative frequency distribution
Relative frequency distribution
Nadeem Uddin
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Burak Mızrak
 
Basics stat ppt-types of data
Basics stat ppt-types of dataBasics stat ppt-types of data
Basics stat ppt-types of data
Farhana Shaheen
 
Estimation theory 1
Estimation theory 1Estimation theory 1
Estimation theory 1
Gopi Saiteja
 
The Central Limit Theorem
The Central Limit TheoremThe Central Limit Theorem
The Central Limit Theorem
Long Beach City College
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
Joydeep Hazarika
 
Lesson 2 percentiles
Lesson 2   percentilesLesson 2   percentiles
Lesson 2 percentiles
karisashley
 
Mean Deviation
Mean DeviationMean Deviation
Mean Deviation
Carlo Luna
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
ewhite00
 
Standard normal distribution
Standard normal distributionStandard normal distribution
Standard normal distribution
Nadeem Uddin
 
2.1 frequency distributions, histograms, and related topics
2.1 frequency distributions, histograms, and related topics2.1 frequency distributions, histograms, and related topics
2.1 frequency distributions, histograms, and related topics
leblance
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
raileeanne
 
Chapter 3 Confidence Interval
Chapter 3 Confidence IntervalChapter 3 Confidence Interval
Chapter 3 Confidence Interval
ghalan
 
Ppt central tendency measures
Ppt central tendency measuresPpt central tendency measures
Ppt central tendency measures
MtMt37
 
Central limit theorem
Central limit theoremCentral limit theorem
Central limit theorem
Nadeem Uddin
 
Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"
muhammad raza
 
Measures of Variability
Measures of VariabilityMeasures of Variability
Measures of Variability
jasondroesch
 
Relative frequency distribution
Relative frequency distributionRelative frequency distribution
Relative frequency distribution
Nadeem Uddin
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Burak Mızrak
 
Basics stat ppt-types of data
Basics stat ppt-types of dataBasics stat ppt-types of data
Basics stat ppt-types of data
Farhana Shaheen
 
Estimation theory 1
Estimation theory 1Estimation theory 1
Estimation theory 1
Gopi Saiteja
 

Similar to 3 descritive statistics measure of central tendency variatio (20)

CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
Gilbert Joseph Abueg
 
Medical statistics
Medical statisticsMedical statistics
Medical statistics
Amany El-seoud
 
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Megha Sharma
 
Descriptive statistics i
Descriptive statistics iDescriptive statistics i
Descriptive statistics i
Mohammad Ihmeidan
 
Data Management_new.pptx
Data Management_new.pptxData Management_new.pptx
Data Management_new.pptx
DharenOla3
 
Lecture 3 & 4 Measure of Central Tendency.pdf
Lecture 3 & 4 Measure of Central Tendency.pdfLecture 3 & 4 Measure of Central Tendency.pdf
Lecture 3 & 4 Measure of Central Tendency.pdf
kelashraisal
 
Measures of disease in epidemiology
Measures of disease in epidemiologyMeasures of disease in epidemiology
Measures of disease in epidemiology
Osmanmohamed38
 
ANA 809 - Measures of Central Tendency - Emmanuel Uchenna.pptx
ANA 809 - Measures of Central Tendency - Emmanuel Uchenna.pptxANA 809 - Measures of Central Tendency - Emmanuel Uchenna.pptx
ANA 809 - Measures of Central Tendency - Emmanuel Uchenna.pptx
EmmanuelUchenna7
 
Measures of Central Tendency.ppt
Measures of Central Tendency.pptMeasures of Central Tendency.ppt
Measures of Central Tendency.ppt
AdamRayManlunas1
 
Intro to Biostat. ppt
Intro to Biostat. pptIntro to Biostat. ppt
Intro to Biostat. ppt
AhmadYarSukhera
 
1608 probability and statistics in engineering
1608 probability and statistics in engineering1608 probability and statistics in engineering
1608 probability and statistics in engineering
Dr Fereidoun Dejahang
 
4.-Science-of-Statistics-Part-1 0(1).pdf
4.-Science-of-Statistics-Part-1 0(1).pdf4.-Science-of-Statistics-Part-1 0(1).pdf
4.-Science-of-Statistics-Part-1 0(1).pdf
AdamCayongcong1
 
Measures of Central Tendency.pdf
Measures of Central Tendency.pdfMeasures of Central Tendency.pdf
Measures of Central Tendency.pdf
DenogieCortes
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
Abhinav yadav
 
P5 ungrouped data
P5 ungrouped dataP5 ungrouped data
P5 ungrouped data
walshbarbaram
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
boyfieldhouse
 
Central Tendency.pptx
Central Tendency.pptxCentral Tendency.pptx
Central Tendency.pptx
CHIRANTANMONDAL2
 
Stat11t chapter3
Stat11t chapter3Stat11t chapter3
Stat11t chapter3
raylenepotter
 
Community Medicine C22 P04 STATISTICAL AVERAGES.ppt
Community Medicine C22 P04 STATISTICAL AVERAGES.pptCommunity Medicine C22 P04 STATISTICAL AVERAGES.ppt
Community Medicine C22 P04 STATISTICAL AVERAGES.ppt
ShivamJindal71
 
Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdfUnit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
AravindS199
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
Gilbert Joseph Abueg
 
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Megha Sharma
 
Data Management_new.pptx
Data Management_new.pptxData Management_new.pptx
Data Management_new.pptx
DharenOla3
 
Lecture 3 & 4 Measure of Central Tendency.pdf
Lecture 3 & 4 Measure of Central Tendency.pdfLecture 3 & 4 Measure of Central Tendency.pdf
Lecture 3 & 4 Measure of Central Tendency.pdf
kelashraisal
 
Measures of disease in epidemiology
Measures of disease in epidemiologyMeasures of disease in epidemiology
Measures of disease in epidemiology
Osmanmohamed38
 
ANA 809 - Measures of Central Tendency - Emmanuel Uchenna.pptx
ANA 809 - Measures of Central Tendency - Emmanuel Uchenna.pptxANA 809 - Measures of Central Tendency - Emmanuel Uchenna.pptx
ANA 809 - Measures of Central Tendency - Emmanuel Uchenna.pptx
EmmanuelUchenna7
 
Measures of Central Tendency.ppt
Measures of Central Tendency.pptMeasures of Central Tendency.ppt
Measures of Central Tendency.ppt
AdamRayManlunas1
 
1608 probability and statistics in engineering
1608 probability and statistics in engineering1608 probability and statistics in engineering
1608 probability and statistics in engineering
Dr Fereidoun Dejahang
 
4.-Science-of-Statistics-Part-1 0(1).pdf
4.-Science-of-Statistics-Part-1 0(1).pdf4.-Science-of-Statistics-Part-1 0(1).pdf
4.-Science-of-Statistics-Part-1 0(1).pdf
AdamCayongcong1
 
Measures of Central Tendency.pdf
Measures of Central Tendency.pdfMeasures of Central Tendency.pdf
Measures of Central Tendency.pdf
DenogieCortes
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
Abhinav yadav
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
boyfieldhouse
 
Community Medicine C22 P04 STATISTICAL AVERAGES.ppt
Community Medicine C22 P04 STATISTICAL AVERAGES.pptCommunity Medicine C22 P04 STATISTICAL AVERAGES.ppt
Community Medicine C22 P04 STATISTICAL AVERAGES.ppt
ShivamJindal71
 
Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdfUnit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
Unit 1 - Mean Median Mode - 18MAB303T - PPT - Part 1.pdf
AravindS199
 
Ad

More from Lama K Banna (20)

The TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdfThe TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdf
Lama K Banna
 
دليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdfدليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdf
Lama K Banna
 
Investment proposal
Investment proposalInvestment proposal
Investment proposal
Lama K Banna
 
Funding proposal
Funding proposalFunding proposal
Funding proposal
Lama K Banna
 
5 incisions
5 incisions5 incisions
5 incisions
Lama K Banna
 
Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery
Lama K Banna
 
lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery
Lama K Banna
 
Facial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial SurgeryFacial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial Surgery
Lama K Banna
 
Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery
Lama K Banna
 
Lecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmdLecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmd
Lama K Banna
 
Lecture 10 temporomandibular joint
Lecture 10 temporomandibular jointLecture 10 temporomandibular joint
Lecture 10 temporomandibular joint
Lama K Banna
 
Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3
Lama K Banna
 
Lecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examinationLecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examination
Lama K Banna
 
Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2
Lama K Banna
 
Lecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial cleftsLecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial clefts
Lama K Banna
 
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lama K Banna
 
Lecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformitiesLecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformities
Lama K Banna
 
lecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorderslecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorders
Lama K Banna
 
Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3
Lama K Banna
 
Lecture 2 maxillofacial trauma
Lecture 2 maxillofacial traumaLecture 2 maxillofacial trauma
Lecture 2 maxillofacial trauma
Lama K Banna
 
The TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdfThe TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdf
Lama K Banna
 
دليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdfدليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdf
Lama K Banna
 
Investment proposal
Investment proposalInvestment proposal
Investment proposal
Lama K Banna
 
Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery
Lama K Banna
 
lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery
Lama K Banna
 
Facial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial SurgeryFacial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial Surgery
Lama K Banna
 
Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery
Lama K Banna
 
Lecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmdLecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmd
Lama K Banna
 
Lecture 10 temporomandibular joint
Lecture 10 temporomandibular jointLecture 10 temporomandibular joint
Lecture 10 temporomandibular joint
Lama K Banna
 
Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3
Lama K Banna
 
Lecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examinationLecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examination
Lama K Banna
 
Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2
Lama K Banna
 
Lecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial cleftsLecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial clefts
Lama K Banna
 
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lama K Banna
 
Lecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformitiesLecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformities
Lama K Banna
 
lecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorderslecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorders
Lama K Banna
 
Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3
Lama K Banna
 
Lecture 2 maxillofacial trauma
Lecture 2 maxillofacial traumaLecture 2 maxillofacial trauma
Lecture 2 maxillofacial trauma
Lama K Banna
 
Ad

Recently uploaded (20)

717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx
dharmendrasingh31102
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Collibra DQ Installation setup and debug
Collibra DQ Installation setup and debugCollibra DQ Installation setup and debug
Collibra DQ Installation setup and debug
karthikprince20
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
muhammed84essa
 
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahahE-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
RyanRahardjo2
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
Taqyea
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhhChapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
ChrisjohnAlfiler
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Process Mining at AE - Key success factors
Process Mining at AE - Key success factorsProcess Mining at AE - Key success factors
Process Mining at AE - Key success factors
Process mining Evangelist
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx717239550-Hotel-Management-Ppt-Final.pptx
717239550-Hotel-Management-Ppt-Final.pptx
dharmendrasingh31102
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Collibra DQ Installation setup and debug
Collibra DQ Installation setup and debugCollibra DQ Installation setup and debug
Collibra DQ Installation setup and debug
karthikprince20
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
muhammed84essa
 
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahahE-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
E-Book-TOEFL-Masuk-PTN.pdf hahahahaahahahah
RyanRahardjo2
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
定制(意大利Rimini毕业证)布鲁诺马代尔纳嘉雷迪米音乐学院学历认证
Taqyea
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhhChapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
Chapter-3-PROBLEM-SOLVING.pdf hhhhhhhhhh
ChrisjohnAlfiler
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 

3 descritive statistics measure of central tendency variatio

  • 1. Descriptive StatisticsDescriptive Statistics Numerically SummarizingNumerically Summarizing DataData
  • 2. Descriptive StatisticsDescriptive Statistics OverviewOverview Numerical DataNumerical Data PropertiesProperties Mean Median Mode Central Tendency Range Interquartile Range Variance Standard Deviation Coefficient of Variation Variation Skewness Shape Kurtosis
  • 3. Introduction:Introduction: Given a set of data, one invariably wishes to find a valueGiven a set of data, one invariably wishes to find a value about which the observations tend to cluster. The threeabout which the observations tend to cluster. The three most common values are themost common values are the meanmean, the, the medianmedian, and the, and the modemode. They are known as measures of central tendency-. They are known as measures of central tendency- the tendency of a set of data to center around certainthe tendency of a set of data to center around certain numerical values.numerical values. Central tendencyCentral tendency
  • 4. This is what people usually have in mind when they say “average” The Arithmetic MeanThe Arithmetic Mean ))))
  • 5. The Arithmetic MeanThe Arithmetic Mean May be considered the balance point, in aMay be considered the balance point, in a distribution of observations.distribution of observations. Computed by summing all the observations in theComputed by summing all the observations in the sample and dividing the sum by the number ofsample and dividing the sum by the number of observations.observations. The sample arithmetic mean, is computed using sample data. The sample mean is a statistic 1 2 1... n i n i x x x x x n n =+ + + = = ∑
  • 6. ∑ 1 2 1... n i n i x x x x x n n =+ + + = = ∑ 1 n i = ∑ x (pronounced "x bar"), representing the sample mean;(pronounced "x bar"), representing the sample mean; xx11 is theis the first andfirst and xxii thethe iith in a series of observations.th in a series of observations. The symbol is the Greek letter sigma and denotes "the sumThe symbol is the Greek letter sigma and denotes "the sum of."of." Thus indicates that the sum as to begin withThus indicates that the sum as to begin with ii = 1 and= 1 and increment by one up to and including the last observationincrement by one up to and including the last observation nn..
  • 7. ExampleExample Consider 7 observations: 4.2, 4.3, 4.7, 4.8, 5.0,Consider 7 observations: 4.2, 4.3, 4.7, 4.8, 5.0, 5.1, 9.0.5.1, 9.0. By definitionBy definition == (4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1+ 9.0)/7(4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1+ 9.0)/7 == 5.35.3
  • 8. The population arithmeticThe population arithmetic MeanMean The symbol for the mean of a population is the GreekThe symbol for the mean of a population is the Greek letter mu, or µ.letter mu, or µ. The population mean is aThe population mean is a parameterparameter.. 1 Sum of the values of all observations in population Total number of observations in population N i i x N µ = = = ∑
  • 9. Weighted Mean The weighted mean of a set of numbers , with corresponding weights , is computed from the following formula: Example: Al-Quds Hospital at Gaza pays its hourly employees $16.50, $19.00, or $25.00 per day. There are 26 daily employees, 14 of which are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the $25.00 rate. What is the mean hourly rate paid the 26 employees?
  • 10. Mean of Grouped Data In a grouped distribution, we use the middle point of each interval as x value. Example: find the mean of the age for the following data Interval (age) Middle point ( ) Frequency ( ) 1-3 2 18 4-6 5 27 7-9 8 34 10-12 11 22 13-15 14 13 Total 114 61.7 114 867 1322342718 13)(1422)(1134)(827)(518)(2 == ++++ ×+×+×+×+× = ∑ ∑ = i ii f fx x year
  • 11. Advantages of the mean:Advantages of the mean: It is a measure that can be calculated and is unique.It is a measure that can be calculated and is unique. It is useful for performing statistical proceduresIt is useful for performing statistical procedures such as comparing the means from several datasuch as comparing the means from several data sets.sets.
  • 12. Disadvantages of the mean:Disadvantages of the mean: It is affected by extreme values.It is affected by extreme values. == (4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1+(4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1+ 9.09.0)/7)/7 == 5.35.3 == (4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1)/6(4.2+ 4.3+ 4.7+ 4.8+ 5.0+ 5.1)/6 == 4.74.7 It would be more representative to calculate the meanIt would be more representative to calculate the mean without including such an extreme value.without including such an extreme value.
  • 13. The MedianThe Median )))) The median of a variable is the numerical value that lies in the middle of the data when arranged in ascending order. That is, half the data is below the median and half the data is above the median.
  • 14. Steps in computing the Median of a dataSteps in computing the Median of a data setset 1. Arrange the data in ascending order.1. Arrange the data in ascending order. 2. Determine the number of observation2. Determine the number of observation nn.. 3. Determine the observation in the middle of the data set.3. Determine the observation in the middle of the data set. If the number of observations isIf the number of observations is oddodd, then the median is the data, then the median is the data value that is exactly in the middle of the data set. That is, it is thevalue that is exactly in the middle of the data set. That is, it is the observation that lies in theobservation that lies in the (n +1)/2 position(n +1)/2 position.. ExampleExample Find the median of the data set consisting of the observations 7, 4, 3, 5, 6,Find the median of the data set consisting of the observations 7, 4, 3, 5, 6, 8, 10.8, 10. SolutionSolution:: First, weFirst, we arrangearrange the data set in ascending orderthe data set in ascending order 3 4 53 4 5 66 7 8 10.7 8 10. Since the number of observations is odd, then median = (7+ 1)/2 = 4thSince the number of observations is odd, then median = (7+ 1)/2 = 4th number in the ordered list, namelynumber in the ordered list, namely 66..
  • 15. Steps in computing the Median of a dataSteps in computing the Median of a data setset If the number of observations isIf the number of observations is eveneven, then the median is, then the median is the arithmetic mean of the two middle observations in thethe arithmetic mean of the two middle observations in the data set. That is, it is the arithmetic mean of the data valuesdata set. That is, it is the arithmetic mean of the data values that lie in thethat lie in the n/2n/2 andand (n/2)+1(n/2)+1 positionposition.. ExampleExample Suppose we have the observations 7, 4, 3, 5, 6, 8, 10, 1. Find theSuppose we have the observations 7, 4, 3, 5, 6, 8, 10, 1. Find the median of this data set.median of this data set. Solution:Solution: First, we arrange the data set in ascending orderFirst, we arrange the data set in ascending order 1 3 41 3 4 5 65 6 7 8 10.7 8 10. Since the number of the observationsSince the number of the observations n =n = 8, then by Definition the8, then by Definition the median is the average of the 4th (median is the average of the 4th (nn/2 = 8/2 = 4th) and the 5th i.e./2 = 8/2 = 4th) and the 5th i.e. Median = (5+6)/2 = 5.5Median = (5+6)/2 = 5.5
  • 16. Advantage of the median over theAdvantage of the median over the mean:mean: It may be determined even if the values of allIt may be determined even if the values of all observations are not known.observations are not known. 3 4 53 4 5 66 xx11 xx22 xx33 Extreme values in data set do not affect the medianExtreme values in data set do not affect the median as strongly as they do the mean.as strongly as they do the mean.
  • 17. ExampleExample Consider 5 physicians who practice in Gaza Strip are sampledConsider 5 physicians who practice in Gaza Strip are sampled and asked how much an office visit costs. Suppose we get theand asked how much an office visit costs. Suppose we get the answers: 7.5, 7.5, 8.0, 8.0, and 28.0 JD. The mean charge foranswers: 7.5, 7.5, 8.0, 8.0, and 28.0 JD. The mean charge for the sample of five doctors isthe sample of five doctors is While the median is 8.0. This value is easily seen to be moreWhile the median is 8.0. This value is easily seen to be more representative of the values than was the sample mean, JDrepresentative of the values than was the sample mean, JD 11.8 which was affected by the extreme value of 28.0.11.8 which was affected by the extreme value of 28.0. 7.5 7.5 8.0 8.0 28.0 59.0 JD 11.8 5 5 x + + + + = = =
  • 18. Median of grouped data In a grouped distribution, the following steps are followed: Step 1: Form the cumulative frequency (F) Step 2: Find the value of where Step 3: Find F value that the first exceeds , which identifies the median class M. Step 4: Calculate the median using the following formula where; • lower bound of the median class • cumulative frequency of class immediately prior to the median class • actual frequency of median class • median class width.
  • 19. Median of grouped data Example: Estimate the median for the Age in the following data set Age 20-25 25-30 30-35 35-40 40-45 45-50 frequency 2 14 29 43 33 9 Solution: Step 1 Age (f) (F) 20-25 2 2 25-30 14 16 30-35 29 45 35-40 43 88 40-45 33 121 45-50 9 130 Step 2: =130/2 = 65 Step 3: Median class is 35-40 Step 4: =35; =45; =5. years
  • 20. The ModeThe Mode )))) The mode is the observation that occurs most frequently. i.e., is repeated mostThe mode is the observation that occurs most frequently. i.e., is repeated most often in the data set.often in the data set. For a given sample N=16: 33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 45 The mode = 39 It corresponds to the highest point on the frequency distribution.It corresponds to the highest point on the frequency distribution. 0 2 4 6 8 10 12 14 16 18 20 1 2 3 4 5 6 7
  • 21. ExampleExample Find the mode of the data set in The TableFind the mode of the data set in The Table Quantity of glucose (mg%) inQuantity of glucose (mg%) in blood of 25 studentsblood of 25 students 7070 8888 9595 101101 106106 7979 9393 9696 101101 107107 8383 9393 9797 103103 108108 8686 9393 9797 103103 112112 8787 9595 9898 106106 115115
  • 22. Solution:Solution: First we arrange this data set in the ascending orderFirst we arrange this data set in the ascending order This data set contains 25 numbers. We see that, the value of 93 isThis data set contains 25 numbers. We see that, the value of 93 is repeated most often. Therefore, the mode of the data set is 93.repeated most often. Therefore, the mode of the data set is 93. 7070 8888 9595 101101 106106 7979 9393 9696 101101 107107 8383 9393 9797 103103 108108 8686 9393 9797 103103 112112 8787 9595 9898 106106 115115
  • 23. Multimodal distributionMultimodal distribution:: A data set may haveA data set may have several modes. In this case it is called multimodalseveral modes. In this case it is called multimodal distribution.distribution. ExampleExample The data set has two modes: 1 and 4.The data set has two modes: 1 and 4. This distribution is calledThis distribution is called bimodalbimodal distribution.distribution. 00 22 66 99 00 44 66 1010 11 44 77 1111 11 44 88 1111 11 55 99 1212
  • 24. Advantage of the modeAdvantage of the mode Like the median, the mode isLike the median, the mode is notnot affected by extremeaffected by extreme values.values. Easily determined for categorical dataEasily determined for categorical data For a given sample N=16: 33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 The mode = 39 455060
  • 25. Mode of grouped data In a grouped distribution, the following steps are followed: Step 1: Determine the model class (class with the largest frequency). Step 2: Calculate = Difference between the largest frequency and frequency immediately preceding it. Step 3: Calculate = Difference between the largest frequency and the frequency immediately following it. Step 4: Obtain the mode using the following formula • = Lower bound of the modal class • = Model class width • and are described in Step 2 and Step 3.
  • 26. Mode of grouped data Example: Estimate the mode for the Age in the following data set Age 20-25 25-30 30-35 35-40 40-45 45-50 frequency 2 14 29 43 33 9 Solution: Step 1: Age Number (f) 20-25 2 25-30 14 30-35 29 35-40 43 40-45 33 45-50 9 Step 2: = 43-29=14 Step 3: = 43-33=10 Step 4: =35; = 40-35 =5 years
  • 27. Disadvantages of the mode:Disadvantages of the mode: Too often, there isToo often, there is no modalno modal value because thevalue because the data set contains no values that occur more thandata set contains no values that occur more than once. Other times, every value is the modeonce. Other times, every value is the mode because every value occurs the same number ofbecause every value occurs the same number of times. Clearly, the mode is a useless measure intimes. Clearly, the mode is a useless measure in these cases.these cases. For a given sample N=16: 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 No unique mode
  • 28. Disadvantages of the mode:Disadvantages of the mode: When data sets contain two, three, or manyWhen data sets contain two, three, or many modes, they are difficult to interpret andmodes, they are difficult to interpret and compare.compare. For a given sample N=16: 34 34 35 35 35 35 36 37 38 38 39 39 39 39 40 40 The modes = 35 and 39
  • 29. The Shape of DistributionsThe Shape of Distributions Distributions can be eitherDistributions can be either symmetricalsymmetrical oror skewedskewed, depending on whether there are, depending on whether there are more frequencies at one end of themore frequencies at one end of the distribution than the other.distribution than the other.
  • 30. Relative Positions of the Mean, Median and the Mode
  • 31. Selecting an Appropriate Measure ofSelecting an Appropriate Measure of Central TendencyCentral Tendency There are two general criteria for choosingThere are two general criteria for choosing between the measures of central tendencybetween the measures of central tendency 1.1. Scale of measurementScale of measurement – NominalNominal scale data, you can only use thescale data, you can only use the ModeMode – OrdinalOrdinal scale data, you can only usescale data, you can only use Median or ModeMedian or Mode;; Median is more informativeMedian is more informative – IntervalInterval oror ratioratio scale data, you can usescale data, you can use any one ofany one of the three.the three. 1.1. Shape of the distributionShape of the distribution – Mean is more informative, if you don’t have a skewedMean is more informative, if you don’t have a skewed distributiondistribution – If you have skewed distribution, you use the medianIf you have skewed distribution, you use the median in place of mean.in place of mean.
  • 33. Measures of VariationMeasures of Variation (dispersion)(dispersion)
  • 34. Measures of Variation (dispersion)Measures of Variation (dispersion) Just as measures of central tendency locate theJust as measures of central tendency locate the “center” of a relative frequency distribution,“center” of a relative frequency distribution, measures of variation measure its “spread”. When themeasures of variation measure its “spread”. When the variation is small, this means that the values are closevariation is small, this means that the values are close together (but not the same).together (but not the same).
  • 35. To understand Measures of VariationTo understand Measures of Variation consider the following two examples:consider the following two examples: Night and DayNight and Day Temperatures (Temperatures (oo C)C) Country ACountry A CountryCountry BB 2222 1717 3636 4040 2323 1616 3535 4242 2020 2020 3434 3535 Average 28.3 28.3 Example 1Example 1 Think of the difference betweenThink of the difference between an exam with an average mark ofan exam with an average mark of 6565 in which scores ranged fromin which scores ranged from (62 to 66)(62 to 66) and an exam with anand an exam with an average score ofaverage score of 6565 in whichin which scores ranged fromscores ranged from (30 to 90).(30 to 90). Example 2
  • 36. Population 1 Population 2 Population 1 Population 2 Population 1 Population 2 Two frequency distributions with equal means but different amounts of variation. Mean
  • 37. Measures of variabilityMeasures of variability Three statistics to measure variabilityThree statistics to measure variability – RangeRange – VarianceVariance – Interquartile rangeInterquartile range
  • 38. RangeRange The range is defined as the difference in valueThe range is defined as the difference in value between the highest (maxi-mum) and lowestbetween the highest (maxi-mum) and lowest (minimum) observation:(minimum) observation: Range =Range = xxmaxmax –– xxminmin The range can be computed quickly, but it isThe range can be computed quickly, but it is notnot very usefulvery useful since issince is considers only the extremesconsiders only the extremes andand does not take into consideration the bulkdoes not take into consideration the bulk of the observationsof the observations..
  • 39. VarianceVariance The Variance is a measure which uses theThe Variance is a measure which uses the mean as a point of reference.mean as a point of reference. The Variance is less when all value are close toThe Variance is less when all value are close to the mean while it is more when the values arethe mean while it is more when the values are spread out from the mean.spread out from the mean.
  • 40. Population variancePopulation variance TheThe population variancepopulation variance of the population of theof the population of the observationsobservations xx is defined the formulais defined the formula (sigma squared) =population variance(sigma squared) =population variance = the item or observation= the item or observation µµ = population mean= population mean NN = total number of observations in the population.= total number of observations in the population. ( ) 2 2 1 N i i x N µ σ = − = ∑ 2 σ ix
  • 41. The population variance of a variable is the sum of squared deviations about the population mean divided by the number of observations in the population, N. That is it is the arithmetic mean of the sum of the squared deviations about the population mean. ( ) 2 2 1 N i i x N µ σ = − = ∑ Population variancePopulation variance
  • 42. TheThe standard deviationstandard deviation of aof a populationpopulation TheThe standard deviationstandard deviation of a population is equalof a population is equal to the square root of the varianceto the square root of the variance ( ) 2 2 1 N i i x N µ σ σ = − = = ∑
  • 43. Since most populations are large, the computation ofSince most populations are large, the computation of σσ22 andand σσ are rarely performed. In practice, theare rarely performed. In practice, the population variance (or standard deviation) is usuallypopulation variance (or standard deviation) is usually estimated by taking a sample from the population andestimated by taking a sample from the population and usingusing ss22 andand ss as a estimate ofas a estimate of σσ22 andand σσ respectivelyrespectively..
  • 44. The sample varianceThe sample variance The sample variance of the sample of the observations isThe sample variance of the sample of the observations is defined the formuladefined the formula where:where: ss22 =sample variance=sample variance = sample mean= sample mean nn = total number of observations in the sample= total number of observations in the sample ( ) 2 2 1 1 n i i x x s n = − = − ∑ x 2 12 2 1 1 n in i i i x x n s n = =    ÷  − = − ∑ ∑ OR
  • 45. Standard deviation of the sampleStandard deviation of the sample The standard deviation of the sample isThe standard deviation of the sample is It could be also determined from the equations:It could be also determined from the equations: OROR 2 12 1 1 n in i i i x x n s n = =    ÷  − = − ∑ ∑( ) 2 1 1 n i i x x s n = − = − ∑ 2 s s=
  • 46. RemarkRemark:: In the denominator of the formula forIn the denominator of the formula for ss22 wewe useuse n-n-1 instead1 instead nn because statisticians proved that ifbecause statisticians proved that if ss22 is defined as above thenis defined as above then ss22 is an unbiased estimate ofis an unbiased estimate of the variance of the population from which the samplethe variance of the population from which the sample was selected ( i.e. the expected value ofwas selected ( i.e. the expected value of ss22 is equal tois equal to the population variance ).the population variance ). Note: Whenever a statistic consistently overestimates or underestimates a parameter, it is called biased. To obtain an unbiased estimate of the population variance, we divide the sum of the squared deviations about the mean by n - 1.
  • 47. ExampleExample A pediatric registrar in a district general hospital is investigatingA pediatric registrar in a district general hospital is investigating the amount of lead in the urine of children from a nearbythe amount of lead in the urine of children from a nearby housing estate. In a particular street there are 15 childrenhousing estate. In a particular street there are 15 children whose ages range from 1 year to under 16, and in a preliminarywhose ages range from 1 year to under 16, and in a preliminary study the registrar has found the amounts given in the Tablestudy the registrar has found the amounts given in the Table below of urinary lead (µmol/24hr),below of urinary lead (µmol/24hr), What is the variance and standard deviation?What is the variance and standard deviation? Urinary concentration of lead in 15 children from housing estate (µmol/24hr)Urinary concentration of lead in 15 children from housing estate (µmol/24hr) 0.6, 2.6, 0.1, 1.1, 0.4, 2.0, 0.8, 1.3, 1.2, 1.5, 3.2, 1.7, 1.9, 1.9, 2.20.6, 2.6, 0.1, 1.1, 0.4, 2.0, 0.8, 1.3, 1.2, 1.5, 3.2, 1.7, 1.9, 1.9, 2.2
  • 48. Note: When using the variance formula, do not round until the last computation. Use as many decimals as allowed by your calculator in order to avoid round off errors.
  • 49. Calculation of standard deviation (1) Lead concentration x (2) Differences from mean (3) Differences squared (4) Observations in col. (1) squared 0.1 -1.4 1.96 0.01 0.4 -1.1 1.21 0.16 0.6 -0.9 0.81 0.36 0.8 -0.7 0.49 0.64 1.1 -0.4 0.16 1.21 1.2 -0.3 0.09 1.44 1.3 -0.2 0.04 1.69 1.5 0 0 2.25 1.7 0.2 0.04 2.89 1.9 0.4 0.16 3.61 1.9 0.4 0.16 3.61 2.0 0.5 0.25 4.00 2.2 0.7 0.49 4.84 2.6 1.1 1.21 6.76 3.2 1.7 2.89 10.24 Total =22.5 = 0 =9.96 = 43.71 n= 15, = l.5 ( ) 1 2 12 − − = ∑= n xx s n i i )(µmol/24hr7114.0 14 96.92 ==s 2 ss = SolutionSolution 1 2 1 1 2 −       − = ∑ ∑ = = n n x x s n i in i i =0.843 µmol/(24hr) One can apply the following equation as an alternative
  • 50. Coefficient of VariationCoefficient of Variation One important application of the mean and the standardOne important application of the mean and the standard deviation is the coefficient of variation.deviation is the coefficient of variation. It is defined as theIt is defined as the ratio of the standard deviation to the value of the mean,ratio of the standard deviation to the value of the mean, expressed as a percentage.expressed as a percentage. cvcv = Coefficient of variation == Coefficient of variation = Since both standard deviation and the mean are expressed in sameSince both standard deviation and the mean are expressed in same units, thereforeunits, therefore cvcv is unitlessis unitless oror dimensionless.dimensionless. Therefore, it is possible toTherefore, it is possible to use it to compare the relative variation ofuse it to compare the relative variation of even unrelated quantities. It also useful in comparing the variabilityeven unrelated quantities. It also useful in comparing the variability among different variables that vary in magnitude of the valuesamong different variables that vary in magnitude of the values (elephant weight versus mouse weight)(elephant weight versus mouse weight) Standard deviation 100% x ×
  • 51. Suppose that each day laboratory technicianSuppose that each day laboratory technician AA completes 40 analysescompletes 40 analyses with a standard deviation of 5. Technicianwith a standard deviation of 5. Technician BB completes 160 analyses percompletes 160 analyses per day with a standard deviation of 15. Which employee shows lessday with a standard deviation of 15. Which employee shows less variability?variability? At first glance, it appears that technicianAt first glance, it appears that technician BB has three times more variationhas three times more variation in the output rate than technicianin the output rate than technician AA. But. But BB completes analyses at a rate 4completes analyses at a rate 4 times faster thantimes faster than AA. Taking all this information into account, we compute. Taking all this information into account, we compute the coefficient of variation for both technicians:the coefficient of variation for both technicians: For technicianFor technician AA:: cvcv=5/40 x 100% = 12.5%=5/40 x 100% = 12.5% For technicianFor technician BB:: cvcv=15/160 x 100% = 9.4%.=15/160 x 100% = 9.4%. So, we find that, technicianSo, we find that, technician BB who has more absolute variation in outputwho has more absolute variation in output than technicianthan technician AA, has less relative variation., has less relative variation. ExampleExample
  • 52. Means and standard deviations fromMeans and standard deviations from grouped datagrouped data More often than not, data are presented inMore often than not, data are presented in groupedgrouped form. That is, the data are in part summarized andform. That is, the data are in part summarized and grouped in a frequency table.grouped in a frequency table.
  • 53. Formulas for calculating the mean and theFormulas for calculating the mean and the standard deviation for grouped data:standard deviation for grouped data: where = mean of the data set,where = mean of the data set, ss = standard deviation of the data set= standard deviation of the data set xxii = midpoint of the ith class,= midpoint of the ith class, ffii = frequency of the ith class,= frequency of the ith class, kk = number of classes,= number of classes, nn = total number of observations in the data set.= total number of observations in the data set. 1 1 k i i i k i i f x x f = = = ∑ ∑ 2 12 1 1 k i ik i i i i f x f x n s n = =    ÷  − = − ∑ ∑ x
  • 54. ExampleExample Given below are the frequency distributions for the heights (in centimeters)Given below are the frequency distributions for the heights (in centimeters) of a sample of 100 student in the Islamic University, find the approximateof a sample of 100 student in the Islamic University, find the approximate value for the standard deviation for students.value for the standard deviation for students. Frequency of heights of a sample of 100 students in the Islamic UniversityFrequency of heights of a sample of 100 students in the Islamic University Class interval xi xi 2 fi fxi fxi 2 150-154 152 23,104 9 1,368 207,936 155-159 157 24,649 22 3,454 542,278 160-164 162 26,244 31 5,022 813,564 165-169 167 27,889 24 4,008 669,336 170-174 172 29,584 13 2,236 384,592 175-179 177 31,329 1 177 31,329 Total 100 16,265 2,649,035
  • 55. Class interval xi xi 2 fi fxi fxi 2 150-154 152 23,104 9 1,368 207,936 155-159 157 24,649 22 3,454 542,278 160-164 162 26,244 31 5,022 813,564 165-169 167 27,889 24 4,008 669,336 170-174 172 29,584 13 2,236 384,592 175-179 177 31,329 1 177 31,329 Total 100 16,265 2,649,035 ∑ ∑ = = = k i i k i ii f xf x 1 1 cm65.162 100 265,16 == 1 2 1 1 2 −       − = ∑ ∑ = = n n xf xf s k i iik i ii 99 25.502,645,2035,649,2 − = 68.35= =5.97 cm
  • 56. Note thatNote that there isthere is some difference betweensome difference between results fromresults from computations ungrouped and grouped data. The size of thecomputations ungrouped and grouped data. The size of the discrepancy depends ondiscrepancy depends on width of the class intervalwidth of the class interval and onand on thethe number of observations within an intervalnumber of observations within an interval.. With shortWith short class intervals and large samples, the discrepancy isclass intervals and large samples, the discrepancy is negligible.negligible.
  • 57. MEASURES OF POSITION:MEASURES OF POSITION: Percentiles, Deciles, and QuartilesPercentiles, Deciles, and Quartiles In cases where our data distribution are heavily skewed or evenIn cases where our data distribution are heavily skewed or even bimodal, we often get a better summary of the distribution bybimodal, we often get a better summary of the distribution by utilizing relative position of data rather than exact value.utilizing relative position of data rather than exact value. MeasuresMeasures ofof positionposition are used to describe the location of aare used to describe the location of a particular observation in relation to the rest of the data set.particular observation in relation to the rest of the data set. Recall that the median is an average computed by using relativeRecall that the median is an average computed by using relative position of the data. If we are told that 71 is the median score on aposition of the data. If we are told that 71 is the median score on a biology test, we know that after the data have been ordered, 50%biology test, we know that after the data have been ordered, 50% of the data fall at or below the median value of 71. The median isof the data fall at or below the median value of 71. The median is an example of aan example of a percentilepercentile; in fact, it is the 50th percentile. The; in fact, it is the 50th percentile. The general definition of the Pth percentile follows.general definition of the Pth percentile follows.
  • 58. PercentilesPercentiles PercentilesPercentiles areare valuesvalues that divide the ranked data setthat divide the ranked data set into 100 equal parts. These values, denoted byinto 100 equal parts. These values, denoted by P1P1,, P2P2,, ….,…., P99P99, are such that 1% of the data falls below, are such that 1% of the data falls below P1,P1, 2% falls below2% falls below P2P2, …., and 99% falls below, …., and 99% falls below P99P99.. 1% 1% 1% 1% 1% 1% 1% 1% Lowest 1st 2nd 3rd 4th 5th 98th 99th Highest 1% 1% 1% 1% 1% 1% 1% 1% Lowest 1st 2nd 3rd 4th 5th 98th 99th Highest
  • 59. DecilesDeciles DecilesDeciles areare valuesvalues that divide the ranked data set intothat divide the ranked data set into 10 equal parts. These values, denoted10 equal parts. These values, denoted D1D1,, D2D2, ….,, …., D9D9,, are such that 10% of the data falls beloware such that 10% of the data falls below D1D1, 20% falls, 20% falls belowbelow D2D2, ….., and 90% falls below, ….., and 90% falls below D9D9.. Lowest Highest 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 1st 2nd 3rd 4th 5th 6th 7th 8th 9thLowest Highest 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 1st 2nd 3rd 4th 5th 6th 7th 8th 9th
  • 60. QuartilesQuartiles QuartilesQuartiles areare valuesvalues that divide the ranked data setthat divide the ranked data set into 4 equal parts. These values are denoted byinto 4 equal parts. These values are denoted by Q1Q1,, Q2Q2, and, and Q3Q3 are such that 25% of the data falls beloware such that 25% of the data falls below Q1Q1, 50% falls below, 50% falls below Q2Q2, and 75% falls below, and 75% falls below Q3Q3.. 25% 25% 25% 25% Lowest Q1 Q2 Q3 Highest Median 50th percentile
  • 61. Percentiles, deciles and quartilesPercentiles, deciles and quartiles All the quartiles and deciles are percentiles. ForAll the quartiles and deciles are percentiles. For example, the 7th decile is the 70th percentile and theexample, the 7th decile is the 70th percentile and the 1st quartile is the 25th percentile. Consequently,1st quartile is the 25th percentile. Consequently, deciles and quartiles are often stated as percentilesdeciles and quartiles are often stated as percentiles.. 25% 25% 25% 25% Lowest Q1 Q2 Q3 Highest Median 50th percentile The 50th percentile, 5th decile, and 2nd quartile of a distribution are all the same and correspond to the median
  • 62. Step 2: Use the following formula to determine the percentile of the score, x: This percent is then rounded to the nearest whole numberThis percent is then rounded to the nearest whole number ((integer) to give the percentile for observation) to give the percentile for observation xx.. Finding the Percentile that Corresponds to aFinding the Percentile that Corresponds to a Data ValueData Value Step 1: Arrange the data in ascending order. Number of data values less than x Percentile of x = .100 Total number of values
  • 63. 3.0 5.0 6.2 7.6 9.4 3.3 5.2 6.3 7.6 9.5 3.5 5.5 6.4 7.7 9.5 3.5 5.5 6.6 7.8 10.0 3.6 5.5 6.6 7.8 10.5 4.0 5.8 6.8 8.5 10.8 4.0 5.8 6.8 8.5 10.9 4.2 5.9 6.8 8.8 11.0 4.6 6.0 7.0 8.8 11.0 The table contains the ranked aortic diameters measured in centimetersThe table contains the ranked aortic diameters measured in centimeters forfor 4545 patients. Notice that the data in the Table are already ranked.patients. Notice that the data in the Table are already ranked. RawRaw data need to be ranked prior to finding measures of positiondata need to be ranked prior to finding measures of position.. Example 1Example 1 The number of observationsThe number of observations less than 5.5less than 5.5 is 11 .is 11 . This percent rounds to 24. The diameterThis percent rounds to 24. The diameter 5.55.5 is the 24th percentile and we expressis the 24th percentile and we express this as P24 = 5.5this as P24 = 5.5.. Example 2Example 2 The number of observationsThe number of observations less than 10.0 is 39.less than 10.0 is 39. ThusThus 11 .100 24.4% 45 = 39 .100 86.7% 87 45 = ≈ we write P87 = 10.0 Finding the Percentile that Corresponds to aFinding the Percentile that Corresponds to a Data Value (5.5 and 10)Data Value (5.5 and 10)
  • 64. TheThe pth percentilepth percentile for a ranked data set consisting offor a ranked data set consisting of nn observationsobservations is found by a two-step procedure.is found by a two-step procedure. The first step is to compute index .The first step is to compute index . IfIf ii is not an integer,is not an integer, , round up to the next highest integer. Locate the ith value of the data set written in ascending order. This number represents the pth percentile. IfIf ii is an integer, the pth percentile is the average of theis an integer, the pth percentile is the average of the observations in positionsobservations in positions ii andand ii + 1+ 1 in the ranked data set.in the ranked data set. ( )( ) 100 p n i = Computing the pth Percentile The pth percentile of a data set is a value such that p percent of the observations less than this value and (100 - p) percent of the observations are more than this value .
  • 65. To find theTo find the tenth percentiletenth percentile for the data of thefor the data of the Table,Table, compute = 4.5.compute = 4.5. The next integer greater than 4.5 isThe next integer greater than 4.5 is 5.5. TheThe observation inobservation in the fifth positionthe fifth position in the Tablein the Table is 3.6.is 3.6. Therefore,Therefore, P10 = 3.6P10 = 3.6.. Note that at least 10% of the data in the TableNote that at least 10% of the data in the Table are 3.6 or less (the actual amount is 11.1%)are 3.6 or less (the actual amount is 11.1%) and at least 90% of the data are 3.6 or moreand at least 90% of the data are 3.6 or more (the actual amount is 91.1%).(the actual amount is 91.1%). For very large data sets, the percentage ofFor very large data sets, the percentage of observations equal to or less than P10 willobservations equal to or less than P10 will be very close to 10% and the percentage ofbe very close to 10% and the percentage of observations equal to or greater than P10observations equal to or greater than P10 will be very close to 90%.will be very close to 90%. 3.0 5.0 6.2 7.6 9.4 3.3 5.2 6.3 7.6 9.5 3.5 5.5 6.4 7.7 9.5 3.5 5.5 6.6 7.8 10.0 3.6 5.5 6.6 7.8 10.5 4.0 5.8 6.8 8.5 10.8 4.0 5.8 6.8 8.5 10.9 4.2 5.9 6.8 8.8 11.0 4.6 6.0 7.0 8.8 11.0 (10)(45) 100 i = EXAMPLEEXAMPLE
  • 66. EXAMPLEEXAMPLE To find the fortieth percentile for the data inTo find the fortieth percentile for the data in the Table,the Table, compute = 18.compute = 18. The fortieth percentile is the average of theThe fortieth percentile is the average of the observations in the 18th and 19thobservations in the 18th and 19th positions in the ranked data set.positions in the ranked data set. The observation in the 18th position is 6.0The observation in the 18th position is 6.0 and the observation in the 19th positionand the observation in the 19th position is 6.2.is 6.2. Therefore P40 = 6.1.Therefore P40 = 6.1. Note that 40% of the data in the Table areNote that 40% of the data in the Table are 6.1 or less and that 60% of the6.1 or less and that 60% of the observations are 6.1 or more.observations are 6.1 or more. 3.0 5.0 6.2 7.6 9.4 3.3 5.2 6.3 7.6 9.5 3.5 5.5 6.4 7.7 9.5 3.5 5.5 6.6 7.8 10.0 3.6 5.5 6.6 7.8 10.5 4.0 5.8 6.8 8.5 10.8 4.0 5.8 6.8 8.5 10.9 4.2 5.9 6.8 8.8 11.0 4.6 6.0 7.0 8.8 11.0 (40)(45) 100 i = (6.0)(6.2) 2 =
  • 67. Procedure to compute quartilesProcedure to compute quartiles – Order the data from smallest to largest.Order the data from smallest to largest. – Find the median. This is the second quartile.Find the median. This is the second quartile. – The first quartileThe first quartile Q1Q1 is then the median of the loweris then the median of the lower half of the data; that is, it is the median of the datahalf of the data; that is, it is the median of the data falling below thefalling below the Q2Q2 position (position (and not includingand not including Q2Q2).). – The third quartileThe third quartile Q3Q3 is the median of the upper halfis the median of the upper half of the data; that is, it is the median of the data fallingof the data; that is, it is the median of the data falling above theabove the Q3Q3 position (position (and not includingand not including Q2Q2).). 2, 3, 4, 5, 6, 7, 82, 3, 4, 5, 6, 7, 8 median (Q2) Lower half Lower Quartile (Q1) Upper half Upper Quartile (Q3) 6, 8, 2, 7, 4, 5, 36, 8, 2, 7, 4, 5, 3
  • 68. Example 2 ….Even numberExample 2 ….Even number Find the median, and upper and lower quartiles of thisFind the median, and upper and lower quartiles of this set: 22, 19, 27, 32, 38, 25, 32, 26set: 22, 19, 27, 32, 38, 25, 32, 26 First step, order the data:First step, order the data: 19, 22, 25, 26, 27, 32, 32, 3819, 22, 25, 26, 27, 32, 32, 38 So, there are eight numbers, the median is the averageSo, there are eight numbers, the median is the average of the fourth and fifth numbers.of the fourth and fifth numbers. Median = (26+27)/2 = 26.5Median = (26+27)/2 = 26.5 Lower Quartile = (22+25)/2 = 23.5Lower Quartile = (22+25)/2 = 23.5 Upper Quartile = (32+32)/2 = 32Upper Quartile = (32+32)/2 = 32 The lower quartile is the median of the first four numbers,The lower quartile is the median of the first four numbers, and the upper quartile is the median of the last four numbers.and the upper quartile is the median of the last four numbers.
  • 69. Interquartile Range (Interquartile Range (IQRIQR)) The interquartile range tells us the spread of theThe interquartile range tells us the spread of the middle half of the data.middle half of the data. Interquartile range =Upper Quartile - Lower QuartileInterquartile range =Upper Quartile - Lower Quartile Or,Or, IQRIQR == Q3Q3 –– Q1Q1 25% 25% 25% 25% Q1 Q2 Q3
  • 70. OutliersOutliers AnAn outlieroutlier is a number that is so far above the data set or below most ofis a number that is so far above the data set or below most of the data set as to bethe data set as to be considered abnormal and therefore of questionableconsidered abnormal and therefore of questionable accuracyaccuracy.. Outliers may be fromOutliers may be from data collection errors,data collection errors, data entry errors,data entry errors, or simply valid but unusual data values.or simply valid but unusual data values. Regardless of the reason, it is important to identify the outliers in the dataRegardless of the reason, it is important to identify the outliers in the data set and examine outliers carefully to determine if they are an error.set and examine outliers carefully to determine if they are an error. An outlier isAn outlier is defineddefined to be any data point that is 1.5to be any data point that is 1.5 IQRIQRss below thebelow the lower quartile or above the upper quartile.lower quartile or above the upper quartile.
  • 71. OutliersOutliers ExampleExample 28, 55, 57, 58, 61, 61, 63, 65, 8328, 55, 57, 58, 61, 61, 63, 65, 83 UQUQ = (65+63)/2 = 64= (65+63)/2 = 64 LQLQ = (55+57)/2 = 56= (55+57)/2 = 56 IQRIQR = 64 – 56 = 8= 64 – 56 = 8 So any numberSo any number below LQ – 1.5(below LQ – 1.5(IQRIQR)) = 56 – 1.5(8) = 44= 56 – 1.5(8) = 44 or any numberor any number aboveabove UQUQ + 1.5(+ 1.5(IQRIQR)) = 64 + 1.5(8) = 78= 64 + 1.5(8) = 78 is an outlier.is an outlier. Therefore the outliers of this data set are 28 & 83.Therefore the outliers of this data set are 28 & 83.
  • 72. Box-and –Whisker PlotsBox-and –Whisker Plots The quartiles together with the low and high data valuesThe quartiles together with the low and high data values give us a very useful five number summary of the datagive us a very useful five number summary of the data and their spread. These Five-number summary include;and their spread. These Five-number summary include; Lowest value,Lowest value, Q1,Q1, median,median, Q3Q3, and highest value., and highest value. These five numbers can be used to create sketch of theThese five numbers can be used to create sketch of the data called adata called a box-and-Whisker plotbox-and-Whisker plot.. Box-and-WhiskerBox-and-Whisker plots provide another useful technique for describingplots provide another useful technique for describing data.data. (lowest value) (highest value)(median) Q1 Q2 Q3
  • 73. Lowest value Highest value Median Q1 Q3 1. Draw a vertical scale to include the lowest and highest data values. 2. To the right of the scale draw a box from Q1 to Q3. 3. Include a solid line through the box at the median level. 4. Draw solid lines, called whiskers, from Q1 to the lowest value and from Q3 to the highest value. 5. Any outliers are marked with an asterisk (*). To make Box-and-Whisker plotTo make Box-and-Whisker plot **
  • 74. 60 - 55 - 50 - 45 - 40 - 35 - 30 - 25 - 20 - 15 - 10 - median = 24 lower quartile = 17 upper quartile = 33 minimum value = 12 maximum value = 51 12 15 16 16 17 18 22 22 23 24 25 30 32 33 33 34 41 45 51 Construct a Box-and-Whisker Plot:
  • 75. 1. Symmetric If the median is near the center of the box and each of the horizontal lines are approximately equal length, then the distribution is roughly symmetric. Distribution Shape Based Upon Boxplot
  • 76. 2. Skewed Right If the median is left of the center of the box and/or the right line is substantially longer than the left line, the distribution is right skewed.
  • 77. 3. Skewed Left If the median is right of the center of the box and/or the left line is substantially longer than the right line, the distribution is left skewed

Editor's Notes

  • #32: First, consider the scale of measurement Nominal: can only use mode, this pretty self explanatory, e.g. can’t find center point of psych and bus majors Ordinal: median or mode Median more informative, typically, because it provides the middle of your data. consider an IV with 4 levels A, B, C, or D. If the median is located in A, then that shows that most people did really well on the quiz b/cuz the same no. of peps made an A that made a B, C, or D. Interval or ratio: can use them all, but mean is most informative if you have a normal or nonskewed distribution.