SlideShare a Scribd company logo
Basic Biostatistics
Dr. Ei Ei Zar Nyi
Assistant Director
Central Epidemiology Unit
Statistics
• Statistics is a field of study concerned with
(1) collection, organization, summarization and analysis of data
and
(2) the drawing of inferences about a body of data when only a
part of the data is observed.
Biostatistics (Biomedical statistics)
• When the data analyzed are derived from the biological
sciences and medicine, we use the term biostatistics.
Synonym = Medical statistics
Uses
Biostatistics is necessary for
• To measure the status of health and disease in a community.
• Provide the basic not only to monitor the health status of the
community but also for the scientific advancement of
medicine.
• For the collection, analysis, and interpretation of scientific data
gathered from clinical, laboratory or field investigation.
• Clear thinking and sound understanding of statistical methods
is fundamental for the research project.
Descriptive Statistics
Descriptive Statistics
• Descriptive Statistics are Used by Researchers to Report on
Populations and Samples.
• Descriptive Statistics are a means of organizing and
summarizing observations, which provide us with an overview
of the general features of the set of data.
• Raw data:
Measurements which have not been organized, summarized, or
otherwise manipulated
• Descriptive measures:
Single numbers calculated from organized and summarized data to
describe these data. eg. Percentage, average
Sample vs. Population
Population Sample
Data Parameter statistic
Sample size N n
Mean μ x
Variance 𝜎2 𝑠2
SD 𝜎 s
Descriptive Statistics
Class A--IQs of 13 Students
102 115
128 109
131 89
98 106
140 119
93 97
110
Class B--IQs of 13 Students
127 162
131 103
96 111
80 109
93 87
120 105
109
An Illustration:
Which Group is Smarter?
Each individual may be different. If you try to understand a group by remembering the qualities
of each member, you become overwhelmed and fail to understand the group.
Descriptive Statistics
Which group is smarter now?
Class A--Average IQ Class B--Average IQ
110.54 110.23
They’re roughly the same!
With a summary descriptive statistic, it is much easier to answer
our question.
Descriptive Statistics
Types of descriptive statistics:
• Organize Data
– Tables
– Graphs
• Summarize Data
– Central Tendency
– Variation
Descriptive Statistics
Types of descriptive statistics: (Data Presentation)
• Organizing Data
– Tables
• Simple table
• Frequency Distribution table
• Contingency table
• Correlation table
– Graphs
• Bar Chart
• Pie chart
• Histogram
• Frequency Polygon
• Line diagram
• Stem and Leaf Plot
• Box Plots
Simple Table
State Population
State A 5,000
State B 70,000
State C 30,000
State D 150,000
Table (1) Population of some states in country X
Source: Census of country X, 2000
Frequency Distribution table
Table (2) Age distribution of study population
Age group Frequency Percentage
0-4 years 15 30
5-9 years 20 40
10-14 years 5 10
≥15 years 10 20
Total 50 100
Contingency table
Table (3) Association between Sex and smoking status among study population
Sex Smoking + Smoking - Total
Male 80 6 86
Female 70 4 74
Total 150 10 160
Age Weight
1 month
2 months
3 months
6 lbs
10 lbs
14 lbs
Correlation table
Bar chart
• Consist a set of vertical or horizontal bars
• Same width
• Height of each bar represent the frequency of each specific category
• Equal space between bars
• Purpose of the use of bar chat is to compare the categories of the same
variable
Simple vertical bar chart Simple horizontal bar chart
0
2
4
6
8
10
12
14
16
18
illitrate primary middle high graduate
0 5 10 15 20
illitrate
primary
middle
high
graduate
Bar chart
Multiple bar chart Component bar chart
0
2
4
6
8
10
12
14
16
illitrate primary middle high graduate
urban
rural
0 5 10 15 20
illitrate
primary
middle
high
graduate
urban
rural
Pie chart
illitrate
primary
middle
high
graduate
• A circle containing 360 degrees
• Pie chart is the best adapted for illustrating the problem of hoe, the whole is
sub-divided into segments
• Segments can be colored or shaded differently for greater clarity
Histogram
• A special type of bar graph showing frequency distribution
• It consists of a set of columns with no space between each of them
• The area under each column represents the frequency of each class
• If the data have been grouped into unevenly spaced intervals, a histogram is
the most suitable kind of diagram
Frequency Polygon
• A special kind of line graph connecting midpoints at the tops of bars or
cells of histogram
• Total area under the frequency polygon is equal to that of histogram
Line Diagram
0
10
20
30
40
50
60
70
80
90
Jan Feb March April May June July August Sept Oct Nov Dec
DHF incidence during 2007, in X hospital
• Most commonly used for showing changes of values with the passage of
time
Stem and leaf plot
• It resembles with the histogram and has the same purpose (range of data
set, location of highest concentration of measurements, presence or absence
of symmetry)
Box plot
Descriptive Statistics
• Summarizing Data
– Central Tendency (or Groups’ “Middle Values”)
• Mean
• Median
• Mode
– Variation (or Summary of Differences Within Groups)
• Range
• Standard Deviation
• Variance
• Coefficient of variation
Measures of Central Tendency
• Statistic : A descriptive measure computed from the data of a
sample
• Parameter : A descriptive measure computed from the data of
a population
• Most commonly used measures of central tendency:
Mean, Median, Mode
Mean
• also called 'Average'
• obtained by adding all the values in a population or a sample
• dividing by the number of values that are added
Formula of the mean : For a finite population: μ = ∑ xi / N
: For a sample : x = ∑ xi / n
Eg. Mean age (year) of the following 9 subjects
56, 54, 61, 60, 54, 44, 49, 50, 63
x = ∑ xi / n
= 56+54+61+60+54+44+49+50+63 / 9
= 54.55 year
Properties of the mean
• Uniqueness
• Simplicity
• Being influenced by extreme values
Exercises Mean
Class A--IQs of 13 Students
102 115
128 109
131 89
98 106
140 119
93 97
110
Class B--IQs of 13 Students
127 162
131 103
96 111
80 109
93 87
120 105
109
Σ x = 1437 Σ x = 1433
x bar= Σ x = 1437 = 110.54 x bar = Σx = 1433 = 110.23
n 13 n 13
Mean
1. Means can be badly affected by outliers (data points with
extreme values unlike the rest)
2. Outliers can make the mean a bad measure of central
tendency or common experience
All of Us
Bill Gates
Mean Outlier
Income in the U.S.
Median
• The middle value of the data set which is arrayed from the
lowest to the highest.
• 50% < Median > 50%
• For the series of odd numbers, median is the middle value.
• For even numbers, median is the average of two middle
values.
Formula: ( n + 1) / 2 th value
Properties of Median
• Uniqueness
• Simplicity
• Median can avoid the effect of skewed distribution
eg. Median age (year) of the following 9 subjects
56, 54, 61, 60, 54, 44, 49, 50, 63
Ordered array → 44, 49, 50, 54, 54, 56, 60, 61, 63
( n + 1) / 2th value → (9 + 1) /2 = 10/ 2 = 5th value
5th value is 54, so median is 54
Median
Median = 109
(six cases above, six below)
Class A--IQs of 13 Students
89
93
97
98
102
106
109
110
115
119
128
131
140
Median
Median = 109.5
109 + 110 = 219/2 = 109.5
(six cases above, six below)
If the first student were to drop out of Class A, there
would be a new median:
89
93
97
98
102
106
109
110
115
119
128
131
140
Median
1. The median is unaffected by outliers, making it a better
measure of central tendency, better describing the “typical
person” than the mean when data are skewed.
All of Us Bill Gates
outlier
Median
2. If the recorded values for a variable form a symmetric
distribution, the median and mean are identical.
3. In skewed data, the mean lies further toward the skew than
the median.
Mean
Median
Mean
Median
Symmetric Skewed
Mode
• Value most frequently occurring in a set of data
• More than one mode present
• Can be used for the categorical data.
eg. Modal age (year) of the following 9 subjects
56, 54, 61, 60, 54, 44, 49, 50, 63
54 is modal age
Mode
1. It may give you the most likely experience rather than the
“typical” or “central” experience.
2. In symmetric distributions, the mean, median, and mode are
the same.
3. In skewed data, the mean and median lie further toward the
skew than the mode.
Median
Mean
Median MeanMode Mode
Symmetric Skewed (Rt)
Measures of dispersion
• Dispersion: synonyms → variation, spread, scatter
• Range: The difference between the largest and smallest value in a
set of data and poor measure of dispersion.
R = xL - xS
eg. The range of ages (year) of the following 9 subjects
56, 54, 61, 60, 54, 44, 49, 50, 63
R = xL - xS = 63 – 44 = 19
Range
• The spread, or the distance, between the lowest and highest
values of a variable.
• To get the range for a variable, you subtract its lowest value
from its highest value.
Class A--IQs of 13 Students
102 115
128 109
131 89
98 106
140 119
93 97
110
Class A Range = 140 - 89 = 51
Class B--IQs of 13 Students
127 162
131 103
96 111
80 109
93 87
120 105
109
Class B Range = 162 - 80 = 82
Standard Deviation
• Standard deviation: s for sample
: σ for population
• It measures how each observation in the data set differs from
the mean
• The square root of the variance reveals the average deviation
of the observations from the mean.
s.d. = √ variance
Standard Deviation
1. The larger s.d. the greater amounts of variation around the
mean.
For example:
2. s.d. = 0 only when all values are the same (only when you
have a constant and not a “variable”)
3. Like the mean, the s.d. will be inflated by an outlier case
value.
Variance
• An average measure of squared deviation of observations from the mean.
• The larger the variance, the further the individual cases are from the
mean.
• The smaller the variance, the closer the individual scores are to the mean.
Mean
Mean
Variance
• Variance is a number that at first seems complex to calculate.
• Calculating variance starts with a “deviation.”
• A deviation is the distance away from the mean of a case’s
score.
variance (𝑠2) =
Ʃ (x – x )
2
n−1
The coefficient of variation
• To compare the dispersion in two sets of data.
• Express the standard deviation as a percentage of mean.
• Useful in comparing the relative variability of different kinds of
characteristics or with different unit.
CV =
𝑠
𝑥
* 100 = ( ) %
Descriptive Statistics
Summarizing Data:
 Central Tendency (or Groups’ “Middle Values”)
• Mean
• Median
• Mode
 Variation (or Summary of Differences Within Groups)
• Range
• Standard Deviation
• Variance
• Coefficient of variation
– …Wait! There’s more
Normal Distribution
• Symmetrical distribution of data
• Normal curve or Gaussian distribution
• The shape of curve depends on mean and SD
Properties of normal distribution
• Symmetrical, belled shape
• Usually not touch to the base line
• Mean, Median, Mode are the same
• Area under the curve, ± 1 SD = 68.26%
± 2 SD = 95.46%
± 3 SD = 99.74%
Skew Distribution
• If a graph (histogram or frequency polygon) of distribution is asymmetric, the
distribution is said to be skewed.
• Right or positively skewed : if the graph extends further to the right, long tail to the
right.
• Left or negatively skewed : if the graph extends further to the left, long tail to the
left.
Skewed (Rt)Skewed (Lt)
Kurtosis
• Is the measure of the degree to which a distribution is peaked or flat in
comparison to normal distribution whose graph is characterized by bell-
shaped appearance.
• Mesokurtic : Kurtosis measure = 0
• Leptokurtic : Kurtosis measure > 0
• Platykurtic : Kurtosis measure < 0
Curve Name
8/29/2018 48
Mesokurtic (Normal)
Leptokurtic
Platykurtic
Descriptive Statistics
• Now you are qualified use descriptive statistics!
• Questions?
Thank you!

More Related Content

What's hot (20)

PPTX
Biostatistics ppt
santhoshikayithi
 
PPTX
biostatistics
Mehul Shinde
 
PDF
Summarizing data
Dr Lipilekha Patnaik
 
PPTX
Biostatistics Measures of central tendency
HARINATHA REDDY ASWARTHAGARI
 
PPTX
Normal distribution
SonamWadhwa3
 
PPT
Bioststistic mbbs-1 f30may
Rawalpindi Medical College
 
PPT
Mean, median, and mode
guest455435
 
PPTX
Measures of central tendency
Jagdish Powar
 
PPT
Ch4 Confidence Interval
Farhan Alfin
 
PPTX
Chi square test
Jagdish Powar
 
PPTX
Mean-median-mode
Pawan Mishra
 
DOC
Mcq 1
Rushina Singhi
 
PPTX
Histogram
MahrukhShehzadi1
 
PPTX
Descriptive statistics
Abdelrahman Alkilani
 
PPTX
Normal distribution
Jagdish Powar
 
PPTX
Range, quartiles, and interquartile range
swarna sudha
 
PPTX
Central tendency and Variation or Dispersion
Johny Kutty Joseph
 
PPT
Hypothesis Testing
Southern Range, Berhampur, Odisha
 
PPTX
Normal Curve
Ace Matilac
 
Biostatistics ppt
santhoshikayithi
 
biostatistics
Mehul Shinde
 
Summarizing data
Dr Lipilekha Patnaik
 
Biostatistics Measures of central tendency
HARINATHA REDDY ASWARTHAGARI
 
Normal distribution
SonamWadhwa3
 
Bioststistic mbbs-1 f30may
Rawalpindi Medical College
 
Mean, median, and mode
guest455435
 
Measures of central tendency
Jagdish Powar
 
Ch4 Confidence Interval
Farhan Alfin
 
Chi square test
Jagdish Powar
 
Mean-median-mode
Pawan Mishra
 
Histogram
MahrukhShehzadi1
 
Descriptive statistics
Abdelrahman Alkilani
 
Normal distribution
Jagdish Powar
 
Range, quartiles, and interquartile range
swarna sudha
 
Central tendency and Variation or Dispersion
Johny Kutty Joseph
 
Normal Curve
Ace Matilac
 

Similar to Basic biostatistics dr.eezn (20)

PPTX
STATISTICS.pptx for the scholars and students
ssuseref12b21
 
PDF
Lessontwo - Measures of Tendency.pptx.pdf
hebaelkouly
 
PDF
Lesson2 - chapter 2 Measures of Tendency.pptx.pdf
hebaelkouly
 
PDF
Lesson2 - chapter two Measures of Tendency.pptx.pdf
hebaelkouly
 
PPT
Biostatistics basics-biostatistics4734
AbhishekDas15
 
PPT
Biostatistics basics-biostatistics4734
AbhishekDas15
 
PPT
Class1.ppt Class StructureBasics of Statistics
deepanoel
 
PPT
data_management_review_descriptive_statistics.ppt
RestyLlagas1
 
PPT
presentation
Pwalmiki
 
PPT
Student’s presentation
Pwalmiki
 
PPT
Statistics
Deepanshu Sharma
 
PPTX
Descriptive
Mmedsc Hahm
 
PPT
Basics of statistics by Arup Nama Das
Arup8
 
PPTX
Session 3&4.pptx
Ankitvispute Ankitvispute
 
PPT
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
nagamani651296
 
PPT
Class1.ppt
Gautam G
 
PPT
Introduction to Statistics - Basics of Data - Class 1
RajnishSingh367990
 
PPT
Introduction to statistics covering the basics
OptiAgileBusinessSer
 
PPT
Class1.ppt
PerumalPitchandi
 
PPT
Class1.ppt
Sandeepkumar628916
 
STATISTICS.pptx for the scholars and students
ssuseref12b21
 
Lessontwo - Measures of Tendency.pptx.pdf
hebaelkouly
 
Lesson2 - chapter 2 Measures of Tendency.pptx.pdf
hebaelkouly
 
Lesson2 - chapter two Measures of Tendency.pptx.pdf
hebaelkouly
 
Biostatistics basics-biostatistics4734
AbhishekDas15
 
Biostatistics basics-biostatistics4734
AbhishekDas15
 
Class1.ppt Class StructureBasics of Statistics
deepanoel
 
data_management_review_descriptive_statistics.ppt
RestyLlagas1
 
presentation
Pwalmiki
 
Student’s presentation
Pwalmiki
 
Statistics
Deepanshu Sharma
 
Descriptive
Mmedsc Hahm
 
Basics of statistics by Arup Nama Das
Arup8
 
Session 3&4.pptx
Ankitvispute Ankitvispute
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
nagamani651296
 
Class1.ppt
Gautam G
 
Introduction to Statistics - Basics of Data - Class 1
RajnishSingh367990
 
Introduction to statistics covering the basics
OptiAgileBusinessSer
 
Class1.ppt
PerumalPitchandi
 
Class1.ppt
Sandeepkumar628916
 
Ad

More from EhealthMoHS (20)

PPTX
Nccpe report 2017
EhealthMoHS
 
PPT
Myanmar fetp intermediate implementation plan
EhealthMoHS
 
PPTX
Myanmar cdc (14 12-17)
EhealthMoHS
 
PPTX
Mva
EhealthMoHS
 
PPT
Module 6 emergency response
EhealthMoHS
 
PPT
Module 5 shipping &amp; transportation of inf materials 21 1-2018
EhealthMoHS
 
PPT
Module 4 primary contaiment and other hazard
EhealthMoHS
 
PPT
Module 3 biosafety principles &amp; microbiologycal risk group 21 1-18
EhealthMoHS
 
PPT
Meningococcal meningitis (dr.yla)
EhealthMoHS
 
PPTX
Medical store management for smo
EhealthMoHS
 
PPTX
Measurement of disease frequency
EhealthMoHS
 
PPTX
Measles verification case definitions
EhealthMoHS
 
PPT
Measles lecture 7.11.17
EhealthMoHS
 
PPTX
Measles dr. al
EhealthMoHS
 
PPTX
Maintain polio free status
EhealthMoHS
 
PPTX
Lec 16 management of high risk patients for mohs
EhealthMoHS
 
PPTX
Lec 14 basic ecg interpretation for mohs
EhealthMoHS
 
PPTX
Lec 13 demonstration of advancecd life support for mohs
EhealthMoHS
 
PPTX
Lec 12 management of rheumatic fever rheumatic heart disease for mohs
EhealthMoHS
 
PPTX
Lec 11 perioperative assessment for diabetes for mohs
EhealthMoHS
 
Nccpe report 2017
EhealthMoHS
 
Myanmar fetp intermediate implementation plan
EhealthMoHS
 
Myanmar cdc (14 12-17)
EhealthMoHS
 
Module 6 emergency response
EhealthMoHS
 
Module 5 shipping &amp; transportation of inf materials 21 1-2018
EhealthMoHS
 
Module 4 primary contaiment and other hazard
EhealthMoHS
 
Module 3 biosafety principles &amp; microbiologycal risk group 21 1-18
EhealthMoHS
 
Meningococcal meningitis (dr.yla)
EhealthMoHS
 
Medical store management for smo
EhealthMoHS
 
Measurement of disease frequency
EhealthMoHS
 
Measles verification case definitions
EhealthMoHS
 
Measles lecture 7.11.17
EhealthMoHS
 
Measles dr. al
EhealthMoHS
 
Maintain polio free status
EhealthMoHS
 
Lec 16 management of high risk patients for mohs
EhealthMoHS
 
Lec 14 basic ecg interpretation for mohs
EhealthMoHS
 
Lec 13 demonstration of advancecd life support for mohs
EhealthMoHS
 
Lec 12 management of rheumatic fever rheumatic heart disease for mohs
EhealthMoHS
 
Lec 11 perioperative assessment for diabetes for mohs
EhealthMoHS
 
Ad

Recently uploaded (20)

PPTX
Parenteral Routes of Drug Administration: IM, IV, ID, and SC Injections.pptx
SurajDudhade
 
PPTX
pptejkfhdwjfbwdkfhbdsnfbsdmfbsdfsdbhmfsdfnmsdnf
AbdulRehman385433
 
PDF
Get Fastest Relocation via Air Ambulance from Delhi and Chennai by Panchmukhi...
Panchmukhi Air& Train Ambulance Services
 
PDF
Strengthening Nursing Education Capcity Related to Climate Change Adaptation.pdf
pedrofamorca
 
PDF
Fillip Kosorukov - Served As A Research Assistant
Fillip Kosorukov
 
PPTX
Diabetic keto acidosis and some recommendations .pptx
KubamBranndone
 
PPTX
Hygiene: Importance, Types, and Role in Nursing Care.pptx
SurajDudhade
 
PPTX
Cardio Pulmonary Resuscitation - CPR Demonstration
bhagatv89
 
PPT
Nitration is a key chemical reaction where a nitro group (-NO₂) is introduced...
PravinKhatale2
 
PPTX
Basics of MRI Physics -Dr Sumit Sharma.pptx
Sumit Sharma, MD
 
PPTX
Acquired disease in kyphosis-00014769.pptx
SMShazidurRahman
 
PDF
NURS FPX 4015 Assessment 4_ Caring for Special Populations Teaching.pdf
experttutors1
 
PPTX
CAP-IDSA-2019 MANAGEMENT GUIDELINES.pptx
Gajendra Shekhawat
 
PPTX
Visual Inspection with Ascetic Acid Cervicography(VIAC).pptx
IrvineMusara
 
PDF
Expert Radiology Billing Services to Maximize Reimbursements
Key Medsolutions Inc
 
PPTX
Care of Terminally Ill Patient: Physical, Emotional, and Spiritual Support.pptx
SurajDudhade
 
PDF
7 sins endodontics lecture quoted from binyakzan concepts
Islam Kassem
 
PPTX
Introduction to Healthcare: The Importance of Infection Control
Earlene McNair
 
PPTX
INFANTILE CHOLESTASIS DIAGNOSIS AND MANAGEMENT pptx.pptx
RanjeshSingh
 
PDF
SMAM 2025: Folder de ação da WABA acaba de ser lançado
Prof. Marcus Renato de Carvalho
 
Parenteral Routes of Drug Administration: IM, IV, ID, and SC Injections.pptx
SurajDudhade
 
pptejkfhdwjfbwdkfhbdsnfbsdmfbsdfsdbhmfsdfnmsdnf
AbdulRehman385433
 
Get Fastest Relocation via Air Ambulance from Delhi and Chennai by Panchmukhi...
Panchmukhi Air& Train Ambulance Services
 
Strengthening Nursing Education Capcity Related to Climate Change Adaptation.pdf
pedrofamorca
 
Fillip Kosorukov - Served As A Research Assistant
Fillip Kosorukov
 
Diabetic keto acidosis and some recommendations .pptx
KubamBranndone
 
Hygiene: Importance, Types, and Role in Nursing Care.pptx
SurajDudhade
 
Cardio Pulmonary Resuscitation - CPR Demonstration
bhagatv89
 
Nitration is a key chemical reaction where a nitro group (-NO₂) is introduced...
PravinKhatale2
 
Basics of MRI Physics -Dr Sumit Sharma.pptx
Sumit Sharma, MD
 
Acquired disease in kyphosis-00014769.pptx
SMShazidurRahman
 
NURS FPX 4015 Assessment 4_ Caring for Special Populations Teaching.pdf
experttutors1
 
CAP-IDSA-2019 MANAGEMENT GUIDELINES.pptx
Gajendra Shekhawat
 
Visual Inspection with Ascetic Acid Cervicography(VIAC).pptx
IrvineMusara
 
Expert Radiology Billing Services to Maximize Reimbursements
Key Medsolutions Inc
 
Care of Terminally Ill Patient: Physical, Emotional, and Spiritual Support.pptx
SurajDudhade
 
7 sins endodontics lecture quoted from binyakzan concepts
Islam Kassem
 
Introduction to Healthcare: The Importance of Infection Control
Earlene McNair
 
INFANTILE CHOLESTASIS DIAGNOSIS AND MANAGEMENT pptx.pptx
RanjeshSingh
 
SMAM 2025: Folder de ação da WABA acaba de ser lançado
Prof. Marcus Renato de Carvalho
 

Basic biostatistics dr.eezn

  • 1. Basic Biostatistics Dr. Ei Ei Zar Nyi Assistant Director Central Epidemiology Unit
  • 2. Statistics • Statistics is a field of study concerned with (1) collection, organization, summarization and analysis of data and (2) the drawing of inferences about a body of data when only a part of the data is observed. Biostatistics (Biomedical statistics) • When the data analyzed are derived from the biological sciences and medicine, we use the term biostatistics. Synonym = Medical statistics
  • 3. Uses Biostatistics is necessary for • To measure the status of health and disease in a community. • Provide the basic not only to monitor the health status of the community but also for the scientific advancement of medicine. • For the collection, analysis, and interpretation of scientific data gathered from clinical, laboratory or field investigation. • Clear thinking and sound understanding of statistical methods is fundamental for the research project.
  • 5. Descriptive Statistics • Descriptive Statistics are Used by Researchers to Report on Populations and Samples. • Descriptive Statistics are a means of organizing and summarizing observations, which provide us with an overview of the general features of the set of data.
  • 6. • Raw data: Measurements which have not been organized, summarized, or otherwise manipulated • Descriptive measures: Single numbers calculated from organized and summarized data to describe these data. eg. Percentage, average
  • 7. Sample vs. Population Population Sample Data Parameter statistic Sample size N n Mean μ x Variance 𝜎2 𝑠2 SD 𝜎 s
  • 8. Descriptive Statistics Class A--IQs of 13 Students 102 115 128 109 131 89 98 106 140 119 93 97 110 Class B--IQs of 13 Students 127 162 131 103 96 111 80 109 93 87 120 105 109 An Illustration: Which Group is Smarter? Each individual may be different. If you try to understand a group by remembering the qualities of each member, you become overwhelmed and fail to understand the group.
  • 9. Descriptive Statistics Which group is smarter now? Class A--Average IQ Class B--Average IQ 110.54 110.23 They’re roughly the same! With a summary descriptive statistic, it is much easier to answer our question.
  • 10. Descriptive Statistics Types of descriptive statistics: • Organize Data – Tables – Graphs • Summarize Data – Central Tendency – Variation
  • 11. Descriptive Statistics Types of descriptive statistics: (Data Presentation) • Organizing Data – Tables • Simple table • Frequency Distribution table • Contingency table • Correlation table – Graphs • Bar Chart • Pie chart • Histogram • Frequency Polygon • Line diagram • Stem and Leaf Plot • Box Plots
  • 12. Simple Table State Population State A 5,000 State B 70,000 State C 30,000 State D 150,000 Table (1) Population of some states in country X Source: Census of country X, 2000
  • 13. Frequency Distribution table Table (2) Age distribution of study population Age group Frequency Percentage 0-4 years 15 30 5-9 years 20 40 10-14 years 5 10 ≥15 years 10 20 Total 50 100
  • 14. Contingency table Table (3) Association between Sex and smoking status among study population Sex Smoking + Smoking - Total Male 80 6 86 Female 70 4 74 Total 150 10 160
  • 15. Age Weight 1 month 2 months 3 months 6 lbs 10 lbs 14 lbs Correlation table
  • 16. Bar chart • Consist a set of vertical or horizontal bars • Same width • Height of each bar represent the frequency of each specific category • Equal space between bars • Purpose of the use of bar chat is to compare the categories of the same variable Simple vertical bar chart Simple horizontal bar chart 0 2 4 6 8 10 12 14 16 18 illitrate primary middle high graduate 0 5 10 15 20 illitrate primary middle high graduate
  • 17. Bar chart Multiple bar chart Component bar chart 0 2 4 6 8 10 12 14 16 illitrate primary middle high graduate urban rural 0 5 10 15 20 illitrate primary middle high graduate urban rural
  • 18. Pie chart illitrate primary middle high graduate • A circle containing 360 degrees • Pie chart is the best adapted for illustrating the problem of hoe, the whole is sub-divided into segments • Segments can be colored or shaded differently for greater clarity
  • 19. Histogram • A special type of bar graph showing frequency distribution • It consists of a set of columns with no space between each of them • The area under each column represents the frequency of each class • If the data have been grouped into unevenly spaced intervals, a histogram is the most suitable kind of diagram
  • 20. Frequency Polygon • A special kind of line graph connecting midpoints at the tops of bars or cells of histogram • Total area under the frequency polygon is equal to that of histogram
  • 21. Line Diagram 0 10 20 30 40 50 60 70 80 90 Jan Feb March April May June July August Sept Oct Nov Dec DHF incidence during 2007, in X hospital • Most commonly used for showing changes of values with the passage of time
  • 22. Stem and leaf plot • It resembles with the histogram and has the same purpose (range of data set, location of highest concentration of measurements, presence or absence of symmetry)
  • 24. Descriptive Statistics • Summarizing Data – Central Tendency (or Groups’ “Middle Values”) • Mean • Median • Mode – Variation (or Summary of Differences Within Groups) • Range • Standard Deviation • Variance • Coefficient of variation
  • 25. Measures of Central Tendency • Statistic : A descriptive measure computed from the data of a sample • Parameter : A descriptive measure computed from the data of a population • Most commonly used measures of central tendency: Mean, Median, Mode
  • 26. Mean • also called 'Average' • obtained by adding all the values in a population or a sample • dividing by the number of values that are added Formula of the mean : For a finite population: μ = ∑ xi / N : For a sample : x = ∑ xi / n Eg. Mean age (year) of the following 9 subjects 56, 54, 61, 60, 54, 44, 49, 50, 63 x = ∑ xi / n = 56+54+61+60+54+44+49+50+63 / 9 = 54.55 year Properties of the mean • Uniqueness • Simplicity • Being influenced by extreme values
  • 27. Exercises Mean Class A--IQs of 13 Students 102 115 128 109 131 89 98 106 140 119 93 97 110 Class B--IQs of 13 Students 127 162 131 103 96 111 80 109 93 87 120 105 109 Σ x = 1437 Σ x = 1433 x bar= Σ x = 1437 = 110.54 x bar = Σx = 1433 = 110.23 n 13 n 13
  • 28. Mean 1. Means can be badly affected by outliers (data points with extreme values unlike the rest) 2. Outliers can make the mean a bad measure of central tendency or common experience All of Us Bill Gates Mean Outlier Income in the U.S.
  • 29. Median • The middle value of the data set which is arrayed from the lowest to the highest. • 50% < Median > 50% • For the series of odd numbers, median is the middle value. • For even numbers, median is the average of two middle values. Formula: ( n + 1) / 2 th value Properties of Median • Uniqueness • Simplicity • Median can avoid the effect of skewed distribution
  • 30. eg. Median age (year) of the following 9 subjects 56, 54, 61, 60, 54, 44, 49, 50, 63 Ordered array → 44, 49, 50, 54, 54, 56, 60, 61, 63 ( n + 1) / 2th value → (9 + 1) /2 = 10/ 2 = 5th value 5th value is 54, so median is 54
  • 31. Median Median = 109 (six cases above, six below) Class A--IQs of 13 Students 89 93 97 98 102 106 109 110 115 119 128 131 140
  • 32. Median Median = 109.5 109 + 110 = 219/2 = 109.5 (six cases above, six below) If the first student were to drop out of Class A, there would be a new median: 89 93 97 98 102 106 109 110 115 119 128 131 140
  • 33. Median 1. The median is unaffected by outliers, making it a better measure of central tendency, better describing the “typical person” than the mean when data are skewed. All of Us Bill Gates outlier
  • 34. Median 2. If the recorded values for a variable form a symmetric distribution, the median and mean are identical. 3. In skewed data, the mean lies further toward the skew than the median. Mean Median Mean Median Symmetric Skewed
  • 35. Mode • Value most frequently occurring in a set of data • More than one mode present • Can be used for the categorical data. eg. Modal age (year) of the following 9 subjects 56, 54, 61, 60, 54, 44, 49, 50, 63 54 is modal age
  • 36. Mode 1. It may give you the most likely experience rather than the “typical” or “central” experience. 2. In symmetric distributions, the mean, median, and mode are the same. 3. In skewed data, the mean and median lie further toward the skew than the mode. Median Mean Median MeanMode Mode Symmetric Skewed (Rt)
  • 37. Measures of dispersion • Dispersion: synonyms → variation, spread, scatter • Range: The difference between the largest and smallest value in a set of data and poor measure of dispersion. R = xL - xS eg. The range of ages (year) of the following 9 subjects 56, 54, 61, 60, 54, 44, 49, 50, 63 R = xL - xS = 63 – 44 = 19
  • 38. Range • The spread, or the distance, between the lowest and highest values of a variable. • To get the range for a variable, you subtract its lowest value from its highest value. Class A--IQs of 13 Students 102 115 128 109 131 89 98 106 140 119 93 97 110 Class A Range = 140 - 89 = 51 Class B--IQs of 13 Students 127 162 131 103 96 111 80 109 93 87 120 105 109 Class B Range = 162 - 80 = 82
  • 39. Standard Deviation • Standard deviation: s for sample : σ for population • It measures how each observation in the data set differs from the mean • The square root of the variance reveals the average deviation of the observations from the mean. s.d. = √ variance
  • 40. Standard Deviation 1. The larger s.d. the greater amounts of variation around the mean. For example: 2. s.d. = 0 only when all values are the same (only when you have a constant and not a “variable”) 3. Like the mean, the s.d. will be inflated by an outlier case value.
  • 41. Variance • An average measure of squared deviation of observations from the mean. • The larger the variance, the further the individual cases are from the mean. • The smaller the variance, the closer the individual scores are to the mean. Mean Mean
  • 42. Variance • Variance is a number that at first seems complex to calculate. • Calculating variance starts with a “deviation.” • A deviation is the distance away from the mean of a case’s score. variance (𝑠2) = Ʃ (x – x ) 2 n−1
  • 43. The coefficient of variation • To compare the dispersion in two sets of data. • Express the standard deviation as a percentage of mean. • Useful in comparing the relative variability of different kinds of characteristics or with different unit. CV = 𝑠 𝑥 * 100 = ( ) %
  • 44. Descriptive Statistics Summarizing Data:  Central Tendency (or Groups’ “Middle Values”) • Mean • Median • Mode  Variation (or Summary of Differences Within Groups) • Range • Standard Deviation • Variance • Coefficient of variation – …Wait! There’s more
  • 45. Normal Distribution • Symmetrical distribution of data • Normal curve or Gaussian distribution • The shape of curve depends on mean and SD Properties of normal distribution • Symmetrical, belled shape • Usually not touch to the base line • Mean, Median, Mode are the same • Area under the curve, ± 1 SD = 68.26% ± 2 SD = 95.46% ± 3 SD = 99.74%
  • 46. Skew Distribution • If a graph (histogram or frequency polygon) of distribution is asymmetric, the distribution is said to be skewed. • Right or positively skewed : if the graph extends further to the right, long tail to the right. • Left or negatively skewed : if the graph extends further to the left, long tail to the left. Skewed (Rt)Skewed (Lt)
  • 47. Kurtosis • Is the measure of the degree to which a distribution is peaked or flat in comparison to normal distribution whose graph is characterized by bell- shaped appearance. • Mesokurtic : Kurtosis measure = 0 • Leptokurtic : Kurtosis measure > 0 • Platykurtic : Kurtosis measure < 0
  • 48. Curve Name 8/29/2018 48 Mesokurtic (Normal) Leptokurtic Platykurtic
  • 49. Descriptive Statistics • Now you are qualified use descriptive statistics! • Questions?