Statistics-Chapter-1.pptxheheheueuehehehehehe

STATISTI
CS
Chapter 1 Introduction Lesson 1
Chapter 1

DATA are defined as factual
information used as basis for
reasoning, discussion, or calculation,
so that meaningful conclusions can be
drawn.

Definition
The word statistics is derived
from the Latin word status meaning
“state”. Early uses of statistics involved
compilation of data and graphs
describing various aspects of the state
or country.

Definition
Statistics
Actual
numbers(data)
Method of
analysis

Statistics as actual numbers (data)
• The largest earthquake measured 9.2 on the
Richter scale.
• Men are at least 10 times more likely than
women to commit murder.
• One in every 8 South Africans is HIV positive.
• By the year 2020, there will be 15 people aged
65 and over for every new baby born.

Definition
Statistics is a science,
which deals with the collection,
organization, presentation,
analysis, and interpretation of
quantitative data.

STATISTICS
Statistics the most important science in the whole
world; for upon it depends the practical applications of
every science and of every art; the one science essential
to all political and social administration, all education
all organization based on experience, for it only gives
results of our experience
--Florence Nightingale

Uses of Statistics
Statistics
Education
Government
Sports and
Entertainment
Industries and
other sectors
Politics

The following are some specific
uses of statistics:
• Surveys are design to collect early
returns on election day to forecast the
outcome of an election.
• Consumers are samples to provide
information for predicting product
preference.

• The research physician conducts
experiments to determine the effect
of various drugs and controlled
environmental conditions on humans
in order to infer the appropriate
method of treatment of a particular
disease.

Branches of Statistics
Statistics
Descriptive
Statistics
Inferential
Statistics

Descriptive Statistics
It deals with the methods of
organizing, summarizing and presenting
a mass of data so as to yield meaningful
information.

Example #1
The Philippine Atmospheric, Geophysical and
Astronomical Services Administration (PAGASA) measures
the daily amount of rainfall in millimeters. They can use
descriptive statistics to compute the average daily
amount of rainfall every month for the past year. They
can use the results to describe the amount of rainfall for
the past year.

Example #2
Given the daily sales performance for a product for
the previous year we can draw a line chart or a
column chart (bar) to emphasize the
upward/downward movement of the series.
Likewise, we can use descriptive statistics to
calculate a quantity index per quarter to compare
sales by quarter for the previous year.

Inferential Statistics
It deals with making
generalizations about a body of data
where only a part of it is examined.
This comprises those methods with
the analysis of a subset of data
leading to predictions or inferences
about the entire set of data.

Sample is part of the population under
consideration.
Population is the set of all individuals or entities
under consideration or study. It may be a finite
or infinite collection of objects, events, or
individuals, with specified class or characteristics
under consideration.

Example #1
To examine the performance of the
country’s financial system, we can use
inferential statistics to arrive at conclusions
that apply to the entire economy using the
data gathered from a sample of companies or
businesses in the country.

Example #2
To determine if reforestation is effective,
we can take a representative portion of
denuded forests and use inferential statistics to
draw conclusions about the effect of
reforestation on all denuded forests.

Example #3
The research division of a certain
pharmaceutical company is investigating
the effectiveness of a new diet pill in
reducing weight on female adults.

“It is a capital mistake to
theorize before one has data”
Sir Arthur Conan Doyle

Determine whether the following statements
use the area of descriptive statistics or
inferential statistics:
1. A bowler wants to find his bowling average
for the past 12 games.
2. A manager would like to predict based on
previous years’ sales, the sales performance
of a company for the next five years.
3. A politician would like to estimate, based on
an opinion poll, his chance for winning in the
upcoming senatorial election.

4. A teacher wishes to determine the percentage
of students who passed the examination.
5. A basketball player wants to estimate his
chance of winning the most valuable player
award based on his current season averages and
the averages of his opponents.
Determine whether the following statements
use the area of descriptive statistics or
inferential statistics:

Definition of Some Statistical Terms
1. Population is the set of all individuals or
entities under consideration or study. It may
be a finite or infinite collection of objects,
events, or individuals, with specified class or
characteristics under consideration.
2. Variable is a characteristic of interest
measurable on each and every individual in
the universe, denoted by any capital letter in
the English alphabet.

Example #1
The PSU Office of Admission is studying the relationship
between the score in the college entrance examination during
application and the general point average (GPA) upon graduation
among graduates of the university from 2010 – 2018.
Population: collection of all graduates of the university from
2010 - 2018.
Variable of interest: score in the college entrance examination
and GPA

Example #2
The Department of Health is interested in determining the
percentage of children below 12 years old infected by the
Hepatitis B/Polio virus in Dagupan City in 2017.
Population: set of all children below 12 years old in Dagupan
City in 2017.
Variable of interest: whether or not the child has ever been
infected by the Hepatitis B/Polio virus.

Example #3
The research division of a certain pharmaceutical
company is investigating the effectiveness of a new
diet pill in reducing weight on female adults.
Population: set of all female adults who will use the
diet pill
Variable of interest: Weight before taking the diet pill,
weight after taking the diet pill.

Types of Variable
Qualitative Variable consists of categories or
attributes, which have non-numerical
characteristics.
Examples: year level, sex, religion etc.
Quantitative Variable consists of numbers
representing counts or measurements
Examples: age, height, weight, grades etc.

Statistics-Chapter-1.pptxheheheueuehehehehehe

Classifications of Quantitative Variable
a. Discrete results from either finite or infinite
number of countable possible values.
Examples: number of students, number of
books, etc.
b. Continuous results from infinitely many
possible values that can be associated with
points on a continuous scale.
Examples: height, weight, grade point
average, etc.

Exercise: Discrete or Continuous?
1. Number of children in a household
2. Number of languages a person speaks
3. Number of people sleeping in stats class
4. Height of children
5. Weight of cars
6. Time to wake up in the morning
7. Speed of the train
8. Age of a person
9. Daily rainfall
10. Time in a race

3. Sample is part of the population under
consideration.
4. Parameter is a numerical measurement
describing some characteristic of a population.
5. Statistic is a numerical measurement
describing some characteristic of a sample.
6. Survey is often conducted to gather opinions
or feedbacks about a certain topic.
Examples: census survey and sampling
survey

MEASUREMENT
Measurement is the process of
determining the value or label of the
variable based on what has been
observed.

LEVELS OF MEASUREMENT
Level 1
Nominal is characterized by data that consist of
names, labels, or categories only. Data cannot
be arranged in ordering scheme.
Examples:
1. Name 4. address
2. Religion 5. sex
3. Civil status 6. degree program

Level 2
Ordinal involves data that maybe arranged in
some order, but differences between data
values either cannot be determined or are
meaningless.
Examples:
1. Military rank 3. year level
2. Job position

Level 3
Interval is like the ordinal level, with additional
property that meaningful amount of differences
between data can be determined. There is no
absolute zero.
Examples:
1. IQ score
2. Temperature

Level 4
Ratio is the interval level modified to include
absolute zero.
Examples:
1. Height
2. Width
3. Area
4. Weekly allowance

Exercise: At what level are the following
variables measured
1. Student number
2. Zip codes
3. Final course grades(4.0, 3.00,…)
4. Lengths of TV commercials
5. Blood pressure
6. Gender
7. Family income
8. Academic rank
9. TIN
10. Distances(km) traveled by a bus

DATA
COLLECTION
and
PRESENTATI
ON
Chapter 2 Data Collection and Presentation Lesson 1
Chapter 2

Methods of Data Collection
In order to have accurate
data, the researcher must know the
right resources and right way of
collecting them.

Characteristics of a Good Question
1. A good question is unbiased.
Biased Questions Unbiased Questions
1. Do you favor the enrolment
procedure employed last
semester which makes long
lines shorter?
1. Do you favor the enrolment
procedure employed last
semester?
2. Do you listen to boring
classical music?
2. Do you like classical music?

2. A good question must be clear and simply
stated.
Not a good question Good question
1. What is your academic
performance last semester?
1. What is your average grade
last semester?

3. Questions must be precise.
4. Good questionnaires lend themselves to easy
analyses.
Vague Question Precise Question
1. Do you think male and female
are equal?
1. In terms of mathematical
ability, do you think male and
female are equal?

Two Categories of Survey Questions
1. Open question- allows a free response.
Example: What do you think can be done to reduce
crime?
2. Closed question- allows only a fixed response.
Example: Which of the following approaches would be
the most effective in reducing crime? Choose one.
A. Get parents to discipline children more.
B. Correct social and economic conditions in slums
C. Improve rehabilitation efforts in jails.
D. Give convicted criminals tougher sentences
E. Reform courts.

Types of Data
1. Primary Data
-information collected from original source of
data.
2. Secondary Data
-information collected from published or
unpublished sources like books, newspapers and
thesis.

Methods of Data Collection
1. Direct or Interview Method(Interviewee and
Interviewer)
2. Indirect or Questionnaire Method(Written
answers)
3. Registration Method(Laws)
4. Observation Method(Senses)
5. Experiment Method (Cause and Effect)

Sampling
Sampling is the process of
selecting units, like people,
organizations, or objects from a
population of interest.

Advantages of Sampling
1. Reduced cost
2. Greater speed
3. Greater scope
4. Greater accuracy

Some Definitions
• Target Population- an entire group a
researcher is interested in.
• Sampled Population-collection of elements
from which sample is actually taken.
• The Frame

Probability Sampling
Probability sampling method
is any method of sampling that utilizes
some form of random selection.

1. Simple Random Sampling
-simplest form of random sampling.
Examples: table of random numbers, computer
generated random numbers, use of calculators
2. Stratified Random Sampling
-dividing the population into homogenous
subgroups and then taking a simple random
sampling

3. Systematic Random Sampling
Systematic sampling with a random start is a
method of selecting a sample by taking every
𝑘𝑡ℎ
unit from an ordered population, the first
unit being selected at random.
4. Cluster Random Sampling
This sampling method involves dividing the
population into clusters (geographical) and then
randomly taking samples out of clusters.

Non-probability Sampling
Non-probability sampling does
not involve random selection of samples.
Non-probability samples do not depend
upon the rationale of probability theory.

Non-probability Sampling
1. Accidental, Haphazard or Convenience
Sampling( based primarily on the
convenience of the researcher)
2. Purposive Sampling(samples are taken with a
purpose in mind)

Methods of Data Presentation
Methods of Presenting Data
1. Textual Method- a narrative description of
the data gathered.
2. Tabular Method- uses rows and columns in
describing data.
3. Graphical Method- an illustrative description
of data.

Terms
Raw Data- information obtained by observing
values of a variable.
Qualitative data(Qualitative Variable)
Quantitative data(Quantitative Variable)
1. Discrete data
2. Continuous data

Examples:
1) A study is conducted in which individuals are
classified into one of sixteen personality
types using Myers-Briggs type indicator.
2) The cardiac output in liters per minute is
measured for the participants in a medical
study.
3) The number of deaths per 200,000
inhabitants is recorded for several large cities
of China caused by NCOV.

Frequency Distribution Table(FDT)
An FDT is a statistical table
showing the frequency or number
of observations contained in each
of the defined classes or
categories.

Types of FDT
1. Qualitative or Categorical FDT- a frequency
distribution table where the data are
grouped according to some qualitative
characteristics.
Example:
Gender of Respondents Number of Respondents
Male 38
Female 62
Total 100

Example:
A sample of rural country arrests gave the
following set of offenses with which individuals
were charged:

Relative Frequency of a Category

BAR GRAPH
A bar graph is a graph composed of
bars whose heights are the frequencies of the
different categories. A bar graph displays
graphically the same information concerning
qualitative data that a frequency distribution
shows in tabular form.

PIE CHART
A pie chart is also used to
graphically display qualitative data. To
construct a pie chart, a circle is divided into
portions that represent the relative
frequencies or percentages belonging to
different categories.

Types of FDT
2. Quantitative FDT- a frequency distribution
table where the data are grouped according to
some numerical or quantitative characteristics.
Example:
Weight(in kilogram) Frequency
7-9 2
10-12 8
13-15 14
16-18 19
19-21 7
Total 50

Terms
1) Classes
2) Class limits
3) Class boundaries
4) Class width

When forming a frequency distribution, the
following general guidelines should be followed:
1)The number of classes should be between 5
and 15.
2) Each data value must belong to one, and only
one class.
3) When possible, all classes should be of equal
width.

Constructing a Quantitative FDT
Step 1: Determine the range (R)
𝑅 = 𝐻𝑆 − 𝐿𝑆
where: R(Range), HS(Highest Score), LS(Lowest Score)
Step 2: Determine the number of classes (k)
𝑘 = 𝑁
Where N is the number of observations
Step 3: Determine the class size
𝑐 =
𝑅
𝑘

Constructing a Quantitative FDT
Step 4: Enumerate the classes or categories.
Step 5: Tally the observations.
Step 6: Compute for values in other columns of
the FDT as deemed necessary.(True Class
Boundaries(TCB), Class Mark(CM), Relative
Frequency(RF), Cumulative Frequency(CF),
Relative Cumulative Frequency(RCF))

Constructing a Qualitative FDT
Step 1: Collect the necessary data
car bus plane train plane plane
car bus train train plane train
train train plane train plane plane
car bus plane train bus bus
bus bus bus bus bus bus
Step 2: Tally and make the necessary FDT.

Exercise: Construct the FDT of the
given data set
Age(In years) of 40 Patients Confined at a certain
hospital.
5 15 23 27 33 38 44 52
5 15 24 30 33 40 45 53
7 20 25 31 34 42 45 55
10 20 25 31 35 42 50 57
13 21 26 32 36 43 51 57

The price for 500 aspirin tablets is determined
for each of twenty randomly selected stores as
part of a larger consumer study. The prices are
as follows:

SINGLE-VALUED CLASSES
If only a few unique values occur
in a set of data, the classes are expressed
as a single value rather than an interval of
values.

Example
A quality technician selects 25 bars of soap from
the daily production.

HISTOGRAMS
A histogram is a graph that displays
the classes on the horizontal axis and the
frequencies of the classes on the vertical axis.
The frequency of each class is represented by a
vertical bar whose height is equal to the
frequency of the class.

CUMULATIVE FREQUENCY
DISTRIBUTIONS
A cumulative frequency distribution
gives the total number of values that fall below
various class boundaries of a frequency
distribution.

EXAMPLE
Table below shows the frequency distribution of
the contents in milliliters of a sample of 25 one-
liter bottles of soda.

OGIVES
An ogive is a graph in which a
point is plotted above each class boundary
at a height equal to the cumulative
frequency corresponding to that boundary.
Ogives can also be constructed for a
cumulative relative frequency distribution
as well as a cumulative percentage
distribution.

STEM-AND-LEAF DISPLAYS
In a stem-and-leaf display each value
is divided into a stem and a leaf. The leaves for
each stem are shown separately. The stem-and-
leaf diagram preserves the information on
individual observations.

The following are the Philippine Achievement
Percentile Scores (CAT scores) for 30 seventh-
grade students:

Exercises
Classify the following data as either qualitative data or
quantitative data. In addition, classify the quantitative
data as discrete or continuous.
(a) The number of times that a movement authority is
sent to a train from a relay station is recorded for
several trains over a two-week period. The movement
authority, which is an electronic transmission, is sent
repeatedly until a return signal is received from the
train.
(b) (b) A physician records the follow-up condition of
patients with optic neuritis as improved, unchanged,
or worse.

The following data set gives the yearly food stamp
expenditure in thousands of dollars for 25
households in Alcorn County:
Construct a frequency distribution consisting of six
classes for this data set. Use 0.5 as the lower limit for
the first class and use a class width equal to 0.5.

Graphical Representations
of Data

Common Types of Graph
1. Scatter Graph- A graph used to present
measurements or values that are thought to be
related.

2. Line Chart- graphical representation of data
especially useful for showing trends over a
period of time.

3. Pie Chart- a circular graph that is useful in
showing how a total quantity is distributed
among a group of categories.

4. Column and Bar Graph

Graphical Presentation of the
Frequency Distribution Table
1. Frequency Histogram- a bar graph that
displays the classes on horizontal axis and the
frequencies of the classes on the vertical axis.

2. Relative Frequency Histogram- a graph
that displays the classes on the horizontal axis
and relative frequencies on the vertical axis.

3. Frequency Polygon- a line chart that is
constructed by plotting the frequencies at the
class marks and connecting the plotted points
by means of straight lines.

4. Ogives - graphs of the cumulative frequency
distribution.
a. < ogive- the <CF is plotted against the UTCB.
b. > ogive- the >CF is plotted against the LTCB.

MEASURES OF CENTRAL
TENDENCY
Chapter 3

Chapter 2 gives several techniques for
organizing data. Bar graphs, pie charts,
frequency distributions, histograms, and stem-
and-leaf plots are techniques for describing data.
Often times we are interested in a typical
numerical value to help us describe a data set.
This typical value is often called an average
value or a measure of a central tendency. We
are looking for a single number that is in some
sense representative of the complete data set

EXAMPLE 3.1
The following are examples of measures of central
tendency:
1) median priced home,
2) average cost of a new automobile,
3) the average household income in the United
States,
4) modal number of televisions per household.
Each of these examples is a single number,
which is intended to be typical of the
characteristics of interest.

113
Measures of Central Tendency
• A measure of central tendency is a descriptive
statistic that describes the average, or typical
value of a set of scores
• There are three common measures of central
tendency:
– the mode
– the median
– the mean

A data set consisting of the
observations for some variable referred to as
raw data or ungrouped data. Data is presented
in the form of frequency distribution are called
grouped data. The measures of central
tendency discussed in this chapter will be
described for both grouped and ungrouped data
since both forms of data occur frequently.

Mean(𝑥)
The mean for a sample consisting of n
observations is
𝑥 =
𝑥
𝑛
and the mean for a population consisting of N
observations is
𝑥 =
𝑥
𝑁

EXAMPLE:
The number of 911 emergency calls classified as
domestic disturbance calls in a large
metropolitan location were sampled for thirty
randomly selected 24 hour periods with the
following results. Find the mean number of calls
per 24-hour period.

EXAMPLE
The total number of 911 emergency calls
classified as domestic disturbance calls last year
in a large metropolitan location was 14,950. Find
the mean number of such calls per 24-hour
period if last year was not a leap year.
𝜇 =
𝑥
𝑁
=
14,950
365
= 41.0

119
When To Use the Mean
• You should use the mean when
– the data are interval or ratio scaled
• Many people will use the mean with ordinally scaled
data too
– and the data are not skewed
• The mean is preferred because it is sensitive
to every score
– If you change one score in the data set, the mean
will change

120
The Median
• The median is simply another name for the
50th percentile
– It is the score in the middle; half of the scores are
larger than the median and half of the scores are
smaller than the median

Median(𝑥)
The median of a set of data is a value
that divides the bottom 50% of the data from
the top 50% of the data. To find the median of a
data set, first arrange the data in increasing
order. If the number of observations is odd, the
median is the number in the middle of the
ordered list. If the number of observations is
even, the median is the mean of the two values
closest to the middle of the ordered list.

EXAMPLE
To find the median number of domestic
disturbance calls per 24-hour period for the data in

123
When To Use the Median
• The median is often used when the
distribution of scores is either positively or
negatively skewed
– The few really large scores (positively skewed) or
really small scores (negatively skewed) will not
overly influence the median

Mode(𝑥)
The mode is the value in a data set
that occurs the most often. If no such value
exists, we say that the data set has no mode. If
two such values exist, we say the data set is
bimodal. If three such values exist, we say the
data set is trimodal.

125
The Mode
• The mode is the score
that occurs most
frequently in a set of
data
0
1
2
3
4
5
6
75 80 85 90 95
Score on Exam 1
Frequency

126
Bimodal Distributions
• When a distribution
has two “modes,” it is
called bimodal
0
1
2
3
4
5
6
75 80 85 90 95
Score on Exam 1
Frequency

127
Multimodal Distributions
• If a distribution has
more than 2 “modes,”
it is called multimodal
0
1
2
3
4
5
6
75 80 85 90 95
Score on Exam 1
Frequency

128
When To Use the Mode
• The mode is not a very useful measure of
central tendency
– It is insensitive to large changes in the data set
• That is, two data sets that are very different from each
other can have the same mode
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8 9 10
0
20
40
60
80
100
120
10 20 30 40 50 60 70 80 90 100

129
When To Use the Mode
• The mode is primarily used with nominally
scaled data
– It is the only measure of central tendency that
is appropriate for nominally scaled data

EXAMPLE
To find the mode number of domestic
disturbance calls per 24-hour period for the data in

131
Relations Between the Measures of
Central Tendency
• In symmetrical
distributions, the median
and mean are equal
– For normal distributions,
mean = median = mode
• In positively skewed
distributions, the mean is
greater than the median
In negatively skewed
distributions, the mean is
smaller than the median

Mean
(1)𝑥 =
𝑓𝑥
𝑁
, where f-frequency, x-class mark
and N-total frequency.
(2)𝑥 = AM +
𝑓𝑑
𝑁
𝑖, where AM-Assumed
mean, 𝑖 − 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒, 𝑑 − 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑎𝑛𝑑 𝑁 −
𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦

Scores Frequency
9-11 1
12-14 2
15-17 0
18-20 3
21-23 3
24-26 8
27-29 13
30-32 3
33-35 2
36-38 4
39-41 4
42-44 4
45-47 3
Scores of 50 Students in STATISTICS EXAM

Median
𝑥 = 𝐿𝑏 +
𝑁
2
− 𝑓<𝑐𝑓
𝑓𝑚
i
Where:
L=Lower class boundary of the interval where
the median lies
N=total frequency
𝑓<𝑐𝑓=cumulative frequency preceding the median
class
fm=frequency of the median class
i=class size

Mode
𝑥 = 𝐿 +
𝑑1
𝑑1 + 𝑑2
𝑖
Where:
L=exact lower limit of modal class
𝑑1=difference between the frequency of the modal
class and the class preceding it
𝑑2=difference between the frequency of the modal
class and the class following it
i=class size

Statistics-Chapter-1.pptxheheheueuehehehehehe

More Related Content

Similar to Statistics-Chapter-1.pptxheheheueuehehehehehe (20)

Recently uploaded (20)

Statistics-Chapter-1.pptxheheheueuehehehehehe