SlideShare a Scribd company logo
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 1
Defining and Collecting
Data
Chapter 1
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 2
Objectives
In this chapter you learn:
 To understand issues that arise when defining
variables.
 How to define variables.
 To understand the different measurement scales.
 How to collect data.
 To identify different ways to collect a sample.
 To understand the issues involved in data
preparation.
 To understand the types of survey errors.
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 3
Classifying Variables By Type
 Categorical (qualitative) variables take categories
as their values such as “yes”, “no”, or “blue”,
“brown”, “green”.
 Numerical (quantitative) variables have values that
represent a counted or measured quantity.
 Discrete variables arise from a counting process.
 Continuous variables arise from a measuring process.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 4
Examples of Types of Variables
DCOVA
Question Responses Variable Type
Do you have a Facebook
profile? Yes or No Categorical
How many text messages
have you sent in the past
three days?
---------------
Numerical
(discrete)
How long did the mobile
app update take to
download?
---------------
Numerical
(continuous)
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 5
Types of Variables
DCOVA
Variables
Categorical Numerical
Discrete Continuous
Examples:
 Marital Status
 Political Party
 Eye Color
(Defined Categories)
Examples:
 Number of Children
 Defects per hour
(Counted items)
Examples:
 Weight
 Voltage
(Measured
characteristics)
Nominal Ordinal
Examples: Ratings
 Good, Better,
Best
 Low, Med, High
(Ordered Categories)
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 6
Measurement Scales
A nominal scale classifies data into distinct
categories in which no ranking is implied.
Categorical Variables Categories
Do you have a
Facebook profile?
Type of investment
Cellular Provider
Yes, No
AT&T, Sprint, Verizon,
Other, None
Growth, Value, Other
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 7
Measurement Scales (con’t.)
An ordinal scale classifies data into distinct
categories in which ranking is implied.
Categorical Variable Ordered Categories
Student class designation Freshman, Sophomore, Junior,
Senior
Product satisfaction Very unsatisfied, Fairly unsatisfied,
Neutral, Fairly satisfied, Very
satisfied
Faculty rank Professor, Associate Professor,
Assistant Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades A, B, C, D, F
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 8
Measurement Scales (con’t.)
 An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true
zero point.
 A ratio scale is an ordered scale in which the
difference between the measurements is a
meaningful quantity and the measurements have a
true zero point.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 9
Interval and Ratio Scales
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 10
Data Is Collected From Either A
Population or A Sample
POPULATION
A population contains all of the items or
individuals of interest that you seek to study.
SAMPLE
A sample contains only a portion of a
population of interest.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 11
Population vs. Sample
All the items or individuals
about which you want to draw
conclusion(s).
A portion of the population
of items or individuals.
Population Sample
DCOVA
A Population of Size 40 A Sample of Size 4
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 12
Collecting Data Via Sampling Is Used
When Doing So Is
 Less time consuming than selecting every item
in the population.
 Less costly than selecting every item in the
population.
 Less cumbersome and more practical than
analyzing the entire population.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 13
Parameter or Statistic?
 A population parameter summarizes the value
of a specific variable for a population.
 A sample statistic summarizes the value of a
specific variable for sample data.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 14
Sources Of Data Arise From
The Following Activities
 Capturing data generated by ongoing business
activities.
 Distributing data compiled by an organization or
individual.
 Compiling the responses from a survey.
 Conducting a designed experiment and
recording the outcomes.
 Conducting an observational study and
recording the results.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 15
Examples of Data Collected From
Ongoing Business Activities
 A bank studies years of financial transactions to
help them identify patterns of fraud.
 Economists utilize data on searches done via
Google to help forecast future economic
conditions.
 Marketing companies use tracking data to
evaluate the effectiveness of a web site.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 16
Examples Of Data Distributed
By An Organization or Individual
 Financial data on a company provided by
investment services.
 Industry or market data from market research
firms and trade associations.
 Stock prices, weather conditions, and sports
statistics in daily newspapers.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 17
Examples of Survey Data
 A survey asking people which laundry detergent
has the best stain-removing abilities.
 Political polls of registered voters during
political campaigns.
 People being surveyed to determine their
satisfaction with a recent product or service
experience.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 18
Examples of Data From A
Designed Experiment
 Consumer testing of different versions of a
product to help determine which product should
be pursued further.
 Material testing to determine which supplier’s
material should be used in a product.
 Market testing on alternative product
promotions to determine which promotion to
use more broadly.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 19
Examples of Data Collected
From Observational Studies
 Market researchers utilizing focus groups to
elicit unstructured responses to open-ended
questions.
 Measuring the time it takes for customers to be
served in a fast food establishment.
 Measuring the volume of traffic through an
intersection to determine if some form of
advertising at the intersection is justified.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 20
Observational Studies & Designed
Experiments Have A Common Objective
 Both are attempting to quantify the effect that a
process change (called a treatment) has on a
variable of interest.
 In an observational study, there is no direct
control over which items receive the treatment.
 In a designed experiment, there is direct control
over which items receive the treatment.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 21
Sources of Data
 Primary Sources: The data collector is the one
using the data for analysis:
 Data from a political survey.
 Data collected from an experiment.
 Observed data.
 Secondary Sources: The person performing
data analysis is not the data collector:
 Analyzing census data.
 Examining data from print journals or data published
on the internet.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 22
A Sampling Process Begins With A
Sampling Frame
 The sampling frame is a listing of items that
make up the population.
 Frames are data sources such as population
lists, directories, or maps.
 Inaccurate or biased results can result if a
frame excludes certain groups or portions of the
population.
 Using different frames to generate data can
lead to dissimilar conclusions.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 23
Types of Samples
Samples
Non Probability
Samples
Judgment
Probability Samples
Simple
Random
Systematic
Stratified
Cluster
Convenience
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 24
Types of Samples:
Nonprobability Sample
 In a nonprobability sample, items included are
chosen without regard to their probability of
occurrence.
 In convenience sampling, items are selected based
only on the fact that they are easy, inexpensive, or
convenient to sample.
 In a judgment sample, you get the opinions of pre-
selected experts in the subject matter.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 25
Types of Samples:
Probability Sample
 In a probability sample, items in the
sample are chosen on the basis of known
probabilities.
Probability Samples
Simple
Random
Systematic Stratified Cluster
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 26
Probability Sample:
Simple Random Sample
 Every individual or item from the frame has an
equal chance of being selected.
 Selection may be with replacement (selected
individual is returned to frame for possible
reselection) or without replacement (selected
individual isn’t returned to the frame).
 Samples obtained from table of random
numbers or computer random number
generators.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 27
Selecting a Simple Random Sample
Using A Random Number Table
Sampling Frame For
Population With 850
Items
Item Name Item #
Bev R. 001
Ulan X. 002
. .
. .
. .
. .
Joann P. 849
Paul F. 850
Portion Of A Random Number Table
49280 88924 35779 00283 81163 07275
11100 02340 12860 74697 96644 89439
09893 23997 20048 49420 88872 08401
The First 5 Items in a simple
random sample
Item # 492
Item # 808
Item # 892 -- does not exist so ignore
Item # 435
Item # 779
Item # 002
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 28
 Decide on sample size: n
 Divide frame of N individuals into groups of k
individuals: k=N/n
 Randomly select one individual from the 1st
group
 Select every kth
individual thereafter
Probability Sample:
Systematic Sample
N = 40
n = 4
k = 10
First Group
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 29
Probability Sample:
Stratified Sample
 Divide population into two or more subgroups (called
strata) according to some common characteristic.
 A simple random sample is selected from each subgroup,
with sample sizes proportional to strata sizes.
 Samples from subgroups are combined into one.
 This is a common technique when sampling population of
voters, stratifying across racial or socio-economic lines.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 30
Probability Sample
Cluster Sample
 Population is divided into several “clusters,” each representative of
the population.
 A simple random sample of clusters is selected.
 All items in the selected clusters can be used, or items can be
chosen from a cluster using another probability sampling technique.
 A common application of cluster sampling involves election exit polls,
where certain election districts are selected and sampled.
Population
divided into
16 clusters. Randomly selected
clusters for sample
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 31
Probability Sample:
Comparing Sampling Methods
 Simple random sample and Systematic sample:
 Simple to use.
 May not be a good representation of the population’s
underlying characteristics.
 Stratified sample:
 Ensures representation of individuals across the
entire population.
 Cluster sample:
 More cost effective.
 Less efficient (need larger sample to acquire the
same level of precision).
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 32
Data Cleaning Is An Important Data
Preprocessing Task Prior To Analysis
Data cleaning corrects irregularities in the data:
 Invalid variable values, including:
 Non-numerical data for numerical variable.
 Invalid categorical values for a categorical variable.
 Numeric values outside a defined range.
 Coding errors, including:
 Inconsistent categorical values.
 Inconsistent case for categorical values.
 Extraneous characters.
 Data integration errors, including:
 Redundant columns.
 Duplicated rows.
 Differing column lengths.
 Different units of measure or scale for numerical variables.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 33
Data Cleaning Cannot Be A Fully
Automated Process
 Excel, JMP, and Minitab have functionality to
lessen the burden of data cleaning.
 The software guides in the book explain this
functionality.
 When performing data cleaning, always
preserve a copy of the original data for later
reference.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 34
Cleaning Invalid Variable Values
Can Be Semi-Automated
 Invalid variable values can be identified by
simple scanning techniques, for example:
 Non-numeric entries for numerical variables.
 Values for categorical variables that don’t match a
pre-defined category.
 Values for a numeric variable outside a pre-defined
explicit range.
 Features exist in Excel, JMP, or Minitab to
assist in this task.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 35
Examples Of Coding Errors
Copy-and-paste or data import can result in poor
recording or entry of data.
Categorical variable: Gender, Correct coding: F or M
 Correctable error: Female.
 Invalid data: New York.
 Correctable or software tolerated: m.
 Extraneous and nonprintable characters:
 Leading or trailing space(s): _F or F_.
 Other nonprintable characters may also be leading or trailing
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 36
Data Integration Errors From Combining
Two Different Computerized Data Sources
 Data integration errors often requires time-
consuming manual effort.
 Some examples:
 Variable names or definitions may differ.
 Duplicated rows (observations) may also occur.
 Different units of measurement (or scale) may not be
obvious without human interpretation.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 37
Data Can Be Formatted and / or
Encoded In More Than One Way
 Some electronic formats are more readily
usable than others.
 Different encodings can impact the precision of
numerical variables and can also impact data
compatibility.
 As you identify and choose sources of data you
need to consider / deal with these issues.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 38
Stacked vs Unstacked Data
 For unstacked data you create separate
numerical variables for different groups (i.e.
genders, locations, etc.)
 For stacked data you create a single column for
the variable of interest and create additional
columns for the potential grouping variables.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 39
After Collection It Is Often Helpful To
Recode Some Variables
 Recoding a variable can either supplement or replace
the original variable.
 Recoding a categorical variable involves redefining
categories.
 Recoding a numerical variable involves changing this
variable into a categorical variable.
 When recoding be sure that the new categories are
mutually exclusive (categories do not overlap) and
collectively exhaustive (categories cover all possible
values).
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 40
Evaluating Survey Worthiness
 What is the purpose of the survey?
 Is the survey based on a probability sample?
 Coverage error – appropriate frame?
 Nonresponse error – follow up.
 Measurement error – good questions elicit good
responses.
 Sampling error – always exists.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 41
Types of Survey Errors
 Coverage error or selection bias:
 Exists if some groups are excluded from the frame and have
no chance of being selected.
 Nonresponse error or bias:
 People who do not respond may be different from those who
do respond.
 Sampling error:
 Variation from sample to sample will always exist.
 Measurement error:
 Due to weaknesses in question design and / or respondent
error.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 42
Types of Survey Errors
 Coverage error
 Nonresponse error
 Sampling error
 Measurement error
Excluded from
frame
Follow up on
nonresponses
Random
differences from
sample to sample
Bad or leading
question
(continued)
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 43
Ethical Issues About Surveys
 Coverage error and nonresponse error can be
leveraged by survey designers to purposely
bias survey results.
 Sampling error can be an ethical issue if the
findings are purposely not reported with the
associated margin of error.
 Measurement error can be an ethical issue:
 Survey sponsor chooses leading questions.
 Interviewer purposely leads respondents in a
particular direction.
 Respondent(s) willfully provide false information.
DCOVA
Copyright © 2020 Pearson Education Ltd.
A LWAY S L E A R N I N G Slide 44
Chapter Summary
In this chapter we have discussed:
 Understanding issues that arise when defining
variables.
 How to define variables.
 Understanding the different measurement scales.
 How to collect data.
 Identifying different ways to collect a sample.
 Understanding the issues involved in data
preparation.
 Understanding the types of survey errors.

More Related Content

Similar to basic statistics introduction to statistics .pptx (20)

DOCX
Data Analysis and Statistics-skills.docx
Abraham Eninla MCP, MCSA, BSc, MBA, KSWD, CCNP
 
PPT
Chapter 1: Statistics
Andrilyn Alcantara
 
PPT
Website development company surat
Edhole.com
 
PPT
Introduction-To-Statistics-18032022-010747pm (1).ppt
Israr36
 
PPTX
Ch(1)_statistical and critical thinking.pptx
hendsaleh2023
 
PPT
Chapter 1
cunninghame
 
PDF
1Basic biostatistics.pdf
YomifDeksisaHerpa
 
PDF
Unit III - Statistical Process Control (SPC)
Dr.Raja R
 
PPT
Statistics.ppt
21EDM25Lilitha
 
PDF
Distinguish between qualitative data and quantitative data.
AddisalemMenberu
 
DOC
Statistic
Chia Barzinje
 
PPTX
01 Introduction (1).pptx
BAVAHRNIAPSUBRAMANIA
 
PPTX
Topic 1 ELEMENTARY STATISTICS.pptx
moisespadillacpsu19
 
PPT
LEVEL OF MEASUREMENTS_2.ppt
chusematelephone
 
PDF
CHAPTER 1.pdf Probability and Statistics for Engineers
braveset14
 
PDF
CHAPTER 1.pdfProbability and Statistics for Engineers
braveset14
 
PPTX
Meaning and Importance of Statistics
Flipped Channel
 
PPT
A basic Introduction To Statistics with examples
ShibsekharRoy1
 
PPTX
Ch07
Marouane Zouzhi
 
PPTX
Stat-Lesson.pptx
JennilynFeliciano2
 
Data Analysis and Statistics-skills.docx
Abraham Eninla MCP, MCSA, BSc, MBA, KSWD, CCNP
 
Chapter 1: Statistics
Andrilyn Alcantara
 
Website development company surat
Edhole.com
 
Introduction-To-Statistics-18032022-010747pm (1).ppt
Israr36
 
Ch(1)_statistical and critical thinking.pptx
hendsaleh2023
 
Chapter 1
cunninghame
 
1Basic biostatistics.pdf
YomifDeksisaHerpa
 
Unit III - Statistical Process Control (SPC)
Dr.Raja R
 
Statistics.ppt
21EDM25Lilitha
 
Distinguish between qualitative data and quantitative data.
AddisalemMenberu
 
Statistic
Chia Barzinje
 
01 Introduction (1).pptx
BAVAHRNIAPSUBRAMANIA
 
Topic 1 ELEMENTARY STATISTICS.pptx
moisespadillacpsu19
 
LEVEL OF MEASUREMENTS_2.ppt
chusematelephone
 
CHAPTER 1.pdf Probability and Statistics for Engineers
braveset14
 
CHAPTER 1.pdfProbability and Statistics for Engineers
braveset14
 
Meaning and Importance of Statistics
Flipped Channel
 
A basic Introduction To Statistics with examples
ShibsekharRoy1
 
Stat-Lesson.pptx
JennilynFeliciano2
 

Recently uploaded (20)

PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PPSX
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
PDF
FULL DOCUMENT: Read the full Deloitte and Touche audit report on the National...
Kweku Zurek
 
PPTX
CONVULSIVE DISORDERS: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PPTX
PPT on the Development of Education in the Victorian England
Beena E S
 
PPTX
ANORECTAL MALFORMATIONS: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PDF
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PPTX
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
PPTX
THE HUMAN INTEGUMENTARY SYSTEM#MLT#BCRAPC.pptx
Subham Panja
 
PDF
water conservation .pdf by Nandni Kumari XI C
Directorate of Education Delhi
 
PPTX
Capitol Doctoral Presentation -July 2025.pptx
CapitolTechU
 
PPTX
classroom based quiz bee.pptx...................
ferdinandsanbuenaven
 
PPTX
Blanket Order in Odoo 17 Purchase App - Odoo Slides
Celine George
 
PPTX
Views on Education of Indian Thinkers Mahatma Gandhi.pptx
ShrutiMahanta1
 
PPTX
Modern analytical techniques used to characterize organic compounds. Birbhum ...
AyanHossain
 
PPTX
Presentation: Climate Citizenship Digital Education
Karl Donert
 
PPTX
Maternal and Child Tracking system & RCH portal
Ms Usha Vadhel
 
PPTX
How to Configure Access Rights of Manufacturing Orders in Odoo 18 Manufacturing
Celine George
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
FULL DOCUMENT: Read the full Deloitte and Touche audit report on the National...
Kweku Zurek
 
CONVULSIVE DISORDERS: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PPT on the Development of Education in the Victorian England
Beena E S
 
ANORECTAL MALFORMATIONS: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
THE HUMAN INTEGUMENTARY SYSTEM#MLT#BCRAPC.pptx
Subham Panja
 
water conservation .pdf by Nandni Kumari XI C
Directorate of Education Delhi
 
Capitol Doctoral Presentation -July 2025.pptx
CapitolTechU
 
classroom based quiz bee.pptx...................
ferdinandsanbuenaven
 
Blanket Order in Odoo 17 Purchase App - Odoo Slides
Celine George
 
Views on Education of Indian Thinkers Mahatma Gandhi.pptx
ShrutiMahanta1
 
Modern analytical techniques used to characterize organic compounds. Birbhum ...
AyanHossain
 
Presentation: Climate Citizenship Digital Education
Karl Donert
 
Maternal and Child Tracking system & RCH portal
Ms Usha Vadhel
 
How to Configure Access Rights of Manufacturing Orders in Odoo 18 Manufacturing
Celine George
 
Ad

basic statistics introduction to statistics .pptx

  • 1. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 1 Defining and Collecting Data Chapter 1
  • 2. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 2 Objectives In this chapter you learn:  To understand issues that arise when defining variables.  How to define variables.  To understand the different measurement scales.  How to collect data.  To identify different ways to collect a sample.  To understand the issues involved in data preparation.  To understand the types of survey errors.
  • 3. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 3 Classifying Variables By Type  Categorical (qualitative) variables take categories as their values such as “yes”, “no”, or “blue”, “brown”, “green”.  Numerical (quantitative) variables have values that represent a counted or measured quantity.  Discrete variables arise from a counting process.  Continuous variables arise from a measuring process. DCOVA
  • 4. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 4 Examples of Types of Variables DCOVA Question Responses Variable Type Do you have a Facebook profile? Yes or No Categorical How many text messages have you sent in the past three days? --------------- Numerical (discrete) How long did the mobile app update take to download? --------------- Numerical (continuous)
  • 5. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 5 Types of Variables DCOVA Variables Categorical Numerical Discrete Continuous Examples:  Marital Status  Political Party  Eye Color (Defined Categories) Examples:  Number of Children  Defects per hour (Counted items) Examples:  Weight  Voltage (Measured characteristics) Nominal Ordinal Examples: Ratings  Good, Better, Best  Low, Med, High (Ordered Categories)
  • 6. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 6 Measurement Scales A nominal scale classifies data into distinct categories in which no ranking is implied. Categorical Variables Categories Do you have a Facebook profile? Type of investment Cellular Provider Yes, No AT&T, Sprint, Verizon, Other, None Growth, Value, Other DCOVA
  • 7. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 7 Measurement Scales (con’t.) An ordinal scale classifies data into distinct categories in which ranking is implied. Categorical Variable Ordered Categories Student class designation Freshman, Sophomore, Junior, Senior Product satisfaction Very unsatisfied, Fairly unsatisfied, Neutral, Fairly satisfied, Very satisfied Faculty rank Professor, Associate Professor, Assistant Professor, Instructor Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC, C, DDD, DD, D Student Grades A, B, C, D, F DCOVA
  • 8. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 8 Measurement Scales (con’t.)  An interval scale is an ordered scale in which the difference between measurements is a meaningful quantity but the measurements do not have a true zero point.  A ratio scale is an ordered scale in which the difference between the measurements is a meaningful quantity and the measurements have a true zero point. DCOVA
  • 9. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 9 Interval and Ratio Scales DCOVA
  • 10. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 10 Data Is Collected From Either A Population or A Sample POPULATION A population contains all of the items or individuals of interest that you seek to study. SAMPLE A sample contains only a portion of a population of interest. DCOVA
  • 11. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 11 Population vs. Sample All the items or individuals about which you want to draw conclusion(s). A portion of the population of items or individuals. Population Sample DCOVA A Population of Size 40 A Sample of Size 4
  • 12. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 12 Collecting Data Via Sampling Is Used When Doing So Is  Less time consuming than selecting every item in the population.  Less costly than selecting every item in the population.  Less cumbersome and more practical than analyzing the entire population. DCOVA
  • 13. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 13 Parameter or Statistic?  A population parameter summarizes the value of a specific variable for a population.  A sample statistic summarizes the value of a specific variable for sample data. DCOVA
  • 14. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 14 Sources Of Data Arise From The Following Activities  Capturing data generated by ongoing business activities.  Distributing data compiled by an organization or individual.  Compiling the responses from a survey.  Conducting a designed experiment and recording the outcomes.  Conducting an observational study and recording the results. DCOVA
  • 15. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 15 Examples of Data Collected From Ongoing Business Activities  A bank studies years of financial transactions to help them identify patterns of fraud.  Economists utilize data on searches done via Google to help forecast future economic conditions.  Marketing companies use tracking data to evaluate the effectiveness of a web site. DCOVA
  • 16. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 16 Examples Of Data Distributed By An Organization or Individual  Financial data on a company provided by investment services.  Industry or market data from market research firms and trade associations.  Stock prices, weather conditions, and sports statistics in daily newspapers. DCOVA
  • 17. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 17 Examples of Survey Data  A survey asking people which laundry detergent has the best stain-removing abilities.  Political polls of registered voters during political campaigns.  People being surveyed to determine their satisfaction with a recent product or service experience. DCOVA
  • 18. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 18 Examples of Data From A Designed Experiment  Consumer testing of different versions of a product to help determine which product should be pursued further.  Material testing to determine which supplier’s material should be used in a product.  Market testing on alternative product promotions to determine which promotion to use more broadly. DCOVA
  • 19. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 19 Examples of Data Collected From Observational Studies  Market researchers utilizing focus groups to elicit unstructured responses to open-ended questions.  Measuring the time it takes for customers to be served in a fast food establishment.  Measuring the volume of traffic through an intersection to determine if some form of advertising at the intersection is justified. DCOVA
  • 20. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 20 Observational Studies & Designed Experiments Have A Common Objective  Both are attempting to quantify the effect that a process change (called a treatment) has on a variable of interest.  In an observational study, there is no direct control over which items receive the treatment.  In a designed experiment, there is direct control over which items receive the treatment. DCOVA
  • 21. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 21 Sources of Data  Primary Sources: The data collector is the one using the data for analysis:  Data from a political survey.  Data collected from an experiment.  Observed data.  Secondary Sources: The person performing data analysis is not the data collector:  Analyzing census data.  Examining data from print journals or data published on the internet. DCOVA
  • 22. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 22 A Sampling Process Begins With A Sampling Frame  The sampling frame is a listing of items that make up the population.  Frames are data sources such as population lists, directories, or maps.  Inaccurate or biased results can result if a frame excludes certain groups or portions of the population.  Using different frames to generate data can lead to dissimilar conclusions. DCOVA
  • 23. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 23 Types of Samples Samples Non Probability Samples Judgment Probability Samples Simple Random Systematic Stratified Cluster Convenience DCOVA
  • 24. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 24 Types of Samples: Nonprobability Sample  In a nonprobability sample, items included are chosen without regard to their probability of occurrence.  In convenience sampling, items are selected based only on the fact that they are easy, inexpensive, or convenient to sample.  In a judgment sample, you get the opinions of pre- selected experts in the subject matter. DCOVA
  • 25. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 25 Types of Samples: Probability Sample  In a probability sample, items in the sample are chosen on the basis of known probabilities. Probability Samples Simple Random Systematic Stratified Cluster DCOVA
  • 26. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 26 Probability Sample: Simple Random Sample  Every individual or item from the frame has an equal chance of being selected.  Selection may be with replacement (selected individual is returned to frame for possible reselection) or without replacement (selected individual isn’t returned to the frame).  Samples obtained from table of random numbers or computer random number generators. DCOVA
  • 27. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 27 Selecting a Simple Random Sample Using A Random Number Table Sampling Frame For Population With 850 Items Item Name Item # Bev R. 001 Ulan X. 002 . . . . . . . . Joann P. 849 Paul F. 850 Portion Of A Random Number Table 49280 88924 35779 00283 81163 07275 11100 02340 12860 74697 96644 89439 09893 23997 20048 49420 88872 08401 The First 5 Items in a simple random sample Item # 492 Item # 808 Item # 892 -- does not exist so ignore Item # 435 Item # 779 Item # 002 DCOVA
  • 28. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 28  Decide on sample size: n  Divide frame of N individuals into groups of k individuals: k=N/n  Randomly select one individual from the 1st group  Select every kth individual thereafter Probability Sample: Systematic Sample N = 40 n = 4 k = 10 First Group DCOVA
  • 29. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 29 Probability Sample: Stratified Sample  Divide population into two or more subgroups (called strata) according to some common characteristic.  A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes.  Samples from subgroups are combined into one.  This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines. DCOVA
  • 30. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 30 Probability Sample Cluster Sample  Population is divided into several “clusters,” each representative of the population.  A simple random sample of clusters is selected.  All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique.  A common application of cluster sampling involves election exit polls, where certain election districts are selected and sampled. Population divided into 16 clusters. Randomly selected clusters for sample DCOVA
  • 31. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 31 Probability Sample: Comparing Sampling Methods  Simple random sample and Systematic sample:  Simple to use.  May not be a good representation of the population’s underlying characteristics.  Stratified sample:  Ensures representation of individuals across the entire population.  Cluster sample:  More cost effective.  Less efficient (need larger sample to acquire the same level of precision). DCOVA
  • 32. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 32 Data Cleaning Is An Important Data Preprocessing Task Prior To Analysis Data cleaning corrects irregularities in the data:  Invalid variable values, including:  Non-numerical data for numerical variable.  Invalid categorical values for a categorical variable.  Numeric values outside a defined range.  Coding errors, including:  Inconsistent categorical values.  Inconsistent case for categorical values.  Extraneous characters.  Data integration errors, including:  Redundant columns.  Duplicated rows.  Differing column lengths.  Different units of measure or scale for numerical variables. DCOVA
  • 33. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 33 Data Cleaning Cannot Be A Fully Automated Process  Excel, JMP, and Minitab have functionality to lessen the burden of data cleaning.  The software guides in the book explain this functionality.  When performing data cleaning, always preserve a copy of the original data for later reference. DCOVA
  • 34. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 34 Cleaning Invalid Variable Values Can Be Semi-Automated  Invalid variable values can be identified by simple scanning techniques, for example:  Non-numeric entries for numerical variables.  Values for categorical variables that don’t match a pre-defined category.  Values for a numeric variable outside a pre-defined explicit range.  Features exist in Excel, JMP, or Minitab to assist in this task. DCOVA
  • 35. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 35 Examples Of Coding Errors Copy-and-paste or data import can result in poor recording or entry of data. Categorical variable: Gender, Correct coding: F or M  Correctable error: Female.  Invalid data: New York.  Correctable or software tolerated: m.  Extraneous and nonprintable characters:  Leading or trailing space(s): _F or F_.  Other nonprintable characters may also be leading or trailing DCOVA
  • 36. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 36 Data Integration Errors From Combining Two Different Computerized Data Sources  Data integration errors often requires time- consuming manual effort.  Some examples:  Variable names or definitions may differ.  Duplicated rows (observations) may also occur.  Different units of measurement (or scale) may not be obvious without human interpretation. DCOVA
  • 37. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 37 Data Can Be Formatted and / or Encoded In More Than One Way  Some electronic formats are more readily usable than others.  Different encodings can impact the precision of numerical variables and can also impact data compatibility.  As you identify and choose sources of data you need to consider / deal with these issues. DCOVA
  • 38. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 38 Stacked vs Unstacked Data  For unstacked data you create separate numerical variables for different groups (i.e. genders, locations, etc.)  For stacked data you create a single column for the variable of interest and create additional columns for the potential grouping variables. DCOVA
  • 39. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 39 After Collection It Is Often Helpful To Recode Some Variables  Recoding a variable can either supplement or replace the original variable.  Recoding a categorical variable involves redefining categories.  Recoding a numerical variable involves changing this variable into a categorical variable.  When recoding be sure that the new categories are mutually exclusive (categories do not overlap) and collectively exhaustive (categories cover all possible values). DCOVA
  • 40. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 40 Evaluating Survey Worthiness  What is the purpose of the survey?  Is the survey based on a probability sample?  Coverage error – appropriate frame?  Nonresponse error – follow up.  Measurement error – good questions elicit good responses.  Sampling error – always exists. DCOVA
  • 41. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 41 Types of Survey Errors  Coverage error or selection bias:  Exists if some groups are excluded from the frame and have no chance of being selected.  Nonresponse error or bias:  People who do not respond may be different from those who do respond.  Sampling error:  Variation from sample to sample will always exist.  Measurement error:  Due to weaknesses in question design and / or respondent error. DCOVA
  • 42. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 42 Types of Survey Errors  Coverage error  Nonresponse error  Sampling error  Measurement error Excluded from frame Follow up on nonresponses Random differences from sample to sample Bad or leading question (continued) DCOVA
  • 43. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 43 Ethical Issues About Surveys  Coverage error and nonresponse error can be leveraged by survey designers to purposely bias survey results.  Sampling error can be an ethical issue if the findings are purposely not reported with the associated margin of error.  Measurement error can be an ethical issue:  Survey sponsor chooses leading questions.  Interviewer purposely leads respondents in a particular direction.  Respondent(s) willfully provide false information. DCOVA
  • 44. Copyright © 2020 Pearson Education Ltd. A LWAY S L E A R N I N G Slide 44 Chapter Summary In this chapter we have discussed:  Understanding issues that arise when defining variables.  How to define variables.  Understanding the different measurement scales.  How to collect data.  Identifying different ways to collect a sample.  Understanding the issues involved in data preparation.  Understanding the types of survey errors.