1. Review Statistics and Probability.pdf

Introduction and
Descriptive Statistics
Review Statistics and Probability
Modifiedby:
Dr. AchmadNizar Hidayanto
Nur FitriahAyuning Budi
KhumaisaNuraini

Learning Outcomes
• Review key statistical and
research terms
1
• Review the concept of central
tendency
2
• Review the concept of
variability
3

Introduction to Statistics
PowerPoint Lecture Slides
Essentials of Statistics for the
Behavioral Sciences
Eighth Edition
by Frederick J Gravetter and Larry B. Wallnau

1.1 Statistics, Science and
Observations
• “Statistics” means “statistical procedures”
• Uses of Statistics
– Organize and summarize information
– Determine exactly what conclusions are
justified based on the results that were
obtained
• Goals of statistical procedures
– Accurate and meaningful interpretation
– Provide standardized evaluation procedures

1.2 Populations and Samples
• Population
– The set of all the individuals of interest in a
particular study
– Vary in size; often quite large
• Sample
– A set of individuals selected from a population
– Usually intended to represent the population
in a research study

Figure 1.1
Relationship between population and sample

Variables and Data
• Variable
– Characteristic or condition that changes or has
different values for different individuals
• Data (plural)
– Measurements or observations of a variable
• Data set
– A collection of measurements or observations
• A datum (singular)
– A single measurement or observation
– Commonly called a score or raw score

Parameters and Statistics
• Parameter
– A value, usually a
numerical value, that
describes a population
– Derived from
measurements of
the individuals in
the population
• Statistic
– A value, usually a
numerical value, that
describes a sample
– Derived from
measurements of
the individuals in
the sample

Descriptive & Inferential Statistics
• Descriptive statistics
– Summarize data
– Organize data
– Simplify data
• Familiar examples
– Tables
– Graphs
– Averages
• Inferential statistics
– Study samples to make
generalizations about
the population
– Interpret experimental
data
• Common terminology
– “Margin of error”
– “Statistically significant”

Sampling Error
• Sample is never identical to population
• Sampling Error
– The discrepancy, or amount of error, that
exists between a sample statistic and the
corresponding population parameter
• Example: Margin of Error in Polls
– “This poll was taken from a sample of registered
voters and has a margin of error of plus-or-minus 4
percentage points” (Box 1.1)

Figure 1.2
A demonstration of sampling error

Figure 1.3
Role of statistics in experimental research

1.3 Data Structures, Research
Methods, and Statistics
• Individual Variables
– A variable is observed
– “Statistics” describe the observed variable
– Category and/or numerical variables
– Descriptive statistics
• Relationships between variables
– Two variables observed and measured
– One of two possible data structures used to
determine what type of relationship exists

Relationships Between Variables
• Data Structure I: The Correlational Method
– One group of participants
– Measurement of two variables for each
participant
– Goal is to describe type and magnitude of the
relationship
– Patterns in the data reveal relationships
– Non-experimental method of study

Figure 1.4
Data structures for studies evaluating the
relationship between variables

Correlational Method Limitations
• Can demonstrate the existence of a
relationship
• Does not provide an explanation for the
relationship
• Most importantly, does not demonstrate a
cause-and-effect relationship between the
two variables

Relationships Between Variables
• Data Structure II: Comparing two (or
more) groups of Scores
– One variable defines the groups
– Scores are measured on second variable
– Both experimental and non-experimental
studies use this structure

Figure 1.5
Data structure for studies comparing groups

Experimental Method
• Goal of Experimental Method
– To demonstrate a cause-and-effect
relationship
• Manipulation
– The level of one variable is determined by the
experimenter
• Control rules out influence of other
variables
– Participant variables
– Environmental variables

Figure 1.6
The structure of an experiment

Independent/Dependent Variables
• Independent Variable is the variable
manipulated by the researcher
– Independent because no other variable in the
study influences its value
• Dependent Variable is the one observed
to assess the effect of treatment
– Dependent because its value is thought to
depend on the value of the independent
variable

Experimental Method: Control
• Methods of control
– Random assignment of subjects
– Matching of subjects
– Holding level of some potentially influential variables
constant
• Control condition
– Individuals do not receive the experimental treatment.
– They either receive no treatment or they receive a neutral,
placebo treatment
– Purpose: to provide a baseline for comparison with the
experimental condition
• Experimental condition
– Individuals do receive the experimental treatment

Non-experimental Methods
• Non-equivalent Groups
– Researcher compares groups
– Researcher cannot control who goes into which
group
• Pre-test / Post-test
– Individuals measured at two points in time
– Researcher cannot control influence of the
passage of time
• Independent variable is quasi-independent

Figure 1.7
Two examples of non-experimental studies
Insert NEW Figure 1.7

1.4 Variables and Measurement
• Scores are obtained by observing and
measuring variables that scientists use to
help define and explain external behaviors
• The process of measurement consists of
applying carefully defined measurement
procedures for each variable

Constructs & Operational Definitions
• Constructs
– Internal attributes
or characteristics
that cannot be
directly observed
– Useful for
describing and
explaining behavior
• Operational
– Identifies the set of
operations required to
measure an external
(observable) behavior
– Uses the resulting
measurements as both
a definition and a
measurement of a
hypothetical construct

Discrete and Continuous
Variables
• Discrete variable
– Has separate, indivisible categories
– No values can exist between two neighboring
categories
• Continuous variable
– Have an infinite number of possible values
between any two observed values
– Every interval is divisible into an infinite
number of equal parts

Figure 1.8
Example: Continuous Measurement

Real Limits of Continuous
Variables
• Real Limits are the boundaries of each
interval representing scores measured on
a continuous number line
– The real limit separating two adjacent scores
is exactly halfway between the two scores
– Each score has two real limits
• The upper real limit marks the top of the
interval
• The lower real limit marks the bottom of the
interval

Scales of Measurement
• Measurement assigns individuals or events to
categories
– The categories can simply be names such as
male/female or employed/unemployed
– They can be numerical values such as 68 inches
or 175 pounds
• The complete set of categories makes up a
scale of measurement
• Relationships between the categories determine
different types of scales

Scales of Measurement
Scale Characteristics Examples
Nominal •Label and categorize
•No quantitative distinctions
•Gender
•Diagnosis
•Experimental or Control
Ordinal •Categorizes observations
•Categories organized by
size or magnitude
•Rank in class
•Clothing sizes (S,M,L,XL)
•Olympic medals
Interval •Ordered categories
•Interval between categories
of equal size
•Arbitrary or absent zero
point
•Temperature
•IQ
•Golf scores (above/below
par)
Ratio •Ordered categories
•Equal interval between
categories
•Absolute zero point
•Number of correct answers
•Time to complete task
•Gain in height since last
year

Central Tendency
Essentials of Statistics for the Behavioral
Sciences
Seventh Edition

1.5 Overview of central tendency
• Central tendency
– A single score to define the “center” of a
distribution
• Purpose: find the single score that is most
typical or best represents the entire group

Figure 1.9
What is the “center” of each distribution?

1.6 The Mean
• The mean is the sum of all the scores
divided by the number of scores in the
data.
Population Mean Sample Mean
N
X



n
X
M



The Mean: Three definitions
• Sum of the scores divided by the number
of scores in the data
• Amount each individual receives when
total is divided equally among all: M = ∑X /
n
• The balance point for the distribution

Computing the Mean from a
Frequency Distribution Table
Quiz Score (X) f fX
10 1 10
9 2 18
8 4 32
7 0 0
6 1 6
Total n = Σf = 8 ΣfX = 66
M = ??

The Weighted Mean
• Combine two sets of scores
• Three steps:
– Determine the combined sum of all the scores
– Determine the combined number of scores
– Divide the sum of scores by the total number
of scores
2
1
2
1
mean
(weighted)
overall
n
n
X
X
M




 

Characteristics of the Mean
• Changing the value of any score changes the
mean.
• Introducing a new score or removing a score
usually changes the mean.
• Adding or subtracting a constant from each
score changes the mean by the same constant.
• Multiplying or dividing each score by a constant
multiplies or divides the mean by
that constant.

1.7 The Median
• The median is the midpoint of the scores
in a distribution when they are listed in
order from smallest to largest.
• The median divides the scores into two
groups of equal size.

The Precise Median for a
Continuous Variable
• A continuous variable can be infinitely divided
• The precise median is located in the interval
defined by the real limits of the value.
• It may be necessary to determine the fraction of
the interval needed to divide the distribution
exactly in half.
•
interval
in the
number
50%
reach
to
needed
number
fraction 

Median, Mean, and Middle
• Mean is the balance point of a distribution
– Defined by distances
– Often is not the midpoint of the scores
• Median is the midpoint of a distribution
– Defined by number of scores
– Often is not the balance point of the scores
• Both measure central tendency, using two
different concepts of middle or “central.”

1.8 The Mode
• The mode is the score or category that has
the greatest frequency of any in the
frequency distribution
– Can be used with any scale of measurement
– Corresponds to an actual score in the data
– The only one used with nominal data
• It is possible to have more than one mode

1.9 Selecting a Measure of Central
Tendency
Measure of
Central
Tendency
Appropriate to choose
when …
Should not be used
when…
Mean No situation precludes it •Extreme scores
•Skewed distribution
•Undetermined values
•Open-ended distribution
•Ordinal scale
•Nominal scale
Median •Extreme scores
•Skewed distribution
•Undetermined values
•Open-ended distribution
•Ordinal scale
•Nominal scale
Mode •Nominal scales
•Discrete variables
•Describing shape
•Interval or ratio data, except
to accompany mean or
median

Figure 1.18
Means or Medians in a Line Graph

Figure 1.19
Means or Medians in a Bar Graph

• Symmetrical distributions
– Mean and median have same value
– If exactly one mode, it has same value as the
mean and the median
– Distribution may have more than one mode,
or no mode at all
1.10 Central Tendency and the
Shape of the Distribution

Central Tendency in Skewed
Distributions
• Mean is found far toward the long tail (positive or
negative)
• Median is found toward the long tail, but not as
far as the mean
• Mode is found near the piled-up scores.
• If positively skewed, order from left to right is
mode, median, mean;
• If negatively skewed, order from left to right is
mean, median, mode

Variability
Essentials of Statistics for the Behavioral
Sciences
Seventh Edition

1.11 Overview
• Variability can be defined several ways
– A quantitative measure of the differences
between scores
– Describes the degree to which the scores are
spread out or clustered together
• Purposes of Measure of Variability
– Describe the distribution
– Measure how well an individual score
represents the distribution

Figure 1.22
Population Distributions: Height, Weight

Three Measures of Variability
• The Range
• The Standard Deviation
• The Variance

1.12 The Range
• The distance covered by the scores in a
distribution
– From smallest value to highest value
• For continuous data, real limits are used
• For discrete variables range is number of
categories
range = URL for Xmax — LRL for Xmin

1.13 Standard Deviation and
Variance for a Population
• Most common and most important measure
of variability
– A measure of the standard, or average, distance from
the mean
– Describes whether the scores are clustered closely
around the mean or are widely scattered
• Calculation differs for population and samples

Developing the Standard Deviation
• Step One: Determine the Deviation Score (distance
from the mean) for each score:
• Step Two: Calculate Mean (Average) of Deviations
– Deviations sum to 0 because M is balance point of the
distribution
– The Mean (Average) Deviation will always equal 0;
another method must be found
Deviation score = X — μ

Developing the Standard Deviation (2)
• Step Three: Get rid of negatives in
Deviations:
– Square each deviation score
– Using the squared values, compute the Mean
Squared Deviation, known as the Variance
–
• Variability is now measured in squared
units and is called the Variance.
Population variance equals the mean squared
deviation -- Variance is the average squared
distance from the mean

Developing the Standard Deviation (2)
• Step Four:
– Variance measures the average squared
distance from the mean; not quite on goal
• Correct for having squared all the
deviations by taking the square root of the
variance
Variance
Deviation
Standard 

Figure 1.23
Calculation of the Variance

Formulas for Population
Variance and Standard Deviation
•
• SS (sum of squares) is the sum of the
squared deviations of scores from the
mean
• Two equations for computing SS
scores
of
number
deviations
squared
of
sum
Variance 

Two formulas for SS
Definitional Formula
• Find each deviation
score (X–μ)
• Square each deviation
score, (X–μ)2
• Sum up the squared
deviations
Computational Formula
 2
 
 
X
SS
• Square each score and
sum the squared scores
• Find the sum of scores,
square it, divide by N
• Subtract the second
part from the first
 
N
X
X
SS
2
2 
 


Population Variance: Formula
and Notation
Formula
N
SS
N
SS
deviation
standard
variance


Notation
• Lowercase Greek letter
sigma is used to denote
the standard deviation of
a population:
σ
• Because the standard
deviation is the square
root of the variance, we
write the variance of a
population as σ2

Figure 1.24
Graphic Representation of Mean and Standard Deviation

1.14 Standard Deviation and
Variance for a Sample
• Goal of inferential statistics:
– Draw general conclusions about population
– Based on limited information from a sample
• Samples differ from the population
– Samples have less variability
– Computing the Variance and Standard
Deviation in the same way as for a population
would give a biased estimate of the
population values

Figure 1.25
Population of Adult Heights

Variance and Standard Deviation
for a Sample
• Sum of Squares (SS) is computed as
before
• Formula has n-1 rather than N in the
denominator
• Notation uses s instead of σ
1
1
2






n
SS
n
SS
s
sample
of
deviation
standard
s
sample
of
variance

Degrees of Freedom
• Population variance
– Mean is known
– Deviations are computed from a known mean
• Sample variance as estimate of population
– Population mean is unknown
– Using sample mean restricts variability
• Degrees of freedom
– Number of scores in sample that are
independent and free to vary
– Degrees of freedom (df) = n – 1

1.15 More about Variance and
Standard Deviation
• Unbiased estimate of a population
parameter
– Average value of statistic is equal to parameter
– Average value uses all possible samples of a
particular size n
• Biased estimate of a population parameter
– Systematically overestimates or
underestimates (as with variance) the
population parameter

Table 4.1 Biased & Unbiased
Estimates
Sample Statistics
Sample 1st Score 2nd Score Mean
Biased
(used n)
Unbiased
(used n-1)
1 0 0 0.00 0.00 0.00
2 0 3 1.50 2.25 4.50
3 0 9 4.50 20.25 40.50
4 3 0 1.50 2.25 4.50
5 3 3 3.00 0.00 0.00
6 3 9 6.00 9.00 18.00
7 9 0 4.50 20.25 40.50
8 9 3 6.00 9.00 18.00
9 9 9 9.00 0.00 0.00
Totals 36.00 63.00/9 126.00/8
Actual σ2 = 14
This is an adaptation of Table 4.1

Figure 1.26
Sample of n = 20, M = 36, and s = 4

Transformations of Scale
• Adding a constant to each score
– The Mean is changed
– The standard deviation is unchanged
• Multiplying each score by a constant
– The Mean is changed
– Standard Deviation is also changed
– The Standard Deviation is multiplied by
that constant

Variance and Inferential
Statistics
• Goal of inferential statistics: To detect
meaningful and significant patterns in
research results
• Variability in the data influences how easy it
is to see patterns
– High variability obscures patterns that would
be visible in low variability samples
– Variability is sometimes called error variance

Figure 1.27
Experiments with high and low variability

1. Review Statistics and Probability.pdf

More Related Content

Similar to 1. Review Statistics and Probability.pdf (20)

More from Muhammad Mishbah (6)

Recently uploaded (20)

1. Review Statistics and Probability.pdf