2. CORRELATION: THE MEASUREMENT OF RELATIONSHIPS
Correlation
1. Aims at discovering the linear relationship between/among two
or more variables. Eg Study habits, class attendance, academic
background and performance in examinations
2. Knowledge of the strength of the relationship is used in
prediction.
3. Where there are two variables, simple linear correlation is used
but for more than two variables, multiple linear correlations is
used.
4. Zero-order, Partial and Part correlations should be distinguished.
Cautions: (i) Correlation does not imply
causation
(ii) Check for linearity.
5. The correlation coefficient to use depends on
the scale of measurement.
3. Use in research
1. Class attendance and performance in class tests
2. Study habits and performance in class tests
3. BECE results and performance in Senior High School
4. WASSCE results and performance in tertiary institutions
5. Family background/status and performance in WASSCE
Correlation coefficients are used to interpret the nature of the linear
relationship. These coefficients range from ─ 1.0 to + 1.0
Scatter plots
A scatter plot or scatter diagram shows the nature of the relationship
between any 2 variables. To obtain a scatter plot, marks are made
on a graph representing the intersection of the two variables.
Scatter plots could either be linear or curvilinear.
4. 0
20
40
60
80
100
0 20 40 60 80 100
Mathematics
C
h
e
m
i
s
tr
y
Linear relationship
0
20
40
60
80
100
0 20 40 60 80 100
Accounts
E
n
g
l
i
s
h
Curvilinear relationship
5. Assumptions
1. The variables are random. Neither the values of X nor Y are
predetermined.
2. The relationship between the variables is linear.
3. The probability distribution of X’s, given a fixed Y, is normal, i.e.
the sample is drawn from a joint normal distribution.
4. The standard deviation of X’s, given each value of Y is assumed to
be the same, just as the standard deviation of Y’s given each value
of X is the same.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
X 14 16 15 10 9 18 18 14 12 13 15 18 10 12 16 20 15 12 14 10
Y 10 12 15 10 12 15 15 12 14 14 14 10 12 15 10 12 15 15 10 14
Assume the following scores in two tests.
Y = 10, X = 14, 10, 18, 16, 14; Distribution of X is normal
Y = 12, X = 16, 9, 14, 10, 20; Distribution of X is normal
Y = 14, X = 12, 13, 15, 10; Distribution of X is normal
Y = 15, X = 15, 18, 18, 12, 15, 12; Distribution of X is normal
6. Nature of the linear relationship
The relationship is described by direction and degree.
(a) Direction:
Positive, (+)
High values go with high values and low values go with low values.
Negative (-)
High values go with low values and low values go with high values.
(b) Degree:
High (strong) r > 0.60 r < −0.60
Moderate (mild) 0.40 ≤ r ≤ 0.60 −0.40 ≥ r ≥ −0.60
Low (weak) r < 0.40 r > −0.40
Perfect r = 1.0 r = −1.0
Zero r = 0.0
8. Commonly Used Types
1. Pearson Product Moment correlation coefficient (r). This is applicable
when both variables are continuous in nature. It uses interval and ratio scale
data. For example the relationship between test scores and age of students.
2. Spearman’s rank correlation coefficient (ρ). This is suitable for variables
that are both continuous and ranked. It uses ordinal scale data. For example
ranks in terms of school attendance and age.
3. Kendall’s tau-b. This is applicable to variables that are ranked with sample
sizes less than 10 with adjustment for ties.
4. Kendall’s tau-c. This is applicable to variables that are ranked with sample
sizes less than 10 and makes no adjustments for ties.
5. Phi coefficient (φ). This is used when both variables are natural
dichotomies. It is also applicable for nominal data. For example the
relationship between gender and political party affiliation.
9. 6. Contingency Coefficient. It is also applicable for nominal data. This is used
when at least one variable has more than two categories. For example the
relationship between region of birth in Ghana and political party affiliation.
7. Point biserial correlation coefficient (rpb). This is applicable when one
variable is continuous and the other is a natural dichotomy. It combines
nominal scale data with either interval or ratio scale data. For example the
relationship between gender and test scores.
8. Biserial (rb). This is applicable when one variable is continuous and the
other is an artificial dichotomy. It combines ‘artificial’ nominal scale data
with either interval or ratio scale data. For example the relationship between
age which is categorised into young and adult and test scores.
10. 9. Eta/Correlation ratio. This is used to detect nonlinear
relationships. It is also used to find the relationship between
two variables where one is nominal and the other interval scale.
10.Kappa. Cohen’s kappa is a measure of inter-rater agreement.
It measures the agreement between two raters.
11. Causation and correlation
The presence of a correlation between two variables does
not necessarily mean that there exists a causal relationship
between the two variables.
A very strong or high relationship between two variables
does not imply that one causes the other.
No cause and effect relationship is determined purely by
correlation coefficients.
16. Spearman’s Rank-Order
1. Click Analyze.
2. Click Correlate.
3. Click Bivariate.
4. Move variables from the box on the left to the
‘Variables’ box on the right.
5. Click the ‘Spearman’ box below.
6. Click the ‘Two-tailed’ circle in the Test of
Significance box.
7. Click ‘Flag significant correlations’ box.
8. Click OK
17. Let us work some SPSS examples
College A: English vrs Education
College C: Maths vrs English
Rank A vrs Rank B
18. The Phi Coefficient (φ)
For nominal scale variables (2 X 2 )
Φ =
n
2
The Contingency Coefficient (C)
For nominal scale variables
C = 2
2
n
19. Obtaining Chi-square values and correlations
1. Click Analyze.
2. Click Descriptive Statistics.
3. Click Crosstabs.
4. Click variable 1 in the crosstabs window and click to
►
move it to the box labelled Row(s).
5. Click variable 2 in the crosstabs window and click to
►
move it to the box labelled Column(s).
6. Click Statistics.
7. Click Chi-square, Phi, Cramer’s V, and Contingency
Coefficient in the Nominal Data box.
8. Click Continue.
9. Click OK.
Let us work some examples
20. Obtaining Other correlations
1. Click Analyze.
2. Click Descriptive Statistics.
3. Click Crosstabs.
4. Click variable 1 in the crosstabs window and click to
►
move it to the box labelled Row(s).
5. Click variable 2 in the crosstabs window and click to
►
move it to the box labelled Column(s).
6. Click Statistics and Click Correlations.
7. Click Gamma, Kendal’s tau-b, Kendal’s tau-c in the Ordinal
box, Eta in the Nominal by Interval box and Kappa.
8. Click Continue.
9. Click OK.
21. Uses of correlation in education
1. It is useful for selection and placement. For example, if
mathematics scores relate well with scores in chemistry,
then mathematics scores can be used for selection into a
chemistry class without conducting a chemistry selection
examination.
2. It is used to determine the reliability of standardized and
classroom tests. The Spearman-Brown split-half method
uses correlation coefficients.
3. It aids in the provision of evidences for the validity of
assessment instruments. Construct and criterion-related
validity evidences are obtained through the computation of
the correlation between two variables.
22. 4. It puts the teacher in a position to predict the future
performance of a student. An established relationship
between two subjects is often used as the basis for
predicting performance, but not with 100% certainty. For
example, if those with aggregate 6, from WASSCE have been
found in the University of Cape Coast to be obtaining First
Class degrees, then it can be predicted that any one with
WASHSCE aggregate 6, would do well in the University.
5. It is useful for research purposes. A study of the
relationship between study habits and the academic
performance of students in the University of Cape Coast
would use correlations.
23. Interpreting Correlation Coefficient (r)
A coefficient of correlation must always be judged with
regard to:
1. The nature of the variables with which we are dealing
2. The significance of the coefficient
Statistically significant or not significant
3. The variability of the group
Homogeneous or heterogeneous
4. Reliability coefficients of the instrument
High or Low
5. The purpose for which the coefficient was computed
Prediction or Exploratory
25. Amount of Variability in X or Y
Other things being equal, the value of r will be greater if there is
more variability among the observations than if there is less
variability. Examples of this characteristic of r is often termed
range restriction, restriction of range, or truncated range.
The Shapes of the Distributions of X and Y
The correlation can achieve its maximum value of 1.0 (positive
or negative) only if the shapes of the distributions of X and Y are
the same. The more dissimilar the shapes, the lower the
maximum value of the correlation.
26. Lack of Linearity
The correlation measures the extent and direction of the
linear relationship between X and Y. If the actual relationship
between X and Y is not linear—rather, if it is a curvilinear or
nonlinear relationship—the value of r will be very low and
might even be zero.
Presence of One or More Outliers
An outlier can be defined as a score or case that is so low or so
high that it stands apart from the rest of the data. Reasons for
outliers include data collection errors, data entry errors, or just
the fact that a valid value occurred ; inadvertent inclusion of
an observation from a different population ; or a subject not
understanding the instructions or wording of items on a
questionnaire.
27. Measurement Error
Measurement error, which decreases the reliability of the
measures of the variables, can be attributed to a variety of
sources: intra-individual factors (fatigue, anxiety, guessing,
etc.), administrative factors, scoring errors, environmental
factors, ambiguity of questions, too few questions,
29. Standard error of estimate
This value is called the standard error of estimate, and is
the standard deviation of errors of prediction about the
regression line. The smaller this value is, the better the
prediction.
est=
Index of forecasting efficiency
It used to provide a quick estimate of the predictive
efficiency of an obtained r. E is also called the
coefficient of dependability. It indicates directly the
percentage of reduction in the errors of prediction for a
given correlation coefficient.
E=100(1─ )
31. Coefficient of determination
It is the square of the correlation coefficient.
It is the proportion of the variance in Y accounted for by
X. An r of 0.71 gives r2
to be 0.50.
This means that 50% of the variance in Y is associated
with variability in X.
For example, if the correlation between class attendance
and performance in Statistics is 0.8, then class
attendance explains 64% of the variation in the scores in
performance in Statistics.
34. Correction for Restriction in Range
Here the sample is the result of the curtailment in the
range of scores as in admissions, resulting in a lower
correlation coefficient because the range of scores
narrows.
The formula for adjusting the r of the curtailed
distribution is
35. Example
Suppose that the correlation between an aptitude test (X)
and the first year results (Y) of a class of students is 0.40.
Suppose that the standard deviation of the class is 4.0
and the standard deviation of the larger group (all
students who took the test is 10.0.
The adjusted r becomes
=
36. Correction for Attenuation
(effect of unreliability – measurement errors).
yy
xx
xy
r
r
r
r
.
where
is the correlation between x and y
is the reliability coefficient for variable y
is the reliability coefficient for variable x
xy
r
37. Partial Correlation
This provides a measure of correlation between two variables
with the effect of the third variable (or any other additional
variables) upon the variables being correlated eliminated,
removed or nullified.
For example, the correlation between height and weight of boys
in a group where age is permitted to vary would be higher than
the correlation between height and weight in a group at
constant age.
The reason is that because certain boys are older, they are both
heavier and taller. Age is a factor that enhances the strength of
the relationship between height and weight.
The partial correlation coefficient measures the degree of the
relationship between height and weight when the effect of age
on both variables is removed.
38. Let X1, X2 and X3 be three variables.
All or part of the correlation between X1 and X2 may result
because both are correlated with X3.
A score on X1 may be divided into two parts. One part is a score
predicted from X3 and the other part is the residual, or error of
estimate, in predicting X1 from X3.
Similarly a score on X2 may be divided into two parts. One part is
a score predicted from X3 and the other part is the residual, or
error of estimate, in predicting X2 from X3.
The correlation between the two sets of residuals, or errors of
estimate in predicting X1 from X3 and X2 from X3 measured as a
degree is the partial correlation coefficient.
It is the part of the correlation which remains when the effect
of the third variable is eliminated or removed.
39. =
The result is a first-order partial correlation coefficient
because only one variable is held constant.
The result is a second-order partial correlation coefficient
because two variables are held constant.
40. SPSS procedure for partial correlation
1. Ensure that data is available in the SPSS Editor.
2. Click Analyze.
3. Click Correlate.
4. Click Partial
5. Click variable 1 in the left window and click to move it to
►
the box Variables.
6. Click variable 2 in the left window and click to move it to
►
the box labelled Variables.
7. Click variable 3 (and 4) (control variable(s)) in the left
window and click to move it to the box labelled
►
controlling for.
8. Click Two-tailed under Tests of Significance, if it has not
been selected.
9. Click OK.
44. 1.Obtain the scatter plots for College A scores for
English and Maths, English and Education, and
Maths and Education.
2.Obtain the Pearson Product Moment
Correlation coefficient for College A, English and
Maths, English and Education, and Maths and
Education.
3.Obtain the Pearson Product Moment
Correlation coefficient for College A, Maths and
Education with English partialled out.