Univariate Analysis

Univariate Analysis Simple Tools for Description

Description of Variables Univariate analysis refers to the analysis of one variable Several statistical measures can be employed to describe data Allows for comparison across variables measured in different units Provides parsimony: one or two statistics can help us understand a large number of cases POLI 399/691 - Fall 2008 Topic 6

Proportion Share of cases relative to the whole population; Range is from 0 to 1 E.g. if 50 women in sample of 125, then proportion of women is 50/125=0.4 Percentage is the proportion multiplied by 100 E.g. if proportion is .40, then percentage is .40x100=40% Basic descriptive tools POLI 399/691 - Fall 2008 Topic 6

Percentage change allows us to calculate the relative change in a variable over some period of time Percentage change is: Time 2 – Time 1 x 100 Time 1 E.g. in 1993 women made up 48% of the population and in 2003 this percentage had risen to 51%. What is the percentage change from 1993 to 2003? ((51-48)/48)x100=(3/48)x100=6.25% ( it is not 3%) Percentage point difference is the absolute change between percentage at time 1 and percentage at time 2 Using the same example, the percentage point difference in the share of women in the population between 1993 and 2003 is 3 percentage points (X 2 -X 1 ) ( it is not 3%) POLI 399/691 - Fall 2008 Topic 6

Frequency Table The frequency table (or frequency distribution) is commonly used to provide a “snapshot” of a variable Made up of 4 columns: Values (categories) of the variable The number of cases The percentage of cases The cumulative percentage of cases Consider collapsing categories if the variable has a large number of values/categories POLI 399/691 - Fall 2008 Topic 6

Table 1: Frequency Table of Grouped Data – Ages of Respondents POLI 399/691 - Fall 2008 Topic 6 Source: Hypothetical Data, 2005. Age Group Frequency Percentage Cumulative Percentage 18-24 36 15.0 15.0 25-34 44 18.3 33.3 35-44 43 17.9 51.2 45-54 46 19.2 70.4 55-64 34 14.2 84.6 65 and over 37 15.4 100.0 Total 240 100.0% 100.0%

Bar charts, pie charts and line graphs Bar charts or pie charts are good for showing the variation in the percentage of cases for each value of a variable Pie chart – compare parts to the whole Bar graphs to compare categories/values Line chart is good for longitudinal data Reveals trends over time POLI 399/691 - Fall 2008 Topic 6

Figure 1: Federal Expenditures by Sector POLI 399/691 - Fall 2008 Topic 6 Source: Hypothetical Data, 2006

Figure 2: Federal Expenditures by Sector POLI 399/691 - Fall 2008 Topic 6 Source: Hypothetical Data, 2006

POLI 399/691 - Fall 2008 Topic 6 Source: O’Neill and Stewart, “ Gender and Political Party Leadership in Canada,” Party Politics , forthcoming.

POLI 399/691 - Fall 2008 Topic 6 Table 8: Political Participation Note: Entries are percentage of respondents who reported engaging in said activity. All differences across the three groups are statistically significant (p<.01). Differences between religious and other volunteers in reported municipal voting statistically significant (p< .05). Table 8: Political Participation by Volunteer Type Source: Brenda O’Neill, “Canadian Women’s Religious Volunteerism: Compassion, Connections and Comparisons” in B. O’Neill and E. Gidengil, Gender and Social Capital, New York: Routledge, 2006. Religious Volunteers All Other Volunteers Non-Volunteers Voted in last federal election 83.7 80.8 71.6 Voted in last provincial election 82.6 79.2 70.6 Voted in last municipal election 72.8 67.4 58.0 Follow news or current affairs daily 70.2 66.8 65.7 N (over 18 only) (509) 537 (1603) 1745 (5346)

Checklist for Charts and Tables Have you chosen the proper type of chart? Have you provided a clear, descriptive title? (Note the difference between “Table” and “Figure”) Is the data source noted in a footnote? Are statistical tests reported in a footnote? For Bivariate tables, is the dependent variable on the vertical axis? The independent on the horizontal? Are the axes properly labelled? Will colour choices matter if printed in black and white? Have you provided values in bar/pie charts? Does the length of the axes distort the result? Have you referred to and explained the table/chart in the text? POLI 399/691 - Fall 2008 Topic 6

Measures of Central Tendency Measures of central tendency allow us to speak of some “standard” case for all the cases in the sample or population What is the most common unit? Is there some pattern in the data? Three different measures: mean, median and mode Nominal data? Use mode Ordinal data? Use mode and/or median Interval data? Use mode, median and/or mean The mean provides the most information; the mode, the least Always use the statistic that provides the most information; goal is parsimony POLI 399/691 - Fall 2008 Topic 6

Mode For nominal data, the mode is the measure of the “standard” or “most common” case The mode is simply that category of the variable that occurs the most often (i.e. has the most cases) The mode is the “best guess” for nominal data The utility of this statistic is limited Can change dramatically with the addition of a few cases (not very stable) Tells us about the most common value but little else POLI 399/691 - Fall 2008 Topic 6

Figure 1: Federal Expenditures by Sector POLI 399/691 - Fall 2008 Topic 6 Source: Hypothetical Data, 2006 ← Mode is Social Expenditures

Median Use with ordinal data Indicates the middle case in an ordered set of cases – the midpoint To determine the median, order the data from lowest to highest and the median is the value of the middle case Even number of cases? Take the average of the two middle values (add them together and divide by 2) POLI 399/691 - Fall 2008 Topic 6

Mean The mean describes the centre of gravity of interval data Commonly called the average Easily allows one to locate a case relative to all others Where is a case located in relation to all the others? Above average? Below average? To calculate: Σ X i /n=(X 1 +X 2 +…+X i )/n where i=number of cases Reliable but sensitive to outliers (cases that are much larger or much smaller than the rest) Median provides a better sense of the most common case when there are outliers POLI 399/691 - Fall 2008 Topic 6

Example: Income data For these data, the mean is $1,039,700 and the median is $36,5000 We call a distribution with outliers a skewed distribution POLI 399/691 - Fall 2008 Topic 6 Median -> Mean -> Income for 10 cases $24,000 $25,000 $28,000 $30,000 $35,000 $38,000 $56,000 $75,000 $86,000 $10,000,000

Measures of Dispersion Once you know the standard case, you should also know how standard the case is – that is, how well does this one case represent all the cases? For nominal data, there is no measure of dispersion; one could simply indicate how many categories exist For ordinal data, the range provides some information about the spread of data The range is simply the highest value minus the lowest value When we have outliers the range gives a distorted picture of the data E.g. for our income data, the range is $10,000,000-$24,000 = $9,976,000 POLI 399/691 - Fall 2008 Topic 6

For interval data, we use the standard deviation A measure of the average deviation of a case from the mean value A deviation is the distance and direction of any raw score from the mean The larger the deviation, the further the score from the mean The deviation can be either positive or negative (larger or smaller than the mean value) The mean is that value where the sum of negative deviations equals the sum of positive deviations You want to calculate the average size of these deviations but we need to ‘fix’ the problem of the deviations summing to 0 To fix the problem, we square each deviation before we sum them, and then take the square root of the total POLI 399/691 - Fall 2008 Topic 6

Formula for standard deviation POLI 399/691 - Fall 2008 Topic 6 Note: N-1 is employed for a sample

To calculate the standard deviation: Calculate the mean Subtract the mean from each value (these are the deviations) Square each of the deviations Sum them (add them together) Divide this sum by the number of cases (to get the average squared deviation) Compute the square root of average squared deviation POLI 399/691 - Fall 2008 Topic 6

Table 8.10 Computation of Standard Deviation, Beth’s Grades POLI 399/691 - Fall 2008 Topic 6 Note: The “N – 1” term is used when sampling procedures have been used. When population values are used the denominator is “N.” SPSS uses N – 1 in calculating the standard deviation in the DESCRIPTIVES procedure. SUBJECT GRADE Sociology 66 66 – 82 = –16 256 Psychology 72 72 – 82 = –10 100 Political science 88 88 – 82 = 6 36 Anthropology 90 90 – 82 = 8 64 Philosophy 94 94 – 82 = 12 144 MEAN 82.0 TOTAL 600

The result is always a positive number but you can think of the average deviation as occurring either positively or negatively The last measure to review is the variance Variance is simply the square of the standard deviation Variance and standard deviation are easily calculated by software programs Good to calculate it on your own for small samples to get a “feel” for the statistic These are two statistics that will be used again for other calculations POLI 399/691 - Fall 2008 Topic 6

The smaller the standard deviation, the tighter the cases are around the mean The mean is a “better” predictor of scores when the standard deviation is small Like the mean, the standard deviation is also sensitive to outliers Describing data effectively requires information on both the mean and the standard deviation POLI 399/691 - Fall 2008 Topic 6

Statistics and SPSS POLI 399/691 - Fall 2008 Topic 6 Source: Jackson and Verberg, p.222. Statistic Nominal Ordinal Interval Central Tendency Mode Mode Median Mode Median Mean Dispersion -- Range Range Standard Deviation Variance SPSS Commands (options) Frequencies (mode) Frequencies (range, median) Descriptives (all)

Z Scores (or standardized scores) A Z score represents the distance from the mean, in standard deviation units, of any value in a distribution Z scores are comparable across different populations and different units because they are offered in standard units The Z score formula is as follows: POLI 399/691 - Fall 2008 Topic 6

A negative z-score means the case falls below the mean; a positive one means it lies above the mean A z-score of 0 means ….? The larger the score, the further from the mean Useful when combining variables with very different ranges into indexes Transform into Z scores and then create the index To obtain Z scores in SPSS Select Analyze -> Descriptive Statistics -> Descriptives Select one or more variables Check “Save standardized values as variables” to save z scores as new variables. They will be the last variables in the variable view screen POLI 399/691 - Fall 2008 Topic 6

Key terms Proportion Percentage Percentage change Percentage point difference Bar chart Pie chart Frequency table Cumulative percentage Mean Median Mode Outlier Skewed distribution Measures of variation Range Standard deviation Variance Standardized (Z) scores POLI 399/691 - Fall 2008 Topic 6

Univariate Analysis

More Related Content

What's hot (20)

Viewers also liked (10)

Similar to Univariate Analysis (20)

More from christineshearer (10)

Recently uploaded (20)

Univariate Analysis

Editor's Notes