SlideShare a Scribd company logo
Univariate Analysis Simple Tools for Description
Description of Variables Univariate analysis refers to the analysis of one variable Several statistical measures can be employed to  describe data   Allows for comparison across variables measured in different units Provides parsimony: one or two statistics can help us understand a large number of cases POLI 399/691 - Fall 2008  Topic 6
Proportion Share of cases relative to the whole population; Range is from 0 to 1 E.g. if 50 women in sample of 125, then proportion of women is 50/125=0.4 Percentage  is the proportion multiplied by 100 E.g. if proportion is .40, then percentage is .40x100=40% Basic descriptive tools POLI 399/691 - Fall 2008  Topic 6
Percentage change  allows us to calculate the relative change in a variable over some period of time Percentage change is: Time 2  –  Time 1   x 100   Time 1 E.g. in 1993 women made up 48% of the population and in 2003 this percentage had risen to 51%. What is the percentage change from 1993 to 2003? ((51-48)/48)x100=(3/48)x100=6.25% ( it is not 3%) Percentage point difference  is the absolute change between percentage at time 1 and percentage at time 2 Using the same example, the percentage point difference in the share of women in the population between 1993 and 2003 is 3 percentage points (X 2 -X 1 ) ( it is not 3%) POLI 399/691 - Fall 2008  Topic 6
Frequency Table The frequency table (or frequency distribution) is commonly used to provide a “snapshot” of a variable Made up of 4 columns: Values (categories) of the variable The number of cases  The percentage of cases The cumulative percentage of cases  Consider collapsing categories if the variable has a large number of values/categories POLI 399/691 - Fall 2008  Topic 6
Table 1: Frequency Table of Grouped Data – Ages of Respondents POLI 399/691 - Fall 2008  Topic 6 Source: Hypothetical Data, 2005. Age Group Frequency Percentage Cumulative Percentage 18-24 36 15.0 15.0 25-34 44 18.3 33.3 35-44 43 17.9 51.2 45-54 46 19.2 70.4 55-64 34 14.2 84.6 65 and over 37 15.4 100.0 Total 240 100.0% 100.0%
Bar charts, pie charts and line graphs Bar charts or pie charts are good for showing the variation in the percentage of cases for each value of a variable Pie chart – compare parts to the whole Bar graphs to compare categories/values Line chart is good for longitudinal data Reveals trends over time POLI 399/691 - Fall 2008  Topic 6
Figure 1: Federal Expenditures by Sector   POLI 399/691 - Fall 2008  Topic 6 Source: Hypothetical Data, 2006
Figure 2: Federal Expenditures by Sector POLI 399/691 - Fall 2008  Topic 6 Source: Hypothetical Data, 2006
POLI 399/691 - Fall 2008  Topic 6 Source: O’Neill and Stewart, “ Gender and Political Party Leadership in Canada,”  Party Politics , forthcoming.
POLI 399/691 - Fall 2008  Topic 6 Table 8: Political Participation Note: Entries are percentage of respondents who reported engaging in said activity. All differences across the three groups are statistically significant (p<.01). Differences between religious and other volunteers in reported municipal voting statistically significant (p< .05). Table 8: Political Participation  by Volunteer Type Source: Brenda O’Neill, “Canadian Women’s Religious Volunteerism: Compassion, Connections and Comparisons” in B. O’Neill and E. Gidengil,  Gender and Social Capital, New York: Routledge, 2006. Religious Volunteers All Other Volunteers Non-Volunteers Voted in last federal election 83.7 80.8 71.6 Voted in last provincial election 82.6 79.2 70.6 Voted in last municipal election 72.8 67.4 58.0 Follow news or current affairs daily 70.2 66.8 65.7 N (over 18 only) (509) 537 (1603) 1745 (5346)
Checklist for Charts and Tables Have you chosen the proper type of chart? Have you provided a clear, descriptive title? (Note the difference between “Table” and “Figure”) Is the data source noted in a footnote? Are statistical tests reported in a footnote? For Bivariate tables, is the dependent variable on the vertical axis? The independent on the horizontal? Are the axes properly labelled? Will colour choices matter if printed in black and white? Have you provided values in bar/pie charts? Does the length of the axes distort the result? Have you referred to and explained the table/chart in the text? POLI 399/691 - Fall 2008  Topic 6
Measures of Central Tendency Measures of central tendency allow us to speak of some “standard” case for all the cases in the sample or population What is the most common unit? Is there some pattern in the data? Three different measures: mean, median and mode Nominal data? Use  mode Ordinal data? Use   mode   and/or  median Interval data? Use   mode, median   and/or  mean The mean provides the most information; the mode, the least Always use the statistic that provides the most information; goal is parsimony POLI 399/691 - Fall 2008  Topic 6
Mode For nominal data, the mode is the measure of the “standard” or “most common” case The mode is simply that category of the variable that occurs the most often (i.e. has the most cases) The mode is the “best guess” for nominal data The utility of this statistic is limited Can change dramatically with the addition of a few cases (not very stable) Tells us about the most common value but little else POLI 399/691 - Fall 2008  Topic 6
Figure 1: Federal Expenditures by Sector   POLI 399/691 - Fall 2008  Topic 6 Source: Hypothetical Data, 2006 ←  Mode is Social Expenditures
Median Use with ordinal data Indicates the middle case in an ordered set of cases – the midpoint To determine the median, order the data from lowest to highest and the median is the value of the middle case Even number of cases? Take the average of the two middle values (add them together and divide by 2) POLI 399/691 - Fall 2008  Topic 6
Mean The mean describes  the centre of gravity  of interval data Commonly called the  average Easily allows one to locate a case relative to all others Where is a case located in relation to all the others? Above average? Below average? To calculate: Σ X i /n=(X 1 +X 2 +…+X i )/n where i=number of cases Reliable but sensitive to outliers (cases that are much larger or much smaller than the rest) Median provides a better sense of the most common case when there are outliers POLI 399/691 - Fall 2008  Topic 6
Example: Income data For these data, the mean is $1,039,700 and the median is $36,5000 We call a distribution with outliers a  skewed distribution POLI 399/691 - Fall 2008  Topic 6 Median  -> Mean  -> Income for 10 cases $24,000 $25,000 $28,000 $30,000 $35,000 $38,000 $56,000 $75,000 $86,000 $10,000,000
Measures of Dispersion Once you know the standard case, you should also know how standard the case is – that is, how well does this one case represent all the cases? For nominal data, there is no measure of dispersion; one could simply indicate how many categories exist For ordinal data, the  range  provides some information about the spread of data The range is simply the highest value minus the lowest value When we have outliers the range gives a distorted picture of the data E.g. for our income data, the range is $10,000,000-$24,000 =  $9,976,000 POLI 399/691 - Fall 2008  Topic 6
For interval data, we use the  standard deviation A measure of  the average deviation of a case from the mean value A deviation is the distance and direction of any raw score from the mean The larger the deviation, the further the score from the mean The deviation can be either positive or negative (larger or smaller than the mean value) The mean is that value where the sum of negative deviations equals the sum of positive deviations You want to calculate the average size of these deviations but we need to ‘fix’ the problem of the deviations summing to 0 To fix the problem, we square each deviation before we sum them, and then take the square root of the total POLI 399/691 - Fall 2008  Topic 6
Formula for standard deviation POLI 399/691 - Fall 2008  Topic 6 Note: N-1 is employed for a sample
To calculate the standard deviation: Calculate the mean Subtract the mean from each value (these are the deviations) Square each of the deviations Sum them (add them together) Divide this sum by the number of cases (to get the average squared deviation) Compute the square root of average squared deviation POLI 399/691 - Fall 2008  Topic 6
Table 8.10  Computation of Standard Deviation, Beth’s Grades POLI 399/691 - Fall 2008  Topic 6 Note: The “N – 1” term is used when sampling procedures have been used. When population values are used the denominator is “N.” SPSS uses N – 1 in calculating the standard deviation in the DESCRIPTIVES procedure. SUBJECT GRADE Sociology 66 66  – 82 = –16 256 Psychology 72 72  – 82 = –10 100 Political science 88 88  – 82 =  6 36 Anthropology 90 90  – 82 =  8 64 Philosophy 94 94  – 82 =  12 144 MEAN 82.0 TOTAL 600
The result is always a positive number but you can think of the average deviation as occurring either positively or negatively The last measure to review is the  variance Variance is simply the square of the standard deviation Variance and standard deviation are easily calculated by software programs Good to calculate it on your own for small samples to get a “feel” for the statistic These are two statistics that will be used again for other calculations POLI 399/691 - Fall 2008  Topic 6
The smaller the standard deviation, the tighter the cases are around the mean The mean is a “better” predictor of scores when the standard deviation is small Like the mean, the standard deviation is also sensitive to outliers Describing data effectively requires information on both the mean and the standard deviation POLI 399/691 - Fall 2008  Topic 6
Statistics and SPSS POLI 399/691 - Fall 2008  Topic 6 Source: Jackson and Verberg, p.222. Statistic Nominal Ordinal Interval Central Tendency Mode Mode Median Mode Median Mean Dispersion -- Range Range Standard Deviation  Variance SPSS Commands (options) Frequencies (mode) Frequencies (range, median) Descriptives (all)
Z Scores (or standardized scores) A Z score represents the distance from the mean, in standard deviation units, of any value in a distribution Z scores   are comparable across different populations and different units because they are offered in  standard units The Z score formula is as follows:   POLI 399/691 - Fall 2008  Topic 6
A negative z-score means the case falls below the mean; a positive one means it lies above the mean A z-score of 0 means ….? The larger the score, the further from the mean Useful when combining variables with very different ranges into indexes Transform into Z scores and then create the index To obtain Z scores in SPSS Select Analyze  ->  Descriptive Statistics -> Descriptives Select one or more variables Check “Save standardized values as variables” to save z scores as new variables. They will be the last variables in the variable view screen POLI 399/691 - Fall 2008  Topic 6
Key terms Proportion Percentage Percentage change Percentage point difference Bar chart Pie chart Frequency table Cumulative percentage Mean Median Mode Outlier Skewed distribution Measures of variation Range Standard deviation Variance Standardized (Z) scores POLI 399/691 - Fall 2008  Topic 6

More Related Content

What's hot (20)

PPTX
Sampling distribution
swarna dey
 
PDF
Hypothesis testing; z test, t-test. f-test
Shakehand with Life
 
PPT
Data Preparation and Processing
Mehul Gondaliya
 
PPTX
Basics stat ppt-types of data
Farhana Shaheen
 
PPTX
Type of data
Amit Sharma
 
PPTX
Descriptive statistics
Aileen Balbido
 
PPTX
Sample design
QURATULAIN MUGHAL
 
PPSX
Inferential statistics.ppt
Nursing Path
 
PPTX
Sample Size Determination
Tina Sepehrifar
 
PDF
Data analysis
Nursing Path
 
PPTX
Introduction to Descriptive Statistics
Sanju Rusara Seneviratne
 
PPTX
Descriptive Statistics
Bhagya Silva
 
PPTX
Statistical inference
Jags Jagdish
 
PPT
Sampling distribution
Nilanjan Bhaumik
 
PPTX
Theory of estimation
Tech_MX
 
PDF
Simple linear regression
Avjinder (Avi) Kaler
 
DOCX
Research methodology theory chapt. 1- kotthari
Rubia Bhatia
 
PPT
Simple linear regression
RekhaChoudhary24
 
PPT
Correlation
Anish Maman
 
PPTX
Linear regression
Tech_MX
 
Sampling distribution
swarna dey
 
Hypothesis testing; z test, t-test. f-test
Shakehand with Life
 
Data Preparation and Processing
Mehul Gondaliya
 
Basics stat ppt-types of data
Farhana Shaheen
 
Type of data
Amit Sharma
 
Descriptive statistics
Aileen Balbido
 
Sample design
QURATULAIN MUGHAL
 
Inferential statistics.ppt
Nursing Path
 
Sample Size Determination
Tina Sepehrifar
 
Data analysis
Nursing Path
 
Introduction to Descriptive Statistics
Sanju Rusara Seneviratne
 
Descriptive Statistics
Bhagya Silva
 
Statistical inference
Jags Jagdish
 
Sampling distribution
Nilanjan Bhaumik
 
Theory of estimation
Tech_MX
 
Simple linear regression
Avjinder (Avi) Kaler
 
Research methodology theory chapt. 1- kotthari
Rubia Bhatia
 
Simple linear regression
RekhaChoudhary24
 
Correlation
Anish Maman
 
Linear regression
Tech_MX
 

Viewers also liked (10)

PPT
Multivariate Analysis Techniques
Mehul Gondaliya
 
PPTX
Univariate Analysis
Soumya Sahoo
 
PPTX
Bivariate
Vikas Saini
 
PPT
Statistical Methods
guest9fa52
 
PDF
Bivariate
Deven Vaijapurkar
 
PPTX
ppt on data collection , processing , analysis of data & report writing
IVRI
 
PPSX
Multivariate Analysis An Overview
guest3311ed
 
PPTX
Multivariate analysis
SUDARSHAN KUMAR PATEL
 
PPTX
Data analysis powerpoint
jamiebrandon
 
PPT
Chapter 10-DATA ANALYSIS & PRESENTATION
Ludy Mae Nalzaro,BSM,BSN,MN
 
Multivariate Analysis Techniques
Mehul Gondaliya
 
Univariate Analysis
Soumya Sahoo
 
Bivariate
Vikas Saini
 
Statistical Methods
guest9fa52
 
ppt on data collection , processing , analysis of data & report writing
IVRI
 
Multivariate Analysis An Overview
guest3311ed
 
Multivariate analysis
SUDARSHAN KUMAR PATEL
 
Data analysis powerpoint
jamiebrandon
 
Chapter 10-DATA ANALYSIS & PRESENTATION
Ludy Mae Nalzaro,BSM,BSN,MN
 
Ad

Similar to Univariate Analysis (20)

PPT
Biostatistics basics-biostatistics4734
AbhishekDas15
 
PPT
Biostatistics basics-biostatistics4734
AbhishekDas15
 
PPTX
Lecture 3 - Descriptive statistics Spring 2023.pptx
aazmeerrahman
 
PPTX
Quants
aliquis
 
PPT
PART 1 DISCUSSION MEASURES OF CENTAL TENDENCY.ppt
DavidJenil
 
PPTX
Data Management_new.pptx
DharenOla3
 
PPTX
RVO-STATISTICS_Statistics_Introduction To Statistics IBBI.pptx
thesisvnit
 
PPTX
Analyzing quantitative data
Bing Villamor
 
PPT
businessstatistics-stat10022-200411201812.ppt
tejashreegurav243
 
PPT
9주차
Kookmin University
 
PPT
Statistical Method for engineers and science
usaproductservices
 
PPT
Research methodology3
Tosif Ahmad
 
PPTX
Probability in statistics
Sukirti Garg
 
PPT
Business statistics (Basics)
AhmedToheed3
 
PPTX
Descriptive Statistics.pptx
test215275
 
PPT
Probability and statistics(assign 7 and 8)
Fatima Bianca Gueco
 
DOCX
Hcai 5220 lecture notes on campus sessions fall 11(2)
Twene Peter
 
PPT
Statistics
pikuoec
 
PPTX
STATISTICS.pptx for the scholars and students
ssuseref12b21
 
PPT
Finals Stat 1
Fatima Bianca Gueco
 
Biostatistics basics-biostatistics4734
AbhishekDas15
 
Biostatistics basics-biostatistics4734
AbhishekDas15
 
Lecture 3 - Descriptive statistics Spring 2023.pptx
aazmeerrahman
 
Quants
aliquis
 
PART 1 DISCUSSION MEASURES OF CENTAL TENDENCY.ppt
DavidJenil
 
Data Management_new.pptx
DharenOla3
 
RVO-STATISTICS_Statistics_Introduction To Statistics IBBI.pptx
thesisvnit
 
Analyzing quantitative data
Bing Villamor
 
businessstatistics-stat10022-200411201812.ppt
tejashreegurav243
 
Statistical Method for engineers and science
usaproductservices
 
Research methodology3
Tosif Ahmad
 
Probability in statistics
Sukirti Garg
 
Business statistics (Basics)
AhmedToheed3
 
Descriptive Statistics.pptx
test215275
 
Probability and statistics(assign 7 and 8)
Fatima Bianca Gueco
 
Hcai 5220 lecture notes on campus sessions fall 11(2)
Twene Peter
 
Statistics
pikuoec
 
STATISTICS.pptx for the scholars and students
ssuseref12b21
 
Finals Stat 1
Fatima Bianca Gueco
 
Ad

More from christineshearer (10)

DOCX
08.10.08 POLI 383
christineshearer
 
PPT
POLI_399_tutorial_4
christineshearer
 
DOCX
07.10.08 POLI 399
christineshearer
 
DOCX
01.10.08 POLI 383
christineshearer
 
DOCX
02.10.08 POLI 399
christineshearer
 
DOCX
03.10.08 POLI 383
christineshearer
 
DOCX
06.10.08 POLI 383
christineshearer
 
PPT
Poli_399_Tutorial_Week_Three_-_Sept_29th_(2)
christineshearer
 
PPT
Concepts, Operationalization and Measurement
christineshearer
 
DOC
SOME RECOGNITION CASES
christineshearer
 
08.10.08 POLI 383
christineshearer
 
POLI_399_tutorial_4
christineshearer
 
07.10.08 POLI 399
christineshearer
 
01.10.08 POLI 383
christineshearer
 
02.10.08 POLI 399
christineshearer
 
03.10.08 POLI 383
christineshearer
 
06.10.08 POLI 383
christineshearer
 
Poli_399_Tutorial_Week_Three_-_Sept_29th_(2)
christineshearer
 
Concepts, Operationalization and Measurement
christineshearer
 
SOME RECOGNITION CASES
christineshearer
 

Recently uploaded (20)

PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
PDF
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 

Univariate Analysis

  • 1. Univariate Analysis Simple Tools for Description
  • 2. Description of Variables Univariate analysis refers to the analysis of one variable Several statistical measures can be employed to describe data Allows for comparison across variables measured in different units Provides parsimony: one or two statistics can help us understand a large number of cases POLI 399/691 - Fall 2008 Topic 6
  • 3. Proportion Share of cases relative to the whole population; Range is from 0 to 1 E.g. if 50 women in sample of 125, then proportion of women is 50/125=0.4 Percentage is the proportion multiplied by 100 E.g. if proportion is .40, then percentage is .40x100=40% Basic descriptive tools POLI 399/691 - Fall 2008 Topic 6
  • 4. Percentage change allows us to calculate the relative change in a variable over some period of time Percentage change is: Time 2 – Time 1 x 100 Time 1 E.g. in 1993 women made up 48% of the population and in 2003 this percentage had risen to 51%. What is the percentage change from 1993 to 2003? ((51-48)/48)x100=(3/48)x100=6.25% ( it is not 3%) Percentage point difference is the absolute change between percentage at time 1 and percentage at time 2 Using the same example, the percentage point difference in the share of women in the population between 1993 and 2003 is 3 percentage points (X 2 -X 1 ) ( it is not 3%) POLI 399/691 - Fall 2008 Topic 6
  • 5. Frequency Table The frequency table (or frequency distribution) is commonly used to provide a “snapshot” of a variable Made up of 4 columns: Values (categories) of the variable The number of cases The percentage of cases The cumulative percentage of cases Consider collapsing categories if the variable has a large number of values/categories POLI 399/691 - Fall 2008 Topic 6
  • 6. Table 1: Frequency Table of Grouped Data – Ages of Respondents POLI 399/691 - Fall 2008 Topic 6 Source: Hypothetical Data, 2005. Age Group Frequency Percentage Cumulative Percentage 18-24 36 15.0 15.0 25-34 44 18.3 33.3 35-44 43 17.9 51.2 45-54 46 19.2 70.4 55-64 34 14.2 84.6 65 and over 37 15.4 100.0 Total 240 100.0% 100.0%
  • 7. Bar charts, pie charts and line graphs Bar charts or pie charts are good for showing the variation in the percentage of cases for each value of a variable Pie chart – compare parts to the whole Bar graphs to compare categories/values Line chart is good for longitudinal data Reveals trends over time POLI 399/691 - Fall 2008 Topic 6
  • 8. Figure 1: Federal Expenditures by Sector POLI 399/691 - Fall 2008 Topic 6 Source: Hypothetical Data, 2006
  • 9. Figure 2: Federal Expenditures by Sector POLI 399/691 - Fall 2008 Topic 6 Source: Hypothetical Data, 2006
  • 10. POLI 399/691 - Fall 2008 Topic 6 Source: O’Neill and Stewart, “ Gender and Political Party Leadership in Canada,” Party Politics , forthcoming.
  • 11. POLI 399/691 - Fall 2008 Topic 6 Table 8: Political Participation Note: Entries are percentage of respondents who reported engaging in said activity. All differences across the three groups are statistically significant (p<.01). Differences between religious and other volunteers in reported municipal voting statistically significant (p< .05). Table 8: Political Participation by Volunteer Type Source: Brenda O’Neill, “Canadian Women’s Religious Volunteerism: Compassion, Connections and Comparisons” in B. O’Neill and E. Gidengil, Gender and Social Capital, New York: Routledge, 2006. Religious Volunteers All Other Volunteers Non-Volunteers Voted in last federal election 83.7 80.8 71.6 Voted in last provincial election 82.6 79.2 70.6 Voted in last municipal election 72.8 67.4 58.0 Follow news or current affairs daily 70.2 66.8 65.7 N (over 18 only) (509) 537 (1603) 1745 (5346)
  • 12. Checklist for Charts and Tables Have you chosen the proper type of chart? Have you provided a clear, descriptive title? (Note the difference between “Table” and “Figure”) Is the data source noted in a footnote? Are statistical tests reported in a footnote? For Bivariate tables, is the dependent variable on the vertical axis? The independent on the horizontal? Are the axes properly labelled? Will colour choices matter if printed in black and white? Have you provided values in bar/pie charts? Does the length of the axes distort the result? Have you referred to and explained the table/chart in the text? POLI 399/691 - Fall 2008 Topic 6
  • 13. Measures of Central Tendency Measures of central tendency allow us to speak of some “standard” case for all the cases in the sample or population What is the most common unit? Is there some pattern in the data? Three different measures: mean, median and mode Nominal data? Use mode Ordinal data? Use mode and/or median Interval data? Use mode, median and/or mean The mean provides the most information; the mode, the least Always use the statistic that provides the most information; goal is parsimony POLI 399/691 - Fall 2008 Topic 6
  • 14. Mode For nominal data, the mode is the measure of the “standard” or “most common” case The mode is simply that category of the variable that occurs the most often (i.e. has the most cases) The mode is the “best guess” for nominal data The utility of this statistic is limited Can change dramatically with the addition of a few cases (not very stable) Tells us about the most common value but little else POLI 399/691 - Fall 2008 Topic 6
  • 15. Figure 1: Federal Expenditures by Sector POLI 399/691 - Fall 2008 Topic 6 Source: Hypothetical Data, 2006 ← Mode is Social Expenditures
  • 16. Median Use with ordinal data Indicates the middle case in an ordered set of cases – the midpoint To determine the median, order the data from lowest to highest and the median is the value of the middle case Even number of cases? Take the average of the two middle values (add them together and divide by 2) POLI 399/691 - Fall 2008 Topic 6
  • 17. Mean The mean describes the centre of gravity of interval data Commonly called the average Easily allows one to locate a case relative to all others Where is a case located in relation to all the others? Above average? Below average? To calculate: Σ X i /n=(X 1 +X 2 +…+X i )/n where i=number of cases Reliable but sensitive to outliers (cases that are much larger or much smaller than the rest) Median provides a better sense of the most common case when there are outliers POLI 399/691 - Fall 2008 Topic 6
  • 18. Example: Income data For these data, the mean is $1,039,700 and the median is $36,5000 We call a distribution with outliers a skewed distribution POLI 399/691 - Fall 2008 Topic 6 Median -> Mean -> Income for 10 cases $24,000 $25,000 $28,000 $30,000 $35,000 $38,000 $56,000 $75,000 $86,000 $10,000,000
  • 19. Measures of Dispersion Once you know the standard case, you should also know how standard the case is – that is, how well does this one case represent all the cases? For nominal data, there is no measure of dispersion; one could simply indicate how many categories exist For ordinal data, the range provides some information about the spread of data The range is simply the highest value minus the lowest value When we have outliers the range gives a distorted picture of the data E.g. for our income data, the range is $10,000,000-$24,000 = $9,976,000 POLI 399/691 - Fall 2008 Topic 6
  • 20. For interval data, we use the standard deviation A measure of the average deviation of a case from the mean value A deviation is the distance and direction of any raw score from the mean The larger the deviation, the further the score from the mean The deviation can be either positive or negative (larger or smaller than the mean value) The mean is that value where the sum of negative deviations equals the sum of positive deviations You want to calculate the average size of these deviations but we need to ‘fix’ the problem of the deviations summing to 0 To fix the problem, we square each deviation before we sum them, and then take the square root of the total POLI 399/691 - Fall 2008 Topic 6
  • 21. Formula for standard deviation POLI 399/691 - Fall 2008 Topic 6 Note: N-1 is employed for a sample
  • 22. To calculate the standard deviation: Calculate the mean Subtract the mean from each value (these are the deviations) Square each of the deviations Sum them (add them together) Divide this sum by the number of cases (to get the average squared deviation) Compute the square root of average squared deviation POLI 399/691 - Fall 2008 Topic 6
  • 23. Table 8.10 Computation of Standard Deviation, Beth’s Grades POLI 399/691 - Fall 2008 Topic 6 Note: The “N – 1” term is used when sampling procedures have been used. When population values are used the denominator is “N.” SPSS uses N – 1 in calculating the standard deviation in the DESCRIPTIVES procedure. SUBJECT GRADE Sociology 66 66 – 82 = –16 256 Psychology 72 72 – 82 = –10 100 Political science 88 88 – 82 = 6 36 Anthropology 90 90 – 82 = 8 64 Philosophy 94 94 – 82 = 12 144 MEAN 82.0 TOTAL 600
  • 24. The result is always a positive number but you can think of the average deviation as occurring either positively or negatively The last measure to review is the variance Variance is simply the square of the standard deviation Variance and standard deviation are easily calculated by software programs Good to calculate it on your own for small samples to get a “feel” for the statistic These are two statistics that will be used again for other calculations POLI 399/691 - Fall 2008 Topic 6
  • 25. The smaller the standard deviation, the tighter the cases are around the mean The mean is a “better” predictor of scores when the standard deviation is small Like the mean, the standard deviation is also sensitive to outliers Describing data effectively requires information on both the mean and the standard deviation POLI 399/691 - Fall 2008 Topic 6
  • 26. Statistics and SPSS POLI 399/691 - Fall 2008 Topic 6 Source: Jackson and Verberg, p.222. Statistic Nominal Ordinal Interval Central Tendency Mode Mode Median Mode Median Mean Dispersion -- Range Range Standard Deviation Variance SPSS Commands (options) Frequencies (mode) Frequencies (range, median) Descriptives (all)
  • 27. Z Scores (or standardized scores) A Z score represents the distance from the mean, in standard deviation units, of any value in a distribution Z scores are comparable across different populations and different units because they are offered in standard units The Z score formula is as follows: POLI 399/691 - Fall 2008 Topic 6
  • 28. A negative z-score means the case falls below the mean; a positive one means it lies above the mean A z-score of 0 means ….? The larger the score, the further from the mean Useful when combining variables with very different ranges into indexes Transform into Z scores and then create the index To obtain Z scores in SPSS Select Analyze -> Descriptive Statistics -> Descriptives Select one or more variables Check “Save standardized values as variables” to save z scores as new variables. They will be the last variables in the variable view screen POLI 399/691 - Fall 2008 Topic 6
  • 29. Key terms Proportion Percentage Percentage change Percentage point difference Bar chart Pie chart Frequency table Cumulative percentage Mean Median Mode Outlier Skewed distribution Measures of variation Range Standard deviation Variance Standardized (Z) scores POLI 399/691 - Fall 2008 Topic 6

Editor's Notes

  • #19: If you gave only the mean income value for these case you give the impression that there is a very high income when really there isn’t – only one person has a really high income while everybody else has a relatively low income. When you have a skewed distribution it is better to use the median. Find this out by looking at a frequency distribution. The greater the “skew” the greater the difference between the median and the mean