SlideShare a Scribd company logo
Statistics: First Steps Andrew Martin PS 372 University of Kentucky
Variance Variance  is a measure of dispersion of data points about the mean for interval- and ratio-level data. Variance is a fundamental concept that social scientists seek to explain in the dependent variable.
 
Standard Deviation Standard deviation  is a measure of dispersion of data points about the mean for interval- and ratio-level data.  Like the mean, standard deviation is sensitive to extreme values.  Standard deviation is calculated as the square root of the variance.
 
 
Normal Distribution The bulk of observations lie in the center, where there is a single peak.  In a normal distribution half (50 percent) of the observations lie above the mean and half lie below it. The mean, median and mode have the same statistical values. Fewer and fewer observations fall in the tails. The spread of the distribution is symmetric.
Normal Distribution Mathematical theory allows us to know what percentage of observations lie within one (68%), two (95%) or three (98%) standard deviations of the mean. If data are not perfectly normally distributed, the percentages will only be approximations. Many naturally occurring variables do have nearly normal distributions. Some can be transformed using logarithms.
Frequency Distribution
What about categorical variables?
 
Example Calculate the ID and IQV for this PS 372 class grades using the following frequencies or proportions: Grade Freq. Prop. A 4 (.12) B 7 (.21) C 4 (.12) D 7 (.21) E 12 (.34)
Index of Diversity ID = 1 – ( p 2 a  +  p 2 b  +  p 2 c  + p 2 d  + p 2 e ) ID = 1 - (.12 2  + .21 2  + .12 2  + .21 2  + .34 2 ) ID = 1 - (.0144 + .0441 + .0144 + .0441 + .1156) ID = 1 - (.2326) ID = .7674
Index of Qualitative Variation 1 – ( p 2 a  +  p 2 b  +  p 2 c  + p 2 d  + p 2 e ) 1 - (1/K)
Index of Qualitative Variation .7674 (1 – 1/5) .9592
 
Data Matrix A  data matrix  is an array of rows and columns that stores the values of a set of variables for all the cases in a data set. This is frequently referred to as a dataset.
 
 
Data Matrix from JRM
Properties of Good Graphs Should answer several of the following questions: (JRM 384) 1. Where does the center of the distribution lie? 2. How spread out or bunched up are the observations? 3. Does it have a single peak or more than one?  4. Approximately what proportion of observations in in the ends of the distributions?
Properties of Good Graphs 5. Do observations tend to pile up at one end of the measurement scale, with relatively few observations at the other end? 6. Are there values that, compared with most, seem very large or very small? 7. How does one distribution compare to another in terms of shape, spread, and central tendency? 8. Do values of one variable seem related to another variable?
 
 
 
 
 
Statistical Concepts Let's quickly review some concepts.
Population A  population  refers to any well-defined set of objects such as people, countries, states, organizations, and so on. The term doesn't simply mean the population of the United States or some other geographical area.
Population A sample is a subset of the population. Samples are drawn in some known manner and each case is chosen independently of the other. From here on out, when the book uses the term sample, random sample or simple random sample, it's making reference to the same concept, which is a sample chosen at random.
Populations Parameters are numerical features of a population. A sample statistic is an estimator that corresponds to a population parameter of interest and is used to estimate the population value. Y is the sample mean, ( μ)  is the population mean. ^ is a “hat”, caret or circumflex
Two Kinds of Inference Hypothesis Testing Point and interval estimation
Hypothesis Testing Many claims can be translated into specific statements about a population that can be confirmed or disconfirmed with the aid of probability theory. Ex: There is no ideological difference between the voting patterns between the voting patterns of Republican and Democrat justices on the U.S. Supreme Court.
Point and Interval Estimation The goal here is to estimate unknown population parameters from samples and to surround those estimates with confidence intervals. Confidence intervals suggest the estimates reliability or precision.
Hypothesis Testing Start with a specific verbal claim or proposition. Ex: The chances of getting heads or tails when flipping the coin is are roughly the same. Ex: The chances of the United States electing a Republican or Democrat president are roughly the same.
Hypothesis Testing
Hypothesis Testing Next, the researcher constructs a null hypothesis. A  null hypothesis  is a statement that a population parameter equals a specific value.
Hypothesis Testing Following up on the coin example, the null hypothesis would equal .5.  Stated more formally: H 0 :  P  = .5 Where  P  stands for the probability that the coin will be heads when tossed.  H 0  is  typically used to denote a null hypothesis.
Hypothesis Testing Next, specify an alternative hypothesis.  An  alternative hypothesis  is a statement about the value or values of a population parameter. It is proposed as an alternative to the null hypothesis.  An alternative hypothesis can merely state that the population does not equal the null hypothesis, or is greater than or less than the null hypothesis.
Hypothesis Testing Suppose you believe the coin is unfair, but have no intuition about whether it is too prone to come up heads or tails.  Stated formally, the alternative hypothesis is: H A :  P   ≠ .5
Hypothesis Testing Perhaps you believe the coin is more likely to come up heads than tails. You would formulate the following alternative hypothesis: H A  :  P  > .5 Conversely, if you believe the coin is less likely to come up heads than tails, you would formulate the alternative hypothesis in the opposite direction: H A :  P  < .5
Hypothesis Testing After specifying the null and alternative hypothesis, identify the sample estimator that corresponds to the parameter in question.  The sample must come from the data, which in this case is generated by flipping a coin.
Hypothesis Testing Next, determine how the sample statistic is distributed in repeated random samples. That is, specify the sampling distribution of the estimator.  For example, what are the chances of getting 10 heads in 10 flips ( p  = 1.)? What about 9 heads in 10 flips ( p  = .9)? 8 flips ( p  = .8)?
 
Hypothesis Testing Make a decision rule based on some criterion of probability or likelihood.  In social sciences, a result that occurs with a probability of .05 (that is, 1 chance in 20) is considered unusual and consequently is grounds for rejecting a null hypothesis.  Other common thresholds (.01, .001) are also common.. Make the decision rule before collecting data.
Hypothesis Testing In light of the decision rule, define a critical region. The critical region consists of those outcomes so unlikely to occur that one has cause to reject the null hypothesis should they occur. So there are areas of “rejection” (critical areas) and nonrejection.
 
Hypothesis Testing Collect a random sample and calculate the sample estimator. Calculate the observed test statistic. A test statistic converts the sample result into a number that can be compared with the critical values specified by your decision rule and critical values.  Examine the observed test statistic to see if it falls in the critical region. Make practical or theoretical interpretation of the findings.
 

More Related Content

What's hot (20)

PPTX
Testing of hypothesis
RuchiJainRuchiJain
 
PPTX
Four steps to hypothesis testing
Hasnain Baber
 
PDF
Data Science interview questions of Statistics
Learnbay Datascience
 
PPT
Basis of statistical inference
zahidacademy
 
PPTX
HYPOTHESIS TESTING
Amna Sheikh
 
PPTX
Hypothesis Testing Lesson 1
yhchung
 
PDF
On p-values
Maarten van Smeden
 
PPTX
Review & Hypothesis Testing
Sr Edith Bogue
 
PPT
Basics of statistics
Gaurav Kr
 
PPTX
Hypothesis testing
Shameer P Hamsa
 
PPTX
Hypothesis Testing
Jeremy Lane
 
PDF
Spanos lecture 7: An Introduction to Bayesian Inference
jemille6
 
PPTX
Hypothesis testing
Stephan Jade Navarro
 
PPTX
Testing of Hypothesis
Chintan Trivedi
 
PPTX
Hypothesis testing
Madhuranath R
 
PDF
Hypothesis testing - Primer
Sreehari Menon CFSA, CAMS
 
PPTX
Hypothesis testing
Muhammadasif909
 
PPTX
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
Pat Barlow
 
PPTX
Hypothesis testing Part1
Akhila Prabhakaran
 
PPT
Chapter 8 – Hypothesis Testing
guest3720ca
 
Testing of hypothesis
RuchiJainRuchiJain
 
Four steps to hypothesis testing
Hasnain Baber
 
Data Science interview questions of Statistics
Learnbay Datascience
 
Basis of statistical inference
zahidacademy
 
HYPOTHESIS TESTING
Amna Sheikh
 
Hypothesis Testing Lesson 1
yhchung
 
On p-values
Maarten van Smeden
 
Review & Hypothesis Testing
Sr Edith Bogue
 
Basics of statistics
Gaurav Kr
 
Hypothesis testing
Shameer P Hamsa
 
Hypothesis Testing
Jeremy Lane
 
Spanos lecture 7: An Introduction to Bayesian Inference
jemille6
 
Hypothesis testing
Stephan Jade Navarro
 
Testing of Hypothesis
Chintan Trivedi
 
Hypothesis testing
Madhuranath R
 
Hypothesis testing - Primer
Sreehari Menon CFSA, CAMS
 
Hypothesis testing
Muhammadasif909
 
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
Pat Barlow
 
Hypothesis testing Part1
Akhila Prabhakaran
 
Chapter 8 – Hypothesis Testing
guest3720ca
 

Similar to Statistics (20)

PPTX
Lecture_Hypothesis_Testing statistics .pptx
AqilHusain3
 
DOCX
Important terminologies
Rolling Plans Pvt. Ltd.
 
PPTX
INFERENTIAL STATISTICS: AN INTRODUCTION
John Labrador
 
PPT
Bgy5901
Noor Lela Yahaya
 
PPT
9주차
Kookmin University
 
PPTX
COM 201_Inferential Statistics_18032022.pptx
AkinsolaAyomidotun
 
PPTX
Week1 GM533 Slides
Brent Heard
 
PPT
Chapter 3
StefanieMcDonald
 
PPTX
Hypothesis testing
RAVI PRASAD K.J.
 
PPTX
Hypothesis testing
RAVI PRASAD K.J.
 
PPTX
Basic statistics
Seth Anandaram Jaipuria College
 
PPTX
Statistical Analysis and Hypothesis Tesing
ManishaPatil932723
 
PDF
statistics - Populations and Samples.pdf
kobra22
 
PPTX
Quant Data Analysis
Saad Chahine
 
PPTX
Tests of significance Periodontology
SaiLakshmi128
 
PDF
statistics.pdf
Noname274365
 
PPTX
Statistics and data analysis
Regent University
 
PPT
Soni_Biostatistics.ppt
Ogunsina1
 
PPT
A basic Introduction To Statistics with examples
ShibsekharRoy1
 
PPTX
Statistics78 (2)
Persikla Yousaf
 
Lecture_Hypothesis_Testing statistics .pptx
AqilHusain3
 
Important terminologies
Rolling Plans Pvt. Ltd.
 
INFERENTIAL STATISTICS: AN INTRODUCTION
John Labrador
 
COM 201_Inferential Statistics_18032022.pptx
AkinsolaAyomidotun
 
Week1 GM533 Slides
Brent Heard
 
Chapter 3
StefanieMcDonald
 
Hypothesis testing
RAVI PRASAD K.J.
 
Hypothesis testing
RAVI PRASAD K.J.
 
Statistical Analysis and Hypothesis Tesing
ManishaPatil932723
 
statistics - Populations and Samples.pdf
kobra22
 
Quant Data Analysis
Saad Chahine
 
Tests of significance Periodontology
SaiLakshmi128
 
statistics.pdf
Noname274365
 
Statistics and data analysis
Regent University
 
Soni_Biostatistics.ppt
Ogunsina1
 
A basic Introduction To Statistics with examples
ShibsekharRoy1
 
Statistics78 (2)
Persikla Yousaf
 
Ad

More from mandrewmartin (20)

PPT
Regression
mandrewmartin
 
PPT
Diffmeans
mandrewmartin
 
PPT
More tabs
mandrewmartin
 
PPT
Crosstabs
mandrewmartin
 
PPT
Statisticalrelationships
mandrewmartin
 
PPT
Week 7 - sampling
mandrewmartin
 
PPT
Research design pt. 2
mandrewmartin
 
PPT
Research design
mandrewmartin
 
PPT
Measurement pt. 2
mandrewmartin
 
PPT
Measurement
mandrewmartin
 
PPT
Introduction
mandrewmartin
 
PPT
Building blocks of scientific research
mandrewmartin
 
PPT
Studying politics scientifically
mandrewmartin
 
PPT
Berry et al
mandrewmartin
 
PPT
Chapter 11 Psrm
mandrewmartin
 
PPT
Week 7 Sampling
mandrewmartin
 
PPT
Stats Intro Ps 372
mandrewmartin
 
PPT
Statistics
mandrewmartin
 
PPT
Media
mandrewmartin
 
PPT
Media
mandrewmartin
 
Regression
mandrewmartin
 
Diffmeans
mandrewmartin
 
More tabs
mandrewmartin
 
Crosstabs
mandrewmartin
 
Statisticalrelationships
mandrewmartin
 
Week 7 - sampling
mandrewmartin
 
Research design pt. 2
mandrewmartin
 
Research design
mandrewmartin
 
Measurement pt. 2
mandrewmartin
 
Measurement
mandrewmartin
 
Introduction
mandrewmartin
 
Building blocks of scientific research
mandrewmartin
 
Studying politics scientifically
mandrewmartin
 
Berry et al
mandrewmartin
 
Chapter 11 Psrm
mandrewmartin
 
Week 7 Sampling
mandrewmartin
 
Stats Intro Ps 372
mandrewmartin
 
Statistics
mandrewmartin
 
Ad

Recently uploaded (20)

PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
PDF
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
PDF
Productivity Management Software | Workstatus
Lovely Baghel
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
Top Managed Service Providers in Los Angeles
Captain IT
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
Productivity Management Software | Workstatus
Lovely Baghel
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 

Statistics

  • 1. Statistics: First Steps Andrew Martin PS 372 University of Kentucky
  • 2. Variance Variance is a measure of dispersion of data points about the mean for interval- and ratio-level data. Variance is a fundamental concept that social scientists seek to explain in the dependent variable.
  • 3.  
  • 4. Standard Deviation Standard deviation is a measure of dispersion of data points about the mean for interval- and ratio-level data. Like the mean, standard deviation is sensitive to extreme values. Standard deviation is calculated as the square root of the variance.
  • 5.  
  • 6.  
  • 7. Normal Distribution The bulk of observations lie in the center, where there is a single peak. In a normal distribution half (50 percent) of the observations lie above the mean and half lie below it. The mean, median and mode have the same statistical values. Fewer and fewer observations fall in the tails. The spread of the distribution is symmetric.
  • 8. Normal Distribution Mathematical theory allows us to know what percentage of observations lie within one (68%), two (95%) or three (98%) standard deviations of the mean. If data are not perfectly normally distributed, the percentages will only be approximations. Many naturally occurring variables do have nearly normal distributions. Some can be transformed using logarithms.
  • 11.  
  • 12. Example Calculate the ID and IQV for this PS 372 class grades using the following frequencies or proportions: Grade Freq. Prop. A 4 (.12) B 7 (.21) C 4 (.12) D 7 (.21) E 12 (.34)
  • 13. Index of Diversity ID = 1 – ( p 2 a + p 2 b + p 2 c + p 2 d + p 2 e ) ID = 1 - (.12 2 + .21 2 + .12 2 + .21 2 + .34 2 ) ID = 1 - (.0144 + .0441 + .0144 + .0441 + .1156) ID = 1 - (.2326) ID = .7674
  • 14. Index of Qualitative Variation 1 – ( p 2 a + p 2 b + p 2 c + p 2 d + p 2 e ) 1 - (1/K)
  • 15. Index of Qualitative Variation .7674 (1 – 1/5) .9592
  • 16.  
  • 17. Data Matrix A data matrix is an array of rows and columns that stores the values of a set of variables for all the cases in a data set. This is frequently referred to as a dataset.
  • 18.  
  • 19.  
  • 21. Properties of Good Graphs Should answer several of the following questions: (JRM 384) 1. Where does the center of the distribution lie? 2. How spread out or bunched up are the observations? 3. Does it have a single peak or more than one? 4. Approximately what proportion of observations in in the ends of the distributions?
  • 22. Properties of Good Graphs 5. Do observations tend to pile up at one end of the measurement scale, with relatively few observations at the other end? 6. Are there values that, compared with most, seem very large or very small? 7. How does one distribution compare to another in terms of shape, spread, and central tendency? 8. Do values of one variable seem related to another variable?
  • 23.  
  • 24.  
  • 25.  
  • 26.  
  • 27.  
  • 28. Statistical Concepts Let's quickly review some concepts.
  • 29. Population A population refers to any well-defined set of objects such as people, countries, states, organizations, and so on. The term doesn't simply mean the population of the United States or some other geographical area.
  • 30. Population A sample is a subset of the population. Samples are drawn in some known manner and each case is chosen independently of the other. From here on out, when the book uses the term sample, random sample or simple random sample, it's making reference to the same concept, which is a sample chosen at random.
  • 31. Populations Parameters are numerical features of a population. A sample statistic is an estimator that corresponds to a population parameter of interest and is used to estimate the population value. Y is the sample mean, ( μ) is the population mean. ^ is a “hat”, caret or circumflex
  • 32. Two Kinds of Inference Hypothesis Testing Point and interval estimation
  • 33. Hypothesis Testing Many claims can be translated into specific statements about a population that can be confirmed or disconfirmed with the aid of probability theory. Ex: There is no ideological difference between the voting patterns between the voting patterns of Republican and Democrat justices on the U.S. Supreme Court.
  • 34. Point and Interval Estimation The goal here is to estimate unknown population parameters from samples and to surround those estimates with confidence intervals. Confidence intervals suggest the estimates reliability or precision.
  • 35. Hypothesis Testing Start with a specific verbal claim or proposition. Ex: The chances of getting heads or tails when flipping the coin is are roughly the same. Ex: The chances of the United States electing a Republican or Democrat president are roughly the same.
  • 37. Hypothesis Testing Next, the researcher constructs a null hypothesis. A null hypothesis is a statement that a population parameter equals a specific value.
  • 38. Hypothesis Testing Following up on the coin example, the null hypothesis would equal .5. Stated more formally: H 0 : P = .5 Where P stands for the probability that the coin will be heads when tossed. H 0 is typically used to denote a null hypothesis.
  • 39. Hypothesis Testing Next, specify an alternative hypothesis. An alternative hypothesis is a statement about the value or values of a population parameter. It is proposed as an alternative to the null hypothesis. An alternative hypothesis can merely state that the population does not equal the null hypothesis, or is greater than or less than the null hypothesis.
  • 40. Hypothesis Testing Suppose you believe the coin is unfair, but have no intuition about whether it is too prone to come up heads or tails. Stated formally, the alternative hypothesis is: H A : P ≠ .5
  • 41. Hypothesis Testing Perhaps you believe the coin is more likely to come up heads than tails. You would formulate the following alternative hypothesis: H A : P > .5 Conversely, if you believe the coin is less likely to come up heads than tails, you would formulate the alternative hypothesis in the opposite direction: H A : P < .5
  • 42. Hypothesis Testing After specifying the null and alternative hypothesis, identify the sample estimator that corresponds to the parameter in question. The sample must come from the data, which in this case is generated by flipping a coin.
  • 43. Hypothesis Testing Next, determine how the sample statistic is distributed in repeated random samples. That is, specify the sampling distribution of the estimator. For example, what are the chances of getting 10 heads in 10 flips ( p = 1.)? What about 9 heads in 10 flips ( p = .9)? 8 flips ( p = .8)?
  • 44.  
  • 45. Hypothesis Testing Make a decision rule based on some criterion of probability or likelihood. In social sciences, a result that occurs with a probability of .05 (that is, 1 chance in 20) is considered unusual and consequently is grounds for rejecting a null hypothesis. Other common thresholds (.01, .001) are also common.. Make the decision rule before collecting data.
  • 46. Hypothesis Testing In light of the decision rule, define a critical region. The critical region consists of those outcomes so unlikely to occur that one has cause to reject the null hypothesis should they occur. So there are areas of “rejection” (critical areas) and nonrejection.
  • 47.  
  • 48. Hypothesis Testing Collect a random sample and calculate the sample estimator. Calculate the observed test statistic. A test statistic converts the sample result into a number that can be compared with the critical values specified by your decision rule and critical values. Examine the observed test statistic to see if it falls in the critical region. Make practical or theoretical interpretation of the findings.
  • 49.