Data Analysis for Business - make informed decisions, optimize performance, a...Slidescope
Ad
statistics chp 1&2.pptx statistics in veterinary
1. STATISTICS
• Statistics is a branch of science dealing with collecting,
organizing, summarizing, analyzing and making decisions from
data.
• Statistics is divided into two main areas, which are descriptive
and inferential statistics
2. • Descriptive statistics deals with methods for collecting,
organizing, and describing data by using tables, graphs,
and summary measures.
• Inferential statistics deals with methods that use sample
results, to help in estimation or make decisions about the
population.
3. • A population is the set of all elements (observations), items, or
objects that bring them a common recipe and at least one that will
be studied their properties for a particular goal.
• The components of the population are called individuals or
elements.
4. • Population can be a collection of any things, like Ipad set,
Books, animals or inanimate, therefore it does not necessary deal
with people.
• An element (or member of a sample or population) is a specific
subject or object about which the information is collected
5. • A variable is a characteristic under study that takes different
values for different elements.
• The value of a variable for an element is called an observation or
measurement.
• We know that the variable is a characteristic under study that
takes different values for different elements.
• In statistics, we have two types of variables according to their
elements; first type is called quantitative variable and the second
one is called qualitative variable.
6. • Quantitative variable gives us numbers representing counts or
measurements
• Qualitative variable (or categorical data) gives us names or labels
that are not numbers representing the observations.
• Moreover, the variables measured in quantitative data divided
into two main types, discrete and continuous.
7. • Discrete variables assume values that can be counted.
• Continuous variables assume all values between any
two specific values, i.e. they take all values in an
interval. They often include fractions and decimals.
8. • There are four levels of measurement scales; nominal,
ordinal, interval, and the ratio level of measurement.
9. Scales of measurements
1. Order: Does a larger number indicate a greater value than
a smaller number?
2. Differences: Does subtracting two numbers represent
some meaningful value?
3. Ratio: Does dividing (or taking the ratio of) two numbers
represent some meaningful value
10. The nominal level of measurement classifies data into mutually
exclusive (disjoint) categories in which no order or ranking can
be imposed on the data
- Gender: Male, Female.
- Eye color: Black, Brown, Blue, Green, ...
- Religious affiliation: Muslim, Christian, Jew, ..
- Nationality: Saudi, Syrian, Jordanian, Egyptian, Pakistani, ...
- Scientific major field: statistics, mathematics, computers, Geography, ...
- Examples of nominal variables include a person’s race, gender,
nationality, sexual orientation, hair and eye color, season of birth,
marital status, or other demographic or personal information
11. • The ordinal level of measurement classifies data into categories
that can be ordered, however precise differences between the
ranks do not exist
• The interval level of measurement orders data with precise
differences between units of measure. (in this case there is no
meaningful zero).
• On the other hand, the resulting measurement values belong to
an interval of the real numbers.
12. • The ratio level of measurement is the interval level
with additional property that there is also a natural
zero starting point.
• In this type of measurement zero means nothingness.
Another difference lies in that we can attribute some
of the quantities to others
14. Stages in Statistical Investigation
There are five stages or steps in any statistical investigation.
1. Collection of data: the process of measuring, gathering,
assembling the raw data up on which the statistical
investigation is to be based.
Data can be collected in a variety of ways; one of the most
common methods is through the use of survey. Survey can also be
done in different methods, three of the most common methods are:
– Telephone survey
– Mailed questionnaire
– Personal interview.
15. 1. Organization of data: Summarization of data in some
meaningful way, e.g table form
2. Presentation of the data: The process of re-organization,
classification, compilation, and summarization of data
to present it in a meaningful form
3. Analysis of data: The process of extracting relevant
information from the summarized data, mainly through
the use of elementary mathematical operation.
16. 4) Inference of data: The interpretation and further
observation of the various statistical measures through
the analysis of the data by implementing those methods
by which conclusions are formed and inferences made.
17. • Statistical Population: It is the collection of all possible observations of a specified
characteristic of interest (possessing certain common property) and being under study.
An example is all of the students in AAU 3101 course in this term.
• Sample: It is a subset of the population, selected using some sampling technique in
such a way that they represent the population.
• Sampling: The process or method of sample selection from the population.
• Sample size: The number of elements or observation to be included in the sample.
• Census: Complete enumeration or observation of the elements of the population. Or it
is the collection of data from every element in a population
• Parameter: Characteristic or measure obtained from a population.
• Statistic: Characteristic or measure obtained from a sample.
• Variable: It is an item of interest that can take on many different numerical values.
18. CHAPTER 2
METHODS OF DATA COLLECTION & PRESNTATION
• There are two sources of data:
1. Primary Data
2. Secondary Data
19. • Primary Data
• Data measured or collect by the investigator or the user directly from
• the source.
• Two activities involved: planning and measuring.
• a) Planning:
– Identify source and elements of the data.
– Decide whether to consider sample or census.
– If sampling is preferred, decide on sample size, selection method,… etc
– Decide measurement procedure.
– Set up the necessary organizational structure.
• b) Measuring: there are different options.
– Focus Group
– Telephone Interview
– Mail Questionnaires
– Door-to-Door Survey
– Mall Intercept
– New Product Registration
– Personal Interview and
– Experiments are some of the sources for collecting the primary data
20. • Secondary Data is data gathered or compiled from published
and unpublished sources or files.
• When our source is secondary data check that:
1. The type and objective of the situations.
2. The purpose for which the data are collected and compatible with
the present problem.
3. The nature and classification of data is appropriate to our problem.
4. There are no biases and misreporting in the published data.
• Note: Data which are primary for one may be secondary for
the other.
21. METHODS OF DATA PRESNTATION
• The presentation of data is broadly classified in to the
following two categories:
– Tabular presentation
– Diagrammatic and Graphic presentation.
• The process of arranging data in to classes or categories
according to similarities technically is called classification.
22. • Raw data: recorded information in its original
collected form, whether it may be counts or
measurements, is referred to as raw data.
• Frequency: is the number of values in a specific class
of the distribution.
• Frequency distribution: is the organization of raw
data in table form using classes and frequencies.
23. • There are three basic types of frequency
distributions
–Categorical frequency distribution
–Ungrouped frequency distribution
–Grouped frequency distribution
24. • Grouped Frequency Distribution: a frequency distribution when several
numbers are grouped in one class.
• Class limits: Separates one class in a grouped frequency distribution from
another. The limits could actually appear in the data and have gaps
between the upper limits of one class and lower limit of the next.
• Units of measurement (U): the distance between two possible consecutive
measures. It is usually taken as 1, 0.1, 0.01, 0.001, -----.
• Class boundaries: Separates one class in a grouped frequency distribution
from another. The boundaries have one more decimal places than the row
data and therefore do not appear in the data. There is no gap between the
upper boundary of one class and lower boundary of the next class. The
lower class boundary is found by subtracting U/2 from the corresponding
lower class limit and the upper class boundary is found by adding U/2 to
the corresponding upper class limit
25. • Class width: the difference between the upper and lower class boundaries of any class. It
is also the difference between the lower limits of any two consecutive classes or the
difference between any two consecutive class marks.
• Class mark (Mid points): it is the average of the lower and upper class limits or the
average of upper and lower class boundary.
• Cumulative frequency: is the number of observations less than/more than or equal to a
specific value.
• Cumulative frequency above: it is the total frequency of all values greater than or equal to
the lower class boundary of a given class.
• Cumulative frequency blow: it is the total frequency of all values less than or equal to the
upper class boundary of a given class.
• Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class interval
together with their corresponding cumulative frequencies. It can be more than or less than
type, depending on the type of cumulative frequency used.
• Relative frequency : it is the frequency divided by the total frequency.
• Relative cumulative frequency it is the cumulative frequency divided by the total
frequency.
26. • The three most commonly used diagrammatic
presentation for discrete as well as qualitative data
are:
– Pie charts
– Pictogram
– Bar charts
27. A. A pie chart is a circle that is divided in to sections or wedges
according to the percentage of frequencies in each category of
the distribution.
B. Pictogram In this diagram, we represent data by means of
some picture symbols. We decide abut a suitable picture to
represent a definite number of units in which the variable is
measured
28. • Bar Charts:
• A set of bars (thick lines or narrow rectangles) representing some
magnitude over time space.
• They are useful for comparing aggregate over time space.
• Bars can be drawn either vertically or horizontally.
• There are different types of bar charts. The most common being :
A. Simple bar chart
B. Deviation or two way bar chart
C. Broken bar chart
D. Component or sub divided bar chart.
E. Multiple bar charts
29. Graphical Presentation of data
• The histogram, frequency polygon and cumulative frequency graph or ogive
are most commonly applied graphical representation for continuous data.
• Procedures for constructing statistical graphs:
1. Draw and label the X and Y axes.
2. Choose a suitable scale for the frequencies or cumulative frequencies and
label it on the Y axes.
3. Represent the class boundaries for the histogram or ogive or the mid
points for the frequency polygon on the X axes.
4. Plot the points.
5. Draw the bars or lines to connect the points
30. Histogram
• A graph which displays the data by using vertical bars of height
to represent frequencies. Class boundaries are placed along the
horizontal axes. Class marks and class limits are some times used
as quantity on the X axes
Frequency Polygon:
• A line graph. The frequency is placed along the vertical axis and
classes mid points are placed along the horizontal axis.
Note It is customer to the next higher and lower class interval
with corresponding frequency of zero, this is to make it a complete
polygon.
32. Ogive (cumulative frequency polygon)
- A graph showing the cumulative frequency (less than or more than
type) plotted against upper or lower class boundaries respectively.
- That is class boundaries are plotted along the horizontal axis and
the corresponding cumulative frequencies are plotted along the
vertical axis. The points are joined by a free hand curve.