STATISTICS
1 Data:
Data is the basic unit
in Statistics. Data is a collection of facts such as numbers, words,
measurements and observations. It must be organised, for it to be useful and to
get information. Data can be collected in many ways. Among all the ways, direct
observation is one of the simplest way to collect the data.
For example, if you want
to find the number of types of houses in a village, You
can count the types of houses in the village, in person similarly,
(i)
Collection of brand wise
motorcycles in your place.
Brand A – 25, Brand B – 40, Brand,
C –14 and Brand D – 37
(ii)
Collection of term marks in
Mathematics of your class mates.
39, 20, 19, 47, 50, 26, 35, 40, 17, 25, 41.
(iii)
Number of students playing
different sports from your class.
Volley ball –12 Kabaddi – 10
Hockey – 9 Cricket – 7 Badminton – 7
(iv)
Staff ’s age in a company
27,
51, 19, 21, 46, 35, 52, 25, 57, 29.
The
above facts are some more examples for data,
Let us now see the kinds
of data. There are two kinds of data namely primary data and secondary data.
Primary data:
These are the data that
are collected in person for the first time for a specific purpose. Here, Kamaraj has collected the data of math marks from the
students in person. It is called primary data.
Also, (i) Census in a
village
(ii)
Collection of colours which the students like in a class are some examples of
primary data.
Secondary data:
These are the data that
are sourced from some places that has originally collected it. This kind of
data has already been collected by some other persons. The statistical
operation may have been performed on them already. Here, Geetha
also collected the data but she took it from a record which had already
collected them. This is called secondary data.
Also, (i) The details of 'PATTA' for a land can be had from the
registration office.
(ii)
Birth–Death details data can be got from concern office are some examples of
secondary data.
2.Data in Tabular form
To make the given data
easily understandable, we tabulate the data in the form of tables or charts. A
table has three columns that contains
(i)Variable /Class (ii)
Tally Marks (iii)
Frequency
Variable / Class:
Arrange the given data
from the lowest to the highest in the first column under the heading variable
or class.
Tally Marks:
A vertical line(|) which is marked against each item falling in the
variable /class is called tally marks.
Frequency:
The number of times an
observation occurs in the given data is called the frequency of the
observation. This is easily counted from the tally marks column.
For
example:
From the table, we understand that three
students got 10 marks, five students got 14 marks and so on.
Ungrouped data or Discrete Data:
An ungrouped data can
assume only whole numbers and exact measurement. These are the data that cannot
have a range of values. A usual way to represent this is by using Bar graphs
Examples: 1. The number of
teachers in a school.
2. The number
of players in a game.
Grouped data or Continuous Data:
A grouped data is any
value within a certain interval. The data can take values between certain range
with the highest and the lowest value. Continuous data can be tabulated in what
is called as frequency distribution. They can be graphically represented using Histograms.
Example: 1. The age of persons in a village.
2. The height
and the weight of the students of your class.
3 Frequency distribution table
Frequency distribution:
A frequency distribution
is the arrangement of the given data in the form of the table showing frequency
with which each variable occurs.
If we have more number
of students in the class , it would be very difficult
to understand and to get information unless it is organised. For this reason,
we organise larger data into a table called the frequency distribution table.
Therefore, the tabular arrangement which shows the observations and their
frequency of occurrences is called the frequency distribution table. There are
two types of distribution table namely
(i)
frequency distribution table
for ungrouped data and
(ii)
frequency
distribution table for grouped data.
3.1 Construction of frequency distribution table for ungrouped
data.
Example
1 Form an ungrouped frequency distribution table
for the weight of 25 students in STD IV given below and answer the following
questions.
25, 24, 20, 25, 16, 15,
18, 20, 25, 16, 20, 16, 15, 18, 25, 16, 24, 18, 25, 15, 27, 20, 20, 27, 25.
(i)
Find the range of the weights.
(ii)
How many of the students has
the highest weight in the class?
(iii)
What is the weight to which
more number of students belong to?
(iv)
How many of them belong to the
least weight?
Solution:
To form a distribution
table, arrange the given data in ascending order under Weight column then, put
a vertical mark against each variable under Tally marks column and count the
number of tally marks against the variable and enter it in Frequency column as
given below. Hence, the distribution table is
Thus, we can tabulate
the above table as follows.
(i)
The range of the given data is
the difference between the largest and the smallest value. Here, the range =
27–15 =12.
(ii)
From this table, two of the
students have the highest weight of 27 kg.
(iii)
6 students belong to 25 kg
weight.
(iv)
3 students belong to the least
weight of 15 kg.
3.2 Construction of frequency distribution table for grouped
data:
Now, we will consider a
situation, if we collect data of marks for 50 students, it becomes very
difficult to put tally for each and every marks of all the 50 students. Because
if we arrange the marks in a table, it will be very large in length and not
understandable at once. In this case, we use class intervals. In this table,
consider the groups of data in the form of class intervals to tally the
frequency for the given data.
Class Interval:
The range of the
variable is grouped into number of classes, and each group is known as class interval (C.I). The difference
between the upper limit (U) and the lower limit (L) of the class is known as class size. i.e. C.I = Upper limit – Lower limit
For
example,
Marks for the C.I 10 to
20 can be written as 10-20, whose class size is 20–10=10
(a) While
distributing the frequency, we follow the counting as given below. Suppose the
classes are 10-20, 20-30, 30-40, 40-50 ..... This
represent a continuous series. Here, 20 is included in the class 20-30 and 30
is included in 30-40, likewise for the other classes also.
(b)
In case the given series has a
gap between the limits of any two adjacent classes, this gap may be filled up
by extending the two limits of each class by taking half of the value of the
gap. Half of the gap is called the adjustment factor.
Conversion of a discontinuous series into continuous series:
In case the given series is a discontinuous, we can
make it as continuous as follows,
Illustration:
1
11 -
20 gap difference in the gap = 21 – 20=1
21
- 30
31
- 40
41
- 50
Lower boundary = lower limit –half of the gap
Upper boundary = upper limit + half of the gap
Therefore, the class interval can be changed
into a continuous one as given in the following table,
(i).Construction of grouped frequency
distribution table – Continuous series.
Example
2 The EB bill (in ₹)
of each of the 26 houses in a village are given below. Construct the frequency
table.
Solution:
Maximum bill amount = ₹800
Minimum bill amount = ₹120
Range = maximum value –
minimum value Range = 800 – 120 = ₹680
Suppose if we want to
take class size as 100, then the
(ii).Construction of grouped frequency distribution table -
Discontinuous series.
Example
3 Construct a continuous series frequency
distribution table.
Solution:
As told above, first we
should fill the gap by extending the two limits of each class by half of the
value of the gap. Here the gap is 1, so subtracting and adding half of the gap i.e 0.5 to the lower and the upper limit of each
class makes it as a continuous series.
4.Graphical representation of the frequency distribution
for ungrouped data
A graphical
representation is the geometrical image of a set of data. It is a mathematical
picture. It enables us to think about a statistical problem in visual terms. A
picture is said to be more effective than words for describing a particular
thing. The graphical representation of data is more effective for
understanding. Now, we are going to represent the given ungrouped data in the
circular form namely the pie diagram or the pie chart.
5 Pie chart (or) Pie diagram
A pie chart is a
circular graph which shows the total value with its components. The area of a
circle represents the total value and the different sectors of the circle
represent the different components. The circle is divided into sectors and the
area of the sectors is proportional to the information given. In the ‘pie
chart’ the data are mostly expressed in percentage. Each component is expressed
as percentage of the total value.
The Pie diagram is so
called because the entire graph looks like an American food ‘pie’ and the
components resemble slices cut from ‘pie’.
Example
5.1 Method of constructing a pie chart:
In a pie chart, we know
that the various components are represented by the sectors of a circle and the
whole circle represents the sum of the value of all the components. Therefore,
the total angle of 360° at the centre of the circle is divided into different sectors
according to the value of the components.
The central angle of a component
Sometimes, the value of
the components are expressed in percentage. In such cases,
The central angle of a component
Steps
for construction of the pie chart:
1)
Calculate the central angle for
each component using the above formula and tabulate it.
2)
Draw a circle of convenient
radius and mark one horizontal radius in it.
3)
Draw radius making central
angle of first component with horizontal radius. This sector represents the
first component. From this radius, draw next radius with central angle of the
second component and so on, until the completion of all components.
4)
For identification of each
sector, shade with different colours.
5)
Label each sector.
Here are given some
examples, let us draw the pie chart for the given data.
Example
4 The number of hours spent by a school student
on various activities on a working day is given below. Construct a pie chart.
1.
Find the percentage of sleeping
hours.
2.
By what angle is home work more
than play?
3.
By what angle are other
activities less than sleep?
Solution:
Number of hours spent in
different activities in a day of 24 hours are converted into components parts
of 360°.
The time spent by a
school student during a day (24 hours)
1.
The percentage of sleeping
hours = × 100 = 33.33%
2.
Home work is 45°–30°=15° more
than play
3.
Other activities are
120°–75°=45° less than sleep.
6 Graphical representation of the frequency distribution for
grouped data
The Line graph, Bar
graph, Pictograph and the Pie chart are the graphical representations of the
frequency distribution for ungrouped data. Histogram, Frequency polygon,
Frequency curve, Cumulative frequency curves (Ogives)
are some of the graphical representations of the frequency distribution for
grouped data.
In this class, we are
going to represent the grouped data frequency by Histogram and Frequency
polygon only. You will learn the other type of representations in the higher
classes.
7 Histogram
A histogram is a graph
of a continuous frequency distribution. Histogram contains a set of rectangles,
the base of which is the length of the class interval and the frequency in each
class interval is its height. i.e
the class intervals are represented on the horizontal axis (x- axis) and the
frequencies are represented on the vertical axis (y-axis).
The area of each
rectangle is proportional to the frequency in the respective class interval and
the total area of the histogram is proportional to the total frequency. Because
of the continuous frequency distribution, the rectangles are placed
continuously side by side with no gap between adjacent rectangles.
Steps to construct a Histogram:
1.
Represent the data in the
continuous form (exclusive form) if it is in discontinuous form (inclusive
form) by converting it using the adjustment factor.
2.
Select the appropriate units
along the x-axis and y-axis.
3.
Plot the lower limits of all
class interval on the x –axis.
4.
Plot the frequencies of the
distribution on the y – axis.
5.
Construct the rectangles with
class intervals as bases and corresponding frequencies as heights. Each class
has lower and upper values. This gives us two equal vertical lines representing
the frequencies. The upper ends of the lines are joined together and this
process will give us rectangles.
7.1 Construction of a histogram for continuous frequency
distribution:
Example
5 Draw a histogram for the following table which
represents the age groups from 100 people in a village.
Solution:
The given data is a
continuous frequency distribution. The class intervals are drawn on x-axis and
their respective frequencies on y-axis. Classes (ages) and its frequencies
(number of people) are taken together to form a rectangle.
The histogram is
constructed as given below
7.2 Construction of histogram for discontinuous frequency
distribution:
Example
6 The following table gives the number of
literate females in the age group 10 to 45 years in a town.
Draw a
histogram to represent the above data
Solution:
The given distribution
is discontinuous. If we represent the given data as it is by a graph we shall
get a bar graph, as there will be gaps in between the classes. So, convert this
into a continuous distribution using the adjustment factor .
That is, lower boundary
= lower limit
Upper
boundary = upper limit
The first class interval
can be written as 9.5-15.5 and the remaining class intervals are changed in the
same way. There are no changes in frequencies.
The new continuous
frequency table is
The histogram is
constructed as below
8 Frequency Polygon
A frequency polygon is a
line graph for the graphical representation of the frequency distribution. If
we mark the midpoints on the top of the rectangles in a histogram and join them
by straight lines, the figure so formed is called a frequency polygon. It is called a polygon as it consists of a
number of lines as the sides of a polygon.
A frequency polygon is
useful in comparing two or more frequency distributions. A frequency polygon
for a grouped frequency distribution can be constructed in two ways.
i)
Using a histogram
ii)
Without using a histogram
8.1 To construct a frequency polygon using a histogram:
1.
Draw a histogram from the given
data.
2.
Join the consecutive midpoints
of the upper sides of the adjacent rectangles of the histogram by the line
segments.
3.
It is assumed that the class
interval preceding the first rectangle and the class interval succeeding the
last rectangle exists in the histogram and the frequency of each extreme class
interval is zero. These class intervals are known as imagined class intervals.
4.
To get frequency polygon, join
the midpoints of these imagined classes with the corresponding midpoints of the
upper sides of the first and last rectangles of the histogram.
Example
7 The following is the distribution of pocket
money of 200 students in a school.
Draw a
frequency polygon using histogram.
Solution:
Represent the pocket
money along x- axis and number of students along the y–axis.
Draw a histogram for the
given data. Now, mark the midpoints of the upper sides of the consecutive rectangles.
Also mark the midpoints of two imagined class intervals 0-10 and 90-100 whose
frequency is 0 on x- axis. Now, join all the midpoints with the help of ruler.
We get a frequency polygon imposed on the histogram.
8.2 To draw a frequency polygon without using a histogram:
(1) Find the midpoints
of the class intervals and tabulate it.
(2) Mark the midpoints
of the class intervals on x-axis and frequencies on y-axis.
(3) Plot the points
corresponding to the frequencies at each midpoints.
(4) Join the points
using a ruler, to get the frequency polygon.
Example
8 Draw a frequency polygon for the following data
without using histogram.
Solution:
Find the midpoint of the
class intervals and tabulate it
The points are (15,4) (25,6) (35,8) (45,12) (55,10) (65,14) (75,5) (85,7).
In the graph sheet, mark
the midpoints along the x- axis and the frequency along the y- axis.
We take the imagined
class as 0 – 10 at the beginning and 90 – 100 at the end ,
each with frequency ‘zero’.
From the table, plot the
points. We draw the line segments AB, BC, CD, DE, EF, FG, GH, HI, IJ to obtain the required frequency polygon ABCDEFGHIJ.