PEP 6305 Measurement in
Health & Physical Education
Topic 2:
Organizing Data
Section
2.2
Click to go to
back to the previous section (Section 2.1)
Tables and
Spreadsheets
n
A table contains data in rows and columns. Typically, a
table summarized the data for groups of subjects and certain variables rather
than reporting the data for all subjects and variables.
n
A spreadsheet is a table contained in a computer program (such as
Excel) that allows you to perform computations using formulas to combine the data
found in the rows and columns.
n
Traditionally, in a spreadsheet each subject is one row, and each
variable is one column.
¨
This is the "standard" way to enter data to spreadsheets for
statistical analysis.
¨
Most statistical software programs require this
type of layout.
¨
If you have to enter data, use this
type of layout.
n
Table 2.5 from the text shows four variables for five subjects
(each subject has their own row).
What are the four variables? (Answer)
|
Subject
|
Height
|
Weight
|
BMI
|
|
1
|
60
|
150
|
25
|
|
2
|
70
|
165
|
21
|
|
3
|
62
|
160
|
25
|
|
4
|
65
|
130
|
19
|
|
5
|
67
|
200
|
27
|
n
Most spreadsheet programs allow you to sort (put the values
in alphabetical or numerical order) the rows according to
one of the columns/variables, which creates a simple frequency distribution for
that column/variable within the spreadsheet.
n
Spreadsheets are good for organizing and storing data, but not for
displaying it in a report.
n
Tables can be a good way to summarize data for a presentation or
written report.
¨
You will create tables in subsequent lecture topics that are
similar to what you see in published research papers.
¨
Most
tables do not have a row for each subject, but will summarize the
variables (in columns)
by
groups (in rows).
Graphs
n
A graph is an image or drawing that represents the way
that data are distributed. Most statistical programs and spreadsheet programs
can create data graphs.
n
There are several types of graphs; each one displays data in a
different way. We will dicuss bar graphs and histograms here, and discuss
scatterplots in a later topic.
Bar Graphs and
Histograms
n
A bar graph is a picture of a simple frequency
distribution. The values of the variable are shown along the x-axis, and the
frequencies (counts of each value) are shown on the y-axis. The height of the bar represents the
frequency of subjects who have the respective value in the variable range.
n
This is a bar graph of the pull-up data from our previous simple frequency
distribution example:
n
Create a bar graph using the simple frequency distribution for age in
R Commander.
¨
Load your data file into R Commander.
¨
Click Graphs>Histogram… and click to select age, and put 31 in
the Number of bins box (age ranges from 20 to 50, so there are 31 separate
values). Click OK to create the graph.
¨
Each bar represent the count of each age value.
n
A histogram is a picture of a
grouped frequency
distribution rather than the simple frequency distribution. The intervals are shown along the x-axis, and the frequencies
(counts) are shown on the y-axis. The height of the bar represents the frequency
of subjects in that interval of the variable range.
n
This is a histogram of the mile run time data from our previous grouped
frequency distribution example:
n
Create a histogram in R Commander using the grouped frequency
distribution from the example you worked.
¨
Load your data file.
¨
Click
Graphs>Histogram...click to select age and put 12 in the Number of bins box.
Click OK.
¨
While this is a grouped frequency distribution, you may notice there are more
than 12 bins. That's because R uses a certain rule for making histograms, so the
number of bins you request is only an estimate. To get the histogram for the
grouped frequency distribution you created in the last section, you have to
manually enter the "breaks" in the R Commander Script window. In that window
you'll see this:
¨
Hist(Dataset6305$age, scale="frequency", breaks=12, col="darkgray")
¨
You can change the breaks to the numbers from
the example by typing them in after break
as shown below (the c is an R function that means column, so R
will read what you type as a column of numbers), and clicking the Submit button:
¨
Hist(Dataset6305$age, scale="frequency",
breaks=(c(20,22.5,25,27.5,30,32.5,35,37.5,40,42.5,45,47.5,50)), col="darkgray")
n
You can save graphs that you create in R Commander in the Graphs>Save graph to
file... menu option. You will need to save graphs for a couple of problems on
the exams.
n
Histograms and bar graphs are good ways to show information
in a presentation or report. The relative frequencies in each interval or
category can be easily seen and interpreted by the reader, as opposed to trying
to read and interpret a long column of numbers.
¨
If you connect the midpoints of the bar graph or histogram bars
with a line, you will have a frequency polygon (see Figure 2.2 in the
book).
¨
If you sum the frequencies in each interval to the sum of the
frequencies in all of the intervals preceding it, you create a cumulative
frequency distribution (see Table 2.6).
¨
If you graph the cumulative frequency distribution values, you obtain a cumulative frequency
graph (see Figure 2.3 in the book).
n
When the variable is
continuous rather than
discrete, instead of a
frequency polygon you will have a continuous curve showing the distribution of
values; the curve represents the theoretically infinite number of very very thin
bars that could be drawn and connected.
¨
The shape of this curve can be used to determine if the data
approximate certain well-defined statistical distributions. The most well known
curve is what is known as a normal curve or bell-shaped curve,
which is a curve of the normal distribution.
Click to go to the
next section (Section 2.3)