PEP 6305 Measurement in Health & Physical Education

 

Topic 6: Correlation

Section 6.2

 

Click to go to back to the previous section (Section 6.1)

Calculating the Correlation Coefficient    

 

n   Similar to the mean and SD, there are several formulas for computing a correlation coefficient.

n   The definitional formula reveals the concept of product moment correlation (which is symbolized by r): 

          

n   This formula shows that the correlation coefficient is a function of a “product moment,” or the average of the product of two Z scores.

n   A few things about the magnitude and direction of the correlation coefficient can be noted from this definitional formula:

¨  The product of two positive numbers is positive; the product of two negative numbers is also positive. The average of a set of positive numbers is a positive number. This occurs when subjects have generally either positive or negative Z scores on BOTH variables, which means that the direction of the correlation will be positive (direct).

¨  The product of a positive number and a negative number is negative. The average of a set of negative numbers is a negative number. This occurs when subjects have generally a positive Z scores on one variable and a negative Z score on the other variable, which means that the direction of the correlation will be negative (inverse).

¨  If a third of the subjects have positive-positive paired Z scores, a third have negative-negative paired Z scores, and a third have positive-negative paired Z scores, then the average of the products of these values will tend to be close to 0, which has no magnitude and no direction (since 0 has no magnitude and is neither positive nor negative). Such variables would be uncorrelated (r = 0).

¨  This Excel file (click here) shows these relations governing the correlation coefficient.

n   The textbook gives a "machine" formula for computing a correlation coefficient (pp. 113). If you plan to compute by hand, that equation may be of use to you. I always use a computer, and you will use a computer for the assignments and exams in this class (unless you actually want to do it by hand...!!).

n   Open R Commander and load Dataset6305. Go to Statistics>Summaries>Correlation matrix…

 

¨  Click to highlight age, height, weight, bmi, prcntfat, and aerobfit (hold the Control key to select multiple variables). Check the "Pairwise p-values..." box as shown. Click OK.

    

¨  The Output window first shows you a correlation table. The correlation coefficients between each pair of variables is provided, as well as the sample size (n = 1000). You'll notice that the values on the diagonal of the table are all 1.00, since the correlation of a variable with itself has to be 1.00.

   

¨  The next table in that window, labeled 'P', shows the respective error probability.  

¨  Interpreting these results, the correlation coefficient for weight and age (r = 0.01) has a probability of p = 0.6421 (64.2%, or 64.2/100) of being a result of sampling error alone. From these results we conclude that weight and age are uncorrelated, because the correlation value has a good chance of simply being a result of sampling error.

¨  By contrast, the correlation coefficient for height and weight (r = 0.69) has a probability of p = 0.0000, or p < 0.001 (less than 0.01%, or 1/10,000) of being a result of sampling error alone. We conclude that height and weight are significantly correlated because that value would be highly unlikely if the variables were actually uncorrelated.

¨  Examine the values in this table and identify which associations are statistically significant (p ≤ 0.05).

 

Interpreting the Correlation Coefficient

 

n   The interpretation of the magnitude of the correlation coefficient can be a little tricky. The same magnitude correlation coefficient may represent a large and important association in some circumstances but a relatively meaningless relation in other circumstances.

n   In general, there are two considerations when evaluating the magnitude of the correlation coefficient.

n   First, evaluate how much of the variation in one variable "overlaps" with variation in the other variable, which is measured by the square of the correlation coefficient (r2 or R2), known as the “proportion of common variance.” This is the proportion of variation in one variable that can be accounted for by variation in the other variable.

¨  In general, as correlation coefficients get larger than 0.33 (r2 ≥ 0.10) they are more likely to be important.

¨  But the importance of magnitude differs from field to field and study to study. It is best to determine what size correlation would be meaningful in a practical sense (i.e., provide a meaningful explanation) before collecting and analyzing the data.

n   Second, determine whether the correlation coefficient is significantly different from 0, which would mean that the observed correlation was unlikely to have occurred by chance (as we did in the correlation table above).

¨  To be able to interpret the statistical significance of the correlation coefficient, the study has to be designed properly, with an appropriate sample size and data collection strategy.

¨  Sample size is a key issue because larger sample sizes allow correlation coefficients with smaller magnitudes to be shown to be significantly different from 0. Appendix Table A.2 in the text shows this relationship; degrees of freedom = N – 2 (why?), so you can see how fewer subjects requires a larger correlation coefficient for the significance test.

¨  For example, for an error probability of α = 0.05 (the third column) and n = 14 subjects, so the degrees of freedom is df = n - 2 = 14 - 2 = 12 a correlation coefficient of 0.532 or higher occurs fewer than 5 times out of 100 by chance alone. By contrast, if you have 32 subjects (df = 30), the correlation coefficient with the same error probability is 0.349.

n   So, if the correlation coefficient is large enough and is significantly different from 0, the correlation coefficient probably indicates an important relationship exists between the variables.

n   Recall that correlation does not imply causality. A very large magnitude correlation coefficient that is statistically significant may be describing an association with no cause-and-effect relation between the two variables.

¨  For example, age is highly correlated with risk of skin cancer, but one does not "cause" the other. Both are related to a third variable, total (lifetime) exposure to ultraviolet radiation—the older you are, the more UV exposure you've had, and the more UV exposure you've had, the higher the risk of skin cancer. Age by itself does not cause skin cancer, and skin cancer does not "cause" aging.

 

Formative Evaluation

 

n   Work through the R Commander examples in the lecture notes above.

n   Work problems 1, 2, and 4 at the end of Chapter 7. (Use a computer program to do the statistics questions).

n   In your own field of interest in HHP (nutrition, exercise science, sports administration, or health), name two variables that are likely to be positively correlated.

n   In your own field of interest in HHP, name two variables that are likely to be negatively correlated?

 

You have reached the end of Topic 6.

Make sure to work through the Formative Evaluation above and the textbook problems stated above (end of the chapter). (remember how to enter data into R Commander?)

You must complete the review quiz (in the Quizzes folder on the Blackboard course home page) before you can advance to the next topic.