Pearson Product-Moment Correlation Coefficient


Research Scenario

Question

John is interested in whether people who have a positive view of themselves in one aspect of their lives also tend to have a positive view of themselves in other aspects of their lives. To address this question, John had 80 men complete a self-concept inventory that contains five scales. Four scales involve questions about how competent respondents feel in areas related to to their self-concept along with a scale that evaluates how competent a person feels in general. The five scales are outlined below:

  1. intimate relationships
  2. relationships with friends
  3. common sense reasoning and everyday knowledge
  4. academic reasoning and scholarly knowledge
  5. how competent a person feels in general

Instrument & Scoring

John’s data is from a survey in which he had 80 men complete a survey that presented items related to the five self-concept scales identified above.

Research Hypothesis

John’s research question concerns the relationship between five separate variables. Specifically, John is interested in understanding whether men who have a positive self-concept in one life domain tend to have a positive self-concept in other domains, and, conversely, whether men who have a negative self-concept in one life domain also tend to have a negative self-concept in other domains.

Covariance and Correlation Overview

For two quantitative variables, the basic statistics of interest are the covariance and/or correlation, which are used to provide estimates of the corresponding population parameters.

Before discussing correlation, one should understand—at least at a cursory level—covariance. Specifically, covariance is a measure of how much two variables “co-vary”, i.e., how much (and in what direction) one might expect one variable to change when the other changes. Positive covariance values suggest that when one measurement is above the mean the other will probably also be above the mean, and vice versa. Negative covariances suggest that when one variable is above its mean, the other is below its mean. And covariances near zero suggest that the two variables vary independently of each other. Unfortunately, covariances tend to be hard to interpret—not too dissimilar from the challenges of interpreting variance due to it being in squared units—which is where Pearson Correlation comes in.

The Pearson Product-Moment Correlation Coefficient (Pearson correlation), usually denoted with r, is a measure of the strength of a linear association between two continuous variables. The goal of a Pearson correlation is to draw a line of best fit through the plotted data of two variables with r indicating how far away data points are to the line of best fit. r can take a range of values between -1 and +1, with -1 being a ‘perfect’ negative linear correlation, +1 being a ‘perfect’ positive linear correlation while 0 indicates that X and Y are uncorrelated.

Things to Remember

  • A value of 0 indicates that there is no association between the two variables.
  • A value greater than 0 indicates a positive association; that is, as the value of one variable increases, so does the value of the other variable.
  • A value less than 0 indicates a negative association; that is, as the value of one variable increases, the value of the other variable decreases.

The plots below, moving from left to right, illustrate what a negative r, r equal to zero, and a positive r look like.

Types of Correlational Relationships

Types of Correlational Relationships

Assumptions

  1. Variables are bivariately normally distributed in the population.

  2. The cases represent a random sample from the population, and the scores on variables for one case are independent of scores on these variables for other cases.

Check

Of the two major assumptions for Pearson correlation, only the first assumption can be evaluated using data alone. The second assumption is something that is best addressed through implementing proper research practices—simply put, if this assumption is found to be violated in the data analysis phase there is little the researcher can do to address the issue.

If the variables are bivariately normally distributed, each variable is normally distributed ignoring the other variable and each variable is normally distributed at all levels of the other variable. If the bivariate normality assumptions met, the only type of relationship that can exist between two variables is a linear relationship. However, if the assumption is violated, a nonlinear relationship may exist. It is important to determine if a nonlinear relationship exists between two variables before describing the results with the Pearson correlation since the Pearson correlation r would not accurately describe the relationship between the variables of interest.


Note

If a nonlinear relationship did exist in John’s set of data, he would need to transform his data so that the nonlinear relationship is no longer present. Possible transformations that could be applied to induce a linear relationship include:

  • Exponential
  • Quadratic
  • Reciprocal
  • Logarithmic
  • Power

However, the linear relationship could come at the cost of interpretability. John’s transformed data would now be in the the units of his selected process, e.g. a logarithmic transformation would result in a self-concept scores in log units.

Nonlinear Correlational Relationship

Nonlinear Correlational Relationship

The plot above shows a nonlinear relationship. The black best fit line was developed assuming a linear relationship while the red best fit line was developed assuming a non-linear relationship. As should be seen, the Pearson correlation would not adequately define the non-linear relationship.

To evaluate whether his variables are bivariately normally distributed, John would need to develop scatterplots for each pairing of items in his data set. As mentioned earlier, John’s data has five scales which means there are ten possible pairings. The plots for each scale pairing are presented below.

Scatterplots for Self-Concept Variables

Scatterplots for Self-Concept Variables

Conclusion

After reviewing his plots, John should conclude that cases are bivarately normally distributed. The first assumption is confirmed.

Pearson Correlations Output

The table below presents Pearson correlation coefficients for each pair of self-concept items in John’s study. However, in an effort to provide a more detailed overview for how the Pearson correlation is computed, the proceeding sections will focus on evaluating whether a correlation exists between Intimate Relationships and Relationships with Friends (Friendships) in John’s data.

Pearson Correlations for Self-Concept Items
Intimate Friend Common Academic General
Intimate r 1 0.552 0.351 0.218 0.393
p 0.000** 0.001** 0.052 0.000**
n 80 80 80 80 80
Friend r 0.552 1 0.462 0.244 0.546
p 0.000** 0.000** 0.029* 0.000**
n 80 80 80 80 80
Common r 0.351 0.462 1 0.4 0.525
p 0.001** 0.000** 0.000** 0.000**
n 80 80 80 80 80
Academic r 0.218 0.244 0.4 1 0.261
p 0.052 0.029* 0.000** 0.019*
n 80 80 80 80 80
General r 0.393 0.546 0.525 0.261 1
p 0.000** 0.000** 0.000** 0.019*
n 80 80 80 80 80
** Correlation is significant at the 0.01 level
* Correlation is significant at the 0.05 level

Note

Don’t presume everything marked significant above is truly significant. As will be discussed later, a correction will be applied to the correlation outcomes to control for the increased likelihood of committing a Type I error due to the multiple comparisons John conducted. Be sure to review the steps of the Bonferroni correction discussed later.


Nine of John’s paired Pearson correlations appeared to be significant at either the p < .05 or p < .01 levels. However, for the time being John should not treat them all as being significant. Rather he should consider implementing a Bonferroni Correction to adjust his p-level. The process for implementing a Bonferroni correction is outlined below.


Pearson Correlation Overview

Analysis Focus: Intimate Relationships and Friendships

Plot Intimate Relationships and Friendships

As mentioned earlier, each pair of variables in John’s set of data appeared to be bivariately normally distributed. To help better illustrate that this is present within Intimate Relationships and Friendships, a larger, more readable plot of the two variables is presented below.


Conduct Pearson Correlation Analysis

The equation for a Pearson Product Moment Correlation Coefficient is presented below.

\[ r = \frac{ \Bigg(\frac{\sum\limits_{i\text{ = 1}}^{n}(X - \overline X)(Y - \overline Y)}{N - 1}\Bigg)}{S_XS_Y} \]

Note

The numerator of the equation above is the the sample covariance for Intimate Relationships and Friendships while the denominator is the product of the sample standard deviations for Intimate Relationships and Friendships


Below is simplified version of the equation for John’s review of Intimate Relationships and Friendships.

\[ r = \frac{ \Bigg(\frac{\sum\limits_{i\text{ = 1}}^{n}(\text{Intimate Relationships}_{i} - \text{Intimate Relationships}_{\text{mean}}){\huge\cdot} (\text{Friendships}_{i} - \text{Friendships}_{\text{mean}})}{N - 1}\Bigg)}{S_{\text{Intimate Relationships}}\cdot S_{\text{Friendships}}} \]

Data Needed to Compute Pearson Correlation

To compute a Pearson correlation coefficient for Intimate Relationships and Friendships, John needs four pieces of information which are outlined below.

  • sum of the mean difference products of paired scores (\(\sum\limits_{i\text{ = 1}}^{n}(X - \overline X)(Y - \overline Y)\)) = 1863.95
  • number of oberservations for John’s set of data (\(N\)) = 80
  • standard deviation for intimate relationships scores (\(S_{\text{Intimate Relationships}}\)) = 6.18
  • standard deviation for friendships scores (\(S_{\text{Intimate Relationships}}\)) = 6.91


Participant One Mean Difference
Variable Score Mean Mean Difference
Intimate Relationships 48 50.48 -2.48
Friendships 56 53.98 2.02
Mean Difference Product -5.01
Note:
The data used above is for participant one.


Calculation Example

John’s data has 80 observations, and rather than walkthrough computing each of the 80 mean difference products for intimate relationships and friendships, an example for how the mean differences are computed is provided using data from participant one in John’s study. The computed mean difference for participant one is -5.01 and is located in the table to the right.


Step 1: Input All of the Data

\[ r = \frac{ \Bigg(\frac{1863.95}{79}\Bigg)}{6.18 \cdot 6.91} \]

Step 2: Simplify the Numerator and Denominator

\[ r = \frac{23.59}{42.72} \]

Step 3: Round the Resulting Pearson Value

\[ r = 0.552\]

Compute t Value for Pearson Correlation Analysis and Determine Significance

\[ t = \frac{\hat{\rho}}{\sqrt{\frac{1-\hat{\rho}^2}{N-2}}} \] Data Needed to Compute t Value Pearson Correlation

\({\hat\rho}\) is the Pearson correlation computed above = 0.552
\({N}\) is sample size = 80


Step 1: Input All of the Data

\[ t = \frac{0.552}{\sqrt{\frac{1-0.552^2}{80-2}}} \]

Step 2: Simplify the Numerator and Denominator

\[ t = \frac{0.552}{0.094} \]

Step 3: Round the Resulting Pearson Value

\[ t = 5.85\]


Pearson Correlation Conclusion

Analysis Focus: Intimate Relationships and Friendships

The observed correlation between Intimate and Friend scores appears to be significant both p < .05 and p < .01 since the computed t value of 5.85 is larger than the critical t values of 1.681 and 2.416.


Effect Size

Analysis Focus: Intimate Relationships and Friendships

The Pearson correlation index ranges in value from —1 to +1. This coefficient indicates the degree that low or high scores on one variable tend to go with low or high scores on another variable. A score on a variable is a low (or high) score to the extent that it falls below (or above) the mean score on that variable.

As with all effect size indices, there is no good answer to the question, “What value indicates a strong relationship between two variables?” What is large or small depends on the discipline within which the research question is being asked. However, for the behavioral sciences, correlation coefficients of .10, .30, and .50, irrespective of sign, are, by convention, interpreted as small, medium, and large coefficients, respectively.

Pearson Coefficient (r) Effect Size Class
< .10 trivial
.10 - .29 small
.30 - .49 moderate
.50+ large
Note:
Coefficients are interpreted irrespective of sign

The correlation coefficient for Intimate and Friend relationships was 0.552 and is considered a large effect.

Bonferroni Correction

When conducting multiple analyses on the same dependent variable, the chance of committing a Type I error increases, which by extension increases the likelihood of obtaining a significant result by pure chance. To correct for this, or protect from Type I error, a Bonferroni correction is conducted.

To get the Bonferroni corrected/adjusted p value, divide the original α-value by the number of analyses on the dependent variable. To compute a Bonferroni corrected \(\alpha\), John would calculated as: .05/10 = 0.005 and would be interpreted to mean that a p-value would haveto be less than 0.005 to be declared significant.

Note

Bonferroni correction is a conservative test that, although protects from Type I Error, is vulnerable to Type II errors (failing to reject the null hypothesis when it should be rejected).


Following the Bonferroni Correction, John went from having nine possible significant Pearson correlation outcomes to having seven significant Pearson correlation outcomes. John’s updated signicance flags for paired correlations are presented below.


Pearson Correlations for Self-Concept Items
Intimate Friend Common Academic General
Intimate r 1 0.552 0.351 0.218 0.393
p 0.000* 0.001* 0.052 0.000*
n 80 80 80 80 80
Friend r 0.552 1 0.462 0.244 0.546
p 0.000* 0.000* 0.029 0.000*
n 80 80 80 80 80
Common r 0.351 0.462 1 0.4 0.525
p 0.001* 0.000* 0.000* 0.000*
n 80 80 80 80 80
Academic r 0.218 0.244 0.4 1 0.261
p 0.052 0.029 0.000* 0.019
n 80 80 80 80 80
General r 0.393 0.546 0.525 0.261 1
p 0.000* 0.000* 0.000* 0.019
n 80 80 80 80 80
* Correlation is significant at the Bonferroni Corrected level of p < 0.005
 

A work by Alex Aguilar

aaguilar@thechicagoschool.edu