ENSH 304 Chapter Notes

Each chapter has a separate page with the same framework.

Chapter 4: Correlation and Regression

Syllabus hours: 6 | Exam weight: 10 marks | Marks breakdown: correlation 3, regression 4, variation 1, partial/multiple correlation 2

Difficulty type: Medium-high | Version / Last Updated: 2026-04-18 | Not in syllabus: advanced nonlinear and machine-learning regression

Outcome: measure association, fit a regression line, interpret variation, and read multiple/partial correlation results from PYQs.

1. Fundamental Concepts

  • Correlation measures strength and direction of linear association.
  • Regression predicts one variable from another and gives an equation, not just a relationship score.
  • Correlation is symmetric; regression is directional.
  • Positive correlation means both variables move in the same direction; negative correlation means they move in opposite directions.
  • Outliers can distort both correlation and regression heavily.

2. Core Methods and Formulas

When to use: use correlation when the question asks how strongly two variables move together; use regression when the question asks to estimate one variable from another.

When not to use: do not treat correlation as causation; do not use regression predictions outside the observed range without caution.

r=(xxˉ)(yyˉ)(xxˉ)2(yyˉ)2r=\frac{\sum (x-\bar x)(y-\bar y)}{\sqrt{\sum (x-\bar x)^2\sum (y-\bar y)^2}}
byx=rσyσx,bxy=rσxσyb_{yx}=r\frac{\sigma_y}{\sigma_x},\quad b_{xy}=r\frac{\sigma_x}{\sigma_y}
y=a+bxy= a + bx
a=yˉbxˉa=\bar y-b\bar x
r2=coefficient of determinationr^2=\text{coefficient of determination}
R=1SSESSTR=\sqrt{1-\frac{SSE}{SST}}
SST=SSR+SSESST=SSR+SSE
r12.3=partial correlation of 1 and 2 keeping 3 fixedr_{12.3}=\text{partial correlation of 1 and 2 keeping 3 fixed}

3. Standard Models / Topics

Topic 1: Correlation Analysis and Tests

Basic notes: correlation gives a single number between -1 and 1. A value close to +1 means strong positive linear relationship; close to -1 means strong negative linear relationship; near 0 means weak linear relationship.

Conditions / use: use Pearson correlation for linear numerical data; avoid it as a summary if the relation is curved or heavily affected by outliers.

Formula recap: r=(xxˉ)(yyˉ)(xxˉ)2(yyˉ)2r=\frac{\sum (x-\bar x)(y-\bar y)}{\sqrt{\sum (x-\bar x)^2\sum (y-\bar y)^2}} and 1r1-1\le r\le1.

Seen-Before Check: if the problem asks for relationship strength, linear association, or “degree of correlation,” this topic applies.

[Core] Problem 1: Find the sign of correlation when both variables increase together.

Answer: positive correlation.

[Core] Problem 2: If r = -0.82, what does it indicate?

Answer: a strong negative linear relationship.

[Advanced] Problem 3: For x = 1, 2, 3, 4 and y = 2, 5, 4, 6, compute the correlation coefficient.

xˉ=2.5,yˉ=4.25,(xxˉ)(yyˉ)=5.5\bar x=2.5,\quad \bar y=4.25,\quad \sum (x-\bar x)(y-\bar y)=5.5
(xxˉ)2=5,(yyˉ)2=9.25,r=5.55×9.250.81\sum (x-\bar x)^2=5,\quad \sum (y-\bar y)^2=9.25,\quad r=\frac{5.5}{\sqrt{5\times9.25}}\approx0.81

Answer: r ≈ 0.81, so the variables have a fairly strong positive linear association.

Interpretation checklist: report sign, strength, and whether the association is linear.

Topic 2: Simple Regression and Interpretation

Basic notes: regression builds a prediction rule. In the regression line of y on x, x is the predictor and y is the response.

Conditions / use: use simple regression when there is one explanatory variable and one response variable.

Formula recap: y=a+bxy=a+bx, b=rσyσxb=r\frac{\sigma_y}{\sigma_x}, a=yˉbxˉa=\bar y-b\bar x.

Seen-Before Check: “estimate y for a given x,” “regression line,” or “best fit straight line” are the clues.

[Core] Problem 1: Why does the regression line of y on x pass through (x̄, ȳ)?

Answer: because a = ȳ - b x̄, so substituting x̄ gives ȳ.

[Core] Problem 2: If r is positive, what is the sign of the slope of y on x?

Answer: positive, assuming standard deviations are positive.

[Advanced] Problem 3: Given x̄=10, ȳ=20, r=0.8, σx=2, σy=5, find the regression line.

b=0.8×52=2,a=202(10)=0,y=2xb=0.8\times\frac{5}{2}=2,\quad a=20-2(10)=0,\quad y=2x

[PYQ-Trap] Problem 4: Using the same data, find the regression line of x on y.

bxy=0.8×25=0.32,ax=100.32(20)=3.6,x=3.6+0.32yb_{xy}=0.8\times\frac{2}{5}=0.32,\quad a_x=10-0.32(20)=3.6,\quad x=3.6+0.32y
Interpretation checklist: state the variable being predicted and whether the slope means increase or decrease in the response.

Topic 3: Explained, Unexplained, and Total Variation

Basic notes: total variation is the spread in the response values. Regression explains part of it; the rest is unexplained error.

Conditions / use: use this to interpret model quality and whether the fitted line is useful.

Formula recap: SST=SSR+SSESST=SSR+SSE, r2=SSR/SSTr^2=SSR/SST, and larger r2r^2 means a better fit.

Seen-Before Check: when a question asks for “variation explained by regression” or “coefficient of determination,” this topic is active.

[Core] Problem 1: What does r² = 0.81 mean?

Answer: 81% of the variation in y is explained by x in the fitted linear model.

[Core] Problem 2: If SST = 500 and SSE = 125, find SSR and r².

SSR=500125=375,r2=375/500=0.75SSR=500-125=375,\quad r^2=375/500=0.75
Interpretation checklist: say how much the model explains and whether the residual variation is still substantial.

Topic 4: Multiple Regression and Partial Correlation

Basic notes: multiple regression uses more than one predictor. Partial correlation measures the relation between two variables after removing the influence of a third.

Conditions / use: use multiple regression when one response depends on several predictors; use partial correlation when the third-variable effect must be controlled.

Formula recap: multiple regression has the form y=a+b1x1+b2x2+...y=a+b_1x_1+b_2x_2+..., and partial correlation is written as r12.3r_{12.3}.

Seen-Before Check: “control for,” “adjust for,” “multiple predictors,” and “holding one variable constant” are the indicators.

[Core] Problem 1: Why is partial correlation useful?

Answer: it isolates the association between two variables after removing a confounding variable.

[Advanced] Problem 2: A model uses temperature and pressure to predict output. What regression type is this?

Answer: multiple regression.

Interpretation checklist: identify which predictor is being controlled and what change in the response is being explained.

4. Applied Problem Solving

  • [Core] Decide whether the data calls for correlation or regression.
  • [Core] Compute or interpret the regression slope and intercept.
  • [PYQ-Trap] Explain why a high correlation does not automatically imply a useful predictive model.

5. System-Level Understanding

  • Correlation summarizes association; regression turns association into a predictive equation.
  • Residual variation tells you how much uncertainty remains after fitting the line.
  • Partial and multiple correlation extend the same logic to more complex systems with hidden or confounding variables.

6. Quick Reference

rr for linear relationship strength.

y=a+bxy=a+bx for simple prediction.

SST=SSR+SSESST=SSR+SSE for variation breakdown.

Seen-Before Check: relationship, prediction, variation, or control variable.

7. Exam Tips

  • Always mention whether the relation is positive or negative before giving the numerical result.
  • For regression questions, define the response and predictor clearly.
  • Use the words “explained variation” and “unexplained variation” correctly.
  • Seen-Before Check: if a question says “hold one variable constant,” think partial correlation, not simple correlation.

8. Common Pitfalls

  • Reversing the roles of x and y in the regression line.
  • Claiming causation from correlation alone.
  • Using a regression model outside the observed data range without caution.
  • Confusing r with r².

9. Tools and Guides

  • Correlation = association score; regression = prediction equation.
  • Sign of r tells direction; magnitude tells strength.
  • Residuals and r² tell how much of the response is still not explained by the model.