Linear regression is a statistical tool used for predicting the values of a variable from the values of another variable related to it. The first variable is called the criterion or dependent variable (typically represented as 'y'), the second variable is called the predictor or independent variable (typically represented as 'x').

The linear regression tool provides a formula that predicts two things: the value of the dependent variable ('y') when the independent variable ('x') is '0', and how much the value in the dependent variable ('y') changes when the value in the independent variable ('x') also changes. The first prediction is called the intercept or the constant (typically represented as 'a'), the second prediction is called the slope (typically represented as 'b'). The resulting formula is called a regression model and it typically looks like follows:

y = a + bx; or y = a - bx^{1} |

The regression model thus allows to estimate how much 'y' changes when 'x' changes (considering that we know both 'a' and 'b', and that those values remain the same - thus, the only thing that changes is the value of 'x').

For example, a student may be interested in knowing how much study time may be required to achieve certain grades in a particular course. If the lecturer were to calculate and provide a linear regression model with those two variables, then any student could "predict" future grades based on study time. In this case, 'grades' is the dependent variable (y) that the student wants to predict, 'hours of study' is the independent variable (x) that the student can control, and 'a' and 'b' are given by the lecturer based on his calculation of a linear regression formula (let's say, a=3, b=0.5). The intercept (a) "predicts" the grade any student will get if he doesn't study at all (eg, it predicts luck), while the slope (b) "predicts" how much the grade changes for every hour of studying.

Above model thus predicts that a student who doesn't study anything at all typically gets a grade of 3 (let's assume, 3 out of 10), as:

y [grade]= 3[intercept]+ (0.5[grade for each hour of study]* 0[hours actually studying]) = 3Meanwhile, a student who studies 4 hours can expect a pass, as:

y [grade]= 3[intercept]+ (0.5[grade for each hour of study]* 4[hours actually studying]) = 5

## Multivariate linear regression

Linear regression is equally applicable to predicting the value of the dependent variable when two or more dependent variables are known. A multivariate (or multiple) regression tool equally provides a formula to predict the value of the dependent variable ('y') when the independent variables (x_{1} + x_{2} + … x_{n}) are '0' (ie, the intercept or constant, 'a'), and how much the value in the dependent variable ('y') changes when one or more values in the independent variables (x_{1} + x_{2} + … x_{n}) also change (ie, the slope, b_{1} + b_{2} + … b_{n}). The resulting formula is called a multivariate or multiple regression model and it typically looks like follows:

y = a + b_{1}x_{1} + b_{2}x_{2} + … b_{n}x_{n} |

For example, a student may be interested in knowing how much study time may be required to achieve certain grades in a particular course. Course performance is an output that may be affected by different variables. For example, 'hours of study' may be good at 'capturing' root learning and, thus, be a good predictor for theoretical knowledge, but not for maximum grades. Other variables may also be important, such as 'comprehension' of the course material, and 'language problems'.

^{1}If the lecturer were to calculate and provide a multivariate linear regression model with all those variables, then any student could "predict" future grades more accurately. In this case, 'grades' is the dependent variable (y) that the student wants to predict, 'hours of study', 'comprehension' and 'language problems' are the independent variables (x_{1}, x_{2}and x_{3}, respectively) that the student can control, and 'a' and 'b_{s}' are given by the lecturer based on his calculation of a multiple linear regression formula (let's say, a=3, b_{1}=0.5, b_{2}=0.2, b_{3}=-0.3). The intercept (a) "predicts" the grade any student will get if he doesn't study at all (eg, it predicts luck), while the slope (b_{s}) "predicts" how much the grade changes for every hour of studying, plus greater levels of 'comprehension', minus lesser levels of 'language skills'.

Above model thus predicts that a student who doesn't study anything at all (thus, cannot show comprehension either) and has not problem at all with language (eg, he is a native and fluent speaker) typically gets a grade of 3 (let's assume, 3 out of 10), as:

y [grade]= 3[intercept]+ (0.5[grade for each hour of study]* 0[hours actually studying]) + (0.2[grade for comprehension]* 0[no comprehension]) - (0.3[grade for language problems]* 0[no language problems])= 3Meanwhile, a foreign student with some language problems, but who studies 4 hours and has good comprehension, can expect a grade of 5.5, a bit more than a pass:

y [grade]= 3[intercept]+ (0.5[grade for each hour of study]* 4[hours actually studying]) + (0.2[grade for comprehension]* 4[good comprehension]) - (0.3[grade for language problems]* 1[some language problems])= 5.5And a foreign student with some language problems, but who studies 8 hours and has medium levels of comprehension, can expect a grade of 7.1, around a B:

y [grade]= 3[intercept]+ (0.5[grade for each hour of study]* 8[hours actually studying]) + (0.2[grade for comprehension]* 2[medium comprehension]) - (0.3[grade for language problems]* 1[some language problems])= 7.1

**Footnotes**+++