Linear Regression
- A supervised learning algorithm used for regression problems
- Model an output variable as a linear combination of input features, finds a line (or surface) that best fits the data
- Formula: y_hat = W^T·X
- y_hat, dependent/response variable, target
- W^T, weights or coefficients
- X, independent/predictor variable(s), features
- Polynomial Regression: add polynomial features
- Assumptions
- Linear Relationship - a linear relationship between each predictor variable and the response variable
- No Multicollinearity - none of the predictor variables are highly correlated with each other
- Independence - each observation in the dataset is independent
- Homoscedasticity - residuals have constant variance at every point in the linear model
- Multivariate Normality - residuals of the model are normally distributed
Pros | Cons |
---|---|
Highly interpretable | Sensitive to outliers |
Fast to train | Can underfit with small, high-dimensional data |
Approaches
Ordinary Least Squares, Normal Equation (instant approach)
Gradient Descent (iterative approach)
- Feature Scaling
- Squared Error Cost Function
Model Performance Evaluation
- R^2
- MAE, RMSE, MAPE