Correlated Component Regression – A smarter way to deal with Multicollinearity

There are various methods available to deal with multicollinearity however each one has some drawbacks. Multicollinearity reduces the precision of the estimated coefficients, which weakens the statistical power of the regression model. The coefficients become very sensitive to small changes and One might not be able to trust the p-values to identify independent variables that are statistically significant. In the case of high dimensional data, where the number of predictor variables P are near or exceed the sample size N, this can result in near-perfect predictions within the analysis sample. However, this apparently good predictive performance is usually an overfitting problem that deteriorates model performance when applied to new cases outside the sample.

Generally, when multicollinearity occurs, Stepwise regression is the most widely used technique to obtain a sparse and stable solution. Alternative approaches include sparse component and sparse penalized regression methods. In addition, non-sparse regression approaches such as ridge regression are also available

Component/ Dimension reduction approaches – excludes higher dimensions

  1. Principal Component Regression (PCR)
  2. Supervised PCR (SPCR).
  3. PLS regression.
  4. Sparse PLS regression (SPLS)
  5. Naïve Bayes

Sparse Penalty Regression Approaches – imposes an explicit penalty

  1. LARS/Lasso (L1- regularization)
  2. Elastic Net
  3. Non-convex penalty
    1. truncated L1 penalty
    2. clipped LASSO
    3. Sparse ridge
    4. SCAD, MCP

Non-Sparse Regression Approaches

  1. Ridge (L1- regularization)

Scenarios like data having a small number of correlated predictors or a large number of correlated predictors with relatively small samples, most of the above methods do not give reliable results. Correlated Component Regression (CCR) is a reliable method that can be used in such circumstances and can provide reliable predictions even with near multicollinear data.

CCR Is a unique smoothing algorithm that is designed to produce more stable model estimates which are better at prediction than the un-stabilized estimates obtained from conventional Regression models with high multicollinearity. Along with working on low base sizes, it applies the proper amount of regularization (K components) to reduce the confounding effects of high predictor correlation, and the CCR step-down algorithm can be used to exclude irrelevant and weak predictors, resulting a sparse model that provides a better prediction (better classification) and coefficient estimates closer to the true values. Unlike PLS-R and penalized regression approaches, CCR is scale-invariant.

Correlated Component Regression (CCR) is an approach for the development of a sequential K-component predictive model, each component estimated by the application of the naïve Bayes rule to deal with the effects of multicollinearity by employing some unique features effectively:

  1. Stabilising the model using a factor-analysis thus effectively strengthens the ability to detect the pattern and reducing noise at the same time
  2. Selecting the best model via a process called cross-validation
  3. Using a stepping-down procedure to screen out irrelevant predictors using the coefficients which have been stabilised

How does CCR Work?

  1. CCR models are based on two tuning parameters: K = Number of components, P = Number of Predictors
  2. Each component K is an exact linear combination of the included predictors, the weights in CCR are chosen to maximize the components ability to predict the outcome variable
  3. The first component C1 captures the effects of prime predictors which have direct effects on the outcome. It is a weighted average of all 1-predictor effects.
  4. The second component C2, correlated with C1, captures the effects of suppressor variables (proxy predictors) that improve prediction by removing extraneous variation from one or more prime predictors. Additional components are included if they improve prediction significantly
  5. The simultaneous variable reduction is achieved using a step-down algorithm where at each step the least important predictor is removed, importance defined by the absolute value of the standardized coefficient. K-fold cross-validation is used to determine the number of components and predictors.
  6. Example K=2, P=10
    1. A standard regression (no components) would yield an intercept + 10 coefficients
    2. CCR with K=2 yields an intercept + 2 component weights.
    3. Since components can be expressed in terms of predictors, a reduced form results in an intercept + 10 regularized coefficients for predictors.

Just as there are several different variants of regression to deal with different assumptions associated with the distributions and scale types of the dependent variable and predictors, there are several variants of CCR – one for each different type of regression:

  • CCR-Linear – Continuous dependent variable
  • CCR-LDA – Dichotomous dependent and continuous predictors satisfying assumptions

of linear discriminant analysis (LDA)

  • CCR-Logistic – Dichotomous dependent variable
  • CCR-Ord – Ordinal dependent variable
  • CCR-Nom – Nominal dependent variable
  • CCR-Cox – Survival analysis
  • CCR-Latent – Dependent variable represented by latent classes

In practice, CCR has outperformed various penalty approaches as well as PLS regression algorithms. Many current variable selection algorithms should be avoided as they are designed to select only predictor variables that are correlated with the dependent variable. Correlated Component Regression (CCR) is revolutionising predictive model development and should be explored by emerging Data Scientists.