Skip to contents

This function computes the generalized partial correlation coefficient \(r\) for each specified variable in a generalized linear model. For each variable, it refits the null model excluding that variable and computes the cosine similarity between the residualized predictor and standardized residuals.

Usage

gcor(
  full_glm,
  terms = NULL,
  normalize = FALSE,
  intercept_too = FALSE,
  algorithm = "auto",
  algorithm.control = list(n_exact = 15, thresholds = c(-0.1, 0, 0.1), n_random = 10,
    max_iter = 1000, topK = 10, tol = 1e-12, patience = 10)
)

Arguments

full_glm

A fitted GLM object of class `glm`.

terms

Character vector of variable names (referred to the model.matrix, that is pay attention for factors) for which to compute generalized partial correlations. If `NULL` (default), computes for all non-intercept terms in the model.

normalize

FALSE by default.

intercept_too

Logical indicating whether to include the intercept as a variable. Default is FALSE.

algorithm.control

Only used if normalize is TRUE. `list` of control parameters: `n_exact` Integer specifying the sample size threshold for using exact methods (brute force). Default is 15. `thresholds` Numeric vector of threshold values for multi-start initialization. `n_random` Integer number of random starts for multi-start optimization. `max_iter` Integer maximum number of iterations per start. `topK` Integer number of top candidates to consider at each iteration. `tol` Numeric tolerance for convergence. `patience` Integer number of iterations without improvement before stopping.

algorith

Only used if normalize is TRUE. `"auto"` by default. It choose between `"intercept_only"`, `"brute_force"` and `"multi_start"`

Value

if normalize is FALSE, a data frame with five columns:

variable

The variable name

r

The generalized partial correlation coefficient

while if normalize is TRUE

terms

The variable name

r

The generalized partial correlation coefficient

r_n

The normalized generalized partial correlation coefficient

null_model

The null model used to compute the generalized (partial) correlation

algorithm

The algorithm used to compute the upper/lower bounds of the generalized partial correlation coefficient (to compute its normalized version)

exact

logical

Details

The generalized partial correlation \(\r\) measures the association between a predictor and response after adjusting for all other terms in the model. It is defined as the cosine similarity between the residualized predictor \(X_r\) and standardized residuals \(Y_r\):

$$\r = \frac{X_r^\top Y_r}{\|X_r\| \|Y_r\|}$$

where:

  • \(X_r = (I - H)W^{1/2}X\) is the residualized predictor

  • \(Y_r = V^{-1/2}(Y - \hat{\mu})\) is the standardized residual vector

  • \(H\) is the hat matrix for the nuisance covariates

  • \(W = DV^{-1}D\) is the weight matrix

  • \(V\) is the variance matrix and \(D\) is the derivative matrix

The function uses `flipscores:::get_par_expo_fam()` to compute \(V\) and \(D\) consistently with the flipscores package methodology.

The normalized generalized partial correlation is computed as: $$ r_n = \begin{cases} +r / r_+ & \text{if } r > 0 \\ -r / r_- & \text{if } r < 0 \end{cases} $$ where \(r_+\) is the maximum possible correlation and \(r_-\) is the minimum.

When the (full) model has only intercept and only one predictor X, the generalized (non partial) correlation is computed and the normalization factor for X is exact. In the more general case with more predictors, for sample sizes \(n \leq n_{\text{exact}}\), brute force search is used to find the exact extrema. For larger sample sizes, a greedy multi-start algorithm is employed:

  • Multiple starting points are generated using thresholding and random sampling

  • From each start, coordinates are greedily flipped to improve the correlation

  • Early stopping is used when no improvements are found for several iterations

  • The best solution across all starts is returned

This approach provides a good trade-off between computational efficiency and solution quality for large problems where brute force is infeasible.

Author

Livio Finos and Paolo Girardi

Examples

set.seed(1)
dt=data.frame(X=rnorm(20),
   Z=factor(rep(LETTERS[1:3],length.out=20)))
dt$Y=rpois(n=20,lambda=exp(dt$Z=="C"))
mod=flipscores(Y~Z+X,data=dt,family="poisson",n_flips=1000)
summary(mod)
#> 
#> Call:
#> flipscores(formula = Y ~ Z + X, family = "poisson", data = dt, 
#>     n_flips = 1000)
#> 
#> Coefficients:
#>             Estimate    Score Std. Error  z value Part. Cor Pr(>|z|)   
#> (Intercept) -0.14256 -0.91360    2.62144 -0.34851    -0.127    0.736   
#> ZB          -0.18558 -0.50868    1.65785 -0.30683    -0.108    0.667   
#> ZC           1.40981  8.55380    2.58950  3.30326     0.765    0.004 **
#> X           -0.06964 -1.56935    4.70999 -0.33320    -0.117    0.652   
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 28.649  on 19  degrees of freedom
#> Residual deviance: 11.218  on 16  degrees of freedom
#> AIC: 58.102
#> 
#> Number of Fisher Scoring iterations: 5
#> 

# Compute generalized partial correlations for all terms
(results <- gcor(mod))
#>   terms       gcor null_model
#> 1   ~ZB -0.1080979    ~1+ZC+X
#> 2   ~ZC  0.7651942    ~1+ZB+X
#> 3    ~X -0.1174534   ~1+ZB+ZC

# Compute for specific terms only
gcor(mod, terms = c("X", "ZC"))
#>   terms       gcor null_model
#> 1    ~X -0.1174534   ~1+ZB+ZC
#> 2   ~ZC  0.7651942    ~1+ZB+X

gcor(mod, terms = c("X", "ZC"),normalize=TRUE)
#> Error in compute_gcor_normalized_binom(model0 = model_i, X = mm[, i_id,     drop = FALSE], algorithm = algorithm, algorithm.control = algorithm.control): object 'temp' not found

set.seed(123)
dt=data.frame(X=rnorm(20),
   Z=factor(rep(LETTERS[1:3],length.out=20)))
dt$Y=rbinom(n=20,prob=plogis((dt$Z=="C")*2),size=1)
mod=flipscores(Y~Z+X,data=dt,family="binomial",n_flips=1000)
summary(mod)
#> 
#> Call:
#> flipscores(formula = Y ~ Z + X, family = "binomial", data = dt, 
#>     n_flips = 1000)
#> 
#> Coefficients:
#>             Estimate    Score Std. Error  z value Part. Cor Pr(>|z|)  
#> (Intercept)  -0.1486  -0.2102     1.1881  -0.1770    -0.067    0.894  
#> ZB          -20.4539  -1.4784     0.7466  -1.9802    -0.530    0.071 .
#> ZC           20.8561   1.8043     0.8180   2.2057     0.615    0.031 *
#> X            -0.4276  -0.3782     0.9574  -0.3951    -0.149    0.773  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 27.5256  on 19  degrees of freedom
#> Residual deviance:  9.4015  on 16  degrees of freedom
#> AIC: 17.401
#> 
#> Number of Fisher Scoring iterations: 19
#> 

(results <- gcor(mod,normalize=TRUE))
#>    terms          r        r_n null_model   algorithm is.exact
#> ZB   ~ZB -0.5295718 -0.5304792    ~1+ZC+X multi_start    FALSE
#> ZC   ~ZC  0.6146304  0.6148542    ~1+ZB+X multi_start    FALSE
#> X     ~X -0.1493213 -0.1891066   ~1+ZB+ZC multi_start    FALSE
# Compute for specific terms only
gcor(mod, terms = c("X", "ZC"),normalize=TRUE)
#>    terms          r        r_n null_model   algorithm is.exact
#> X     ~X -0.1493213 -0.1891066   ~1+ZB+ZC multi_start    FALSE
#> ZC   ~ZC  0.6146304  0.6148542    ~1+ZB+X multi_start    FALSE