
Compute Generalized Partial Correlations for GLM terms
gcor.RdThis function computes the generalized partial correlation coefficient \(r\) for each specified variable in a generalized linear model. For each variable, it refits the null model excluding that variable and computes the cosine similarity between the residualized predictor and standardized residuals.
Arguments
- full_glm
A fitted GLM object of class `glm`.
- terms
Character vector of variable names (referred to the model.matrix, that is pay attention for factors) for which to compute generalized partial correlations. If `NULL` (default), computes for all non-intercept terms in the model.
- normalize
FALSE by default.
- intercept_too
Logical indicating whether to include the intercept as a variable. Default is FALSE.
- algorithm.control
Only used if
normalizeisTRUE. `list` of control parameters: `n_exact` Integer specifying the sample size threshold for using exact methods (brute force). Default is 15. `thresholds` Numeric vector of threshold values for multi-start initialization. `n_random` Integer number of random starts for multi-start optimization. `max_iter` Integer maximum number of iterations per start. `topK` Integer number of top candidates to consider at each iteration. `tol` Numeric tolerance for convergence. `patience` Integer number of iterations without improvement before stopping.- algorith
Only used if
normalizeisTRUE. `"auto"` by default. It choose between `"intercept_only"`, `"brute_force"` and `"multi_start"`
Value
if normalize is FALSE, a data frame with five columns:
- variable
The variable name
- r
The generalized partial correlation coefficient
while if normalize is TRUE
- terms
The variable name
- r
The generalized partial correlation coefficient
- r_n
The normalized generalized partial correlation coefficient
- null_model
The null model used to compute the generalized (partial) correlation
- algorithm
The algorithm used to compute the upper/lower bounds of the generalized partial correlation coefficient (to compute its normalized version)
- exact
logical
Details
The generalized partial correlation \(\r\) measures the association between a predictor and response after adjusting for all other terms in the model. It is defined as the cosine similarity between the residualized predictor \(X_r\) and standardized residuals \(Y_r\):
$$\r = \frac{X_r^\top Y_r}{\|X_r\| \|Y_r\|}$$
where:
\(X_r = (I - H)W^{1/2}X\) is the residualized predictor
\(Y_r = V^{-1/2}(Y - \hat{\mu})\) is the standardized residual vector
\(H\) is the hat matrix for the nuisance covariates
\(W = DV^{-1}D\) is the weight matrix
\(V\) is the variance matrix and \(D\) is the derivative matrix
The function uses `flipscores:::get_par_expo_fam()` to compute \(V\) and \(D\) consistently with the flipscores package methodology.
The normalized generalized partial correlation is computed as: $$ r_n = \begin{cases} +r / r_+ & \text{if } r > 0 \\ -r / r_- & \text{if } r < 0 \end{cases} $$ where \(r_+\) is the maximum possible correlation and \(r_-\) is the minimum.
When the (full) model has only intercept and only one predictor X, the generalized (non partial) correlation is computed and the normalization factor for X is exact. In the more general case with more predictors, for sample sizes \(n \leq n_{\text{exact}}\), brute force search is used to find the exact extrema. For larger sample sizes, a greedy multi-start algorithm is employed:
Multiple starting points are generated using thresholding and random sampling
From each start, coordinates are greedily flipped to improve the correlation
Early stopping is used when no improvements are found for several iterations
The best solution across all starts is returned
This approach provides a good trade-off between computational efficiency and solution quality for large problems where brute force is infeasible.
Examples
set.seed(1)
dt=data.frame(X=rnorm(20),
Z=factor(rep(LETTERS[1:3],length.out=20)))
dt$Y=rpois(n=20,lambda=exp(dt$Z=="C"))
mod=flipscores(Y~Z+X,data=dt,family="poisson",n_flips=1000)
summary(mod)
#>
#> Call:
#> flipscores(formula = Y ~ Z + X, family = "poisson", data = dt,
#> n_flips = 1000)
#>
#> Coefficients:
#> Estimate Score Std. Error z value Part. Cor Pr(>|z|)
#> (Intercept) -0.14256 -0.91360 2.62144 -0.34851 -0.127 0.736
#> ZB -0.18558 -0.50868 1.65785 -0.30683 -0.108 0.667
#> ZC 1.40981 8.55380 2.58950 3.30326 0.765 0.004 **
#> X -0.06964 -1.56935 4.70999 -0.33320 -0.117 0.652
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 28.649 on 19 degrees of freedom
#> Residual deviance: 11.218 on 16 degrees of freedom
#> AIC: 58.102
#>
#> Number of Fisher Scoring iterations: 5
#>
# Compute generalized partial correlations for all terms
(results <- gcor(mod))
#> terms gcor null_model
#> 1 ~ZB -0.1080979 ~1+ZC+X
#> 2 ~ZC 0.7651942 ~1+ZB+X
#> 3 ~X -0.1174534 ~1+ZB+ZC
# Compute for specific terms only
gcor(mod, terms = c("X", "ZC"))
#> terms gcor null_model
#> 1 ~X -0.1174534 ~1+ZB+ZC
#> 2 ~ZC 0.7651942 ~1+ZB+X
gcor(mod, terms = c("X", "ZC"),normalize=TRUE)
#> Error in compute_gcor_normalized_binom(model0 = model_i, X = mm[, i_id, drop = FALSE], algorithm = algorithm, algorithm.control = algorithm.control): object 'temp' not found
set.seed(123)
dt=data.frame(X=rnorm(20),
Z=factor(rep(LETTERS[1:3],length.out=20)))
dt$Y=rbinom(n=20,prob=plogis((dt$Z=="C")*2),size=1)
mod=flipscores(Y~Z+X,data=dt,family="binomial",n_flips=1000)
summary(mod)
#>
#> Call:
#> flipscores(formula = Y ~ Z + X, family = "binomial", data = dt,
#> n_flips = 1000)
#>
#> Coefficients:
#> Estimate Score Std. Error z value Part. Cor Pr(>|z|)
#> (Intercept) -0.1486 -0.2102 1.1881 -0.1770 -0.067 0.894
#> ZB -20.4539 -1.4784 0.7466 -1.9802 -0.530 0.071 .
#> ZC 20.8561 1.8043 0.8180 2.2057 0.615 0.031 *
#> X -0.4276 -0.3782 0.9574 -0.3951 -0.149 0.773
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 27.5256 on 19 degrees of freedom
#> Residual deviance: 9.4015 on 16 degrees of freedom
#> AIC: 17.401
#>
#> Number of Fisher Scoring iterations: 19
#>
(results <- gcor(mod,normalize=TRUE))
#> terms r r_n null_model algorithm is.exact
#> ZB ~ZB -0.5295718 -0.5304792 ~1+ZC+X multi_start FALSE
#> ZC ~ZC 0.6146304 0.6148542 ~1+ZB+X multi_start FALSE
#> X ~X -0.1493213 -0.1891066 ~1+ZB+ZC multi_start FALSE
# Compute for specific terms only
gcor(mod, terms = c("X", "ZC"),normalize=TRUE)
#> terms r r_n null_model algorithm is.exact
#> X ~X -0.1493213 -0.1891066 ~1+ZB+ZC multi_start FALSE
#> ZC ~ZC 0.6146304 0.6148542 ~1+ZB+X multi_start FALSE