Robust testing in GLMs, by sign-flipping score contributions

Provides robust tests for testing in GLMs, by sign-flipping score contributions. The tests are often robust against overdispersion, heteroscedasticity and, in some cases, ignored nuisance variables.

flipscores(formula, family, data, score_type = "standardized",
n_flips = 5000, alternative = "two.sided", id = NULL,
seed = NULL, to_be_tested = NULL, flips = NULL,
precompute_flips = TRUE, output_flips = FALSE, ...)

Arguments

formula: see glm function. It can also be a model (usually generated by a call to glm); in this case, any other glm-related parameter (e.g. family, data, etc.) are discarded, the function will make use of the ones used to generate the model. (i.e. formula, family, data, etc) are not considered. It is NULL by default (i.e. not used).
family: see glm function.
data: see glm function.
score_type: The type of score that is computed. It can be "standardized" "orthogonalized", "effective" or "basic". Both "orthogonalized" and "effective" take into account the nuisance estimation and they provide the same test statistic. In case of small samples "effective score" might have a slight anti-conservative behaviour. "standardized effective score" gives a solution for this issue. "orthogonalized" has a similar intent, note however that in case of a big model matrix, it may be slow.
n_flips: The number of random flips of the score contributions. Overwritten with the nrow(flips) when flips is not NULL (see parameter flips for more details). When n_flips is equal or larger than the maximum number of possible flips (i.e. n^2), all possible flips are performed.
alternative: It can be "greater", "less" or "two.sided" (default)
id: a vector identifying the clustered observations. If NULL (default) observations are assumed to be independent. If id is not NULL, only score_type=="effective" is allowed, yet.
seed: NULL by default.
to_be_tested: vector of indices or names of coefficients of the glm model to be tested (it is faster than computing every scores and p-values of course).
flips: matrix fo +1 or -1, the matrix has n_flips rows and n (number of observations) columns
precompute_flips: TRUE by default. Overwritten if flips is not NULL. If FALSE the matrix of flips is not computed and the flips are made 'on-the-fly' before computing the test statistics; it may be usefull when flips is very large (see parameter flips for more details).
output_flips: FALSE by default. If TRUE the flips matrix is returned. Useful when the same flips are needed for more glms, for example in the case of multivariate glms where the joint distribution of test statistis if used for multivariate inference.
...: see glm function.

Value

an object of class flipscores. See also its methods (summary.flipscores, anova.flipscores, print.flipscores).

Details

flipscores borrows the same parameters from function glm (and glm.nb). See these helps for more details about parameters such as formula, data, family. Note: in order to use Negative Binomial family, family reference must have quotes (i.e. family="negbinom"). Furthermore, flipscores object contains two extra elements: scores -- i.e. a matrix of n score contributions, one column for each tested coefficient -- and Tspace -- i.e. a matrix of size n_flips times ncol(scores). The fist row of Tspace contains column-wise the test statistics generated by randomly flipping the score contributions, each column refers to the same column of scores, the vector of observed test statistics (i.e. no flips) is in the first row of Tspace.

References

"Robust testing in generalized linear models by sign-flipping score contributions" by J.Hemerik, J.Goeman and L.Finos.

Author

Livio Finos, Riccardo De Santis, Jesse Hemerik and Jelle Goeman

Examples

set.seed(1)
dt=data.frame(X=rnorm(20),
   Z=factor(rep(LETTERS[1:3],length.out=20)))
dt$Y=rpois(n=20,lambda=exp(dt$Z=="C"))
mod=flipscores(Y~Z+X,data=dt,family="poisson",n_flips=1000)
summary(mod)
#> 
#> Call:
#> flipscores(formula = Y ~ Z + X, family = "poisson", data = dt, 
#>     n_flips = 1000)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.4796  -0.4363   0.1579   0.3358   0.9899  
#> 
#> Coefficients:
#>             Estimate    Score Std. Error  z value Part. Cor Pr(>|z|)   
#> (Intercept) -0.14256 -0.91360    2.62144 -0.34851    -0.127    0.730   
#> ZB          -0.18558 -0.50868    1.65785 -0.30683    -0.108    0.652   
#> ZC           1.40981  8.55380    2.58950  3.30326     0.765    0.004 **
#> X           -0.06964 -1.56935    4.70999 -0.33320    -0.117    0.677   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 28.649  on 19  degrees of freedom
#> Residual deviance: 11.218  on 16  degrees of freedom
#> AIC: 58.102
#> 
#> Number of Fisher Scoring iterations: 5
#> 

# Equivalent to:
model=glm(Y~Z+X,data=dt,family="poisson")
mod2=flipscores(model)
#> Error in model.frame.default(formula = Y ~ Z + X, data = dt, drop.unused.levels = TRUE): 'data' must be a data.frame, environment, or list
summary(mod2)
#> Error in summary(mod2): object 'mod2' not found