
Robust testing in GLMs, by sign-flipping score contributions
flipscores.RdProvides robust tests for testing in GLMs, by sign-flipping score contributions. The tests are often robust against overdispersion, heteroscedasticity and, in some cases, ignored nuisance variables.
Usage
flipscores(formula, family, data, score_type = "standardized",
n_flips = 5000, alternative = "two.sided", id = NULL,
seed = NULL, to_be_tested = NULL, flips = NULL,
precompute_flips=TRUE, ...)Arguments
- formula
see
glmfunction. It can also be a model (usually generated by a call toglm); in this case, any other glm-related parameter (e.g.family, data, etc.) are discarded, the function will make use of the ones used to generate the model. (i.e.formula,family,data, etc) are not considered. It isNULLby default (i.e. not used).- family
see
glmfunction.- data
see
glmfunction.- score_type
The type of score that is computed. It can be "standardized" "orthogonalized", "effective" or "basic". Both "orthogonalized" and "effective" take into account the nuisance estimation and they provide the same test statistic. In case of small samples "effective score" might have a slight anti-conservative behaviour. "standardized effective score" gives a solution for this issue. "orthogonalized" has a similar intent, note however that in case of a big model matrix, it may be slow.
- n_flips
The number of random flips of the score contributions. Overwritten with the
nrow(flips)whenflipsis notNULL(see parameterflipsfor more details). Whenn_flipsis equal or larger than the maximum number of possible flips (i.e. n^2), all possible flips are performed.- alternative
It can be "greater", "less" or "two.sided" (default)
- id
a
vectoridentifying the clustered observations. IfNULL(default) observations are assumed to be independent. Ifidis notNULL, onlyscore_type=="effective"is allowed, yet.- seed
NULLby default.- to_be_tested
vector of indices or names of coefficients of the glm model to be tested (it is faster than computing every scores and p-values of course).
- flips
matrix fo +1 or -1, the matrix has
n_flipsrows and n (number of observations) columns- precompute_flips
TRUEby default. Overwritten ifflipsis notNULL. IfFALSEthe matrix of flips is not computed and the flips are made 'on-the-fly' before computing the test statistics; it may be usefull whenflipsis very large (see parameterflipsfor more details).- ...
see
glmfunction.
Value
an object of class flipscores.
See also its methods (summary.flipscores, anova.flipscores, print.flipscores).
Details
flipscores borrows the same parameters from function glm (and glm.nb). See these helps for more details about parameters such as formula,
data, family. Note: in order to use Negative Binomial family, family reference must have quotes (i.e. family="negbinom").
Furthermore, flipscores object contains two extra elements: scores – i.e. a matrix of n score contributions, one column for each tested coefficient – and Tspace – i.e. a matrix of size n_flips times ncol(scores). The fist row of Tspace contains column-wise the test statistics generated by randomly flipping the score contributions, each column refers to the same column of scores, the vector of observed test statistics (i.e. no flips) is in the first row of Tspace.
References
"Robust testing in generalized linear models by sign-flipping score contributions" by J.Hemerik, J.Goeman and L.Finos.
Examples
set.seed(1)
dt=data.frame(X=rnorm(20),
Z=factor(rep(LETTERS[1:3],length.out=20)))
dt$Y=rpois(n=20,lambda=exp(dt$Z=="C"))
mod=flipscores(Y~Z+X,data=dt,family="poisson",n_flips=1000)
summary(mod)
#>
#> Call:
#> flipscores(formula = Y ~ Z + X, family = "poisson", data = dt,
#> n_flips = 1000)
#>
#> Coefficients:
#> Estimate Score Std. Error z value Part. Cor Pr(>|z|)
#> (Intercept) -0.14256 -0.91360 2.62144 -0.34851 -0.127 0.726
#> ZB -0.18558 -0.50868 1.65785 -0.30683 -0.108 0.644
#> ZC 1.40981 8.55380 2.58950 3.30326 0.765 0.006 **
#> X -0.06964 -1.56935 4.70999 -0.33320 -0.117 0.682
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 28.649 on 19 degrees of freedom
#> Residual deviance: 11.218 on 16 degrees of freedom
#> AIC: 58.102
#>
#> Number of Fisher Scoring iterations: 5
#>
# Equivalent to:
model=glm(Y~Z+X,data=dt,family="poisson")
mod2=flipscores(model)
#> Error in model.frame.default(formula = Y ~ Z + X, data = dt, drop.unused.levels = TRUE): 'data' must be a data.frame, environment, or list
summary(mod2)
#> Error: object 'mod2' not found