flipscores.Rd
Provides robust tests for testing in GLMs, by sign-flipping score contributions. The tests are often robust against overdispersion, heteroscedasticity and, in some cases, ignored nuisance variables.
flipscores(formula, family, data, score_type = "standardized",
n_flips = 5000, alternative = "two.sided", id = NULL,
seed = NULL, to_be_tested = NULL, flips = NULL,
precompute_flips = TRUE, output_flips = FALSE, ...)
see glm
function. It can also be a model (usually generated by a call to glm
); in this case, any other glm-related parameter (e.g. family, data, etc.
) are discarded, the function will make use of the ones used to generate the model.
(i.e. formula
, family
, data
, etc) are not considered. It is NULL
by default (i.e. not used).
see glm
function.
see glm
function.
The type of score that is computed. It can be "standardized" "orthogonalized", "effective" or "basic". Both "orthogonalized" and "effective" take into account the nuisance estimation and they provide the same test statistic. In case of small samples "effective score" might have a slight anti-conservative behaviour. "standardized effective score" gives a solution for this issue. "orthogonalized" has a similar intent, note however that in case of a big model matrix, it may be slow.
The number of random flips of the score contributions. Overwritten with the nrow(flips)
when flips
is not NULL
(see parameter flips
for more details).
When n_flips
is equal or larger than the maximum number of possible flips (i.e. n^2), all possible flips are performed.
It can be "greater", "less" or "two.sided" (default)
a vector
identifying the clustered observations. If NULL
(default) observations are assumed to be independent. If id
is not NULL
, only score_type=="effective"
is allowed, yet.
NULL
by default.
vector of indices or names of coefficients of the glm model to be tested (it is faster than computing every scores and p-values of course).
matrix fo +1 or -1, the matrix has n_flips
rows and n (number of observations) columns
TRUE
by default. Overwritten if flips
is not NULL
. If FALSE
the matrix of flips is not computed and the flips are made 'on-the-fly' before computing the test statistics; it may be usefull when flips
is very large (see parameter flips
for more details).
FALSE
by default. If TRUE
the flips
matrix is returned. Useful when the same flips are needed for more glms, for example in the case of multivariate glms where the joint distribution of test statistis if used for multivariate inference.
see glm
function.
an object of class flipscores
.
See also its methods (summary.flipscores
, anova.flipscores
, print.flipscores
).
flipscores
borrows the same parameters from function glm
(and glm.nb
). See these helps for more details about parameters such as formula
,
data
, family
. Note: in order to use Negative Binomial family, family
reference must have quotes (i.e. family="negbinom"
).
Furthermore, flipscores
object contains two extra elements: scores
-- i.e. a matrix of n score contributions, one column for each tested coefficient -- and Tspace
-- i.e. a matrix of size n_flips
times ncol(scores)
. The fist row of Tspace
contains column-wise the test statistics generated by randomly flipping the score contributions, each column refers to the same column of scores
, the vector of observed test statistics (i.e. no flips) is in the first row of Tspace
.
"Robust testing in generalized linear models by sign-flipping score contributions" by J.Hemerik, J.Goeman and L.Finos.
set.seed(1)
dt=data.frame(X=rnorm(20),
Z=factor(rep(LETTERS[1:3],length.out=20)))
dt$Y=rpois(n=20,lambda=exp(dt$Z=="C"))
mod=flipscores(Y~Z+X,data=dt,family="poisson",n_flips=1000)
summary(mod)
#>
#> Call:
#> flipscores(formula = Y ~ Z + X, family = "poisson", data = dt,
#> n_flips = 1000)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.4796 -0.4363 0.1579 0.3358 0.9899
#>
#> Coefficients:
#> Estimate Score Std. Error z value Part. Cor Pr(>|z|)
#> (Intercept) -0.14256 -0.91360 2.62144 -0.34851 -0.127 0.730
#> ZB -0.18558 -0.50868 1.65785 -0.30683 -0.108 0.652
#> ZC 1.40981 8.55380 2.58950 3.30326 0.765 0.004 **
#> X -0.06964 -1.56935 4.70999 -0.33320 -0.117 0.677
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 28.649 on 19 degrees of freedom
#> Residual deviance: 11.218 on 16 degrees of freedom
#> AIC: 58.102
#>
#> Number of Fisher Scoring iterations: 5
#>
# Equivalent to:
model=glm(Y~Z+X,data=dt,family="poisson")
mod2=flipscores(model)
#> Error in model.frame.default(formula = Y ~ Z + X, data = dt, drop.unused.levels = TRUE): 'data' must be a data.frame, environment, or list
summary(mod2)
#> Error in summary(mod2): object 'mod2' not found