Specify the data-generating mechanisms for the simulation using purrr-style lambda functions.
# S3 method for formula
specify(x = NULL, ..., .use_names = TRUE, .sep = "_")
leave this argument blank (NULL); this argument is a placeholder and can be skipped.
named purrr
-style formula
functions used for generating simulation
variables. x
is not recommended as a
name, since it is a formal argument and will
be automatically assumed to be the first
variable (a message will be displayed if
x
is used).
Whether to use names generated by the lambda function (TRUE, the default), or to overwrite them with supplied names.
Specify the separator for auto-generating names. See Column naming.
A simpr_specify
object which
contains the functions needed to generate the
simulation; to be passed to
define
for defining
metaparameters or, if there are no
metaparameters, directly to
for generating the simulation.
Also useful is the fact that one can refer to
variables in subsequent arguments. So, one
could define another variable b
that
depends on a
very simply, e.g.
specify(a = ~ 3 + runif(10), b = ~ 2 *
x)
.
Finally, one can also refer to metaparameters
that are to be systematically varied in the
simulation study. See define
and the examples for more details.
This is always the first command in the
simulation process, to specify the actual
simulated variables, which is then passed to
define
to define metaparameters
and then to
generate
to
generate the data.
The ...
arguments use an efficient
syntax to specify custom functions needed for
generating a simulation, based on the
purrr
package. When producing one
variable, one can provide an expression such as
specify(a = ~ 3 + runif(10))
; the
expression is preceded by ~
, the tilde
operator, and can refer to previous arguments
in specify
or to metaparameters in
define
. This is called a lambda
function.
Order matters: arguments are evaluated
sequentially, so later argument can refer to an
earlier one, e.g. specify(a = ~ rnorm(2),
b = ~ a + rnorm(2))
.
generate
combines results together into a single tibble
for each simulation, so all lambda functions
should produce the same number of rows.
However, a lambda function can produce multiple
columns.
Because functions can produce different
numbers of columns, there are several options
for naming columns. If a provided lambda
function produces a single column, the name
given to the argument becomes the name of the
column. If the lambda function already
produces column names, then the output will
use these names if .use_names = TRUE
,
the default. Otherwise, simpr uses the
argument name as a base and auto-numbers the
columns. For instance, if the argument
a
generates a two-column matrix and
.sep = "_"
(the default) the columns
will be named a_1
and a_2
.
Custom names can also be directly provided by
a double-sided formula. The left-hand side
must use c
or
cbind
, e.g. specify(c(a,
b) ~ MASS::mvrnorm(5, c(0, 0), Sigma =
diag(2)))
.
This function is an S3 method for
specify
from the
generics
package. Because x
is
a formal argument of
specify
, if you have
a variable in your simulation named x
it will be automatically moved to be the
first variable (with a message). It is therefore
safest to use any other variable name besides
x
.
## specify a variable and generate it in the simulation
single_var = specify(a = ~ 1 + rnorm(5)) %>%
generate(1) # generate a single repetition of the simulation
single_var
#> full tibble
#> --------------------------
#> # A tibble: 1 × 3
#> .sim_id rep sim
#> <int> <int> <list>
#> 1 1 1 <tibble [5 × 1]>
#>
#> sim[[1]]
#> --------------------------
#> # A tibble: 5 × 1
#> a
#> <dbl>
#> 1 2.72
#> 2 1.66
#> 3 -0.236
#> 4 -0.638
#> 5 1.73
#>
two_var = specify(a = ~ 1 + rnorm(5),
b = ~ x + 2) %>%
generate(1)
#> Warning: Simulation produced errors. See column '.sim_error'.
two_var
#> tibble
#> --------------------------
#> # A tibble: 1 × 4
#> .sim_id rep sim .sim_error
#> <int> <int> <list> <chr>
#> 1 1 1 <NULL> "\u001b[1m\u001b[33mError\u001b[39m in `map()`:\u001b[22…
#>
## Generates a_01 through a_10
autonumber_var = specify(a = ~ MASS::mvrnorm(5, rep(0, 10), Sigma = diag(10))) %>%
generate(1)
autonumber_var
#> full tibble
#> --------------------------
#> # A tibble: 1 × 3
#> .sim_id rep sim
#> <int> <int> <list>
#> 1 1 1 <tibble [5 × 10]>
#>
#> sim[[1]]
#> --------------------------
#> # A tibble: 5 × 10
#> a_01 a_02 a_03 a_04 a_05 a_06 a_07 a_08 a_09 a_10
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.323 1.17 0.315 0.609 0.987 0.577 -0.0338 0.125 1.19 0.461
#> 2 0.485 -1.30 0.243 2.52 -1.31 0.205 0.400 -1.05 -0.458 -0.537
#> 3 0.419 0.542 0.0649 -0.239 -0.747 1.79 -1.84 -1.35 0.228 0.249
#> 4 1.00 0.282 0.939 0.611 0.160 0.192 0.518 -0.224 -1.10 0.904
#> 5 -1.39 2.03 -0.731 0.184 0.536 0.489 0.871 -2.44 -1.86 -1.41
#>
# alternatively, you could use a two-sided formula for names
multi_name = specify(cbind(a, b, c) ~ MASS::mvrnorm(5, rep(0, 3), Sigma = diag(3))) %>%
generate(1)
#> Formula specification for 'x' detected. Assuming 'x' is the first formula.
#>
#> To hide this message, or to avoid moving this formula first, use a different variable name.
multi_name
#> full tibble
#> --------------------------
#> # A tibble: 1 × 3
#> .sim_id rep sim
#> <int> <int> <list>
#> 1 1 1 <tibble [5 × 3]>
#>
#> sim[[1]]
#> --------------------------
#> # A tibble: 5 × 3
#> a b c
#> <dbl> <dbl> <dbl>
#> 1 0.259 1.19 -0.696
#> 2 0.821 0.0367 -0.565
#> 3 0.883 0.131 0.108
#> 4 0.535 0.811 -0.272
#> 5 -1.49 1.77 -0.668
#>
# Simple example of setting a metaparameter
simple_meta = specify(a = ~ 1 + rnorm(n)) %>%
define(n = c(5, 10)) %>% # without this line you would get an error!
generate(1)
simple_meta # has two rows now, one for each value of n
#> full tibble
#> --------------------------
#> # A tibble: 2 × 4
#> .sim_id n rep sim
#> <int> <dbl> <int> <list>
#> 1 1 5 1 <tibble [5 × 1]>
#> 2 2 10 1 <tibble [10 × 1]>
#>
#> sim[[1]]
#> --------------------------
#> # A tibble: 5 × 1
#> a
#> <dbl>
#> 1 -0.122
#> 2 1.68
#> 3 1.53
#> 4 3.29
#> 5 1.58
#>
simple_meta$sim[[1]] # n = 5
#> # A tibble: 5 × 1
#> a
#> <dbl>
#> 1 -0.122
#> 2 1.68
#> 3 1.53
#> 4 3.29
#> 5 1.58
simple_meta$sim[[2]] # n = 10
#> # A tibble: 10 × 1
#> a
#> <dbl>
#> 1 1.63
#> 2 0.322
#> 3 -0.292
#> 4 1.70
#> 5 1.64
#> 6 0.986
#> 7 0.107
#> 8 1.27
#> 9 -0.824
#> 10 0.250