Generate simulated data from specification

Use specification from specify or define to produce simulated data.

# S3 method for simpr_spec
generate(
  x,
  .reps,
  ...,
  .sim_name = "sim",
  .quiet = TRUE,
  .warn_on_error = TRUE,
  .stop_on_error = FALSE,
  .debug = FALSE,
  .progress = FALSE,
  .options = furrr_options(seed = TRUE)
)

Arguments

x: a simpr_spec object generated by define or specify, containing the specifications of the simulation
.reps: number of replications to run (a whole number greater than 0)
...: filtering criteria for which rows to simulate, passed to filter. This is useful for reproducing just a few selected rows of a simulation without needing to redo the entire simulation, see vignette("Reproducing simulations"),
.sim_name: name of the list-column to be created, containing simulation results. Default is "sim"
.quiet: Should simulation errors be broadcast to the user as they occur?
.warn_on_error: Should there be a warning when simulation errors occur? See vignette("Managing simulation errors").
.stop_on_error: Should the simulation stop immediately when simulation errors occur?
.debug: Run simulation in debug mode, allowing objects, etc. to be explored for each generated variable specification.
.progress: A logical, for whether or not to print a progress bar for multiprocess, multisession, and multicore plans.
.options: The future specific options to use with the workers when using futures. This must be the result from a call to furrr_options(seed = TRUE).

Value

a simpr_sims object, which is a tibble with a row for each repetition (a total of rep

repetitions) for each combination of metaparameters and some extra metadata used by fit. The columns are rep for the repetition number, the names of the metaparameters, and a list-column (named by the argument

sim_name) containing the dataset for each repetition and metaparameter combination. simpr_sims objects can be manipulated elementwise by dplyr and

tidyr verbs: the command is applied to each element of the simulation list-column.

Details

This is the third step in the simulation process: after specifying the population model and defining the metaparameters, if any, generate is the workhorse function that actually generates the simulated datasets, one for each replication and combination of metaparameters. You likely want to use the output of generate to fit model(s) with fit.

Errors you get using this function usually have to do with how you specified the simulation in specify and define.

Examples

meta_list_out = specify(a = ~ MASS::mvrnorm(n, rep(0, 2), Sigma = S)) %>%
  define(n = c(10, 20, 30),
       S = list(independent = diag(2), correlated = diag(2) + 2)) %>%
  generate(1)

 ## View overall structure of the result and a single simulation output
 meta_list_out
#> full tibble
#> --------------------------
#> # A tibble: 6 × 6
#>   .sim_id     n S_index       rep S             sim              
#>     <int> <dbl> <chr>       <int> <list>        <list>           
#> 1       1    10 independent     1 <dbl [2 × 2]> <tibble [10 × 2]>
#> 2       2    20 independent     1 <dbl [2 × 2]> <tibble [20 × 2]>
#> 3       3    30 independent     1 <dbl [2 × 2]> <tibble [30 × 2]>
#> 4       4    10 correlated      1 <dbl [2 × 2]> <tibble [10 × 2]>
#> 5       5    20 correlated      1 <dbl [2 × 2]> <tibble [20 × 2]>
#> 6       6    30 correlated      1 <dbl [2 × 2]> <tibble [30 × 2]>
#> 
#> sim[[1]]
#> --------------------------
#> # A tibble: 10 × 2
#>        a_1      a_2
#>      <dbl>    <dbl>
#>  1 -0.664   0.526  
#>  2  0.190  -1.07   
#>  3 -1.03   -1.29   
#>  4 -0.827  -0.653  
#>  5  0.144  -0.322  
#>  6  1.02    0.0764 
#>  7 -0.936  -2.38   
#>  8 -0.444   0.279  
#>  9 -1.54    0.00321
#> 10  0.0709 -0.289  
#> 

 ## Changing .reps will change the number of replications and thus the number of
 ## rows in the output
 meta_list_2 = specify(a = ~ MASS::mvrnorm(n, rep(0, 2), Sigma = S)) %>%
  define(n = c(10, 20, 30),
       S = list(independent = diag(2), correlated = diag(2) + 2)) %>%
  generate(2)

 meta_list_2
#> full tibble
#> --------------------------
#> # A tibble: 12 × 6
#>    .sim_id     n S_index       rep S             sim              
#>      <int> <dbl> <chr>       <int> <list>        <list>           
#>  1       1    10 independent     1 <dbl [2 × 2]> <tibble [10 × 2]>
#>  2       2    20 independent     1 <dbl [2 × 2]> <tibble [20 × 2]>
#>  3       3    30 independent     1 <dbl [2 × 2]> <tibble [30 × 2]>
#>  4       4    10 correlated      1 <dbl [2 × 2]> <tibble [10 × 2]>
#>  5       5    20 correlated      1 <dbl [2 × 2]> <tibble [20 × 2]>
#>  6       6    30 correlated      1 <dbl [2 × 2]> <tibble [30 × 2]>
#>  7       7    10 independent     2 <dbl [2 × 2]> <tibble [10 × 2]>
#>  8       8    20 independent     2 <dbl [2 × 2]> <tibble [20 × 2]>
#>  9       9    30 independent     2 <dbl [2 × 2]> <tibble [30 × 2]>
#> 10      10    10 correlated      2 <dbl [2 × 2]> <tibble [10 × 2]>
#> 11      11    20 correlated      2 <dbl [2 × 2]> <tibble [20 × 2]>
#> 12      12    30 correlated      2 <dbl [2 × 2]> <tibble [30 × 2]>
#> 
#> sim[[1]]
#> --------------------------
#> # A tibble: 10 × 2
#>        a_1     a_2
#>      <dbl>   <dbl>
#>  1  0.698  -1.78  
#>  2  0.609   0.406 
#>  3 -1.30    1.69  
#>  4  0.198  -0.712 
#>  5  1.05    1.48  
#>  6 -0.0732  0.909 
#>  7 -0.665  -0.0117
#>  8  1.48   -0.632 
#>  9  0.157  -0.0286
#> 10  0.701   1.94  
#> 

 ## Fitting, tidying functions can be included in this step by running those functions and then
 ## generate.  This can save computation time when doing large
 ## simulations, especially with parallel processing
 meta_list_generate_after = specify(a = ~ MASS::mvrnorm(n, rep(0, 2), Sigma = S)) %>%
  define(n = c(10, 20, 30),
       S = list(independent = diag(2), correlated = diag(2) + 2)) %>%
  fit(lm = ~ lm(a_2 ~ a_1, data = .)) %>%
  tidy_fits %>%
  generate(1)

  meta_list_generate_after
#> # A tibble: 12 × 10
#>    .sim_id     n S_index       rep Source term   estim…¹ std.e…² stati…³ p.value
#>      <int> <dbl> <chr>       <int> <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>
#>  1       1    10 independent     1 lm     (Inte…  0.989    0.274   3.61  6.86e-3
#>  2       1    10 independent     1 lm     a_1     0.358    0.240   1.49  1.74e-1
#>  3       2    20 independent     1 lm     (Inte…  0.173    0.163   1.06  3.04e-1
#>  4       2    20 independent     1 lm     a_1    -0.472    0.193  -2.44  2.51e-2
#>  5       3    30 independent     1 lm     (Inte… -0.138    0.155  -0.885 3.84e-1
#>  6       3    30 independent     1 lm     a_1    -0.0222   0.161  -0.138 8.91e-1
#>  7       4    10 correlated      1 lm     (Inte… -0.431    0.564  -0.765 4.67e-1
#>  8       4    10 correlated      1 lm     a_1     0.489    0.410   1.19  2.67e-1
#>  9       5    20 correlated      1 lm     (Inte… -0.129    0.395  -0.327 7.48e-1
#> 10       5    20 correlated      1 lm     a_1     0.879    0.228   3.87  1.13e-3
#> 11       6    30 correlated      1 lm     (Inte… -0.0868   0.274  -0.316 7.54e-1
#> 12       6    30 correlated      1 lm     a_1     0.718    0.154   4.68  6.72e-5
#> # … with abbreviated variable names ¹estimate, ²std.error, ³statistic

Arguments

Value

Details

See also

Examples