Fit models to the simulated data — fit.simpr

Takes simulated data from generate and applies functions to it, usually model-fitting functions.

# S3 method for simpr_tibble
fit(
  object,
  ...,
  .quiet = TRUE,
  .warn_on_error = TRUE,
  .stop_on_error = FALSE,
  .debug = FALSE,
  .progress = FALSE,
  .options = furrr_options()
)

# S3 method for simpr_spec
fit(
  object,
  ...,
  .quiet = TRUE,
  .warn_on_error = TRUE,
  .stop_on_error = FALSE,
  .debug = FALSE,
  .progress = FALSE,
  .options = furrr_options()
)

Arguments

object: a simpr_tibble object--the simulated data from generate--or an simpr_spec object not yet generated.
...: purrr-style lambda functions used for computing on the simulated data. See Details and Examples.
.quiet: Should simulation errors be broadcast to the user as they occur?
.warn_on_error: Should there be a warning when simulation errors occur? See vignette("Managing simulation errors").
.stop_on_error: Should the simulation stop immediately when simulation errors occur?
.debug: Run simulation in debug mode, allowing objects, etc. to be explored for each attempt to fit objects.
.progress: A logical, for whether or not to print a progress bar for multiprocess, multisession, and multicore plans .
.options: The future specific options to use with the workers when using futures. This must be the result from a call to furrr_options().

Value

a simpr_tibble object with additional list-columns for the output of the provided functions (e.g. model outputs). Just like the output of

generate, there is one row per repetition per combination of metaparameters, and the columns are the repetition number rep, the metaparameter names, the simulated data

sim, with additional columns for the function outputs specified in .... If per_sim was called previously, fit returns the object to default simpr_tibble mode.

Details

This is the fourth step in the simulation process: after generating the simulation data, apply functions such as fitting a statistical model to the data. The output is often then passed to tidy_fits or glance_fits to extract relevant model estimates from the object.

Similar to specify, the model-fitting ... arguments can be arbitrary R expressions (purrr-style lambda functions, see as_mapper) to specify fitting models to the data. The functions are computed within each simulation cell, so dataset names are generally unnecessary: e.g., to compute regressions on each cell, fit(linear_model = ~ lm(c ~ a + b). If your modeling function requires a reference to the full dataset, use ., e.g. fit(linear_model = ~lm(c ~ a + b, data = .).

Examples

## Generate data to fit models
simple_linear_data = specify(a = ~ 2 + rnorm(n),
                               b = ~ 5 + 3*a + rnorm(n, 0, sd = 0.5)) %>%
  define(n = 100:101) %>%
  generate(2)

## Fit with a single linear term
linear_fit = simple_linear_data %>%
  fit(linear = ~lm(b ~ a, data = .))

linear_fit # first fit element also prints
#> full tibble
#> --------------------------
#> # A tibble: 4 × 5
#>   .sim_id     n   rep sim                linear
#>     <int> <int> <int> <list>             <list>
#> 1       1   100     1 <tibble [100 × 2]> <lm>  
#> 2       2   101     1 <tibble [101 × 2]> <lm>  
#> 3       3   100     2 <tibble [100 × 2]> <lm>  
#> 4       4   101     2 <tibble [101 × 2]> <lm>  
#> 
#> sim[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#>        a     b
#>    <dbl> <dbl>
#>  1 0.868  7.84
#>  2 2.03  11.1 
#>  3 3.14  14.0 
#>  4 0.985  9.11
#>  5 2.28  12.2 
#>  6 1.65  10.5 
#>  7 0.819  7.24
#>  8 2.83  13.9 
#>  9 1.98  10.6 
#> 10 0.940  7.16
#> # … with 90 more rows
#> 
#> linear[[1]]
#> --------------------------
#> 
#> Call:
#> lm(formula = b ~ a, data = .)
#> 
#> Coefficients:
#> (Intercept)            a  
#>       4.922        3.077  
#> 
#> 

## Each element of $linear is a model object
linear_fit$linear
#> [[1]]
#> 
#> Call:
#> lm(formula = b ~ a, data = .)
#> 
#> Coefficients:
#> (Intercept)            a  
#>       4.922        3.077  
#> 
#> 
#> [[2]]
#> 
#> Call:
#> lm(formula = b ~ a, data = .)
#> 
#> Coefficients:
#> (Intercept)            a  
#>       5.027        2.958  
#> 
#> 
#> [[3]]
#> 
#> Call:
#> lm(formula = b ~ a, data = .)
#> 
#> Coefficients:
#> (Intercept)            a  
#>       4.944        3.024  
#> 
#> 
#> [[4]]
#> 
#> Call:
#> lm(formula = b ~ a, data = .)
#> 
#> Coefficients:
#> (Intercept)            a  
#>       5.004        2.986  
#> 
#> 

## We can fit multiple models to the same data
multi_fit = simple_linear_data %>%
  fit(linear = ~lm(b ~ a, data = .),
      quadratic = ~lm(b ~ a + I(a^2), data = .))

## Two columns, one for each model
multi_fit
#> full tibble
#> --------------------------
#> # A tibble: 4 × 6
#>   .sim_id     n   rep sim                linear quadratic
#>     <int> <int> <int> <list>             <list> <list>   
#> 1       1   100     1 <tibble [100 × 2]> <lm>   <lm>     
#> 2       2   101     1 <tibble [101 × 2]> <lm>   <lm>     
#> 3       3   100     2 <tibble [100 × 2]> <lm>   <lm>     
#> 4       4   101     2 <tibble [101 × 2]> <lm>   <lm>     
#> 
#> sim[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#>        a     b
#>    <dbl> <dbl>
#>  1 0.868  7.84
#>  2 2.03  11.1 
#>  3 3.14  14.0 
#>  4 0.985  9.11
#>  5 2.28  12.2 
#>  6 1.65  10.5 
#>  7 0.819  7.24
#>  8 2.83  13.9 
#>  9 1.98  10.6 
#> 10 0.940  7.16
#> # … with 90 more rows
#> 
#> linear[[1]]
#> --------------------------
#> 
#> Call:
#> lm(formula = b ~ a, data = .)
#> 
#> Coefficients:
#> (Intercept)            a  
#>       4.922        3.077  
#> 
#> 
#> quadratic[[1]]
#> --------------------------
#> 
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#> 
#> Coefficients:
#> (Intercept)            a       I(a^2)  
#>     4.77043      3.27509     -0.05044  
#> 
#> 

## Again, each element is a model object
multi_fit$quadratic
#> [[1]]
#> 
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#> 
#> Coefficients:
#> (Intercept)            a       I(a^2)  
#>     4.77043      3.27509     -0.05044  
#> 
#> 
#> [[2]]
#> 
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#> 
#> Coefficients:
#> (Intercept)            a       I(a^2)  
#>     4.94605      3.05883     -0.02442  
#> 
#> 
#> [[3]]
#> 
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#> 
#> Coefficients:
#> (Intercept)            a       I(a^2)  
#>     5.11366      2.79535      0.05548  
#> 
#> 
#> [[4]]
#> 
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#> 
#> Coefficients:
#> (Intercept)            a       I(a^2)  
#>     4.84816      3.25333     -0.07714  
#> 
#> 

## Can view terms more nicely with tidy_fits
multi_fit %>%
  tidy_fits
#> # A tibble: 20 × 9
#>    .sim_id     n   rep Source    term        estimate std.error stati…¹  p.value
#>      <int> <int> <int> <chr>     <chr>          <dbl>     <dbl>   <dbl>    <dbl>
#>  1       1   100     1 linear    (Intercept)   4.92      0.0901  54.6   3.33e-75
#>  2       1   100     1 linear    a             3.08      0.0417  73.8   1.07e-87
#>  3       1   100     1 quadratic (Intercept)   4.77      0.131   36.5   1.71e-58
#>  4       1   100     1 quadratic a             3.28      0.131   24.9   6.21e-44
#>  5       1   100     1 quadratic I(a^2)       -0.0504    0.0317  -1.59  1.15e- 1
#>  6       2   101     1 linear    (Intercept)   5.03      0.105   47.8   3.21e-70
#>  7       2   101     1 linear    a             2.96      0.0484  61.1   2.12e-80
#>  8       2   101     1 quadratic (Intercept)   4.95      0.165   30.0   3.14e-51
#>  9       2   101     1 quadratic a             3.06      0.165   18.5   9.70e-34
#> 10       2   101     1 quadratic I(a^2)       -0.0244    0.0381  -0.641 5.23e- 1
#> 11       3   100     2 linear    (Intercept)   4.94      0.111   44.4   1.07e-66
#> 12       3   100     2 linear    a             3.02      0.0470  64.3   5.82e-82
#> 13       3   100     2 quadratic (Intercept)   5.11      0.154   33.1   1.11e-54
#> 14       3   100     2 quadratic a             2.80      0.153   18.3   3.21e-33
#> 15       3   100     2 quadratic I(a^2)        0.0555    0.0353   1.57  1.19e- 1
#> 16       4   101     2 linear    (Intercept)   5.00      0.103   48.4   1.05e-70
#> 17       4   101     2 linear    a             2.99      0.0510  58.6   1.19e-78
#> 18       4   101     2 quadratic (Intercept)   4.85      0.132   36.7   4.88e-59
#> 19       4   101     2 quadratic a             3.25      0.153   21.3   1.36e-38
#> 20       4   101     2 quadratic I(a^2)       -0.0771    0.0416  -1.86  6.65e- 2
#> # … with abbreviated variable name ¹statistic

## Can view model summaries with glance_fits
multi_fit %>%
  glance_fits
#> # A tibble: 8 × 16
#>   .sim_id     n   rep Source r.squ…¹ adj.r…² sigma stati…³  p.value    df logLik
#>     <int> <int> <int> <chr>    <dbl>   <dbl> <dbl>   <dbl>    <dbl> <dbl>  <dbl>
#> 1       1   100     1 linear   0.982   0.982 0.388   5446. 1.07e-87     1  -46.1
#> 2       1   100     1 quadr…   0.983   0.982 0.385   2767. 2.86e-86     2  -44.8
#> 3       2   101     1 linear   0.974   0.974 0.477   3731. 2.12e-80     1  -67.5
#> 4       2   101     1 quadr…   0.974   0.974 0.478   1854. 1.33e-78     2  -67.3
#> 5       3   100     2 linear   0.977   0.977 0.515   4137. 5.82e-82     1  -74.5
#> 6       3   100     2 quadr…   0.977   0.977 0.511   2101. 1.39e-80     2  -73.3
#> 7       4   101     2 linear   0.972   0.972 0.507   3431. 1.19e-78     1  -73.8
#> 8       4   101     2 quadr…   0.973   0.972 0.501   1760. 1.62e-77     2  -72.0
#> # … with 5 more variables: AIC <dbl>, BIC <dbl>, deviance <dbl>,
#> #   df.residual <int>, nobs <int>, and abbreviated variable names ¹r.squared,
#> #   ²adj.r.squared, ³statistic

## Fit functions do not actually need to be any particular kind of model, they
## can be any arbitrary function. However, not all functions will lead to useful
## output with tidy_fits and glance_fits.
add_five_data = simple_linear_data %>%
  fit(add_five = ~ . + 5)  ## adds 5 to every value in dataset

add_five_data
#> full tibble
#> --------------------------
#> # A tibble: 4 × 5
#>   .sim_id     n   rep sim                add_five      
#>     <int> <int> <int> <list>             <list>        
#> 1       1   100     1 <tibble [100 × 2]> <df [100 × 2]>
#> 2       2   101     1 <tibble [101 × 2]> <df [101 × 2]>
#> 3       3   100     2 <tibble [100 × 2]> <df [100 × 2]>
#> 4       4   101     2 <tibble [101 × 2]> <df [101 × 2]>
#> 
#> sim[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#>        a     b
#>    <dbl> <dbl>
#>  1 0.868  7.84
#>  2 2.03  11.1 
#>  3 3.14  14.0 
#>  4 0.985  9.11
#>  5 2.28  12.2 
#>  6 1.65  10.5 
#>  7 0.819  7.24
#>  8 2.83  13.9 
#>  9 1.98  10.6 
#> 10 0.940  7.16
#> # … with 90 more rows
#> 
#> add_five[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#>        a     b
#>    <dbl> <dbl>
#>  1  5.87  12.8
#>  2  7.03  16.1
#>  3  8.14  19.0
#>  4  5.99  14.1
#>  5  7.28  17.2
#>  6  6.65  15.5
#>  7  5.82  12.2
#>  8  7.83  18.9
#>  9  6.98  15.6
#> 10  5.94  12.2
#> # … with 90 more rows
#>