Takes simulated data from
generate
and applies functions to it, usually
model-fitting functions.
# S3 method for simpr_tibble
fit(
object,
...,
.quiet = TRUE,
.warn_on_error = TRUE,
.stop_on_error = FALSE,
.debug = FALSE,
.progress = FALSE,
.options = furrr_options()
)
# S3 method for simpr_spec
fit(
object,
...,
.quiet = TRUE,
.warn_on_error = TRUE,
.stop_on_error = FALSE,
.debug = FALSE,
.progress = FALSE,
.options = furrr_options()
)a simpr_tibble object--the
simulated data from
generate--or
an simpr_spec object not yet
generated.
purrr-style lambda functions
used for computing on the simulated data. See
Details and Examples.
Should simulation errors be broadcast to the user as they occur?
Should there be a warning
when simulation errors occur? See
vignette("Managing simulation errors").
Should the simulation stop immediately when simulation errors occur?
Run simulation in debug mode, allowing objects, etc. to be explored for each attempt to fit objects.
A logical, for whether or not to print a progress bar for multiprocess, multisession, and multicore plans .
The future specific
options to use with the workers when using
futures. This must be the result from a call
to
furrr_options().
a simpr_tibble object with
additional list-columns for the output of the
provided functions (e.g. model outputs). Just
like the output of
generate,
there is one row per repetition per
combination of metaparameters, and the
columns are the repetition number rep,
the metaparameter names, the simulated data
sim, with additional columns for the
function outputs specified in ....
If per_sim was called
previously, fit returns the object to
default simpr_tibble mode.
This is the fourth step in the simulation
process: after generating the simulation data,
apply functions such as fitting a statistical
model to the data. The output is often then
passed to tidy_fits or
glance_fits to extract relevant
model estimates from the object.
Similar to
specify, the
model-fitting ... arguments can be
arbitrary R expressions (purrr-style
lambda functions, see
as_mapper) to specify
fitting models to the data. The functions are
computed within each simulation cell, so
dataset names are generally unnecessary: e.g.,
to compute regressions on each cell,
fit(linear_model = ~ lm(c ~ a + b). If
your modeling function requires a reference to
the full dataset, use ., e.g.
fit(linear_model = ~lm(c ~ a + b, data =
.).
## Generate data to fit models
simple_linear_data = specify(a = ~ 2 + rnorm(n),
b = ~ 5 + 3*a + rnorm(n, 0, sd = 0.5)) %>%
define(n = 100:101) %>%
generate(2)
## Fit with a single linear term
linear_fit = simple_linear_data %>%
fit(linear = ~lm(b ~ a, data = .))
linear_fit # first fit element also prints
#> full tibble
#> --------------------------
#> # A tibble: 4 × 5
#> .sim_id n rep sim linear
#> <int> <int> <int> <list> <list>
#> 1 1 100 1 <tibble [100 × 2]> <lm>
#> 2 2 101 1 <tibble [101 × 2]> <lm>
#> 3 3 100 2 <tibble [100 × 2]> <lm>
#> 4 4 101 2 <tibble [101 × 2]> <lm>
#>
#> sim[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#> a b
#> <dbl> <dbl>
#> 1 0.868 7.84
#> 2 2.03 11.1
#> 3 3.14 14.0
#> 4 0.985 9.11
#> 5 2.28 12.2
#> 6 1.65 10.5
#> 7 0.819 7.24
#> 8 2.83 13.9
#> 9 1.98 10.6
#> 10 0.940 7.16
#> # … with 90 more rows
#>
#> linear[[1]]
#> --------------------------
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 4.922 3.077
#>
#>
## Each element of $linear is a model object
linear_fit$linear
#> [[1]]
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 4.922 3.077
#>
#>
#> [[2]]
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 5.027 2.958
#>
#>
#> [[3]]
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 4.944 3.024
#>
#>
#> [[4]]
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 5.004 2.986
#>
#>
## We can fit multiple models to the same data
multi_fit = simple_linear_data %>%
fit(linear = ~lm(b ~ a, data = .),
quadratic = ~lm(b ~ a + I(a^2), data = .))
## Two columns, one for each model
multi_fit
#> full tibble
#> --------------------------
#> # A tibble: 4 × 6
#> .sim_id n rep sim linear quadratic
#> <int> <int> <int> <list> <list> <list>
#> 1 1 100 1 <tibble [100 × 2]> <lm> <lm>
#> 2 2 101 1 <tibble [101 × 2]> <lm> <lm>
#> 3 3 100 2 <tibble [100 × 2]> <lm> <lm>
#> 4 4 101 2 <tibble [101 × 2]> <lm> <lm>
#>
#> sim[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#> a b
#> <dbl> <dbl>
#> 1 0.868 7.84
#> 2 2.03 11.1
#> 3 3.14 14.0
#> 4 0.985 9.11
#> 5 2.28 12.2
#> 6 1.65 10.5
#> 7 0.819 7.24
#> 8 2.83 13.9
#> 9 1.98 10.6
#> 10 0.940 7.16
#> # … with 90 more rows
#>
#> linear[[1]]
#> --------------------------
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 4.922 3.077
#>
#>
#> quadratic[[1]]
#> --------------------------
#>
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#>
#> Coefficients:
#> (Intercept) a I(a^2)
#> 4.77043 3.27509 -0.05044
#>
#>
## Again, each element is a model object
multi_fit$quadratic
#> [[1]]
#>
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#>
#> Coefficients:
#> (Intercept) a I(a^2)
#> 4.77043 3.27509 -0.05044
#>
#>
#> [[2]]
#>
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#>
#> Coefficients:
#> (Intercept) a I(a^2)
#> 4.94605 3.05883 -0.02442
#>
#>
#> [[3]]
#>
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#>
#> Coefficients:
#> (Intercept) a I(a^2)
#> 5.11366 2.79535 0.05548
#>
#>
#> [[4]]
#>
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#>
#> Coefficients:
#> (Intercept) a I(a^2)
#> 4.84816 3.25333 -0.07714
#>
#>
## Can view terms more nicely with tidy_fits
multi_fit %>%
tidy_fits
#> # A tibble: 20 × 9
#> .sim_id n rep Source term estimate std.error stati…¹ p.value
#> <int> <int> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 100 1 linear (Intercept) 4.92 0.0901 54.6 3.33e-75
#> 2 1 100 1 linear a 3.08 0.0417 73.8 1.07e-87
#> 3 1 100 1 quadratic (Intercept) 4.77 0.131 36.5 1.71e-58
#> 4 1 100 1 quadratic a 3.28 0.131 24.9 6.21e-44
#> 5 1 100 1 quadratic I(a^2) -0.0504 0.0317 -1.59 1.15e- 1
#> 6 2 101 1 linear (Intercept) 5.03 0.105 47.8 3.21e-70
#> 7 2 101 1 linear a 2.96 0.0484 61.1 2.12e-80
#> 8 2 101 1 quadratic (Intercept) 4.95 0.165 30.0 3.14e-51
#> 9 2 101 1 quadratic a 3.06 0.165 18.5 9.70e-34
#> 10 2 101 1 quadratic I(a^2) -0.0244 0.0381 -0.641 5.23e- 1
#> 11 3 100 2 linear (Intercept) 4.94 0.111 44.4 1.07e-66
#> 12 3 100 2 linear a 3.02 0.0470 64.3 5.82e-82
#> 13 3 100 2 quadratic (Intercept) 5.11 0.154 33.1 1.11e-54
#> 14 3 100 2 quadratic a 2.80 0.153 18.3 3.21e-33
#> 15 3 100 2 quadratic I(a^2) 0.0555 0.0353 1.57 1.19e- 1
#> 16 4 101 2 linear (Intercept) 5.00 0.103 48.4 1.05e-70
#> 17 4 101 2 linear a 2.99 0.0510 58.6 1.19e-78
#> 18 4 101 2 quadratic (Intercept) 4.85 0.132 36.7 4.88e-59
#> 19 4 101 2 quadratic a 3.25 0.153 21.3 1.36e-38
#> 20 4 101 2 quadratic I(a^2) -0.0771 0.0416 -1.86 6.65e- 2
#> # … with abbreviated variable name ¹statistic
## Can view model summaries with glance_fits
multi_fit %>%
glance_fits
#> # A tibble: 8 × 16
#> .sim_id n rep Source r.squ…¹ adj.r…² sigma stati…³ p.value df logLik
#> <int> <int> <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 100 1 linear 0.982 0.982 0.388 5446. 1.07e-87 1 -46.1
#> 2 1 100 1 quadr… 0.983 0.982 0.385 2767. 2.86e-86 2 -44.8
#> 3 2 101 1 linear 0.974 0.974 0.477 3731. 2.12e-80 1 -67.5
#> 4 2 101 1 quadr… 0.974 0.974 0.478 1854. 1.33e-78 2 -67.3
#> 5 3 100 2 linear 0.977 0.977 0.515 4137. 5.82e-82 1 -74.5
#> 6 3 100 2 quadr… 0.977 0.977 0.511 2101. 1.39e-80 2 -73.3
#> 7 4 101 2 linear 0.972 0.972 0.507 3431. 1.19e-78 1 -73.8
#> 8 4 101 2 quadr… 0.973 0.972 0.501 1760. 1.62e-77 2 -72.0
#> # … with 5 more variables: AIC <dbl>, BIC <dbl>, deviance <dbl>,
#> # df.residual <int>, nobs <int>, and abbreviated variable names ¹r.squared,
#> # ²adj.r.squared, ³statistic
## Fit functions do not actually need to be any particular kind of model, they
## can be any arbitrary function. However, not all functions will lead to useful
## output with tidy_fits and glance_fits.
add_five_data = simple_linear_data %>%
fit(add_five = ~ . + 5) ## adds 5 to every value in dataset
add_five_data
#> full tibble
#> --------------------------
#> # A tibble: 4 × 5
#> .sim_id n rep sim add_five
#> <int> <int> <int> <list> <list>
#> 1 1 100 1 <tibble [100 × 2]> <df [100 × 2]>
#> 2 2 101 1 <tibble [101 × 2]> <df [101 × 2]>
#> 3 3 100 2 <tibble [100 × 2]> <df [100 × 2]>
#> 4 4 101 2 <tibble [101 × 2]> <df [101 × 2]>
#>
#> sim[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#> a b
#> <dbl> <dbl>
#> 1 0.868 7.84
#> 2 2.03 11.1
#> 3 3.14 14.0
#> 4 0.985 9.11
#> 5 2.28 12.2
#> 6 1.65 10.5
#> 7 0.819 7.24
#> 8 2.83 13.9
#> 9 1.98 10.6
#> 10 0.940 7.16
#> # … with 90 more rows
#>
#> add_five[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#> a b
#> <dbl> <dbl>
#> 1 5.87 12.8
#> 2 7.03 16.1
#> 3 8.14 19.0
#> 4 5.99 14.1
#> 5 7.28 17.2
#> 6 6.65 15.5
#> 7 5.82 12.2
#> 8 7.83 18.9
#> 9 6.98 15.6
#> 10 5.94 12.2
#> # … with 90 more rows
#>