Takes simulated data from
generate
and applies functions to it, usually
model-fitting functions.
# S3 method for simpr_tibble
fit(
object,
...,
.quiet = TRUE,
.warn_on_error = TRUE,
.stop_on_error = FALSE,
.debug = FALSE,
.progress = FALSE,
.options = furrr_options()
)
# S3 method for simpr_spec
fit(
object,
...,
.quiet = TRUE,
.warn_on_error = TRUE,
.stop_on_error = FALSE,
.debug = FALSE,
.progress = FALSE,
.options = furrr_options()
)
a simpr_tibble
object--the
simulated data from
generate
--or
an simpr_spec
object not yet
generated.
purrr
-style lambda functions
used for computing on the simulated data. See
Details and Examples.
Should simulation errors be broadcast to the user as they occur?
Should there be a warning
when simulation errors occur? See
vignette("Managing simulation errors")
.
Should the simulation stop immediately when simulation errors occur?
Run simulation in debug mode, allowing objects, etc. to be explored for each attempt to fit objects.
A logical, for whether or not to print a progress bar for multiprocess, multisession, and multicore plans .
The future
specific
options to use with the workers when using
futures. This must be the result from a call
to
furrr_options()
.
a simpr_tibble
object with
additional list-columns for the output of the
provided functions (e.g. model outputs). Just
like the output of
generate
,
there is one row per repetition per
combination of metaparameters, and the
columns are the repetition number rep
,
the metaparameter names, the simulated data
sim
, with additional columns for the
function outputs specified in ...
.
If per_sim
was called
previously, fit
returns the object to
default simpr_tibble
mode.
This is the fourth step in the simulation
process: after generating the simulation data,
apply functions such as fitting a statistical
model to the data. The output is often then
passed to tidy_fits
or
glance_fits
to extract relevant
model estimates from the object.
Similar to
specify
, the
model-fitting ...
arguments can be
arbitrary R expressions (purrr
-style
lambda functions, see
as_mapper
) to specify
fitting models to the data. The functions are
computed within each simulation cell, so
dataset names are generally unnecessary: e.g.,
to compute regressions on each cell,
fit(linear_model = ~ lm(c ~ a + b)
. If
your modeling function requires a reference to
the full dataset, use .
, e.g.
fit(linear_model = ~lm(c ~ a + b, data =
.)
.
## Generate data to fit models
simple_linear_data = specify(a = ~ 2 + rnorm(n),
b = ~ 5 + 3*a + rnorm(n, 0, sd = 0.5)) %>%
define(n = 100:101) %>%
generate(2)
## Fit with a single linear term
linear_fit = simple_linear_data %>%
fit(linear = ~lm(b ~ a, data = .))
linear_fit # first fit element also prints
#> full tibble
#> --------------------------
#> # A tibble: 4 × 5
#> .sim_id n rep sim linear
#> <int> <int> <int> <list> <list>
#> 1 1 100 1 <tibble [100 × 2]> <lm>
#> 2 2 101 1 <tibble [101 × 2]> <lm>
#> 3 3 100 2 <tibble [100 × 2]> <lm>
#> 4 4 101 2 <tibble [101 × 2]> <lm>
#>
#> sim[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#> a b
#> <dbl> <dbl>
#> 1 0.868 7.84
#> 2 2.03 11.1
#> 3 3.14 14.0
#> 4 0.985 9.11
#> 5 2.28 12.2
#> 6 1.65 10.5
#> 7 0.819 7.24
#> 8 2.83 13.9
#> 9 1.98 10.6
#> 10 0.940 7.16
#> # … with 90 more rows
#>
#> linear[[1]]
#> --------------------------
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 4.922 3.077
#>
#>
## Each element of $linear is a model object
linear_fit$linear
#> [[1]]
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 4.922 3.077
#>
#>
#> [[2]]
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 5.027 2.958
#>
#>
#> [[3]]
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 4.944 3.024
#>
#>
#> [[4]]
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 5.004 2.986
#>
#>
## We can fit multiple models to the same data
multi_fit = simple_linear_data %>%
fit(linear = ~lm(b ~ a, data = .),
quadratic = ~lm(b ~ a + I(a^2), data = .))
## Two columns, one for each model
multi_fit
#> full tibble
#> --------------------------
#> # A tibble: 4 × 6
#> .sim_id n rep sim linear quadratic
#> <int> <int> <int> <list> <list> <list>
#> 1 1 100 1 <tibble [100 × 2]> <lm> <lm>
#> 2 2 101 1 <tibble [101 × 2]> <lm> <lm>
#> 3 3 100 2 <tibble [100 × 2]> <lm> <lm>
#> 4 4 101 2 <tibble [101 × 2]> <lm> <lm>
#>
#> sim[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#> a b
#> <dbl> <dbl>
#> 1 0.868 7.84
#> 2 2.03 11.1
#> 3 3.14 14.0
#> 4 0.985 9.11
#> 5 2.28 12.2
#> 6 1.65 10.5
#> 7 0.819 7.24
#> 8 2.83 13.9
#> 9 1.98 10.6
#> 10 0.940 7.16
#> # … with 90 more rows
#>
#> linear[[1]]
#> --------------------------
#>
#> Call:
#> lm(formula = b ~ a, data = .)
#>
#> Coefficients:
#> (Intercept) a
#> 4.922 3.077
#>
#>
#> quadratic[[1]]
#> --------------------------
#>
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#>
#> Coefficients:
#> (Intercept) a I(a^2)
#> 4.77043 3.27509 -0.05044
#>
#>
## Again, each element is a model object
multi_fit$quadratic
#> [[1]]
#>
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#>
#> Coefficients:
#> (Intercept) a I(a^2)
#> 4.77043 3.27509 -0.05044
#>
#>
#> [[2]]
#>
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#>
#> Coefficients:
#> (Intercept) a I(a^2)
#> 4.94605 3.05883 -0.02442
#>
#>
#> [[3]]
#>
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#>
#> Coefficients:
#> (Intercept) a I(a^2)
#> 5.11366 2.79535 0.05548
#>
#>
#> [[4]]
#>
#> Call:
#> lm(formula = b ~ a + I(a^2), data = .)
#>
#> Coefficients:
#> (Intercept) a I(a^2)
#> 4.84816 3.25333 -0.07714
#>
#>
## Can view terms more nicely with tidy_fits
multi_fit %>%
tidy_fits
#> # A tibble: 20 × 9
#> .sim_id n rep Source term estimate std.error stati…¹ p.value
#> <int> <int> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 100 1 linear (Intercept) 4.92 0.0901 54.6 3.33e-75
#> 2 1 100 1 linear a 3.08 0.0417 73.8 1.07e-87
#> 3 1 100 1 quadratic (Intercept) 4.77 0.131 36.5 1.71e-58
#> 4 1 100 1 quadratic a 3.28 0.131 24.9 6.21e-44
#> 5 1 100 1 quadratic I(a^2) -0.0504 0.0317 -1.59 1.15e- 1
#> 6 2 101 1 linear (Intercept) 5.03 0.105 47.8 3.21e-70
#> 7 2 101 1 linear a 2.96 0.0484 61.1 2.12e-80
#> 8 2 101 1 quadratic (Intercept) 4.95 0.165 30.0 3.14e-51
#> 9 2 101 1 quadratic a 3.06 0.165 18.5 9.70e-34
#> 10 2 101 1 quadratic I(a^2) -0.0244 0.0381 -0.641 5.23e- 1
#> 11 3 100 2 linear (Intercept) 4.94 0.111 44.4 1.07e-66
#> 12 3 100 2 linear a 3.02 0.0470 64.3 5.82e-82
#> 13 3 100 2 quadratic (Intercept) 5.11 0.154 33.1 1.11e-54
#> 14 3 100 2 quadratic a 2.80 0.153 18.3 3.21e-33
#> 15 3 100 2 quadratic I(a^2) 0.0555 0.0353 1.57 1.19e- 1
#> 16 4 101 2 linear (Intercept) 5.00 0.103 48.4 1.05e-70
#> 17 4 101 2 linear a 2.99 0.0510 58.6 1.19e-78
#> 18 4 101 2 quadratic (Intercept) 4.85 0.132 36.7 4.88e-59
#> 19 4 101 2 quadratic a 3.25 0.153 21.3 1.36e-38
#> 20 4 101 2 quadratic I(a^2) -0.0771 0.0416 -1.86 6.65e- 2
#> # … with abbreviated variable name ¹statistic
## Can view model summaries with glance_fits
multi_fit %>%
glance_fits
#> # A tibble: 8 × 16
#> .sim_id n rep Source r.squ…¹ adj.r…² sigma stati…³ p.value df logLik
#> <int> <int> <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 100 1 linear 0.982 0.982 0.388 5446. 1.07e-87 1 -46.1
#> 2 1 100 1 quadr… 0.983 0.982 0.385 2767. 2.86e-86 2 -44.8
#> 3 2 101 1 linear 0.974 0.974 0.477 3731. 2.12e-80 1 -67.5
#> 4 2 101 1 quadr… 0.974 0.974 0.478 1854. 1.33e-78 2 -67.3
#> 5 3 100 2 linear 0.977 0.977 0.515 4137. 5.82e-82 1 -74.5
#> 6 3 100 2 quadr… 0.977 0.977 0.511 2101. 1.39e-80 2 -73.3
#> 7 4 101 2 linear 0.972 0.972 0.507 3431. 1.19e-78 1 -73.8
#> 8 4 101 2 quadr… 0.973 0.972 0.501 1760. 1.62e-77 2 -72.0
#> # … with 5 more variables: AIC <dbl>, BIC <dbl>, deviance <dbl>,
#> # df.residual <int>, nobs <int>, and abbreviated variable names ¹r.squared,
#> # ²adj.r.squared, ³statistic
## Fit functions do not actually need to be any particular kind of model, they
## can be any arbitrary function. However, not all functions will lead to useful
## output with tidy_fits and glance_fits.
add_five_data = simple_linear_data %>%
fit(add_five = ~ . + 5) ## adds 5 to every value in dataset
add_five_data
#> full tibble
#> --------------------------
#> # A tibble: 4 × 5
#> .sim_id n rep sim add_five
#> <int> <int> <int> <list> <list>
#> 1 1 100 1 <tibble [100 × 2]> <df [100 × 2]>
#> 2 2 101 1 <tibble [101 × 2]> <df [101 × 2]>
#> 3 3 100 2 <tibble [100 × 2]> <df [100 × 2]>
#> 4 4 101 2 <tibble [101 × 2]> <df [101 × 2]>
#>
#> sim[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#> a b
#> <dbl> <dbl>
#> 1 0.868 7.84
#> 2 2.03 11.1
#> 3 3.14 14.0
#> 4 0.985 9.11
#> 5 2.28 12.2
#> 6 1.65 10.5
#> 7 0.819 7.24
#> 8 2.83 13.9
#> 9 1.98 10.6
#> 10 0.940 7.16
#> # … with 90 more rows
#>
#> add_five[[1]]
#> --------------------------
#> # A tibble: 100 × 2
#> a b
#> <dbl> <dbl>
#> 1 5.87 12.8
#> 2 7.03 16.1
#> 3 8.14 19.0
#> 4 5.99 14.1
#> 5 7.28 17.2
#> 6 6.65 15.5
#> 7 5.82 12.2
#> 8 7.83 18.9
#> 9 6.98 15.6
#> 10 5.94 12.2
#> # … with 90 more rows
#>