You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- The percentile bootstrap works well when making inferences about trimmed means, quantiles, or correlation coefficient.
192
+
- "However, percentile-bootstrap confidence intervals tend to be inaccurate in some situations because the bootstrap sampling distribution is skewed (asymmetric) and biased (consistently shifted away from the population value in one direction)"
193
+
- "To address these problems, two major alternatives to the percentile bootstrap have been suggested: the bootstrap-t and the bias-corrected and accelerated (BCa) bootstrap"
194
+
- "bootstrap-t can lead to more accurate confidence intervals for the mean and some trimmed means than the percentile bootstrap does, a percentile bootstrap is recommended for inferences about the 20% trimmed mean"
195
+
- The BCa approach can be unsatisfactory for relatively small sample sizes
- [{]{style="color: #990000"}[workboots](https://markjrieke.github.io/workboots/){style="color: #990000"}[}]{style="color: #990000"} - Bootstrap prediction intervals for arbitrary model types from a tidymodel workflow.
Copy file name to clipboardExpand all lines: qmd/mixed-effects-general.qmd
+12-5Lines changed: 12 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -43,6 +43,8 @@ fig-cap-location: top
43
43
- When you have **variable cluster sizes**, inverse cluster size weights can be specified to ensure that all clusters contribute equally regardless of cluster size which mitigates a loss of power.
44
44
-[{]{style="color: #990000"}[gpboost](https://github.com/fabsig/GPBoost){style="color: #990000"}[}]{style="color: #990000"} - Models fixed effects with a **boosted tree** and combines random effects with **Gaussian Processes** somehow. Multiple likelihoods available
45
45
-[{]{style="color: #990000"}[skewlmm](https://github.com/fernandalschumacher/skewlmm){style="color: #990000"}[}]{style="color: #990000"} ([Paper](https://arxiv.org/abs/2002.01040)) - Fits **skew robust** linear mixed models, using scale mixture of skew-normal linear mixed models with possible within-subject dependence structure, using an EM-type algorithm.
46
+
-[{]{style="color: #990000"}[jlme](https://github.com/yjunechoe/jlme/){style="color: #990000"}[}]{style="color: #990000"} - Fits mixed models in Julia from R using `lmer` and `glmer` syntax
47
+
- Supports all kinds of diagnostics and CIs
46
48
:::
47
49
48
50
- Mixed Effects Model = Random Effects model = Multilevel model = Hierarchical model
@@ -64,6 +66,11 @@ fig-cap-location: top
64
66
-[Numerical validation as a critical aspect in bringing R to the Clinical Research](https://www.researchgate.net/publication/345778861_Numerical_validation_as_a_critical_aspect_in_bringing_R_to_the_Clinical_Research)
65
67
- Slides that show various discrepancies between R output and programs like SAS and SPSS and solutions.
66
68
- Procedure for adopting packages into core analysis procedures (i.e. popularity, documentation, author activity, etc.)
69
+
-[Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data](https://www.cambridge.org/core/journals/political-science-research-and-methods/article/explaining-fixed-effects-random-effects-modeling-of-timeseries-crosssectional-and-panel-data/0334A27557D15848549120FE8ECD8D63)
70
+
- Describes Heterogeneity Bias in repeated measure / longitudinal models. This estimate bias occurs in because the vary slopes variable is also included in the fixed effects. In some cases (?), it creates correlation between the fixed effect and error terms.
71
+
- Solution: Demeaning, which is described in this [{parameters}]{style="color: #990000"} [vignette](https://easystats.github.io/parameters/articles/demean.html), splits the variable into between (group mean) and within (deviation from group mean) types.
72
+
- Need to read the paper. The vignette doesn't describe what fitting a mixed effects model without demeaning looks like. I'd like to compare both models.
73
+
-[{]{style="color: #990000"}[performance::check_heterogeneity_bias](https://easystats.github.io/performance/reference/check_heterogeneity_bias.html){style="color: #990000"}[}]{style="color: #990000"} - See the vignette for a better example of its usage.
67
74
- Advantages of a mixed model ($y \sim x + (x \;|\; g)$) vs a linear model with an interaction ($y \sim x \ast g$)
68
75
- From T.J. Mahr [tweet](https://twitter.com/tjmahr/status/1504124329319096323)
69
76
- Conceptual: Assumes participant means are drawn from the same latent population
Copy file name to clipboardExpand all lines: scrapsheet.qmd
-235Lines changed: 0 additions & 235 deletions
Original file line number
Diff line number
Diff line change
@@ -524,241 +524,6 @@ title: "Scrapsheet"
524
524
- our approach averages not only different coherent forecasts, but also across hierarchies with completely different middle level series. This is possible since only coherent bottom and top level forecasts are averaged and evaluated.
525
525
- Section 2 describes the trace minimization reconciliation method (min T from {forecast})
526
526
527
-
## tidycensus 3
528
-
529
-
- rdeck
530
-
531
-
- visualize large amounts of data
532
-
533
-
- migration flows
534
-
535
-
- tidycensus::get_flows
536
-
537
-
- only for \> 2020 5-yr ACS
538
-
539
-
- map type also good for mapping commuting patterns
540
-
541
-
- Automated Mapping
542
-
543
-
- Memory intensive
544
-
545
-
- https://walker-data.com/posts/iterative-mapping/ for more advanced Metro exampleThank
546
-
547
-
- Shows how to export too
548
-
549
-
- geographic patterns in remote work for 100 largest counties by population in US. (2:17)
550
-
551
-
- important for office space real estate
552
-
553
-
- Maps per County
554
-
555
-
- Generate list of 100 largest counties
556
-
557
-
``` r
558
-
library(tidycensus)
559
-
library(tidyverse)
560
-
library(mapview)
561
-
562
-
top100counties <- get_acs(
563
-
geography = "county",
564
-
variables = "B01003_001",
565
-
year = 2022,
566
-
survey = "acs1"
567
-
) %>%
568
-
slice_max(estimate, n = 100)
569
-
```
570
-
571
-
- MOE is NA which means this is true value
572
-
573
-
- Plus ACS more recent
574
-
575
-
- pull remote work data at county level for those counties
576
-
577
-
- Need to get tract data for remote work data
578
-
579
-
``` r
580
-
581
-
wfh_tract_list <- top100counties %>%
582
-
split(~NAME) %>% # splits into a list with each element per county
county_fips <- str_sub(county$GEOID, 3, 5) # extract next 3 chars (county)
586
-
587
-
get_acs(
588
-
geography = "tract",
589
-
variables = "DP03_0024P",
590
-
state = state_fips,
591
-
county = county_fips,
592
-
year = 2022,
593
-
geometry = TRUE
594
-
)
595
-
})
596
-
```
597
-
598
-
- need census key since hitting api 100s of times
599
-
600
-
- Make 100 Maps
601
-
602
-
``` r
603
-
wfh_maps <-
604
-
map(wfh_tract_list, function(county) {
605
-
mapview(
606
-
county,
607
-
zcol = "estimate",
608
-
layer.name = "% working from home"
609
-
)
610
-
})
611
-
```
612
-
613
-
- Small Area Time Series Analysis (2:40)
614
-
615
-
- Where has remote work increased the most in Salt Lake City, Utah
616
-
617
-
- 5yr acs represent overlapping samples
618
-
619
-
- For 2018-2022
620
-
621
-
- Compare 2008-2012 to
622
-
623
-
- Comparison Profile only at county level
624
-
625
-
``` r
626
-
utah_wfh_compare <- get_acs(
627
-
geography = "county",
628
-
variables = c(
629
-
work_from_home17 = "CP03_2017_024",
630
-
work_from_home22 = "CP03_2022_024"
631
-
),
632
-
state = "UT",
633
-
year = 2022
634
-
)
635
-
```
636
-
637
-
- Census Tract (neighborhood-level)
638
-
639
-
- Issue: geographies change
640
-
641
-
- get more details
642
-
643
-
- Areal Interpolation (see [book](https://walker-data.com/census-r/spatial-analysis-with-us-census-data.html?q=small#small-area-time-series-analysis)for more details)
644
-
645
-
- Interpolating data between sets of boundaries involves the use of weights to re-distribute data from one geography to another
646
-
647
-
- Check for incongruent boundaries
648
-
649
-
``` r
650
-
library(sf)
651
-
652
-
wfh_17 <-
653
-
get_acs(geography = "tract",
654
-
variables = "B08006_017",
655
-
year = 2017,
656
-
state = "UT",
657
-
county = "Salt Lake",
658
-
geometry = TRUE) |>
659
-
st_transform(6620)
660
-
661
-
wfh_22 <-
662
-
get_acs(geography = "tract",
663
-
variables = "B08006_017",
664
-
year = 2022,
665
-
state = "UT",
666
-
county = "Salt Lake",
667
-
geometry = TRUE) |>
668
-
st_transform(6620)
669
-
```
670
-
671
-
- The process is quicker on a projected coordinated system
672
-
673
-
- [EPSG:6620](https://epsg.io/6620) is NAD83(2011) / Utah North
- **Area-Weighted Interpolation** allocates information from one geography to another geography by weights based on the area of overlap ([Walker, Ch. 7.3.1](https://walker-data.com/census-r/spatial-analysis-with-us-census-data.html?q=small#area-weighted-areal-interpolation))
693
-
- Typically more accurate when going *backward*, as many new tracts will “roll up” within parent tracts from a previous Census (though not always)(aka rolls backwards)
694
-
- The book has an example that rolls *forwards* from 2015 to 2020.
695
-
- Beware: This may be very inaccurate as assumes that population is evenly distributed over area. It can incorrectly allocate large values to low-density / empty areas.
696
-
- Better to use Population-Weighted Areal Interpolation
697
-
- [extensive = TRUE]{.arg-text} says weighted sums will be computed. Alternatively, if [extensive = FALSE]{.arg-text}, the function returns weighted means.
698
-
699
-
- Population-Weighted Areal Interpolation
700
-
701
-
``` r
702
-
library(tigris)
703
-
options(tigris_use_cache = TRUE)
704
-
705
-
salt_lake_blocks <-
706
-
tigris::blocks(
707
-
"UT",
708
-
"Salt Lake",
709
-
year = 2020
710
-
)
711
-
712
-
wfh_17_to_22 <-
713
-
tidycensus::interpolate_pw(
714
-
from = wfh_17,
715
-
to = wfh_22,
716
-
to_id = "GEOID",
717
-
weights = salt_lake_blocks,
718
-
weight_column = "POP20",
719
-
crs = 6620,
720
-
extensive = TRUE
721
-
)
722
-
723
-
# check result
724
-
# m17b <-
725
-
# mapview(wfh_17,
726
-
# zcol = "estimate",
727
-
# layer.name = "2017 geographies")
728
-
# m22b <-
729
-
# mapview(wfh_17_to_22,
730
-
# zcol = "estimate",
731
-
# layer.name = "2022 geographies")
732
-
#
733
-
# sync(m17b, m22b)
734
-
735
-
# calculate change over time
736
-
wfh_shift <- wfh_17_to_22 %>%
737
-
select(GEOID, estimate17 = estimate) %>%
738
-
left_join(
739
-
select(st_drop_geometry(wfh_22),
740
-
GEOID,
741
-
estimate22 = estimate),
742
-
by = "GEOID"
743
-
) |>
744
-
mutate(
745
-
shift = estimate22 - estimate17,
746
-
pct_shift = 100 * (shift / estimate17)
747
-
)
748
-
749
-
mapview(wfh_shift, zcol = "shift")
750
-
```
751
-
752
-
- **Population-Weighted Interpolation** uses an underlying dataset that explains the population distribution as weights.
753
-
754
-
- Recommended to use census block level data to create the weights. ACS only has geographies down to the Block Group level, so the Dicennial Census values are used.
755
-
756
-
- `blocks`gets the 2020 Dicennial population values at the census block level to calculate the weights
757
-
758
-
- `interpolate_pw`creates weights based on the 2020 census block populations. Then, it splits the 2017 weighted data into 2022 geographies.
759
-
760
-
- The 2022 data is joined to the new 2017 data and percent-change can now be calculated since both have 2022 geometries.
0 commit comments