Skip to content

Commit e0d7268

Browse files
committed
surv-cens >> tsa ex5 started
1 parent 3e60312 commit e0d7268

10 files changed

+1764
-1455
lines changed
Loading
Loading
Loading

_book/qmd/surveys-census-data.html

Lines changed: 1567 additions & 1405 deletions
Large diffs are not rendered by default.

_book/search.json

Lines changed: 15 additions & 4 deletions
Large diffs are not rendered by default.
Loading
Loading
Loading

qmd/surveys-census-data.qmd

Lines changed: 120 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ lightbox:
206206
```
207207

208208
- You still get decent granularity for denser populated regions, so patterns will be visable while also getting complete coverage of a study area
209-
- **5-Year**: Useful is smaller geographies and lower MOEs are necessary.
209+
- **5-Year**: Useful for smaller geographies and when lower MOEs are necessary.
210210
- Detailed social, economic, housing, and demographic characteristics. Variables covering e.g. income, education, language, housing characteristics
211211
- Smallest geography is the Block Group
212212
- [census.gov/acs](http://census.gov/acs)
@@ -2131,7 +2131,7 @@ lightbox:
21312131
- Census Bureau recommends using *non-overlapping* ACS 5-Year surveys
21322132
- e.g. 2008-2012, 2013-2017, etc.
21332133
2134-
- [Example]{.ribbon-highlight}: Join 2010 and 2020 and Calculate Percent Change
2134+
- [Example 1]{.ribbon-highlight}: Join 2010 and 2020 and Calculate Percent Change
21352135
21362136
``` r
21372137
county_pop_10 <-
@@ -2169,14 +2169,14 @@ lightbox:
21692169
)
21702170
```
21712171
2172-
- [Example]{.ribbon-highlight}: Age distribution over time in Michigan\
2172+
- [Example 2]{.ribbon-highlight}: Age distribution over time in Michigan\
21732173
![](_resources/Surveys,_Census_Data.resources/michigan-age-chart-1.png){.lightbox width="432"}
21742174
21752175
- Code available in the github [repo](https://github.com/walkerke/umich-workshop-2024/blob/main/census-2020/bonus-chart.R) or R/Workshops/tidycensus-umich-workshop-2024-main/census-2020/bonus-chart.R
21762176
- Distribution shape remains pretty much the same, but decreasing for most age cohorts, i.e. people are leaving the state across most age groups.
21772177
- e.g. The large hump representing the group of people in there mid-40s in 2000 steadily decreases over time.
21782178
2179-
- [Example]{.ribbon-highlight}: Compare 2010 to 2020 Population Densities for Dallas-Ft. Worth\
2179+
- [Example 3]{.ribbon-highlight}: Compare 2010 to 2020 Population Densities for Dallas-Ft. Worth\
21802180
![](_resources/Surveys,_Census_Data.resources/dicen-ts-3d-popden-1.png){.lightbox width="532"}
21812181
21822182
<Details>
@@ -2288,28 +2288,33 @@ lightbox:
22882288
22892289
</Details>
22902290
2291-
- [Example]{.ribbon-highlight}: Compare 2022 5-Year ACS to the 2017 5-Year ACS
2291+
- [Example 4]{.ribbon-highlight}: Compare 2022 5-Year ACS to the 2017 5-Year ACS (*County Level)*
22922292
2293-
- County Level
2293+
``` r
22942294
2295-
``` r
2295+
utah_wfh_compare <- get_acs(
2296+
geography = "county",
2297+
variables = c(
2298+
work_from_home17 = "CP03_2017_024",
2299+
work_from_home22 = "CP03_2022_024"
2300+
),
2301+
state = "UT",
2302+
year = 2022
2303+
)
2304+
```
22962305
2297-
utah_wfh_compare <- get_acs(
2298-
geography = "county",
2299-
variables = c(
2300-
work_from_home17 = "CP03_2017_024",
2301-
work_from_home22 = "CP03_2022_024"
2302-
),
2303-
state = "UT",
2304-
year = 2022
2305-
)
2306-
```
2306+
- The Comparison Profile dataset has aggregated statistics to compare between ACS 5-Year surveys (See [tidycensus \>\> Variables](surveys-census-data.qmd#sec-surv-cens-tidyc-vars){style="color: green"} \>\> Search Variables)
2307+
- This dataset only goes down to the county level
2308+
2309+
- [Example 5]{.ribbon-highlight}: Compare 2022 5-Year ACS to the 2017 5-Year ACS (*Tract Level*)
23072310
2308-
- The Comparison Profile dataset has aggregated statistics to compare between ACS 5-Year surveys (See [tidycensus \>\> Variables](surveys-census-data.qmd#sec-surv-cens-tidyc-vars){style="color: green"} \>\> Search Variables)
2311+
- There are two methods to calculate change at the census tract level
23092312
2310-
- This dataset only goes down to the county level
2313+
- Interpolate data from 2022 boundaries to 2017 boundaries. Then calculate change.
2314+
- Interpolate data from 2017 boundaries to 2022 boundaries. Then calculate change
23112315
2312-
- Census Tract Level
2316+
- Data\
2317+
The data is the number of remote workers by census tract in Salt Lake County (i.e. Salt Lake City) from the 2013-2017 period and the 2018 to 2022 period
23132318
23142319
``` r
23152320
library(sf)
@@ -2332,3 +2337,98 @@ lightbox:
23322337
geometry = TRUE) |>
23332338
st_transform(6620)
23342339
```
2340+
2341+
- The process is quicker on a projected coordinated system
2342+
- [EPSG:6620](https://epsg.io/6620) is NAD83(2011) / Utah North
2343+
2344+
- 2022 to 2017 Boundaries\
2345+
![2022 Data to 2017 Boundaries Using Area-Weighted Interpolation](_resources/Surveys,_Census_Data.resources/tsa-ex5-1722-tract-back-1.png){.lightbox group="tsa-ex5-1" width="582"}
2346+
2347+
``` r
2348+
library(sf)
2349+
library(mapview)
2350+
library(leafsync)
2351+
2352+
wfh_22_to_17 <- wfh_22 |>
2353+
select(estimate) |>
2354+
st_interpolate_aw(to = wfh_17, extensive = TRUE)
2355+
2356+
m22a <- mapview(wfh_22, zcol = "estimate", layer.name = "2020 geographies")
2357+
m17a <- mapview(wfh_22_to_17, zcol = "estimate", layer.name = "2015 geographies")
2358+
2359+
sync(m22a, m17a)
2360+
```
2361+
2362+
- **Area-Weighted Interpolation** allocates information from one geography to another geography by weights based on the area of overlap ([Walker, Ch. 7.3.1](https://walker-data.com/census-r/spatial-analysis-with-us-census-data.html?q=small#area-weighted-areal-interpolation))
2363+
- Typically more accurate when going *backward*, as many new tracts will “roll up” within parent tracts from a previous Census (though not always)(aka rolls backwards)
2364+
- The book has an example that rolls *forwards* from 2015 to 2020.
2365+
- Beware: This may be very inaccurate as assumes that population is evenly distributed over area. It can incorrectly allocate large values to low-density / empty areas.
2366+
- Better to use Population-Weighted Areal Interpolation
2367+
- The 2022 data is weighted and "rolled" into 2017 census tract boundaries.
2368+
- [extensive = TRUE]{.arg-text} says weighted sums will be computed. Alternatively, if [extensive = FALSE]{.arg-text}, the function returns weighted means.
2369+
2370+
- 2017 to 2022 Boundaries\
2371+
![2017 Data to 2022 Boundaries Using Population-Weighted Interpolation](_resources/Surveys,_Census_Data.resources/tsa-ex5-1722-tract-for-1.png){.lightbox group="tsa-ex5-1" width="582"}
2372+
2373+
``` r
2374+
library(tigris)
2375+
options(tigris_use_cache = TRUE)
2376+
2377+
salt_lake_blocks <-
2378+
tigris::blocks(
2379+
"UT",
2380+
"Salt Lake",
2381+
year = 2020
2382+
)
2383+
2384+
wfh_17_to_22 <-
2385+
tidycensus::interpolate_pw(
2386+
from = wfh_17,
2387+
to = wfh_22,
2388+
to_id = "GEOID",
2389+
weights = salt_lake_blocks,
2390+
weight_column = "POP20",
2391+
crs = 6620,
2392+
extensive = TRUE
2393+
)
2394+
2395+
# check result
2396+
m17b <-
2397+
mapview(wfh_17,
2398+
zcol = "estimate",
2399+
layer.name = "2017 geographies")
2400+
m22b <-
2401+
mapview(wfh_17_to_22,
2402+
zcol = "estimate",
2403+
layer.name = "2022 geographies")
2404+
2405+
sync(m17b, m22b)
2406+
```
2407+
2408+
- **Population-Weighted Interpolation** uses an underlying dataset that explains the population distribution as weights.
2409+
- Recommended to use census block level data to create the weights. ACS only has geographies down to the Block Group level, so the Dicennial Census values are used.
2410+
- `blocks` gets the 2020 Dicennial population values at the census block level to calculate the weights
2411+
- `interpolate_pw` creates weights based on the 2020 census block populations. Then, it splits the 2017 weighted data into 2022 geographies.
2412+
2413+
- Calculate Change\
2414+
![Percent Change From 2017 to 2022 in Remote Workers](_resources/Surveys,_Census_Data.resources/tsa-ex5-1722-tract-change-1.png){.lightbox group="tsa-ex5-1" width="582"}
2415+
2416+
``` r
2417+
wfh_shift <- wfh_17_to_22 %>%
2418+
select(GEOID, estimate17 = estimate) %>%
2419+
left_join(
2420+
select(st_drop_geometry(wfh_22),
2421+
GEOID,
2422+
estimate22 = estimate),
2423+
by = "GEOID"
2424+
) |>
2425+
mutate(
2426+
shift = estimate22 - estimate17,
2427+
pct_shift = 100 * (shift / estimate17)
2428+
)
2429+
2430+
mapview(wfh_shift, zcol = "shift")
2431+
```
2432+
2433+
- Uses the 2017 data that's been interpolated to 2022 census tract boundaries.
2434+
- The 2022 data is joined to the new 2017 data and percent-change can now be calculated since both have 2022 geometries.

scrapsheet.qmd

Lines changed: 62 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -640,7 +640,9 @@ title: "Scrapsheet"
640640

641641
- get more details
642642

643-
- Aerial Interpolation (see [book ](https://walker-data.com/census-r/spatial-analysis-with-us-census-data.html?q=small#small-area-time-series-analysis)for more details)
643+
- Areal Interpolation (see [book](https://walker-data.com/census-r/spatial-analysis-with-us-census-data.html?q=small#small-area-time-series-analysis)for more details)
644+
645+
- Interpolating data between sets of boundaries involves the use of weights to re-distribute data from one geography to another
644646

645647
- Check for incongruent boundaries
646648

@@ -666,55 +668,79 @@ title: "Scrapsheet"
666668
st_transform(6620)
667669
```
668670

669-
- Process is quicker on a projected coordinated system
671+
- The process is quicker on a projected coordinated system
670672

671673
- [EPSG:6620](https://epsg.io/6620) is NAD83(2011) / Utah North
672674

673-
- get details on how he found incongruent boundaries
674-
675-
- Use st_interpolate_aw
675+
- Area-Weighted Areal Interpolation
676676

677677
``` r
678678
library(sf)
679+
library(mapview)
680+
library(leafsync)
679681
680682
wfh_22_to_17 <- wfh_22 |>
681683
select(estimate) |>
682684
st_interpolate_aw(to = wfh_17, extensive = TRUE)
683-
```
684685
685-
- rolls backwards
686+
m22a <- mapview(wfh_22, zcol = "estimate", layer.name = "2020 geographies")
687+
m17a <- mapview(wfh_22_to_17, zcol = "estimate", layer.name = "2015 geographies")
686688
687-
- Uses area weighting (get details
689+
sync(m22a, m17a)
690+
```
688691

689-
- Population weighted roll forward method
692+
- **Area-Weighted Interpolation** allocates information from one geography to another geography by weights based on the area of overlap ([Walker, Ch. 7.3.1](https://walker-data.com/census-r/spatial-analysis-with-us-census-data.html?q=small#area-weighted-areal-interpolation))
693+
- Typically more accurate when going *backward*, as many new tracts will “roll up” within parent tracts from a previous Census (though not always)(aka rolls backwards)
694+
- The book has an example that rolls *forwards* from 2015 to 2020.
695+
- Beware: This may be very inaccurate as assumes that population is evenly distributed over area. It can incorrectly allocate large values to low-density / empty areas.
696+
- Better to use Population-Weighted Areal Interpolation
697+
- [extensive = TRUE]{.arg-text} says weighted sums will be computed. Alternatively, if [extensive = FALSE]{.arg-text}, the function returns weighted means.
698+
699+
- Population-Weighted Areal Interpolation
690700

691701
``` r
692702
library(tigris)
693703
options(tigris_use_cache = TRUE)
694704
695-
salt_lake_blocks <- blocks(
696-
"UT",
697-
"Salt Lake",
698-
year = 2020
699-
)
700-
701-
wfh_17_to_22 <- interpolate_pw(
702-
from = wfh_17,
703-
to = wfh_22,
704-
to_id = "GEOID",
705-
weights = salt_lake_blocks,
706-
weight_column = "POP20",
707-
crs = 6620,
708-
extensive = TRUE
709-
)
705+
salt_lake_blocks <-
706+
tigris::blocks(
707+
"UT",
708+
"Salt Lake",
709+
year = 2020
710+
)
711+
712+
wfh_17_to_22 <-
713+
tidycensus::interpolate_pw(
714+
from = wfh_17,
715+
to = wfh_22,
716+
to_id = "GEOID",
717+
weights = salt_lake_blocks,
718+
weight_column = "POP20",
719+
crs = 6620,
720+
extensive = TRUE
721+
)
722+
723+
# check result
724+
# m17b <-
725+
# mapview(wfh_17,
726+
# zcol = "estimate",
727+
# layer.name = "2017 geographies")
728+
# m22b <-
729+
# mapview(wfh_17_to_22,
730+
# zcol = "estimate",
731+
# layer.name = "2022 geographies")
732+
#
733+
# sync(m17b, m22b)
710734
711735
# calculate change over time
712736
wfh_shift <- wfh_17_to_22 %>%
713737
select(GEOID, estimate17 = estimate) %>%
714738
left_join(
715739
select(st_drop_geometry(wfh_22),
716-
GEOID, estimate22 = estimate), by = "GEOID"
717-
) %>%
740+
GEOID,
741+
estimate22 = estimate),
742+
by = "GEOID"
743+
) |>
718744
mutate(
719745
shift = estimate22 - estimate17,
720746
pct_shift = 100 * (shift / estimate17)
@@ -723,6 +749,16 @@ title: "Scrapsheet"
723749
mapview(wfh_shift, zcol = "shift")
724750
```
725751

752+
- **Population-Weighted Interpolation** uses an underlying dataset that explains the population distribution as weights.
753+
754+
- Recommended to use census block level data to create the weights. ACS only has geographies down to the Block Group level, so the Dicennial Census values are used.
755+
756+
- `blocks` gets the 2020 Dicennial population values at the census block level to calculate the weights
757+
758+
- `interpolate_pw` creates weights based on the 2020 census block populations. Then, it splits the 2017 weighted data into 2022 geographies.
759+
760+
- The 2022 data is joined to the new 2017 data and percent-change can now be calculated since both have 2022 geometries.
761+
726762
## lab 91
727763

728764
- clvtools for prob type, h2o::automl for ML

0 commit comments

Comments
 (0)