- If there's some part of the code that's hopelessly broken, please email me.
- Follow the instructions within the script itself to the letter (please)
This is only documentation for the file get_data.R
since it's the only file that could have problems when running the code
- Make sure that you've pulled the code from github onto your computer and you which directory the folder is in
- Know which folder you have your data/
Excel
files in
Only for the first time you run this program, you want to uncomment the lines with the package installation information, run that line, and then re-comment them once you've installed the packages. The code looks like this:
install.packages(
"tidyverse",
"tidycensus",
"readxl",
"openxlsx"
)
In the toolbar, go to Session -> Set working directory -> To Source File Location.
Copy the output and paste it where you see the setwd
line below rm(list=ls())
. The code looks like this:
rm(list=ls())
setwd("your_directory")
Filepath misspecificaiton is what prevented the script from running on windows to begin with.
The key is this stupid feature of R
:
For example, the output that you get from windows is:
setwd("C:\Users\JCHUSL01\Downloads")
But what you should put in the R code is:
setwd("C:/Users/JCHUSL01/Downloads")
This is extremely dumb, but it's how R
works. Once this is set, everything else
should work just fine.
When customizing the inputs
section, you can look at
the documentation for the tidycensus
package
for more information.
- When inputting state (
state
), geography (geo_level
) or county (cnty
), the names must appear exactly as they do in the database- If the code stops working because
R
says that it "doesn't recognize the geography", try changing the case of each word in any of the inputes I've mentioned. You can check the census website to be sure of the name.
- If the code stops working because
The name
variable is used to name your file, but is also used to filter results based on the
geography you choose. Directions for how to use the name
variable are:
Geography | Instructions | |||
---|---|---|---|---|
State | Use st |
|||
County | Use cnty_name |
|||
City or lower | For anything lower than city level (including tract, block group, and block) input the name of the city/town/county subdivision that you're focusing on | |||
When you pull data at the (usually) tract level or lower, tidycensus
will pull
data for all cities/towns. name
works by filtering out only the city that you're looking for.
The code for this part in loop
is below:
input_df <- inner_join(vars,df,by = "variable")
If you find that the data has a strange or incorrect output, experiment with the value for
name
based on the values you get in the original data pull from the get_acs
function.
That, by the way, is the first thing to appear in the loop
:
df <- get_acs(geography=geo_level,
table = tables[[t,1]],
state = st,
county = cnty_val,
cache_table = TRUE,
year = yr,
survey = survey_type)
Most of the (normal) troubleshooting will have to do with filenames, file paths, and writing the files. In the event that Windows cannot find a file or create a file,
- Make sure that file paths have forward slashes
/
in theR
code - When running the code to read the excel file with the sheet of tables
within it, make sure:
- The end of the filepath has no slashes at the end of it
- The name of the excel file, which is the
read_file
argument, has no slashes at the beginning of it
When you run the code, the script will automatically put the data_path
and read_file
together with this code:
labels <- read_excel(paste(data_path,read_file,sep="/"),sheet=var_sheet)
That sep=
argument specifies that character between the data_path
and the read_file
, and this separator should always be a forward slash /
for windows.