This repo is a minimalistic template for empirical economics projects. It usefully serves as a bare bone directory structure in empirical research projects in economics. One can easily extent the structure to allow for more complicated cases.
├── code
├── data [not tracked]
│ ├── final
│ ├── inter
│ └── raw
│ ├── raw_data-1
│ │ └── doc
│ └── raw_data-2
│ └── doc
├── docs [not tracked]
├── out
├── readings [not tracked]
├── report
└── temp [not tracked]
The template has been constructed with a specific workflow in mind, and has the following features/recommendations:
-
All code go into
codeand modified/analysis-ready data goes intodata/finaldirectory. -
All (if any) intermediate data should be in
data/inter -
All raw (original) data go into
data/rawdirectory. This folder is not tracked bygit. After cloning the repo, the user should run00-data-download.Ror00-data-download.pyfile to download raw data from a Google Drive folder. -
All output (tables, figures etc.) must go into
outdirectory. -
The following is some advanced workflow, and no need to adapt for beginners:
- Ideally each
data/raw/data_xdirectory should have amakefilethat runs the code in that directory and subdirectories. Thismakefileworks as a master code file. Another option would be to use amastercode file. When there are code files running different programming languages,makefileoption is a better way to go. - A main
makefileshould be in the root directory. Thismakefileworks as a master code file. Another option would be to use amastercode file. When there are scripts in different programming languages,makefileoption is a better way to go.
- Ideally each
-
The project directory should NEVER be in a cloud sync directory such as Dropbox, Box or Google Drive.
-
If you need to use certain libraries or configuration files to be included in the project folder, then create a
.libor.configfolder in the project root. -
If you add new directories to the root project folder, make sure that you are not sharing anything that you would not want to share via Github. So, check and, if necessary, modify
.gitignorefile. -
.gitignorefile follows a particular logic. First unfollow everything, then follow the ones you want to be tracked by git.
The following coding principles are necessary for a reproducibility and easy collaboration:
- Each small task, such as creating a figure, table or estimation of a specific model should be done by a separate code, such as
table-2-1.R. If code pieces must be run in an order, numbering files sequentially e.g.04-table-3-2.doortable-3-2.Ris a useful practice. - In collaboration with the research team, following the modified Gitflow workflow by DIME is highly recommended. Please also consult Development Research in Practice by DIME Analytics of World Bank.