Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #641

Merged
merged 1 commit into from
Aug 21, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 90 additions & 47 deletions Support/Z - GLD Ecosystem Tools/GLD Add Wage Info/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,97 +2,140 @@

## What is the issue?

GLD harmonizations do not currently calculate the hourly or monthly wage. This is done in the quality checks to evaluate data. However, it is GLD policy to invite users to calculate certain compound variables (like wage, made up of wage, unit, and hours worked information) on their own to ensure they are aware of limitations of the data. Nonetheless, to help in first estimates, users may leverage the `gld_add_wage` function created here to obtain a first estimate.

## How is it addressed?

The `gld_add_wage` function assumes GLD data are loaded. This can be data for a single country, a series from a country, or various years and countries. It will calculate the hourly and monthly wage based on the values of `wage_no_compen` (the gross earnings without employer contributions), `unitwage` (the unit the payment refers to), and `whours` (the hours worked in week) - or their 12 month recall period equivalents. For the precise details please [see the code](gld_add_wage.ado)

## How to install the function?

The function can be installed directly from the internet by typing the following into the console:

```
net install GLD-latest-file-check, replace from("https://raw.githubusercontent.com/worldbank/gld/main/Support/Z%20-%20GLD%20Ecosystem%20Tools/GLD%20files%20update%20check")
net install gld_add_wage, replace from("https://raw.githubusercontent.com/worldbank/gld/main/Support/Z%20-%20GLD%20Ecosystem%20Tools/GLD%20Add%20Wage%20Info")
```

## How to run the code?


## Overview
Kindly make sure to keep the `replace` option after the comma. This is not necessary the first time but will allow Stata to overwrite the code if we update the function.

This Stata script is designed to add wage information (both hourly and monthly) to Global Labor Database (GLD) surveys. The script supports two recall periods—weekly and yearly—and allows customization based on specific worker groups and desired timeframes. The script ensures that sufficient wage data is present before calculations are performed.

## Key Features

- Customizable Worker Groups: The script allows you to focus on either wage workers (waged) or all workers (all).
- Flexible Timeframes: Wage calculations can be done over a weekly or yearly recall period, or both.
- Threshold-Based Processing: The script only proceeds if a user-defined percentage of required data is present.
- Automatic Handling of Different Pay Periods: The script converts various pay frequencies (e.g., weekly, monthly, annual) into consistent hourly and monthly wages.
## How to run the code?

## Syntax
Once the desired GLD file(s) is(are) loaded, the user can call the function directly. The function uses [the wbopendata Stata function](https://github.com/jpazvd/wbopendata/tree/master) but it will automatically install it if the user does not have it. The syntax is the following:

```
gld_add_wage [, WORKer(string) TIME(string) THREshold(real 75) PPP]
```

## Options
The function command is followed by four options:

- WORKer(string): Defines the worker group to be used. Accepted values:
- **"a"** or **"all"**: All workers
- **"w"** or **"waged"**: Only wage workers

- TIME(string): Defines the recall period. Accepted values:
- **"w"** or **"week"**: Weekly
- **"y"** or **"year"**: Yearly
- **"w"** or **"week"**: Weekly (7 days)
- **"y"** or **"year"**: Yearly (12 months)
- **"b"** or **"both"**: Both weekly and yearly

- THREshold(real): The percentage threshold (0-100) of cases required to have complete wage data before the script proceeds. Default is 75.

- PPP: Optional. This option will include Purchasing Power Parity (PPP) adjustments (feature not yet implemented).
- PPP: If requested, will also add the PPP estimate of the wage values created.

## How the Code Works
Since `threshold` has a default value and PPP is an option, the minimum command necessary to work is:

### Step 1: Initial Setup and Input Validation
```
gld_add_wage, work(w) time(w)
```

- The script begins by setting up the environment and parsing user inputs.
- It checks the validity of the options provided by the user, ensuring correct worker groups, timeframes, and threshold values are specified.
The above created the hourly and monthly wages for wage employees over the 7 day recall period - as long as 75% of wage employees have wage information (per the default threshold).

### Step 2: Variable Availability Check
If the user wants to have the same but also (a) ensure that 90% of employees have information and (b) also add in the PPP values the command is:

- The script checks for the availability of essential variables (e.g., employment status, wage information, working hours) for the chosen recall period.
```
gld_add_wage, work(w) time(w) thre(90) ppp
```

### Step 3: Hourly Wage Calculation (Weekly Recall Period)
## Example run

- If the selected recall period includes the weekly option, the script calculates the hourly wage for the specified worker group.
- Different pay frequencies (daily, weekly, monthly, etc.) are converted into consistent hourly rates.
The below example is based on the case where data from the following four surveys has been appended:

### Step 4: Monthly Wage Calculation (Weekly Recall Period)
- THA_2021_LFS-Q4_V01_M_V03_A_GLD_ALL.dta
- BOL_2021_ECE_V01_M_V02_A_GLD_ALL.dta
- BOL_2019_ECE_V01_M_V01_A_GLD_ALL.dta
- IND_2019_PLFS_V02_M_V04_A_GLD_ALL.dta

- The script calculates monthly wages based on the weekly recall period using consistent conversion factors for different pay frequencies.
Running the command

### Step 5: Hourly Wage Calculation (Yearly Recall Period)
```
gld_add_wage, work(w) time(w) thre(90) ppp
```

- If the selected recall period includes the yearly option, the script performs similar calculations as in Step 3, but for a yearly recall period.
Will add variables `threshold`, `hour_wage`, `month_wage`, `pa_nus_ppp`, `hour_wage_ppp`, and `month_wage_ppp`. Below we can see the output of the wage variables by country and year:

## Threshold Evaluation
<br></br>
![Output of the function at threhsold 90](utilities/function_90.png)
<br></br>

- The script evaluates whether sufficient data (based on the specified threshold) is available for processing. If not, it exits with an error message.
There are values for all years and countries. However, if we look at the thresholds by countries (screenshot below), we observe that in the case of Thailand just under 98% of the wage workers have the relevant variables to calculate wages.

## Error Handling
<br></br>
![Thresholds by countries](utilities/thresholds.png)
<br></br>

The script includes comprehensive error checking:
Hence we can test that setting the threshold option to 99 should result in no numbers being calculated for Thailand. In effect, as shown below, all observations for Thailand are missing, while the values for the other countries remain the same.

- If an invalid option is provided, the script notifies the user and exits.
- If key variables are missing, the script exits with a clear error message.
- If insufficient cases meet the threshold requirement, the script exits without performing calculations.
<br></br>
![Output of the function at threhsold 99](utilities/function_99.png)
<br></br>

## Usage Example
## Error handling

```
gld_wage_info, WORKer("all") TIME("both") THREshold(80)
```
The function has a set of error handling choices designed to inform the user about issues with the data. Below we highlight three: incorrect entry of arguments, unmet threshold, and unavailable PPP values.

### Incorrect entry of function arguments

The function automatically checks whether the arguments passed to the the function arguments (worker, time, and threshold) are correct and informs users of the errors, as shown in the images below.

<br></br>
![Error in passing work argument](utilities/error_wo.png)
<br></br>

<br></br>
![Error in passing time argument](utilities/error_ti.png)
<br></br>

<br></br>
![Error in passing threshold argument](utilities/error_th.png)
<br></br>

### Missing key variables

The function will also inform users if relevant variables are missing. In the below example, the variable unitwage is not in the data. Without the function cannot proceed and execution stops.

<br></br>
![Error as relevant variable is missing](utilities/error_var.png)
<br></br>

## Unavailable PPP values

If PPP data is not available for any country, the function will inform the user.

In the case shown below, we convert the country to Venezuela and the year to 2023 (as we know that at the time of writing, the PPP value is missing). As shown, the function informs the user but still calculates the local currency values.

<br></br>
![No PPP data for VEN in 2023](utilities/ppp_not_single.png)
<br></br>

If at least one of the options is present, the function will continue, yet the values for the countries without PPP values will be missing. In the below example data from Bolivia and Thailand is changed to cases without PPP to show the functioning.

<br></br>
![PPP one out of three](utilities/ppp_one_out_of_three.png)
<br></br>

In this example, the script calculates wage information for all workers across both weekly and yearly recall periods, but only if 80% of the necessary data is available.
If there is not information for any of the countries, as in the constructed case shown below, the function will inform the user of the situation and not produce the PPP variables.

Notes:
<br></br>
![PPP none out of three](utilities/ppp_none_out_of_three.png)
<br></br>

- The script assumes that key variables (like empstat, wage_no_compen, unitwage, and whours) are present in the dataset.
- The script is optimized for GLD surveys but can be adapted for similar datasets with minimal changes.
- Future updates may include additional features like PPP adjustments.
Note that the information may not be inclusive of all cases. In the above case, we have YEM 2023, VIR 2022, and VEN 2023. Hence the information lists the countries and the years. This does not mean that there is no information for YEM in 2022 or VIR in 2023.
Loading