-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 57a246b
Showing
18 changed files
with
3,917,155 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
select distinct u.id | ||
from ghtorrent.users as u, ghtorrent.commits as c, | ||
namsor.ght_private as p, namsor.name_parse as np, | ||
namsor.origin as o, | ||
namsor.gender as g | ||
where | ||
g.firstName = np.firstName and g.lastName = np.lastName | ||
and o.firstName = np.firstName | ||
and o.lastName = np.lastName | ||
and p.name = np.fullName and p.login = u.login | ||
and length(p.name) - length(replace(p.name, ' ', '')) > 0 | ||
and c.author_id = u.id | ||
and p.login NOT REGEXP BINARY '^[A-Z]{8}$' | ||
and u.type = 'USR'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
CREATE TABLE `ght_namsor_s` ( | ||
`id` int(11) NOT NULL DEFAULT '0', | ||
`login` varchar(255) CHARACTER SET utf8 NOT NULL, | ||
`name` text, | ||
`firstName` text, | ||
`lastName` text, | ||
`email` text, | ||
`company` varchar(255) CHARACTER SET utf8 DEFAULT NULL, | ||
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, | ||
`type` varchar(255) CHARACTER SET utf8 NOT NULL DEFAULT 'USR', | ||
`fake` tinyint(1) NOT NULL DEFAULT '0', | ||
`deleted` tinyint(1) NOT NULL DEFAULT '0', | ||
`location` varchar(255) CHARACTER SET utf8 DEFAULT NULL, | ||
`nameParseScore` float DEFAULT NULL, | ||
`country` varchar(255) DEFAULT NULL, | ||
`countryAlt` varchar(255) DEFAULT NULL, | ||
`countryScore` float DEFAULT NULL, | ||
`script` varchar(255) DEFAULT NULL, | ||
`countryFirstName` text, | ||
`countryLastName` text, | ||
`countryScoreFirstName` float DEFAULT NULL, | ||
`countryScoreLastName` float DEFAULT NULL, | ||
`gender` varchar(255) DEFAULT NULL, | ||
`countryGender` varchar(255) DEFAULT NULL, | ||
`countryGenderAlt` varchar(255) DEFAULT NULL, | ||
`genderScale` float DEFAULT NULL, | ||
`gplus_gender` float DEFAULT NULL, | ||
`gplus_reliability` float DEFAULT NULL, | ||
`genderComputer` float DEFAULT NULL, | ||
`do_not_contact` tinyint(1) DEFAULT '0', | ||
`friendly` tinyint(1) DEFAULT '0', | ||
`first_commit` datetime DEFAULT NULL, | ||
KEY `index1` (`id`), | ||
KEY `index2` (`login`) | ||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# icse2019 | ||
|
||
These Python files were used to calculate social capital ... for paper ..... | ||
|
||
Required dependencies: | ||
pickle | ||
sqlalchemy | ||
pandas | ||
scipy | ||
|
||
In the MySQL database, there are the following tables: | ||
users | ||
commits | ||
ght_namsor_s (created using the MySQL query provided in `MySQL_queries/ght_namsor_s` | ||
|
||
Procedure of running the code: | ||
1. Use `MySQL_queries/filter_valid_users` to find valid users. For all valid users, run `determine_gender.py` to determine their genders. | ||
|
||
2. Run `sample_user.py` to get a balanced sample of equal number of male and female contributors. The result is saved in `data/uid.list`. | ||
|
||
3. Run `setup.py`, which reads files `dict/alias_map_b.dict`, | ||
`dict/reverse_alias_map_b.dict`, and `data/uid.list`, and generates files | ||
`data/pid.list`, `data/all_contributors.list`, `dict/contr_projs.dict`, | ||
`data/all_projs.list`, and `dict/proj_contrs_count.dict`. | ||
|
||
4. Run `get_user_info.py`, `get_proj_info.py`, and `get_user_proj_info.py`. They write to `data/results_users.csv`, `data/results_proj.csv`, and `data/results_user_proj.csv` repectively. | ||
|
||
5. Run `merge_result.py` to combine these tables. The result will be saved in `data/proj_user_proj.csv`, which will be used for data analysis. |
Oops, something went wrong.