Skip to content

Commit

Permalink
transfer from annonymous repo
Browse files Browse the repository at this point in the history
  • Loading branch information
sophieball committed Jan 26, 2019
0 parents commit 57a246b
Show file tree
Hide file tree
Showing 18 changed files with 3,917,155 additions and 0 deletions.
14 changes: 14 additions & 0 deletions MySQL_queries/filter_valid_users
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
select distinct u.id
from ghtorrent.users as u, ghtorrent.commits as c,
namsor.ght_private as p, namsor.name_parse as np,
namsor.origin as o,
namsor.gender as g
where
g.firstName = np.firstName and g.lastName = np.lastName
and o.firstName = np.firstName
and o.lastName = np.lastName
and p.name = np.fullName and p.login = u.login
and length(p.name) - length(replace(p.name, ' ', '')) > 0
and c.author_id = u.id
and p.login NOT REGEXP BINARY '^[A-Z]{8}$'
and u.type = 'USR';
35 changes: 35 additions & 0 deletions MySQL_queries/ght_namsor_s
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
CREATE TABLE `ght_namsor_s` (
`id` int(11) NOT NULL DEFAULT '0',
`login` varchar(255) CHARACTER SET utf8 NOT NULL,
`name` text,
`firstName` text,
`lastName` text,
`email` text,
`company` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`type` varchar(255) CHARACTER SET utf8 NOT NULL DEFAULT 'USR',
`fake` tinyint(1) NOT NULL DEFAULT '0',
`deleted` tinyint(1) NOT NULL DEFAULT '0',
`location` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`nameParseScore` float DEFAULT NULL,
`country` varchar(255) DEFAULT NULL,
`countryAlt` varchar(255) DEFAULT NULL,
`countryScore` float DEFAULT NULL,
`script` varchar(255) DEFAULT NULL,
`countryFirstName` text,
`countryLastName` text,
`countryScoreFirstName` float DEFAULT NULL,
`countryScoreLastName` float DEFAULT NULL,
`gender` varchar(255) DEFAULT NULL,
`countryGender` varchar(255) DEFAULT NULL,
`countryGenderAlt` varchar(255) DEFAULT NULL,
`genderScale` float DEFAULT NULL,
`gplus_gender` float DEFAULT NULL,
`gplus_reliability` float DEFAULT NULL,
`genderComputer` float DEFAULT NULL,
`do_not_contact` tinyint(1) DEFAULT '0',
`friendly` tinyint(1) DEFAULT '0',
`first_commit` datetime DEFAULT NULL,
KEY `index1` (`id`),
KEY `index2` (`login`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# icse2019

These Python files were used to calculate social capital ... for paper .....

Required dependencies:
pickle
sqlalchemy
pandas
scipy

In the MySQL database, there are the following tables:
users
commits
ght_namsor_s (created using the MySQL query provided in `MySQL_queries/ght_namsor_s`

Procedure of running the code:
1. Use `MySQL_queries/filter_valid_users` to find valid users. For all valid users, run `determine_gender.py` to determine their genders.

2. Run `sample_user.py` to get a balanced sample of equal number of male and female contributors. The result is saved in `data/uid.list`.

3. Run `setup.py`, which reads files `dict/alias_map_b.dict`,
`dict/reverse_alias_map_b.dict`, and `data/uid.list`, and generates files
`data/pid.list`, `data/all_contributors.list`, `dict/contr_projs.dict`,
`data/all_projs.list`, and `dict/proj_contrs_count.dict`.

4. Run `get_user_info.py`, `get_proj_info.py`, and `get_user_proj_info.py`. They write to `data/results_users.csv`, `data/results_proj.csv`, and `data/results_user_proj.csv` repectively.

5. Run `merge_result.py` to combine these tables. The result will be saved in `data/proj_user_proj.csv`, which will be used for data analysis.
Loading

0 comments on commit 57a246b

Please sign in to comment.