Skip to content
This repository was archived by the owner on Apr 11, 2023. It is now read-only.

Commit e792e1c

Browse files
committed
Initial commit
0 parents  commit e792e1c

File tree

94 files changed

+12830
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

94 files changed

+12830
-0
lines changed

Diff for: .dockerignore

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
*.csv
2+
*.pkl
3+
*.hdf5
4+
resources/
5+
!resources/README.md
6+
!tests/data/
7+

Diff for: .github/workflows/test.yaml

+76
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
name: Smoke Test
2+
on: push
3+
4+
# split into two jobs so it runs in parallel, even if a little redundant
5+
jobs:
6+
docker_build:
7+
name: Build Test Container
8+
runs-on: ubuntu-latest
9+
steps:
10+
11+
- name: Copy Repo Files
12+
uses: actions/checkout@master
13+
14+
- name: docker build
15+
run: |
16+
echo ${INPUT_PASSWORD} | docker login -u ${INPUT_USERNAME} --password-stdin
17+
cd $GITHUB_WORKSPACE
18+
docker pull github/csnet-smoketest
19+
docker build --cache-from github/csnet-smoketest -t github/csnet-smoketest -f docker/docker-cpu.Dockerfile .
20+
docker push github/csnet-smoketest
21+
env:
22+
INPUT_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
23+
INPUT_USERNAME: ${{ secrets.DOCKER_USERNAME }}
24+
25+
basic_tests:
26+
needs: docker_build
27+
name: Integration Test Default Parameters
28+
runs-on: ubuntu-latest
29+
30+
steps:
31+
- name: mypy type checking
32+
run: |
33+
cd $GITHUB_WORKSPACE
34+
docker run github/csnet-smoketest mypy --ignore-missing-imports --follow-imports skip /src/train.py /src/model_test.py
35+
36+
- name: neuralbow, all languages
37+
run: |
38+
cd $GITHUB_WORKSPACE
39+
docker run github/csnet-smoketest python train.py /src /tests/data/data_train.txt /tests/data/data_train.txt /tests/data/data_train.txt --dryrun --max-num-epochs 1 --model neuralbow
40+
41+
- name: --max-files-per-dir 2
42+
run: |
43+
cd $GITHUB_WORKSPACE
44+
docker run github/csnet-smoketest python train.py /src /tests/data/data_train.txt /tests/data/data_train.txt /tests/data/data_train.txt --dryrun --max-num-epochs 1 --max-files-per-dir 2
45+
46+
CNN:
47+
needs: docker_build
48+
name: 1DCNN
49+
runs-on: ubuntu-latest
50+
51+
steps:
52+
- name: 1dcnn, all languages
53+
run: |
54+
cd $GITHUB_WORKSPACE
55+
docker run github/csnet-smoketest python train.py /src /tests/data/data_train.txt /tests/data/data_train.txt /tests/data/data_train.txt --dryrun --max-num-epochs 1 --model 1dcnn
56+
57+
selfattn:
58+
needs: docker_build
59+
name: selfattn
60+
runs-on: ubuntu-latest
61+
steps:
62+
63+
- name: selfattn, all languages
64+
run: |
65+
cd $GITHUB_WORKSPACE
66+
docker run github/csnet-smoketest python train.py /src /tests/data/data_train.txt /tests/data/data_train.txt /tests/data/data_train.txt --dryrun --max-num-epochs 1 --model selfatt --hypers-override "{\"batch_size\":64}"
67+
68+
rnn:
69+
needs: docker_build
70+
name: rnn
71+
runs-on: ubuntu-latest
72+
steps:
73+
- name: rnn, all languages
74+
run: |
75+
cd $GITHUB_WORKSPACE
76+
docker run github/csnet-smoketest python train.py /src /tests/data/data_train.txt /tests/data/data_train.txt /tests/data/data_train.txt --dryrun --max-num-epochs 1 --model rnn

Diff for: .gitignore

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# ts
2+
**/node_modules/
3+
/webroot/scripts/*.js
4+
5+
# vim
6+
**/*.swp
7+
8+
# python
9+
**/*.pyc
10+
**/__pycache__/
11+
12+
# jupyter
13+
**/.ipynb_checkpoints/
14+
15+
# data
16+
resources/
17+
!resources/README.md
18+
!tests/data/
19+
*.csv
20+
21+
# environment
22+
*.ftpconfig
23+
24+
.idea
25+
/src/wandb/run-*
26+
/src/wandb/debug.log
27+
*.html

Diff for: BENCHMARK.md

+80
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
## Submitting runs to the benchmark
2+
3+
The Weights & Biases (W&B) benchmark tracks and compares models trained on the CodeSearchNet dataset by the global machine learning research community. Anyone is welcome to submit their results for review.
4+
5+
## Submission process
6+
7+
### Requirements
8+
9+
There are a few requirements for submitting a model to the benchmark.
10+
- You must a have a run logged to [W&B](https://app.wandb.ai)
11+
- Your run must have attached inference results in a file named `model_predictions.csv`. You can view all the files attached to a given run in the browser by clicking the "Files" icon from that run's main page.
12+
- The schema outlined in the submission format section below must be strictly followed.
13+
14+
### Submission format
15+
16+
A valid submission to the CodeSeachNet Challenge requires a file named **model_predictions.csv** with the following fields: `query`, `language`, `identifier`, and `url`:
17+
18+
* `query`: the textual representation of the query, e.g. "int to string" .
19+
* `language`: the programming language for the given query, e.g. "python". This information is available as a field in the data to be scored.
20+
* `identifier`: this is an optional field that can help you track your data
21+
* `url`: the unique GitHub URL to the returned results, e.g. "https://github.com/JamesClonk/vultr/blob/fed59ad207c9bda0a5dfe4d18de53ccbb3d80c91/cmd/commands.go#L12-L190" . This information is available as a field in the data to be scored.
22+
23+
For further background and instructions on the submission process, see the root README.
24+
25+
The row order corresponds to the result ranking in the search task. For example, if in row 5 there is an entry for the Python query "read properties file", and in row 60 another result for the Python query "read properties file", then the URL in row 5 is considered to be ranked higher than the URL in row 60 for that query and language.
26+
27+
The script we used to create the baseline submission is [src/predict.py](src/predict.py). You are not required to use this script to produce your submission file -- we only provide it for reference.
28+
29+
Here is an example:
30+
31+
| query | language | identifier | url |
32+
| --------------------- | -------- | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
33+
| convert int to string | python | int_to_decimal_str | https://github.com/raphaelm/python-sepaxml/blob/187b699b1673c862002b2bae7e1bd62fe8623aec/sepaxml/utils.py#L64-L76 |
34+
| convert int to string | python | str_to_int_array | https://github.com/UCSBarchlab/PyRTL/blob/0988e5c9c10ededd5e1f58d5306603f9edf4b3e2/pyrtl/rtllib/libutils.py#L23-L33 |
35+
| convert int to string | python | Bcp47LanguageParser.IntStr26ToInt | https://github.com/google/transitfeed/blob/eb2991a3747ba541b2cb66502b305b6304a1f85f/extensions/googletransit/pybcp47/bcp47languageparser.py#L138-L139 |
36+
| convert int to string | python | PrimaryEqualProof.to_str_dict | https://github.com/hyperledger-archives/indy-anoncreds/blob/9d9cda3d505c312257d99a13d74d8f05dac3091a/anoncreds/protocol/types.py#L604-L613 |
37+
| convert int to string | python | to_int | https://github.com/mfussenegger/cr8/blob/a37d6049f1f9fee2d0556efae2b7b7f8761bffe8/cr8/cli.py#L8-L23 |
38+
| how to read .csv file in an efficient way? | ruby | Icosmith.Font.generate_scss | https://github.com/tulios/icosmith-rails/blob/e73c11eaa593fcb6f9ba93d34fbdbfe131693af4/lib/icosmith-rails/font.rb#L80-L88 |
39+
| how to read .csv file in an efficient way? | ruby | WebSocket.Extensions.valid_frame_rsv | https://github.com/faye/websocket-extensions-ruby/blob/1a441fac807e08597ec4b315d4022aea716f3efc/lib/websocket/extensions.rb#L120-L134 |
40+
| how to read .csv file in an efficient way? | ruby | APNS.Pem.read_file_at_path | https://github.com/jrbeck/mercurius/blob/1580a4af841a6f30ac62f87739fdff87e9608682/lib/mercurius/apns/pem.rb#L12-L18 |
41+
42+
43+
44+
### Submitting model predictions to W&B
45+
46+
You can submit your results to the benchmark as follows:
47+
48+
1. Run a training job with any script (your own or the baseline example provided, with or without W&B logging).
49+
2. Generate your own file of model predictions following the format above and name it \`model_predictions.csv\`.
50+
3. Upload a run to wandb with this \`model_predictions.csv\` file attached.
51+
52+
Our example script [src/predict.py](src/predict.py) takes care of steps 2 and 3 for a model whose training run has been logged to W&B, given the corresponding W&B run id, which you can find on the /overview page in the browser or by clicking the 'info' icon on a given run.
53+
54+
Here is a short example script that will create a run in W&B and perform the upload (step 3) for a local file of predictions:
55+
```python
56+
import wandb
57+
wandb.init(project="codesearchnet", resume="must")
58+
wandb.save('model_predictions.csv')
59+
```
60+
61+
### Publishing your submission
62+
63+
You've now generated all the content required to submit a run to the CodeSearchNet benchmark. Using the W&B GitHub integration you can now submit your model for review via the web app.
64+
65+
You can submit your runs by visiting the run page and clicking on the overview tab:
66+
![](https://github.com/wandb/core/blob/master/frontends/app/src/assets/run-page-benchmark.png?raw=true)
67+
68+
or by selecting a run from the runs table:
69+
![](https://app.wandb.ai/static/media/submit_benchmark_run.e286da0d.png)
70+
71+
### Result evaluation
72+
73+
Once you upload your \`model_predictions.csv\` file, W&B will compute the normalized cumulative gain (NCG) of your model's predictions against the human-annotated relevance scores. Further details on the evaluation process and metrics are in the root README. For transparency, we include the script used to evaluate submissions: [src/relevanceeval.py](src/relevanceeval.py)
74+
75+
76+
### Training the baseline model (optional)
77+
78+
Replicating our results for the CodeSearchNet baseline is optional, as we encourage the community to create their own models and methods for ranking search results. To replicate our baseline submission, you can start with the instructions in the [CodeSearchNet GitHub repository](https://github.com/ml-msr-github/CodeSearchNet). This baseline model uses [src/predict.py](src/predict.py) to generate the submission file.
79+
80+
Your run will be logged to W&B, within a project that will be automatically linked to this benchmark.

Diff for: CODE_OF_CONDUCT.md

+76
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Contributor Covenant Code of Conduct
2+
3+
## Our Pledge
4+
5+
In the interest of fostering an open and welcoming environment, we as
6+
contributors and maintainers pledge to making participation in our project and
7+
our community a harassment-free experience for everyone, regardless of age, body
8+
size, disability, ethnicity, sex characteristics, gender identity and expression,
9+
level of experience, education, socio-economic status, nationality, personal
10+
appearance, race, religion, or sexual identity and orientation.
11+
12+
## Our Standards
13+
14+
Examples of behavior that contributes to creating a positive environment
15+
include:
16+
17+
* Using welcoming and inclusive language
18+
* Being respectful of differing viewpoints and experiences
19+
* Gracefully accepting constructive criticism
20+
* Focusing on what is best for the community
21+
* Showing empathy towards other community members
22+
23+
Examples of unacceptable behavior by participants include:
24+
25+
* The use of sexualized language or imagery and unwelcome sexual attention or
26+
advances
27+
* Trolling, insulting/derogatory comments, and personal or political attacks
28+
* Public or private harassment
29+
* Publishing others' private information, such as a physical or electronic
30+
address, without explicit permission
31+
* Other conduct which could reasonably be considered inappropriate in a
32+
professional setting
33+
34+
## Our Responsibilities
35+
36+
Project maintainers are responsible for clarifying the standards of acceptable
37+
behavior and are expected to take appropriate and fair corrective action in
38+
response to any instances of unacceptable behavior.
39+
40+
Project maintainers have the right and responsibility to remove, edit, or
41+
reject comments, commits, code, wiki edits, issues, and other contributions
42+
that are not aligned to this Code of Conduct, or to ban temporarily or
43+
permanently any contributor for other behaviors that they deem inappropriate,
44+
threatening, offensive, or harmful.
45+
46+
## Scope
47+
48+
This Code of Conduct applies both within project spaces and in public spaces
49+
when an individual is representing the project or its community. Examples of
50+
representing a project or community include using an official project e-mail
51+
address, posting via an official social media account, or acting as an appointed
52+
representative at an online or offline event. Representation of a project may be
53+
further defined and clarified by project maintainers.
54+
55+
## Enforcement
56+
57+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
58+
reported by contacting the project team at [email protected]. All
59+
complaints will be reviewed and investigated and will result in a response that
60+
is deemed necessary and appropriate to the circumstances. The project team is
61+
obligated to maintain confidentiality with regard to the reporter of an incident.
62+
Further details of specific enforcement policies may be posted separately.
63+
64+
Project maintainers who do not follow or enforce the Code of Conduct in good
65+
faith may face temporary or permanent repercussions as determined by other
66+
members of the project's leadership.
67+
68+
## Attribution
69+
70+
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71+
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
72+
73+
[homepage]: https://www.contributor-covenant.org
74+
75+
For answers to common questions about this code of conduct, see
76+
https://www.contributor-covenant.org/faq

Diff for: CONTRIBUTING.md

+47
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
## Contributing
2+
3+
[fork]: https://help.github.com/articles/fork-a-repo/
4+
[pr]: https://help.github.com/articles/creating-a-pull-request/
5+
[style]: https://www.python.org/dev/peps/pep-0008/
6+
[code-of-conduct]: CODE_OF_CONDUCT.md
7+
[azurepipelines]: azure-pipelines.yml
8+
[benchmark]: BENCHMARK.md
9+
10+
Hi there! We're thrilled that you'd like to contribute to this project. Your help is essential for keeping it great.
11+
12+
Contributions to this project are [released](https://help.github.com/articles/github-terms-of-service/#6-contributions-under-repository-license) to the public under the [project's open source license](LICENSE).
13+
14+
Please note that this project is released with a [Contributor Code of Conduct][code-of-conduct]. By participating in this project you agree to abide by its terms.
15+
16+
## Scope
17+
18+
We anticipate that the community will design custom architectures and use frameworks other than Tensorflow. Furthermore, we anticipate that other datasets beyond the ones provided in this project might be useful. It is not our intention to integrate the best models and datasets into this repository as a superset of all available ideas. Rather, we intend to provide baseline approaches and a central place of reference with links to related repositories from the community. Therefore, we are accepting pull requests for the following items:
19+
20+
- Bug fixes
21+
- Updates to documentation, including links to your project(s) where improvements to the baseline have been made
22+
- Minor improvements to the code
23+
24+
Please open an issue if you are unsure regarding the best course of action.
25+
26+
## Submitting a pull request
27+
28+
0. [Fork][fork] and clone the repository
29+
0. Configure and install the dependencies: `script/bootstrap`
30+
0. Make sure the tests pass on your machine: see [azure-pipelines.yml][azurepipelines] to see tests we are currently running.
31+
0. Create a new branch: `git checkout -b my-branch-name`
32+
0. Make your change, add tests, and make sure the tests still pass.
33+
0. Push to your fork and [submit a pull request][pr]
34+
0. Pat your self on the back and wait for your pull request to be reviewed and merged.
35+
36+
Here are a few things you can do that will increase the likelihood of your pull request being accepted:
37+
38+
- Follow the [style guide][style].
39+
- Write tests.
40+
- Keep your change as focused as possible. If there are multiple changes you would like to make that are not dependent upon each other, consider submitting them as separate pull requests.
41+
- Write a [good commit message](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).
42+
43+
## Resources
44+
45+
- [How to Contribute to Open Source](https://opensource.guide/how-to-contribute/)
46+
- [Using Pull Requests](https://help.github.com/articles/about-pull-requests/)
47+
- [GitHub Help](https://help.github.com)

Diff for: LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2019 GitHub
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

0 commit comments

Comments
 (0)