Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First experiment #31

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/config.local
/tmp
/cache
5 changes: 5 additions & 0 deletions .dvc/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[core]
analytics = false
remote = remote_storage
['remote "remote_storage"']
url = /home/ajkumar/hackathonDVC/dvcDataStore
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
2 changes: 2 additions & 0 deletions data/prepared/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/train.csv
/test.csv
4 changes: 4 additions & 0 deletions data/prepared/test.csv.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 5151fcd60de0b43c2d18fde128ee5e09
size: 83326
path: test.csv
4 changes: 4 additions & 0 deletions data/prepared/train.csv.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 0004b6e0e6d489f6902d6d6db47c24fe
size: 206725
path: train.csv
2 changes: 2 additions & 0 deletions data/raw/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/train
/val
5 changes: 5 additions & 0 deletions data/raw/train.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
outs:
- md5: 7adc7abb69056f4d7afb512c78f2fce9.dir
size: 75309082
nfiles: 9470
path: train
5 changes: 5 additions & 0 deletions data/raw/val.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
outs:
- md5: 0ad4dcf197b452735726bf8d8777201d.dir
size: 31248080
nfiles: 3925
path: val
1 change: 1 addition & 0 deletions metrics/accuracy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"accuracy": 0.7351077313054499}
1 change: 1 addition & 0 deletions model/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/model.joblib
4 changes: 4 additions & 0 deletions model/model.joblib.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 13c7384218e443fdf941f153ce53d134
size: 241222
path: model.joblib
2 changes: 1 addition & 1 deletion src/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def load_data(data_path):
def main(repo_path):
train_csv_path = repo_path / "data/prepared/train.csv"
train_data, labels = load_data(train_csv_path)
sgd = SGDClassifier(max_iter=10)
sgd = SGDClassifier(max_iter=100)
trained_model = sgd.fit(train_data, labels)
dump(trained_model, repo_path / "model/model.joblib")

Expand Down