Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP :update the annotation sql file #196

Closed
wants to merge 1 commit into from

Conversation

liniiiiii
Copy link
Collaborator

@i-be-snek , could you have a look of the db file, and if the tables are correct, we can ask Camila start working on this branch, thanks!

@liniiiiii liniiiiii linked an issue Nov 27, 2024 that may be closed by this pull request
2 tasks
Copy link
Collaborator

@i-be-snek i-be-snek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR needs significant changes to pass. Please implement the changes as requested and re-tag me to review it again.

I don't really think we agreed on how exactly the new annotations would look like, let alone the annotation table. I'm a bit surprised since I'm on this project for my knowledge in data engineering yet I don't really get consulted much about major design choices that are related to databases. My job and expertise have been reduced to reviewing code (which I'm happy to do, but it's not the only thing I should be doing in this labor division). In my professional opinion, this is NOT a good recipe for future annotations based on all the issues we faced in V1, and it has NOT been ironed out properly.

import sqlite3

# Connect to SQLite database (or create it if it doesn't exist)
conn = sqlite3.connect("/home/nl/Wikimpacts/Annotation_V2/impactDB_gold_V2.db")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not okay to have these paths hard-coded. You need to use the relative path where the repository is the root directory (so starting from Wikimpacts/).

conn = sqlite3.connect("/home/nl/Wikimpacts/Annotation_V2/impactDB_gold_V2.db")
cursor = conn.cursor()

# Create tables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is redundant.

"""
)

# Commit changes and close the connection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is redundant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of placing this database in a new folder called "Annotation_V2" is unhelpful and will clutter the repository. Gold annotations have always landed in Database/gold, so there is no reason for them to have a new directory. You need to reconsider where you want to place this data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When programming with a large group of people, it's absolutely vital to try to avoid repetitive code as much as possible. Here, you are basically making a new "copy" of the database but without any of the neat validation rules that the script above implements.

Also, all database related schemas have always been placed in Database/schema so there is no justification for a new directory.

These changes need to be implemented in Database/schema in a fashion similar to poetry run python3 Database/schema/create_db.py -db <DB-NAME> and with proper validation rules.

We now have two files called create_db.py. This kind of repetition is confusing and dangerous down the line.

Event_ID TEXT,
Country TEXT NOT NULL,
Country_GID TEXT,
Administrative_Areas TEXT,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be an array or TEXT?

@liniiiiii
Copy link
Collaborator Author

The PR needs significant changes to pass. Please implement the changes as requested and re-tag me to review it again.

I don't really think we agreed on how exactly the new annotations would look like, let alone the annotation table. I'm a bit surprised since I'm on this project for my knowledge in data engineering yet I don't really get consulted much about major design choices that are related to databases. My job and expertise have been reduced to reviewing code (which I'm happy to do, but it's not the only thing I should be doing in this labor division). In my professional opinion, this is NOT a good recipe for future annotations based on all the issues we faced in V1, and it has NOT been ironed out properly.

@i-be-snek , sure, thanks for your comments on this pr, I will share the tables with the group, and get a confirmed message for this, definitely, we can't decide the structure just by our two, I will double check the tables and email the group, thanks!

@liniiiiii
Copy link
Collaborator Author

@koffiworou, do you have further comments on this table, and I can update and share to the group for further check, thanks!

@liniiiiii liniiiiii closed this Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Annotations for V2 (to do branch)
2 participants