-
Notifications
You must be signed in to change notification settings - Fork 477
Sqlalchemy 2 + parent<>child relationship reworked #2646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
* sqlalchemy2 main database * Update database.py * Update lib/cuckoo/core/database.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Refactor SQLAlchemy models to use modern API Updated all ORM models to use SQLAlchemy 2.x style with DeclarativeBase, Mapped, and mapped_column. Replaced backref with explicit back_populates for relationships, added type hints, and modernized the session initialization function. Cleaned up legacy code and improved type safety and maintainability. * Update database.py * Update dist_db.py * Update lib/cuckoo/core/database.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update lib/cuckoo/core/database.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update database.py * Update database.py * Update database.py * fixes * Update dist_db.py * modernize db tests * Update test_analysis_manager.py * Update test_database.py * fix * Update database.py * Update test_database.py * fix tests * Update database.py * Update test_analysis_manager.py * dist port to sqlalchemy 2 * Update dist.py * Update analysisinfo.py --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @doomedraven, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly modernizes the application's database layer by upgrading to SQLAlchemy 2.0, which involved a comprehensive refactoring of all database models and query patterns. A key aspect of this refactoring is the complete overhaul of how parent-child relationships between samples are managed, moving to an explicit association model. Concurrently, the 'Shrike' integration has been entirely removed from the codebase, streamlining the application and removing unused functionality.
Highlights
- SQLAlchemy 2.0 Upgrade: All database models and query syntax have been migrated to leverage SQLAlchemy 2.0's modern declarative and ORM features, including
Mapped
,mapped_column
, andselect()
statements, enhancing type safety and query readability. - Reworked Parent-Child Relationships: The previous implicit parent-child relationship in
Sample
objects has been replaced with an explicitSampleAssociation
model. This new many-to-many association table links parent and childSample
objects via aTask
, providing a more robust and clear representation of derived samples. - Removal of 'Shrike' Feature: The 'Shrike' integration has been completely removed from the codebase. This includes its configuration (
conf/default/web.conf.default
), database columns (tasks
table), API endpoints (web/apiv2/urls.py
), web interface elements (web/templates/analysis/index.html
,web/templates/analysis/search.html
), and associated view functions and parameters. - Database Query Modernization: Numerous database interaction functions across
lib/cuckoo/core/database.py
andutils/dist.py
have been updated to use SQLAlchemy 2.0's idiomaticselect()
,delete()
, andupdate()
statements, replacing oldersession.query()
patterns for improved performance and maintainability. - Database Schema and Migration Cleanup: A new Alembic migration script (
2. Database cleanup.py
) has been added to perform schema changes, including alteringsamples.file_size
toBIGINT
and dropping deprecated columns related to the old parent-child model and the removed 'Shrike' feature. Older distributed database migration files have also been removed.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a substantial and well-executed pull request that modernizes the database layer to SQLAlchemy 2.0 and refactors the parent-child relationship for samples. The move to Mapped
types, select()
statements, and back_populates
brings the code up to current best practices, improving both readability and likely performance. The new SampleAssociation
model for handling parent-child relationships is a robust solution.
My review has identified a few key areas for improvement:
- A critical issue in a database migration file where a table creation is commented out.
- A couple of high-severity issues related to test correctness, which could indicate underlying bugs.
- Several medium-severity issues concerning potential performance regressions (a missing index), code clarity, and incomplete refactoring work.
Overall, this is a high-quality contribution. Addressing the identified issues will help ensure the stability and maintainability of this significant refactoring.
# ToDo fix, idk why this one fails | ||
# assert task.machine_id == machine.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -795,7 +837,7 @@ def get_ids(**kwargs): | |||
assert get_ids(options_like="minhook") == [t1] | |||
assert get_ids(options_not_like="minhook") == [t3, t2] | |||
assert get_ids(tags_tasks_like="1") == [t2] | |||
assert get_ids(task_ids=(t1, t2)) == [t2, t1] | |||
assert get_ids(task_ids=(t1, t2)) == [t1, t2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default sort order for list_tasks
is Task.added_on.desc()
. In the test setup, t2
is added after t1
, so t2.added_on
should be greater than t1.added_on
. With descending order, t2
should appear before t1
. The original assertion [t2, t1]
seems correct. The new assertion [t1, t2]
seems to imply an ascending order, which contradicts the default behavior. Please verify the sorting logic.
assert get_ids(task_ids=(t1, t2)) == [t1, t2] | |
assert get_ids(task_ids=(t1, t2)) == [t2, t1] |
@@ -2111,6 +2104,7 @@ def delete_task(self, task_id): | |||
if task is None: | |||
return False | |||
self.session.delete(task) | |||
# ToDo missed commits everywhere, check if autocommit is possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ToDo
about missed commits is a valid point. Throughout the class, there are several places where changes are made to the session but self.session.commit()
is not called within the method. While the calling code might handle the transaction, it would be safer and more explicit to manage transactions within the methods that modify data, for example by using a with self.session.begin():
block. This would ensure atomicity and prevent leaving the session in a dirty state.
@@ -1347,7 +1385,9 @@ def submit_tasks(self, node_id, pend_tasks_num, options_like=False, force_push_p | |||
log.info("Pushed all tasks") | |||
return True | |||
|
|||
# ToDo not finished |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a ToDo not finished
here. It seems like the refactoring of this part of the code is incomplete. It would be good to either complete it or create a follow-up issue to track this. The commented-out """
block that follows should also be addressed, either by removing it or converting it to proper comments.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This reverts commit b0e83f4.
No description provided.