Skip to content

Sqlalchemy 2 + parent<>child relationship reworked #2646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 57 commits into
base: master
Choose a base branch
from

Conversation

doomedraven
Copy link
Collaborator

No description provided.

doomedraven and others added 19 commits March 1, 2025 08:14
* sqlalchemy2 main database

* Update database.py

* Update lib/cuckoo/core/database.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Refactor SQLAlchemy models to use modern API

Updated all ORM models to use SQLAlchemy 2.x style with DeclarativeBase, Mapped, and mapped_column. Replaced backref with explicit back_populates for relationships, added type hints, and modernized the session initialization function. Cleaned up legacy code and improved type safety and maintainability.

* Update database.py

* Update dist_db.py

* Update lib/cuckoo/core/database.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update lib/cuckoo/core/database.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update database.py

* Update database.py

* Update database.py

* fixes

* Update dist_db.py

* modernize db tests

* Update test_analysis_manager.py

* Update test_database.py

* fix

* Update database.py

* Update test_database.py

* fix tests

* Update database.py

* Update test_analysis_manager.py

* dist port to sqlalchemy 2

* Update dist.py

* Update analysisinfo.py

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @doomedraven, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly modernizes the application's database layer by upgrading to SQLAlchemy 2.0, which involved a comprehensive refactoring of all database models and query patterns. A key aspect of this refactoring is the complete overhaul of how parent-child relationships between samples are managed, moving to an explicit association model. Concurrently, the 'Shrike' integration has been entirely removed from the codebase, streamlining the application and removing unused functionality.

Highlights

  • SQLAlchemy 2.0 Upgrade: All database models and query syntax have been migrated to leverage SQLAlchemy 2.0's modern declarative and ORM features, including Mapped, mapped_column, and select() statements, enhancing type safety and query readability.
  • Reworked Parent-Child Relationships: The previous implicit parent-child relationship in Sample objects has been replaced with an explicit SampleAssociation model. This new many-to-many association table links parent and child Sample objects via a Task, providing a more robust and clear representation of derived samples.
  • Removal of 'Shrike' Feature: The 'Shrike' integration has been completely removed from the codebase. This includes its configuration (conf/default/web.conf.default), database columns (tasks table), API endpoints (web/apiv2/urls.py), web interface elements (web/templates/analysis/index.html, web/templates/analysis/search.html), and associated view functions and parameters.
  • Database Query Modernization: Numerous database interaction functions across lib/cuckoo/core/database.py and utils/dist.py have been updated to use SQLAlchemy 2.0's idiomatic select(), delete(), and update() statements, replacing older session.query() patterns for improved performance and maintainability.
  • Database Schema and Migration Cleanup: A new Alembic migration script (2. Database cleanup.py) has been added to perform schema changes, including altering samples.file_size to BIGINT and dropping deprecated columns related to the old parent-child model and the removed 'Shrike' feature. Older distributed database migration files have also been removed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial and well-executed pull request that modernizes the database layer to SQLAlchemy 2.0 and refactors the parent-child relationship for samples. The move to Mapped types, select() statements, and back_populates brings the code up to current best practices, improving both readability and likely performance. The new SampleAssociation model for handling parent-child relationships is a robust solution.

My review has identified a few key areas for improvement:

  • A critical issue in a database migration file where a table creation is commented out.
  • A couple of high-severity issues related to test correctness, which could indicate underlying bugs.
  • Several medium-severity issues concerning potential performance regressions (a missing index), code clarity, and incomplete refactoring work.

Overall, this is a high-quality contribution. Addressing the identified issues will help ensure the stability and maintainability of this significant refactoring.

Comment on lines +173 to +174
# ToDo fix, idk why this one fails
# assert task.machine_id == machine.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This assertion is commented out with a ToDo indicating it fails. This suggests a potential bug in how task.machine_id is being set or persisted. It's important to fix this to ensure data integrity.

@@ -795,7 +837,7 @@ def get_ids(**kwargs):
assert get_ids(options_like="minhook") == [t1]
assert get_ids(options_not_like="minhook") == [t3, t2]
assert get_ids(tags_tasks_like="1") == [t2]
assert get_ids(task_ids=(t1, t2)) == [t2, t1]
assert get_ids(task_ids=(t1, t2)) == [t1, t2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The default sort order for list_tasks is Task.added_on.desc(). In the test setup, t2 is added after t1, so t2.added_on should be greater than t1.added_on. With descending order, t2 should appear before t1. The original assertion [t2, t1] seems correct. The new assertion [t1, t2] seems to imply an ascending order, which contradicts the default behavior. Please verify the sorting logic.

Suggested change
assert get_ids(task_ids=(t1, t2)) == [t1, t2]
assert get_ids(task_ids=(t1, t2)) == [t2, t1]

@@ -2111,6 +2104,7 @@ def delete_task(self, task_id):
if task is None:
return False
self.session.delete(task)
# ToDo missed commits everywhere, check if autocommit is possible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This ToDo about missed commits is a valid point. Throughout the class, there are several places where changes are made to the session but self.session.commit() is not called within the method. While the calling code might handle the transaction, it would be safer and more explicit to manage transactions within the methods that modify data, for example by using a with self.session.begin(): block. This would ensure atomicity and prevent leaving the session in a dirty state.

@@ -1347,7 +1385,9 @@ def submit_tasks(self, node_id, pend_tasks_num, options_like=False, force_push_p
log.info("Pushed all tasks")
return True

# ToDo not finished
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

I see a ToDo not finished here. It seems like the refactoring of this part of the code is incomplete. It would be good to either complete it or create a follow-up issue to track this. The commented-out """ block that follows should also be addressed, either by removing it or converting it to proper comments.

doomedraven and others added 7 commits July 15, 2025 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants