Sqlalchemy 2 + parent<>child relationship reworked #2646

doomedraven · 2025-07-15T12:44:51Z

No description provided.

* sqlalchemy2 main database * Update database.py * Update lib/cuckoo/core/database.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Refactor SQLAlchemy models to use modern API Updated all ORM models to use SQLAlchemy 2.x style with DeclarativeBase, Mapped, and mapped_column. Replaced backref with explicit back_populates for relationships, added type hints, and modernized the session initialization function. Cleaned up legacy code and improved type safety and maintainability. * Update database.py * Update dist_db.py * Update lib/cuckoo/core/database.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update lib/cuckoo/core/database.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update database.py * Update database.py * Update database.py * fixes * Update dist_db.py * modernize db tests * Update test_analysis_manager.py * Update test_database.py * fix * Update database.py * Update test_database.py * fix tests * Update database.py * Update test_analysis_manager.py * dist port to sqlalchemy 2 * Update dist.py * Update analysisinfo.py --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist

Summary of Changes

Hello @doomedraven, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly modernizes the application's database layer by upgrading to SQLAlchemy 2.0, which involved a comprehensive refactoring of all database models and query patterns. A key aspect of this refactoring is the complete overhaul of how parent-child relationships between samples are managed, moving to an explicit association model. Concurrently, the 'Shrike' integration has been entirely removed from the codebase, streamlining the application and removing unused functionality.

Highlights

SQLAlchemy 2.0 Upgrade: All database models and query syntax have been migrated to leverage SQLAlchemy 2.0's modern declarative and ORM features, including Mapped, mapped_column, and select() statements, enhancing type safety and query readability.
Reworked Parent-Child Relationships: The previous implicit parent-child relationship in Sample objects has been replaced with an explicit SampleAssociation model. This new many-to-many association table links parent and child Sample objects via a Task, providing a more robust and clear representation of derived samples.
Removal of 'Shrike' Feature: The 'Shrike' integration has been completely removed from the codebase. This includes its configuration (conf/default/web.conf.default), database columns (tasks table), API endpoints (web/apiv2/urls.py), web interface elements (web/templates/analysis/index.html, web/templates/analysis/search.html), and associated view functions and parameters.
Database Query Modernization: Numerous database interaction functions across lib/cuckoo/core/database.py and utils/dist.py have been updated to use SQLAlchemy 2.0's idiomatic select(), delete(), and update() statements, replacing older session.query() patterns for improved performance and maintainability.
Database Schema and Migration Cleanup: A new Alembic migration script (2. Database cleanup.py) has been added to perform schema changes, including altering samples.file_size to BIGINT and dropping deprecated columns related to the old parent-child model and the removed 'Shrike' feature. Older distributed database migration files have also been removed.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a substantial and well-executed pull request that modernizes the database layer to SQLAlchemy 2.0 and refactors the parent-child relationship for samples. The move to Mapped types, select() statements, and back_populates brings the code up to current best practices, improving both readability and likely performance. The new SampleAssociation model for handling parent-child relationships is a robust solution.

My review has identified a few key areas for improvement:

A critical issue in a database migration file where a table creation is commented out.
A couple of high-severity issues related to test correctness, which could indicate underlying bugs.
Several medium-severity issues concerning potential performance regressions (a missing index), code clarity, and incomplete refactoring work.

Overall, this is a high-quality contribution. Addressing the identified issues will help ensure the stability and maintainability of this significant refactoring.

utils/db_migration/versions/2. Database cleanup.py

gemini-code-assist · 2025-07-15T12:48:50Z

tests/test_analysis_manager.py

+            # ToDo fix, idk why this one fails
+            # assert task.machine_id == machine.id


This assertion is commented out with a ToDo indicating it fails. This suggests a potential bug in how task.machine_id is being set or persisted. It's important to fix this to ensure data integrity.

gemini-code-assist · 2025-07-15T12:48:50Z

tests/test_database.py

@@ -795,7 +837,7 @@ def get_ids(**kwargs):
            assert get_ids(options_like="minhook") == [t1]
            assert get_ids(options_not_like="minhook") == [t3, t2]
            assert get_ids(tags_tasks_like="1") == [t2]
-            assert get_ids(task_ids=(t1, t2)) == [t2, t1]
+            assert get_ids(task_ids=(t1, t2)) == [t1, t2]


The default sort order for list_tasks is Task.added_on.desc(). In the test setup, t2 is added after t1, so t2.added_on should be greater than t1.added_on. With descending order, t2 should appear before t1. The original assertion [t2, t1] seems correct. The new assertion [t1, t2] seems to imply an ascending order, which contradicts the default behavior. Please verify the sorting logic.

Suggested change

assert get_ids(task_ids=(t1, t2)) == [t1, t2]

assert get_ids(task_ids=(t1, t2)) == [t2, t1]

lib/cuckoo/common/dist_db.py

gemini-code-assist · 2025-07-15T12:48:50Z

lib/cuckoo/core/database.py

@@ -2111,6 +2104,7 @@ def delete_task(self, task_id):
        if task is None:
            return False
        self.session.delete(task)
+        # ToDo missed commits everywhere, check if autocommit is possible


This ToDo about missed commits is a valid point. Throughout the class, there are several places where changes are made to the session but self.session.commit() is not called within the method. While the calling code might handle the transaction, it would be safer and more explicit to manage transactions within the methods that modify data, for example by using a with self.session.begin(): block. This would ensure atomicity and prevent leaving the session in a dirty state.

gemini-code-assist · 2025-07-15T12:48:51Z

utils/dist.py

@@ -1347,7 +1385,9 @@ def submit_tasks(self, node_id, pend_tasks_num, options_like=False, force_push_p
                                log.info("Pushed all tasks")
                                return True

+                    # ToDo not finished


I see a ToDo not finished here. It seems like the refactoring of this part of the code is incomplete. It would be good to either complete it or create a follow-up issue to track this. The commented-out """ block that follows should also be addressed, either by removing it or converting it to proper comments.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

This reverts commit b0e83f4.

doomedraven and others added 30 commits February 2, 2025 13:04

Ruff (#2410)

e2bd101

Update zip_utils.py

ff09024

Update zip_utils.py

7871a71

Merge branch 'master' into staging

462fb0c

Merge branch 'master' into staging

8db0425

Update rooter.py

82be043

Merge branch 'master' into staging

2299dc0

Update plugins.py

8228c21

Update abstracts.py

6f4df2b

Update plugins.py

e812fe1

Update plugins.py

bf15a9d

Merge branch 'master' into staging

07b9f84

Merge branch 'master' into staging

54de070

Merge branch 'master' into staging

b392b01

Update tls.py

bcec6d9

sync

269a449

Merge branch 'master' into staging

e145063

sync

a8f768a

Merge branch 'master' into staging

3cd15fd

Downloaders (#2493)

50d7f54

ci: Update requirements.txt

12999c2

improve docs

9145bac

typo

07afabf

fix typo

61bbeae

Merge branch 'master' into staging

47b0c34

Update load_extra_modules.py

37e8eb8

sync

c12b2f0

fix static

85777cc

fix rtf

d8082fd

Merge branch 'master' into staging

ae22386

doomedraven and others added 19 commits March 1, 2025 08:14

sync

0b1c8e3

Create test_strings.py

5c37b79

Update database.py

d51af8f

Update database.py

d4e6557

Update test_strings.py

5e45d94

Update changelog.md

f2d604a

Merge branch 'master' into staging

747a069

Merge branch 'master' into staging

245bed8

ci: Update requirements.txt

7c3e65b

Merge branch 'master' into staging

7a27af3

ci: Update requirements.txt

3c287cd

Merge branch 'master' into staging

4f6e17c

ci: Update requirements.txt

be0e80a

Update dist.py

a637ad1

Merge branch 'master' into staging

746be3f

db cleanup (#2644)

28e4ddf

Proper relationship parent-child samples (#2645)

13e1e1e

drop also old dist migrations

100b7cf

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

Update web_utils.py

32e3170

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

doomedraven and others added 7 commits July 15, 2025 12:53

Update lib/cuckoo/common/dist_db.py

cc2584a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update 2. Database cleanup.py

150b933

Update web_utils.py

d4380a3

Update dist.py

e44ea81

Create iocs.py

33a2a8c

Update views.py

b0e83f4

Revert "Update views.py"

c325da5

This reverts commit b0e83f4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sqlalchemy 2 + parent<>child relationship reworked #2646

Sqlalchemy 2 + parent<>child relationship reworked #2646

doomedraven commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

gemini-code-assist bot Jul 15, 2025

Uh oh!

Uh oh!

		# ToDo fix, idk why this one fails
		# assert task.machine_id == machine.id

	assert get_ids(task_ids=(t1, t2)) == [t1, t2]
	assert get_ids(task_ids=(t1, t2)) == [t2, t1]

Sqlalchemy 2 + parent<>child relationship reworked #2646

Are you sure you want to change the base?

Sqlalchemy 2 + parent<>child relationship reworked #2646

Conversation

doomedraven commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!