Skip to content

Add skipDuplicates() scope guard to createDocuments#855

Closed
premtsd-code wants to merge 1 commit intomainfrom
feat/skip-duplicates
Closed

Add skipDuplicates() scope guard to createDocuments#855
premtsd-code wants to merge 1 commit intomainfrom
feat/skip-duplicates

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

@premtsd-code premtsd-code commented Apr 15, 2026

Summary

Adds a RAII-style skipDuplicates() scope guard for batch inserts, silencing duplicate-key errors at the adapter layer via dialect-specific no-op inserts. Primary use cases: idempotent CSV re-imports and database migration re-runs.

$db->skipDuplicates(fn () => $db->createDocuments($collection, $docs));

Adapter implementation

  • MariaDB / MySQLINSERT IGNORE INTO
  • PostgresINSERT INTO ... ON CONFLICT (...) DO NOTHING
  • SQLiteINSERT OR IGNORE INTO
  • Mongoupsert + $setOnInsert, bypassing the transaction wrap to avoid the txn-abort-on-duplicate behavior of insertMany

Hot-path cost

One property read + one boolean branch per chunk. Callers that don't use skipDuplicates pay zero closure allocation and zero scope-guard setup. Per-chunk overhead only applies when the feature is actually used.

Mirror dual-write correctness

Mirror::createDocuments in skipDuplicates mode now re-fetches source's authoritative state via find() after the adapter insert settles, then forwards that to destination instead of the caller's input. Race-free regardless of concurrent writers — destination always receives a faithful snapshot of whatever source holds.

This replaces a prior captureOnNext-based approach that forwarded would-be values for source-skipped duplicates, diverging destination from source. Addresses Greptile flag #3084293974.

Cost: one extra SELECT against source per batch when skipDuplicates is active and Mirror has an upgraded destination. Zero cost on the non-skip path.

upsertDocumentsWithIncrease tenant grouping

Restored per-tenant grouping for the existing-doc lookup in sharedTables + tenantPerDocument mode. The previously-inlined find() ran under the session tenant context (null for platform workers) and silently missed rows from other tenants, breaking the appwrite StatsUsage worker that flushes stats across projects.

Per-tenant grouping uses K queries (K = unique tenants in the batch) instead of N (per-doc) or 1 (broken). The common single-tenant path still hits the fast batched code.

Test coverage

  • Every adapter: MariaDB, MySQL, Postgres, SQLite, Mongo
  • Intra-batch duplicates, cross-batch duplicates, nested scope, large batches, relationships
  • Mirror dual-write with authoritative-state forwarding (includes the Greptile divergence assertion)

Test plan

  • MirrorTest — 635 tests, 11035 assertions green
  • PostgresTest, SQLiteTest, MariaDBTest, MySQLTest, MongoDBTest skipDuplicates suites green
  • Full createDocuments test sweep on Postgres + SQLite

Replaces

Replaces #852, which accumulated 91 commits of mid-review churn including merges from other PRs already in main. This branch is a clean squash of just the feature work (10 files, +789 / -17) against current main.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added skipDuplicates() functionality enabling bulk document creation to silently skip duplicate entries instead of throwing errors
    • Supports nested scopes and works consistently across all database adapters, including mirrored environments

Introduces a RAII-style scope guard that lets callers opt into silent
duplicate handling for batch inserts. Duplicate-key errors during the
wrapped call are silenced at the adapter layer via dialect-specific
no-op inserts; everything outside the guarded call behaves normally.

Usage:

    $db->skipDuplicates(fn () => $db->createDocuments($collection, $docs));

Adapter dialects:
- MariaDB/MySQL: INSERT IGNORE INTO
- Postgres: INSERT INTO ... ON CONFLICT (...) DO NOTHING
- SQLite: INSERT OR IGNORE INTO
- Mongo: upsert + \$setOnInsert, bypassing the transaction wrap to
  avoid txn-abort-on-duplicate

Hot-path cost: one property read + one boolean branch per chunk.
Callers that don't use skipDuplicates pay no closure allocation and
no scope setup.

Mirror::createDocuments: the skipDuplicates path re-fetches source's
authoritative state via find() after the adapter insert settles, then
forwards that to destination instead of the caller's input. This is
race-free because the query runs after source's write has resolved —
regardless of whether rows were inserted, skipped, or raced with
concurrent writers, destination always receives a faithful snapshot
of source. Fixes Greptile #3084293974: captureOnNext-based approach
forwarded would-be values for source-skipped duplicates, diverging
destination from source.

upsertDocumentsWithIncrease: restore per-tenant grouping for the
existing-doc lookup when running in sharedTables + tenantPerDocument
mode. The previously-inlined find() ran under the session tenant
context (null for platform workers) and silently missed rows from
other tenants, which broke the StatsUsage worker flushing stats
across projects. Per-tenant grouping uses K queries (K = unique
tenants in the batch) instead of N (per-doc) or 1 (broken). The
common single-tenant path still hits the fast batched code.

Tests cover every adapter: intra-batch duplicates, cross-batch
duplicates, nested scope, large batches, relationships, Mirror
dual-write with authoritative state forwarding.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 15, 2026

📝 Walkthrough

Walkthrough

This PR introduces a skipDuplicates() feature that allows bulk document creation to gracefully skip duplicate key errors across all database adapters. The implementation adds a temporary state flag and callback wrapper to the base Adapter and Database classes, with adapter-specific behaviors: Mongo bypasses transactions and uses upserts; SQL-based adapters use dialect-specific duplicate-handling clauses (INSERT IGNORE, ON CONFLICT DO NOTHING, INSERT OR IGNORE); Pool delegates the flag through its routing layer; and Mirror ensures source-destination synchronization when skipping duplicates.

Changes

Cohort / File(s) Summary
Adapter Infrastructure
src/Database/Adapter.php, src/Database/Database.php
Added skipDuplicates state flag and public method to wrap callback execution with temporary flag enablement and restoration logic. Database class also adds tenant-aware key computation and refactors batch creation to conditionally route through skipDuplicates and optimizes prefetching via single batch queries.
Mongo Adapter
src/Database/Adapter/Mongo.php
Modified withTransaction() to bypass transaction wrapping when flag enabled; updated createDocuments() to build per-document upsert operations using _uid and optional _tenant filters, with $setOnInsert payload exclusions, instead of direct insertMany().
SQL Adapters
src/Database/Adapter/SQL.php, src/Database/Adapter/Postgres.php, src/Database/Adapter/SQLite.php
Added protected helper methods (getInsertKeyword(), getInsertSuffix(), getInsertPermissionsSuffix()) to centralize adapter-specific duplicate-handling syntax; SQL.php refactored bulk INSERT construction to use these hooks; Postgres/SQLite implement dialect-specific ON CONFLICT/INSERT OR IGNORE clauses.
Pool Adapter
src/Database/Adapter/Pool.php
Updated delegate() and withTransaction() to conditionally wrap adapter method invocations with skipDuplicates() routing when flag enabled, supporting both pinned and borrowed adapter paths.
Mirror Adapter
src/Database/Mirror.php
Enhanced createDocuments() to perform source write via skipDuplicates(), then re-fetch authoritative documents from source, apply write filters, and propagate to destination with skipDuplicates() wrapping and withPreserveDates() context; includes error logging for destination writes.
End-to-End Tests
tests/e2e/Adapter/MirrorTest.php, tests/e2e/Adapter/Scopes/DocumentTests.php
Added comprehensive e2e tests validating skipDuplicates behavior: Mirror test verifies source-destination consistency; DocumentTests suite covers duplicate skipping, batching, nesting, relationships, and interaction with onNext/onError callbacks.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Database
    participant Adapter as Adapter<br/>(Mongo/SQL/etc.)
    participant DB as Database<br/>Backend

    User->>Database: skipDuplicates(callback)
    Database->>Database: Set skipDuplicates flag = true
    Database->>Database: Execute callback<br/>(createDocuments)
    Database->>Adapter: skipDuplicates(insert_fn)
    Adapter->>Adapter: Set skipDuplicates flag = true
    Adapter->>Adapter: Execute insert_fn()
    
    alt Mongo Adapter
        Adapter->>DB: upsert with filter {_uid, _tenant?}
    else SQL Adapter
        Adapter->>DB: INSERT...ON CONFLICT DO NOTHING
    else SQLite Adapter
        Adapter->>DB: INSERT OR IGNORE INTO...
    end
    
    DB-->>Adapter: Success (skip duplicates)
    Adapter->>Adapter: Restore skipDuplicates flag
    Adapter-->>Database: Return result
    Database->>Database: Restore skipDuplicates flag
    Database-->>User: Return result
Loading
sequenceDiagram
    participant User
    participant Pool
    participant PinnedAdapter as Pinned<br/>Adapter
    participant BorrowedAdapter as Borrowed<br/>Adapter
    participant DB as Database<br/>Backend

    User->>Pool: delegate(method, args)
    Pool->>Pool: Check skipDuplicates flag
    
    alt Transaction active (pinned)
        Pool->>PinnedAdapter: skipDuplicates(fn => method(...args))
        PinnedAdapter->>PinnedAdapter: Set flag, execute method
        PinnedAdapter->>DB: Delegated operation
        DB-->>PinnedAdapter: Result
        PinnedAdapter-->>Pool: Return result
    else No transaction (borrow adapter)
        Pool->>BorrowedAdapter: Configure (tenant, auth, etc.)
        Pool->>BorrowedAdapter: skipDuplicates(fn => method(...args))
        BorrowedAdapter->>BorrowedAdapter: Set flag, execute method
        BorrowedAdapter->>DB: Delegated operation
        DB-->>BorrowedAdapter: Result
        BorrowedAdapter-->>Pool: Return result
    end
    
    Pool-->>User: Return result
Loading
sequenceDiagram
    participant User
    participant Mirror
    participant SourceDB as Source<br/>Adapter
    participant DestDB as Destination<br/>Adapter

    User->>Mirror: createDocuments([doc1_dup, doc2_new])
    Mirror->>SourceDB: skipDuplicates(fn => createDocuments(...))
    SourceDB->>SourceDB: Apply skipDuplicates logic
    SourceDB-->>Mirror: Modified count (1 inserted: doc2_new)
    
    Mirror->>Mirror: Filter non-empty IDs
    Mirror->>SourceDB: find(filtered_ids)
    SourceDB-->>Mirror: Authoritative docs [doc2_new]
    
    Mirror->>Mirror: Clone, apply writeFilters
    Mirror->>DestDB: skipDuplicates(fn => createDocuments([doc2_new]))
    DestDB->>DestDB: Apply skipDuplicates logic
    DestDB-->>Mirror: Success
    
    Mirror->>Mirror: Run afterCreateDocument filters
    Mirror-->>User: Return modified count
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Feat mongodb #721: Modifies Mongo adapter withTransaction and createDocuments pathways that are directly extended by this PR's skipDuplicates transaction-bypass logic.
  • Upsert fix for postgres #613: Adds upsert/duplicate-handling logic to SQL and Postgres adapters, shared pattern with this PR's ON CONFLICT and helper method centralization.
  • Feat mongo tmp #647: Modifies the same adapter-level classes (Adapter, Mongo, Pool, SQL, Database) including withTransaction and createDocuments methods affected by skipDuplicates routing.

Suggested reviewers

  • ArnabChatterjee20k
  • abnegate
  • fogelito

Poem

🐰 A skip through duplicates, so neat,
Where upserts and ignores compete,
Mongo, SQL, in harmony dance,
Pool routes the calls with a hop and a prance,
Mirror stays true with each duplicate chance! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: introducing a skipDuplicates() scope guard mechanism for createDocuments batch inserts across the adapter layer.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/skip-duplicates

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 PHPStan (2.1.46)

Composer install failed: this project depends on private packages that require authentication (e.g. GitLab/GitHub, Laravel Nova, etc.).
CodeRabbit tooling environment cannot access private registries.
If your project requires private packages, disable the PHPStan tool in your coderabbit settings.

Instead, run PHPStan in a CI/CD pipeline where you can use custom packages — our pipeline remediation tool can use the PHPStan output from your CI/CD pipeline.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Database/Adapter/Mongo.php (1)

1477-1529: ⚠️ Potential issue | 🟠 Major

Don't reuse an active session in the skip-duplicates branch.

$options is populated before the branch, so an explicitly started Mongo transaction still passes session into upsert(). That reintroduces the same txn-abort/write-conflict path this code is trying to avoid.

Proposed fix
-        $options = $this->getTransactionOptions();
         $records = [];
         $hasSequence = null;
         $documents = \array_map(fn ($doc) => clone $doc, $documents);
@@
         // insertMany aborts the txn on any duplicate; upsert + $setOnInsert no-ops instead.
         if ($this->skipDuplicates) {
+            $options = [];
             if (empty($records)) {
                 return [];
             }
@@
             return $documents;
         }
 
+        $options = $this->getTransactionOptions();
         try {
             $documents = $this->client->insertMany($name, $records, $options);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/Mongo.php` around lines 1477 - 1529, The code passes
$options (from getTransactionOptions()) including an active session into
client->upsert() in the skip-duplicates branch; clone $options to a local
variable (e.g. $upsertOptions = $options) and remove the session before calling
upsert (unset $upsertOptions['session'] or remove the session entry if options
is an object) so upsert() is executed outside the active transaction; update the
call to $this->client->upsert($name, $operations, $upsertOptions) and keep the
rest of the skip-duplicates logic unchanged.
🧹 Nitpick comments (1)
src/Database/Adapter/SQL.php (1)

1036-1039: Base INSERT keyword should default to the portable form; MySQL-family adapters can override.

INSERT IGNORE is MySQL/MariaDB-specific syntax. While all adapters that use suffix-based duplicate handling (Postgres, SQLite) currently override getInsertKeyword() correctly, the base implementation should default to plain INSERT INTO for better portability and defensive design. This makes it safer for future adapters that may inherit from SQL without intending to use INSERT IGNORE.

Consider moving the MySQL-specific logic to MySQL.php and MariaDB.php overrides:

  • Base SQL::getInsertKeyword() → return 'INSERT INTO'
  • MySQL.php and MariaDB.php → override to return skipDuplicates ? 'INSERT IGNORE INTO' : 'INSERT INTO'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 1036 - 1039, Change the base SQL
class's getInsertKeyword() to return the portable default 'INSERT INTO' instead
of the MySQL-specific 'INSERT IGNORE INTO' by removing the skipDuplicates branch
from SQL::getInsertKeyword(); then add overrides in the MySQL and MariaDB
adapters (implement getInsertKeyword() in MySQL.php and MariaDB.php) that return
skipDuplicates ? 'INSERT IGNORE INTO' : 'INSERT INTO' using the existing
skipDuplicates property so only MySQL-family adapters emit IGNORE while other
adapters remain portable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/SQLite.php`:
- Around line 1940-1943: The current getInsertKeyword() returns 'INSERT OR
IGNORE INTO', which silences all constraint errors; change the logic so when
$this->skipDuplicates is true the generated SQL uses SQLite's targeted conflict
clause for the _uid column (e.g., use an INSERT ... ON CONFLICT(_uid) DO NOTHING
form) instead of INSERT OR IGNORE; update getInsertKeyword() (and any place that
composes the INSERT statement if it only expects a keyword) to emit the ON
CONFLICT(_uid) DO NOTHING variant so only _uid uniqueness conflicts are
suppressed while other constraint violations still raise errors.

In `@src/Database/Database.php`:
- Around line 7153-7188: The pre-read find() calls build Query::equal('$id',
...) over full ID lists which can exceed DocumentsValidator limits; modify both
branches (the tenant-batched branch that builds $idsByTenant and the else branch
that builds $docIds) to split each ID array into chunks using array_chunk($ids,
max(1, $this->maxQueryValues)) and call find() for each chunk (preserving the
same withTenant/authorization/silent wrappers), collecting/merging all returned
documents into $existingDocs (use the same keys: "$tenant:$id" in the tenant
branch and $this->tenantKey($doc) in the else branch). Apply the same chunking
approach to the related callers createDocuments() and
upsertDocumentsWithIncrease() when they prefetch existing document IDs so no
single Query::equal('$id', ...) exceeds $this->maxQueryValues.
- Around line 5728-5732: The adapter's skipDuplicates guard is applied only
around createDocuments, but createDocumentRelationships runs earlier and can
mutate related rows before the insert; change the flow so the skipDuplicates
context (adapter->skipDuplicates or equivalent flag) and the withTransaction are
established before calling createDocumentRelationships and createDocuments,
i.e., wrap both createDocumentRelationships(...) and the $insert closure inside
the adapter->skipDuplicates(...) (or set the adapter flag prior to relationship
handling) so relationship side effects are subject to the same dedupe guard and
transaction; update references around createDocumentRelationships, $insert,
withTransaction, adapter->skipDuplicates and adapter->createDocuments
accordingly.

In `@src/Database/Mirror.php`:
- Around line 632-645: The current re-fetch uses a single Query::equal('$id',
$ids) which can fail validation or miss rows for mixed-tenant/large batches;
change the logic in Mirror::(around the $authoritative fetch) to reuse the same
chunking and tenant-grouping approach used in Database::createDocuments() and
Database::upsertDocumentsWithIncrease(): split $ids into chunks with
array_chunk($ids, max(1, $this->maxQueryValues)), group ids by tenant (or run
per-current-tenant when non-multi-tenant), call $this->source->silent(fn() =>
$this->source->find($collection, [Query::equal('$id', $chunk),
Query::limit(count($chunk))])) for each chunk, aggregate the results into
$authoritative, and then continue so skipDuplicates() runs per-chunk/tenant
instead of on one large mixed batch.
- Around line 641-646: The authoritative re-fetch must bypass the caller's read
authorization so it returns all newly created docs; replace the current
caller-scoped fetch ($authoritative = $this->source->silent(fn () =>
$this->source->find(...))) with an elevated/system context call that disables
read authorization (e.g. use the data source's
"runAsSystem"/"withoutAuthorization"/auth-bypass helper to invoke
$this->source->find($collection, [Query::equal('$id', $ids),
Query::limit(count($ids))]) so the fetch is performed with full privileges and
cannot be limited by the caller's permissions.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php`:
- Around line 8029-8059: Add a test variant that exercises exceptional exits of
the RAII-style scope guard: call $database->skipDuplicates(...) with an inner or
outer callback that throws (use a deliberate throw or trigger
DuplicateException) and ensure both the thrown exception is caught in the test
and that after the exception the skip flag is restored by calling
$database->createDocuments($collection, [$makeDoc('seed','ShouldThrow')]) and
asserting it throws a DuplicateException; reference the existing nested test
using skipDuplicates and createDocuments and add analogous cases where the inner
callback throws and where the outer callback throws, then assert a plain
createDocuments still raises DuplicateException afterward.
- Around line 7859-8262: Add a new test (e.g.,
testCreateDocumentsSkipDuplicatesCustomUniqueIndex) that creates a collection,
creates a unique attribute (like 'email' via Database::createAttribute with the
unique flag), seeds one document with an email, then calls
$database->skipDuplicates(...) to createDocuments with a different $id but the
same email and assert the intended behavior: either the call throws
DuplicateException or (if the adapter intentionally uses blanket INSERT IGNORE)
verify no overwrite occurred and only one row with that email exists; use
$database->createAttribute, $database->createDocuments,
$database->skipDuplicates, $database->getAdapter(), $database->getDocument and
$database->find to implement the assertions and branch expectations based on
adapter characteristics so the test fails if custom-unique violations are
silently suppressed incorrectly.

---

Outside diff comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1477-1529: The code passes $options (from getTransactionOptions())
including an active session into client->upsert() in the skip-duplicates branch;
clone $options to a local variable (e.g. $upsertOptions = $options) and remove
the session before calling upsert (unset $upsertOptions['session'] or remove the
session entry if options is an object) so upsert() is executed outside the
active transaction; update the call to $this->client->upsert($name, $operations,
$upsertOptions) and keep the rest of the skip-duplicates logic unchanged.

---

Nitpick comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 1036-1039: Change the base SQL class's getInsertKeyword() to
return the portable default 'INSERT INTO' instead of the MySQL-specific 'INSERT
IGNORE INTO' by removing the skipDuplicates branch from SQL::getInsertKeyword();
then add overrides in the MySQL and MariaDB adapters (implement
getInsertKeyword() in MySQL.php and MariaDB.php) that return skipDuplicates ?
'INSERT IGNORE INTO' : 'INSERT INTO' using the existing skipDuplicates property
so only MySQL-family adapters emit IGNORE while other adapters remain portable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e21e7e44-cb27-40b8-b581-18d92f3bb185

📥 Commits

Reviewing files that changed from the base of the PR and between ee2d7d4 and cac2679.

📒 Files selected for processing (10)
  • src/Database/Adapter.php
  • src/Database/Adapter/Mongo.php
  • src/Database/Adapter/Pool.php
  • src/Database/Adapter/Postgres.php
  • src/Database/Adapter/SQL.php
  • src/Database/Adapter/SQLite.php
  • src/Database/Database.php
  • src/Database/Mirror.php
  • tests/e2e/Adapter/MirrorTest.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php

Comment on lines +1940 to +1943
protected function getInsertKeyword(): string
{
return $this->skipDuplicates ? 'INSERT OR IGNORE INTO' : 'INSERT INTO';
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does SQLite INSERT OR IGNOREsuppress NOT NULL and CHECK violations in addition to UNIQUE/PRIMARY KEY conflicts, and since which SQLite version isINSERT ... ON CONFLICT(column-list) DO NOTHING supported?

💡 Result:

Yes, SQLite INSERT OR IGNORE suppresses NOT NULL violations (skips the row silently) in addition to UNIQUE/PRIMARY KEY conflicts, but it does NOT suppress CHECK violations (those cause ABORT like default). INSERT ... ON CONFLICT(column-list) DO NOTHING (the UPSERT syntax specifying columns) has been supported since SQLite version 3.24.0 (2018-06-04).

Citations:


INSERT OR IGNORE suppresses more than just duplicate constraints—fix to target only _uid conflicts.

SQLite's INSERT OR IGNORE silently suppresses NOT NULL and UNIQUE/PRIMARY KEY violations alike. This means rows failing unrelated constraints (e.g., NOT NULL on a required field) are dropped without error, violating the design intent that only duplicate _uid conflicts should be silently skipped.

Use INSERT ... ON CONFLICT(_uid) DO NOTHING instead to suppress only the intended unique keys. This syntax is supported in SQLite 3.24.0+ and applies the constraint exception only to the specified columns.

Suggested direction
 protected function getInsertKeyword(): string
 {
-    return $this->skipDuplicates ? 'INSERT OR IGNORE INTO' : 'INSERT INTO';
+    return 'INSERT INTO';
+}
+
+protected function getInsertSuffix(string $table): string
+{
+    if (!$this->skipDuplicates) {
+        return '';
+    }
+
+    $conflictTarget = $this->sharedTables ? '(`_tenant`, `_uid`)' : '(`_uid`)';
+
+    return "ON CONFLICT {$conflictTarget} DO NOTHING";
+}
+
+protected function getInsertPermissionsSuffix(): string
+{
+    if (!$this->skipDuplicates) {
+        return '';
+    }
+
+    $conflictTarget = $this->sharedTables
+        ? '(`_tenant`, `_document`, `_type`, `_permission`)'
+        : '(`_document`, `_type`, `_permission`)';
+
+    return "ON CONFLICT {$conflictTarget} DO NOTHING";
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQLite.php` around lines 1940 - 1943, The current
getInsertKeyword() returns 'INSERT OR IGNORE INTO', which silences all
constraint errors; change the logic so when $this->skipDuplicates is true the
generated SQL uses SQLite's targeted conflict clause for the _uid column (e.g.,
use an INSERT ... ON CONFLICT(_uid) DO NOTHING form) instead of INSERT OR
IGNORE; update getInsertKeyword() (and any place that composes the INSERT
statement if it only expects a keyword) to emit the ON CONFLICT(_uid) DO NOTHING
variant so only _uid uniqueness conflicts are suppressed while other constraint
violations still raise errors.

Comment thread src/Database/Database.php
Comment on lines +5728 to +5732
$insert = fn () => $this->withTransaction(fn () => $this->adapter->createDocuments($collection, $chunk));
// Set adapter flag before withTransaction so Mongo can opt out of a real txn.
$batch = $this->skipDuplicates
? $this->adapter->skipDuplicates($insert)
: $insert();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

skipDuplicates() does not cover relationship side effects.

Line 5728 only guards the adapter insert. By then, createDocumentRelationships() has already run for every input document, so a duplicate parent can still create/update related rows—or hit a duplicate on an existing junction record—before the guarded insert is reached. That makes skipDuplicates(createDocuments(...)) non-idempotent for documents carrying relationship payloads.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5728 - 5732, The adapter's
skipDuplicates guard is applied only around createDocuments, but
createDocumentRelationships runs earlier and can mutate related rows before the
insert; change the flow so the skipDuplicates context (adapter->skipDuplicates
or equivalent flag) and the withTransaction are established before calling
createDocumentRelationships and createDocuments, i.e., wrap both
createDocumentRelationships(...) and the $insert closure inside the
adapter->skipDuplicates(...) (or set the adapter flag prior to relationship
handling) so relationship side effects are subject to the same dedupe guard and
transaction; update references around createDocumentRelationships, $insert,
withTransaction, adapter->skipDuplicates and adapter->createDocuments
accordingly.

Comment thread src/Database/Database.php
Comment on lines +7153 to 7188
if ($this->getSharedTables() && $this->getTenantPerDocument()) {
$idsByTenant = [];
foreach ($documents as $doc) {
if ($doc->getId() !== '') {
$idsByTenant[$doc->getTenant()][] = $doc->getId();
}
}
foreach ($idsByTenant as $tenant => $tenantIds) {
$tenantIds = \array_values(\array_unique($tenantIds));
$found = $this->authorization->skip(fn () => $this->withTenant($tenant, fn () => $this->silent(
fn () => $this->find($collection->getId(), [
Query::equal('$id', $tenantIds),
Query::limit(\count($tenantIds)),
])
)));
foreach ($found as $doc) {
$existingDocs[$tenant . ':' . $doc->getId()] = $doc;
}
}
} else {
$docIds = \array_values(\array_unique(\array_filter(
\array_map(fn (Document $doc) => $doc->getId(), $documents),
fn ($id) => $id !== ''
)));

if (!empty($docIds)) {
$existing = $this->authorization->skip(fn () => $this->silent(
fn () => $this->find($collection->getId(), [
Query::equal('$id', $docIds),
Query::limit(\count($docIds)),
])
));
foreach ($existing as $doc) {
$existingDocs[$this->tenantKey($doc)] = $doc;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Chunk these pre-read find() calls by maxQueryValues.

Both branches build a single Query::equal('$id', ...) over the full ID set. Once a tenant batch or the global batch exceeds $this->maxQueryValues, find() can fail validation before the upsert loop even starts, which regresses large upserts. Please split each ID list with array_chunk(..., max(1, $this->maxQueryValues)) and merge the results into $existingDocs. Based on learnings, createDocuments() and upsertDocumentsWithIncrease() should prefetch existing document IDs in array_chunk(..., max(1, $this->maxQueryValues)) batches to satisfy DocumentsValidator limits on Query::equal('$id', ...).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 7153 - 7188, The pre-read find()
calls build Query::equal('$id', ...) over full ID lists which can exceed
DocumentsValidator limits; modify both branches (the tenant-batched branch that
builds $idsByTenant and the else branch that builds $docIds) to split each ID
array into chunks using array_chunk($ids, max(1, $this->maxQueryValues)) and
call find() for each chunk (preserving the same withTenant/authorization/silent
wrappers), collecting/merging all returned documents into $existingDocs (use the
same keys: "$tenant:$id" in the tenant branch and $this->tenantKey($doc) in the
else branch). Apply the same chunking approach to the related callers
createDocuments() and upsertDocumentsWithIncrease() when they prefetch existing
document IDs so no single Query::equal('$id', ...) exceeds
$this->maxQueryValues.

Comment thread src/Database/Mirror.php
Comment on lines +632 to +645
$ids = \array_values(\array_filter(
\array_map(fn (Document $d) => $d->getId(), $documents),
fn ($id) => $id !== ''
));

if (empty($ids)) {
return $modified;
}

$authoritative = $this->source->silent(
fn () => $this->source->find($collection, [
Query::equal('$id', $ids),
Query::limit(\count($ids)),
])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reuse the existing chunking and tenant grouping for the ID re-fetch.

This single Query::equal('$id', $ids) reintroduces the exact large-batch and tenant-per-document edge cases the main create path already chunks around. skipDuplicates() on a big mixed-tenant batch can fail validation or miss rows under the current tenant context.

Based on learnings, In src/Database/Database.php, both createDocuments() and upsertDocumentsWithIncrease() prefetch existing document IDs in chunks using array_chunk(..., max(1, $this->maxQueryValues)) for tenant-per-document and non-tenant modes to satisfy DocumentsValidator limits on Query::equal('$id', ...).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Mirror.php` around lines 632 - 645, The current re-fetch uses a
single Query::equal('$id', $ids) which can fail validation or miss rows for
mixed-tenant/large batches; change the logic in Mirror::(around the
$authoritative fetch) to reuse the same chunking and tenant-grouping approach
used in Database::createDocuments() and Database::upsertDocumentsWithIncrease():
split $ids into chunks with array_chunk($ids, max(1, $this->maxQueryValues)),
group ids by tenant (or run per-current-tenant when non-multi-tenant), call
$this->source->silent(fn() => $this->source->find($collection,
[Query::equal('$id', $chunk), Query::limit(count($chunk))])) for each chunk,
aggregate the results into $authoritative, and then continue so skipDuplicates()
runs per-chunk/tenant instead of on one large mixed batch.

Comment thread src/Database/Mirror.php
Comment on lines +641 to +646
$authoritative = $this->source->silent(
fn () => $this->source->find($collection, [
Query::equal('$id', $ids),
Query::limit(\count($ids)),
])
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bypass read authorization for the authoritative re-fetch.

This find() runs under the caller's authorization context. If a batch creates documents the caller cannot immediately read back, $authoritative becomes partial/empty and the destination silently diverges.

Proposed fix
-                $authoritative = $this->source->silent(
-                    fn () => $this->source->find($collection, [
-                        Query::equal('$id', $ids),
-                        Query::limit(\count($ids)),
-                    ])
-                );
+                $authoritative = $this->source->getAuthorization()->skip(
+                    fn () => $this->source->silent(
+                        fn () => $this->source->find($collection, [
+                            Query::equal('$id', $ids),
+                            Query::limit(\count($ids)),
+                        ])
+                    )
+                );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$authoritative = $this->source->silent(
fn () => $this->source->find($collection, [
Query::equal('$id', $ids),
Query::limit(\count($ids)),
])
);
$authoritative = $this->source->getAuthorization()->skip(
fn () => $this->source->silent(
fn () => $this->source->find($collection, [
Query::equal('$id', $ids),
Query::limit(\count($ids)),
])
)
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Mirror.php` around lines 641 - 646, The authoritative re-fetch
must bypass the caller's read authorization so it returns all newly created
docs; replace the current caller-scoped fetch ($authoritative =
$this->source->silent(fn () => $this->source->find(...))) with an
elevated/system context call that disables read authorization (e.g. use the data
source's "runAsSystem"/"withoutAuthorization"/auth-bypass helper to invoke
$this->source->find($collection, [Query::equal('$id', $ids),
Query::limit(count($ids))]) so the fetch is performed with full privileges and
cannot be limited by the caller's permissions.

Comment on lines +7859 to +8262
public function testCreateDocumentsIgnoreDuplicates(): void
{
/** @var Database $database */
$database = $this->getDatabase();

$database->createCollection(__FUNCTION__);
$database->createAttribute(__FUNCTION__, 'name', Database::VAR_STRING, 128, true);

// Insert initial documents
$database->createDocuments(__FUNCTION__, [
new Document([
'$id' => 'doc1',
'name' => 'Original A',
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]),
new Document([
'$id' => 'doc2',
'name' => 'Original B',
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]),
]);

// Without ignore, duplicates should throw
try {
$database->createDocuments(__FUNCTION__, [
new Document([
'$id' => 'doc1',
'name' => 'Duplicate A',
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]),
]);
$this->fail('Expected DuplicateException');
} catch (DuplicateException $e) {
$this->assertNotEmpty($e->getMessage());
}

// With skipDuplicates, duplicates should be silently skipped
$emittedIds = [];
$collection = __FUNCTION__;
$count = $database->skipDuplicates(function () use ($database, $collection, &$emittedIds) {
return $database->createDocuments($collection, [
new Document([
'$id' => 'doc1',
'name' => 'Duplicate A',
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]),
new Document([
'$id' => 'doc3',
'name' => 'New C',
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]),
], onNext: function (Document $doc) use (&$emittedIds) {
$emittedIds[] = $doc->getId();
});
});

$this->assertSame(2, $count);
$this->assertCount(2, $emittedIds);
\sort($emittedIds);
$this->assertSame(['doc1', 'doc3'], $emittedIds);

$doc1 = $database->getDocument(__FUNCTION__, 'doc1');
$this->assertSame('Original A', $doc1->getAttribute('name'));

$doc3 = $database->getDocument(__FUNCTION__, 'doc3');
$this->assertSame('New C', $doc3->getAttribute('name'));

// Total should be 3 (doc1, doc2, doc3)
$all = $database->find(__FUNCTION__);
$this->assertCount(3, $all);
}

public function testCreateDocumentsIgnoreAllDuplicates(): void
{
/** @var Database $database */
$database = $this->getDatabase();

$database->createCollection(__FUNCTION__);
$database->createAttribute(__FUNCTION__, 'name', Database::VAR_STRING, 128, true);

// Insert initial document
$database->createDocuments(__FUNCTION__, [
new Document([
'$id' => 'existing',
'name' => 'Original',
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]),
]);

// With skipDuplicates, inserting only duplicates should succeed with no new rows
$emittedIds = [];
$collection = __FUNCTION__;
$count = $database->skipDuplicates(function () use ($database, $collection, &$emittedIds) {
return $database->createDocuments($collection, [
new Document([
'$id' => 'existing',
'name' => 'Duplicate',
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]),
], onNext: function (Document $doc) use (&$emittedIds) {
$emittedIds[] = $doc->getId();
});
});

$this->assertSame(1, $count);
$this->assertSame(['existing'], $emittedIds);

$doc = $database->getDocument(__FUNCTION__, 'existing');
$this->assertSame('Original', $doc->getAttribute('name'));

// Still only 1 document
$all = $database->find(__FUNCTION__);
$this->assertCount(1, $all);
}

public function testCreateDocumentsSkipDuplicatesEmptyBatch(): void
{
$database = $this->getDatabase();

$collection = 'skipDupEmpty';
$database->createCollection($collection);
$database->createAttribute($collection, 'name', Database::VAR_STRING, 128, true);

$count = $database->skipDuplicates(fn () => $database->createDocuments($collection, []));

$this->assertSame(0, $count);
$this->assertCount(0, $database->find($collection));
}

public function testCreateDocumentsSkipDuplicatesNestedScope(): void
{
$database = $this->getDatabase();

$collection = 'skipDupNested';
$database->createCollection($collection);
$database->createAttribute($collection, 'name', Database::VAR_STRING, 128, true);

$makeDoc = fn (string $id, string $name) => new Document([
'$id' => $id,
'name' => $name,
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]);

// Seed an existing doc
$database->createDocuments($collection, [$makeDoc('seed', 'Seed')]);

// Nested scope — inner scope runs inside outer scope.
// After inner exits, outer state should still be "skip enabled".
// After outer exits, state should restore to "skip disabled".
$countOuter = $database->skipDuplicates(function () use ($database, $collection, $makeDoc) {
// Inner scope: add dup + new
$countInner = $database->skipDuplicates(function () use ($database, $collection, $makeDoc) {
return $database->createDocuments($collection, [
$makeDoc('seed', 'Dup'),
$makeDoc('innerNew', 'InnerNew'),
]);
});
$this->assertSame(2, $countInner);

// Still inside outer scope — skip flag should still be on
return $database->createDocuments($collection, [
$makeDoc('seed', 'Dup2'),
$makeDoc('outerNew', 'OuterNew'),
]);
});
$this->assertSame(2, $countOuter);

// After both scopes exit, skip flag is off again — a plain createDocuments
// call with a duplicate should throw.
$thrown = null;
try {
$database->createDocuments($collection, [$makeDoc('seed', 'ShouldThrow')]);
} catch (DuplicateException $e) {
$thrown = $e;
}
$this->assertNotNull($thrown, 'Plain createDocuments after nested scopes should throw on duplicate');

// Final state: seed + innerNew + outerNew
$all = $database->find($collection);
$ids = \array_map(fn (Document $d) => $d->getId(), $all);
\sort($ids);
$this->assertSame(['innerNew', 'outerNew', 'seed'], $ids);
}

public function testCreateDocumentsSkipDuplicatesLargeBatch(): void
{
$database = $this->getDatabase();

$collection = 'skipDupLarge';
$database->createCollection($collection);
$database->createAttribute($collection, 'idx', Database::VAR_INTEGER, 0, true);

// Seed 50 docs
$seed = [];
for ($i = 0; $i < 50; $i++) {
$seed[] = new Document([
'$id' => 'doc_' . $i,
'idx' => $i,
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]);
}
$database->createDocuments($collection, $seed);

// Now call skipDuplicates with 300 docs: 50 existing (0-49) + 250 new (50-299).
// 300 > default INSERT_BATCH_SIZE, so this exercises the chunk loop.
$batch = [];
for ($i = 0; $i < 300; $i++) {
$batch[] = new Document([
'$id' => 'doc_' . $i,
'idx' => $i + 1000, // different value so we can detect if existing got overwritten
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]);
}

$emittedIds = [];
$count = $database->skipDuplicates(function () use ($database, $collection, $batch, &$emittedIds) {
return $database->createDocuments($collection, $batch, onNext: function (Document $doc) use (&$emittedIds) {
$emittedIds[] = $doc->getId();
});
});

$this->assertSame(300, $count);
$this->assertCount(300, $emittedIds);

$seedDoc = $database->getDocument($collection, 'doc_25');
$this->assertSame(25, $seedDoc->getAttribute('idx'));

$newDoc = $database->getDocument($collection, 'doc_100');
$this->assertSame(1100, $newDoc->getAttribute('idx'));

$total = $database->count($collection);
$this->assertSame(300, $total);
}

public function testCreateDocumentsSkipDuplicatesSecondCallSkipsAll(): void
{
$database = $this->getDatabase();

$collection = 'skipDupSecond';
$database->createCollection($collection);
$database->createAttribute($collection, 'name', Database::VAR_STRING, 128, true);

$makeBatch = fn (string $name) => \array_map(
fn (string $id) => new Document([
'$id' => $id,
'name' => $name,
'$permissions' => [
Permission::read(Role::any()),
Permission::create(Role::any()),
],
]),
['a', 'b', 'c']
);

// First call — all new
$firstCount = $database->skipDuplicates(
fn () => $database->createDocuments($collection, $makeBatch('First'))
);
$this->assertSame(3, $firstCount);

$emittedIds = [];
$secondCount = $database->skipDuplicates(function () use ($database, $collection, $makeBatch, &$emittedIds) {
return $database->createDocuments($collection, $makeBatch('Second'), onNext: function (Document $doc) use (&$emittedIds) {
$emittedIds[] = $doc->getId();
});
});
$this->assertSame(3, $secondCount);
\sort($emittedIds);
$this->assertSame(['a', 'b', 'c'], $emittedIds);

// All three should retain the First values
foreach (['a', 'b', 'c'] as $id) {
$doc = $database->getDocument($collection, $id);
$this->assertSame('First', $doc->getAttribute('name'), "Doc {$id} should not have been overwritten");
}
}

public function testCreateDocumentsSkipDuplicatesRelationships(): void
{
$database = $this->getDatabase();

if (!$database->getAdapter()->getSupportForRelationships()) {
$this->expectNotToPerformAssertions();
return;
}

$parent = 'skipDupParent';
$child = 'skipDupChild';
$permissions = [
Permission::read(Role::any()),
Permission::create(Role::any()),
Permission::update(Role::any()),
Permission::delete(Role::any()),
];

$database->createCollection($parent);
$database->createCollection($child);
$database->createAttribute($parent, 'name', Database::VAR_STRING, 128, true);
$database->createAttribute($child, 'name', Database::VAR_STRING, 128, true);
$database->createRelationship(
collection: $parent,
relatedCollection: $child,
type: Database::RELATION_ONE_TO_MANY,
id: 'children',
);

$database->createDocument($parent, new Document([
'$id' => 'existingParent',
'name' => 'ExistingParent',
'$permissions' => $permissions,
'children' => [
new Document([
'$id' => 'existingChild',
'name' => 'ExistingChild',
'$permissions' => $permissions,
]),
],
]));

$batch = [
new Document([
'$id' => 'existingParent',
'name' => 'ShouldNotOverwrite',
'$permissions' => $permissions,
'children' => [
new Document([
'$id' => 'existingChild',
'name' => 'ExistingChild',
'$permissions' => $permissions,
]),
new Document([
'$id' => 'retryChild',
'name' => 'RetryChild',
'$permissions' => $permissions,
]),
],
]),
new Document([
'$id' => 'newParent',
'name' => 'NewParent',
'$permissions' => $permissions,
'children' => [
new Document([
'$id' => 'newChild',
'name' => 'NewChild',
'$permissions' => $permissions,
]),
],
]),
];

$database->skipDuplicates(fn () => $database->createDocuments($parent, $batch));

$existing = $database->getDocument($parent, 'existingParent');
$this->assertFalse($existing->isEmpty());
$this->assertSame('ExistingParent', $existing->getAttribute('name'));

$existingChildren = $existing->getAttribute('children', []);
$childIds = \array_map(fn (Document $d) => $d->getId(), $existingChildren);
\sort($childIds);
$this->assertSame(['existingChild', 'retryChild'], $childIds);

$new = $database->getDocument($parent, 'newParent');
$this->assertFalse($new->isEmpty());
$this->assertSame('NewParent', $new->getAttribute('name'));
$newChildren = $new->getAttribute('children', []);
$this->assertCount(1, $newChildren);
$this->assertSame('newChild', $newChildren[0]->getId());

$allChildren = $database->find($child);
$allChildIds = \array_map(fn (Document $d) => $d->getId(), $allChildren);
\sort($allChildIds);
$this->assertSame(['existingChild', 'newChild', 'retryChild'], $allChildIds);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find where DuplicateException and UniqueException are defined/used
rg "DuplicateException|UniqueException" --type php -l | head -20

Repository: utopia-php/database

Length of output: 785


🏁 Script executed:

# Look at the Database class to understand the exception hierarchy
fd "Database\.php" -type f | grep -v node_modules | head -5

Repository: utopia-php/database

Length of output: 233


🏁 Script executed:

# Examine the skipDuplicates implementation
rg "skipDuplicates" --type php -A 10 -B 2 | head -100

Repository: utopia-php/database

Length of output: 7754


🏁 Script executed:

# Check if there's already a test for unique indexes with skipDuplicates
rg "unique.*skipDuplicates|skipDuplicates.*unique" --type php -i

Repository: utopia-php/database

Length of output: 45


🏁 Script executed:

# Look at the skipDuplicates implementation in Database class
rg "function skipDuplicates" -A 30 src/Database/Database.php

Repository: utopia-php/database

Length of output: 978


🏁 Script executed:

# Find where DuplicateException is defined
rg "class DuplicateException" -A 5

Repository: utopia-php/database

Length of output: 45


🏁 Script executed:

# Find where UniqueException is defined
rg "class UniqueException" -A 5

Repository: utopia-php/database

Length of output: 45


🏁 Script executed:

# Look at how MariaDB adapter handles exceptions for unique violations
rg "DuplicateException|UniqueException" src/Database/Adapter/MariaDB.php -B 5 -A 2

Repository: utopia-php/database

Length of output: 2264


🏁 Script executed:

# Check if there are any tests with unique indexes (not $id)
rg "createAttribute.*true" tests/e2e/Adapter/Scopes/DocumentTests.php | head -20

Repository: utopia-php/database

Length of output: 2481


🏁 Script executed:

# Find where the skipDuplicates flag is checked in adapters
rg "skipDuplicates" src/Database/Adapter/ -B 3 -A 5

Repository: utopia-php/database

Length of output: 5092


🏁 Script executed:

# Search for UniqueException usage in adapters
rg "UniqueException" src/Database/Adapter/

Repository: utopia-php/database

Length of output: 45


🏁 Script executed:

# Look at SQL adapter exception handling for unique constraints
rg "23000.*1062" -A 10 -B 5 src/Database/Adapter/SQL.php

Repository: utopia-php/database

Length of output: 45


🏁 Script executed:

# Check the full exception handling code in MariaDB for 23000/1062
rg "1062" -B 10 -A 10 src/Database/Adapter/MariaDB.php

Repository: utopia-php/database

Length of output: 2150


🏁 Script executed:

# Look for how skipDuplicates handles exceptions during document creation
rg "skipDuplicates" src/Database/Adapter/ -A 15 | head -80

Repository: utopia-php/database

Length of output: 4918


🏁 Script executed:

# Look at the full MariaDB exception mapping to understand the intent
rg "1062" -B 15 -A 5 src/Database/Adapter/MariaDB.php | head -60

Repository: utopia-php/database

Length of output: 1860


🏁 Script executed:

# Check if there are tests with unique indexes on custom attributes
rg "createIndex" tests/e2e/Adapter/Scopes/DocumentTests.php -A 5 -B 2

Repository: utopia-php/database

Length of output: 2706


🏁 Script executed:

# Look for tests that try to create duplicate custom unique attribute values
rg "unique.*attribute|attribute.*unique" tests/e2e/Adapter/Scopes/ -i | head -20

Repository: utopia-php/database

Length of output: 1621


🏁 Script executed:

# Check if Postgres has different handling for skipDuplicates
rg "ON CONFLICT" src/Database/Adapter/Postgres.php -B 3 -A 3

Repository: utopia-php/database

Length of output: 780


🏁 Script executed:

# Examine the Postgres suffix to see what's being targeted by ON CONFLICT DO NOTHING
cat src/Database/Adapter/Postgres.php | grep -A 15 "getInsertSuffix"

Repository: utopia-php/database

Length of output: 490


Add test for custom unique index violations with skipDuplicates.

Current tests only verify behavior with $id duplicates. SQL adapters (SQL, SQLite) use blanket INSERT IGNORE that silences all unique constraint violations. Without a test covering custom unique indexes (e.g., a unique email attribute), a regression that suppresses unrelated unique-index conflicts would pass this test suite undetected. Add a case with createDocuments() + skipDuplicates() against a collection with a custom unique index and verify the intended exception behavior—either that custom unique violations still throw or that they're handled correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7859 - 8262, Add a
new test (e.g., testCreateDocumentsSkipDuplicatesCustomUniqueIndex) that creates
a collection, creates a unique attribute (like 'email' via
Database::createAttribute with the unique flag), seeds one document with an
email, then calls $database->skipDuplicates(...) to createDocuments with a
different $id but the same email and assert the intended behavior: either the
call throws DuplicateException or (if the adapter intentionally uses blanket
INSERT IGNORE) verify no overwrite occurred and only one row with that email
exists; use $database->createAttribute, $database->createDocuments,
$database->skipDuplicates, $database->getAdapter(), $database->getDocument and
$database->find to implement the assertions and branch expectations based on
adapter characteristics so the test fails if custom-unique violations are
silently suppressed incorrectly.

Comment on lines +8029 to +8059
// Nested scope — inner scope runs inside outer scope.
// After inner exits, outer state should still be "skip enabled".
// After outer exits, state should restore to "skip disabled".
$countOuter = $database->skipDuplicates(function () use ($database, $collection, $makeDoc) {
// Inner scope: add dup + new
$countInner = $database->skipDuplicates(function () use ($database, $collection, $makeDoc) {
return $database->createDocuments($collection, [
$makeDoc('seed', 'Dup'),
$makeDoc('innerNew', 'InnerNew'),
]);
});
$this->assertSame(2, $countInner);

// Still inside outer scope — skip flag should still be on
return $database->createDocuments($collection, [
$makeDoc('seed', 'Dup2'),
$makeDoc('outerNew', 'OuterNew'),
]);
});
$this->assertSame(2, $countOuter);

// After both scopes exit, skip flag is off again — a plain createDocuments
// call with a duplicate should throw.
$thrown = null;
try {
$database->createDocuments($collection, [$makeDoc('seed', 'ShouldThrow')]);
} catch (DuplicateException $e) {
$thrown = $e;
}
$this->assertNotNull($thrown, 'Plain createDocuments after nested scopes should throw on duplicate');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cover the exceptional-exit path for the scope guard.

This only proves that nested skipDuplicates() restores state after normal returns. Since the feature is explicitly RAII-style, the risky failure mode is a callback throwing and leaking the flag into the next write. Please add a throwing inner/outer callback case and then assert a plain createDocuments() duplicate still raises afterward.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 8029 - 8059, Add a
test variant that exercises exceptional exits of the RAII-style scope guard:
call $database->skipDuplicates(...) with an inner or outer callback that throws
(use a deliberate throw or trigger DuplicateException) and ensure both the
thrown exception is caught in the test and that after the exception the skip
flag is restored by calling $database->createDocuments($collection,
[$makeDoc('seed','ShouldThrow')]) and asserting it throws a DuplicateException;
reference the existing nested test using skipDuplicates and createDocuments and
add analogous cases where the inner callback throws and where the outer callback
throws, then assert a plain createDocuments still raises DuplicateException
afterward.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 15, 2026

Greptile Summary

This PR adds a nestable skipDuplicates() RAII scope guard to createDocuments, implemented via dialect-specific no-op inserts across all five adapters, plus an authoritative-state re-fetch in Mirror to prevent destination divergence. It also restores per-tenant batching in upsertDocumentsWithIncrease to fix a silent cross-tenant miss for platform workers.

  • Mirror::createDocuments skipDuplicates re-fetch (lines 641–645) is missing the same per-tenant grouping that was just fixed in upsertDocumentsWithIncrease — in sharedTables + tenantPerDocument mode with cross-tenant batches, documents belonging to non-session tenants are silently absent from the destination.
  • The same re-fetch also omits the authorization->skip() wrapper used by upsertDocumentsWithIncrease, risking silent document loss when Mirror runs under a restricted auth context.

Confidence Score: 4/5

Safe to merge for non-Mirror paths; Mirror's skipDuplicates re-fetch has a P1 correctness gap in shared-tables/tenantPerDocument mode that should be addressed before production use.

All adapter-level implementations are clean and well-tested. The upsertDocumentsWithIncrease per-tenant fix is correct. The one outstanding P1 is confined to Mirror::createDocuments in skipDuplicates mode: the authoritative re-fetch lacks both per-tenant grouping and authorization skip, which are the exact two fixes introduced elsewhere in this same PR. Until that is patched, cross-tenant Mirror batches in tenantPerDocument mode will silently diverge.

src/Database/Mirror.php — authoritative re-fetch in skipDuplicates path needs per-tenant grouping and authorization->skip().

Important Files Changed

Filename Overview
src/Database/Adapter.php Adds $skipDuplicates flag and nestable skipDuplicates(callable) RAII scope guard — clean save/restore pattern, no issues.
src/Database/Adapter/Mongo.php Bypasses Mongo transactions when skipDuplicates is set (correct for WriteConflict E112 avoidance); upserts via $setOnInsert keyed on _uid/_tenant. Logic is sound; unsets filter-keys from $setOnInsert to avoid path-conflict errors.
src/Database/Adapter/Pool.php Propagates skipDuplicates to underlying adapters in all three dispatch paths (pinnedAdapter, pool delegate, withTransaction). Nested wrapping is redundant but safe due to save/restore semantics.
src/Database/Adapter/Postgres.php Overrides getInsertKeyword (keeps plain INSERT INTO) and adds ON CONFLICT ... DO NOTHING suffix for both data and permissions tables. Conflict targets are correctly scoped per shared-tables mode.
src/Database/Adapter/SQL.php Default SQL base uses INSERT IGNORE INTO for MySQL/MariaDB. Functional, but INSERT IGNORE silences all errors (not just duplicate-key), which could mask truncation/FK failures silently.
src/Database/Adapter/SQLite.php Adds INSERT OR IGNORE INTO override for SQLite — minimal, correct, scoped to duplicate handling.
src/Database/Database.php Adds skipDuplicates scope guard; routes adapter flag before transaction starts (critical for Mongo txn bypass). upsertDocumentsWithIncrease per-tenant batching fix is correct. tenantKey helper is clean.
src/Database/Mirror.php Authoritative re-fetch in skipDuplicates path is missing both authorization->skip() and per-tenant grouping — the same two gaps that were fixed in upsertDocumentsWithIncrease in this PR. Cross-tenant batches or restricted-auth callers will produce silent destination divergence.
tests/e2e/Adapter/MirrorTest.php New test covers the authoritative-state forwarding divergence scenario. All assertions validate source-wins semantics correctly.
tests/e2e/Adapter/Scopes/DocumentTests.php Comprehensive adapter-level tests cover intra-batch duplicates, all-duplicates, empty batch, nested scope, large batch (multi-chunk), cross-batch, and relationships. Good coverage.

Comments Outside Diff (1)

  1. src/Database/Adapter/SQL.php, line 1194-1196 (link)

    P2 INSERT IGNORE swallows all MySQL/MariaDB errors, not just duplicates

    INSERT IGNORE INTO in MySQL/MariaDB converts all statement errors to warnings — data-truncation overflows, FK violations, zero-date rejections, etc. would be silently swallowed. On Postgres, ON CONFLICT DO NOTHING is scoped precisely to the named conflict target, making it safer. Consider documenting this caveat or using INSERT INTO … ON DUPLICATE KEY UPDATE as a no-op (id = id) to preserve the error-surfacing behaviour for non-duplicate failures.

Reviews (1): Last reviewed commit: "Add skipDuplicates() scope guard to crea..." | Re-trigger Greptile

Comment thread src/Database/Mirror.php
Comment on lines +641 to +646
$authoritative = $this->source->silent(
fn () => $this->source->find($collection, [
Query::equal('$id', $ids),
Query::limit(\count($ids)),
])
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing per-tenant grouping in authoritative re-fetch

In sharedTables + tenantPerDocument mode, this find() runs under the session tenant context (which may be null for platform workers) and silently misses documents belonging to other tenants in a cross-tenant batch — exactly the same gap that was fixed for upsertDocumentsWithIncrease in this same PR. Those documents are never forwarded to the destination, causing silent divergence.

The same per-tenant grouping pattern used in upsertDocumentsWithIncrease should be applied here:

if ($this->source->getSharedTables() && $this->source->getAdapter()->getTenantPerDocument()) {
    $idsByTenant = [];
    foreach ($documents as $d) {
        if ($d->getId() !== '') {
            $idsByTenant[$d->getTenant()][] = $d->getId();
        }
    }
    $authoritative = [];
    foreach ($idsByTenant as $tenant => $tenantIds) {
        $tenantIds = array_values(array_unique($tenantIds));
        $found = $this->source->authorization->skip(fn () =>
            $this->source->withTenant($tenant, fn () =>
                $this->source->silent(fn () => $this->source->find($collection, [
                    Query::equal('$id', $tenantIds),
                    Query::limit(count($tenantIds)),
                ]))
            )
        );
        array_push($authoritative, ...$found);
    }
} else {
    $authoritative = $this->source->silent(fn () => $this->source->find($collection, [
        Query::equal('$id', $ids),
        Query::limit(count($ids)),
    ]));
}

Comment thread src/Database/Mirror.php
Comment on lines +640 to +646

$authoritative = $this->source->silent(
fn () => $this->source->find($collection, [
Query::equal('$id', $ids),
Query::limit(\count($ids)),
])
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing authorization->skip() on authoritative re-fetch

upsertDocumentsWithIncrease wraps its equivalent find() in $this->authorization->skip(...), so it reads back all documents regardless of the caller's auth context. This re-fetch omits that guard. If Mirror is invoked under a restricted authorization context (not globally skipped), documents that were just written but aren't readable under the current session's permissions will be silently excluded, and the destination will receive an incomplete set.

$authoritative = $this->source->authorization->skip(
    fn () => $this->source->silent(
        fn () => $this->source->find($collection, [
            Query::equal('$id', $ids),
            Query::limit(\count($ids)),
        ])
    )
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant