Skip to content

Conversation

ulemons
Copy link
Contributor

@ulemons ulemons commented Jul 23, 2025

✍️ Proposed Changes

What:

  • Prevent assigning the same repository to multiple insightsProjects and segments.
  • Fix a potential bug in SegmentRepository.mappedRepos, which was not accounting for the deletedAt field.

How:

  1. New table: segmentRepositories — stores the association between each repository and its corresponding segment with a UNIQUE INDEX on the combination in order to prevent multiple assosiations.
  2. when updating an InsightsProject a new function upsertSegmentRepositories is called. This will upsert in bulk all the new repo associated.
  3. once the new repos are added the function softDeleteMissingSegmentRepositories is called, which set the deletedAt on the repo which are not anymore related to the project

Notes:

This approach may introduce some duplication of data already available in the githubRepos and gitlabRepos tables. However, since repository data is derived from the integrations.settings field too, enforcing database-level constraints directly on those tables is not feasible. Hence, this duplication is a necessary trade-off to maintain data integrity.

🔗 colses: CM-2320

Checklist ✅

  • Label appropriately with Feature, Improvement, or Bug.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR titles must follow Conventional Commits. Love from, Your reviewers ❤️.

@ulemons ulemons changed the title Feature/prevent duplicates feat: prevent duplicates [CM-2320] Jul 23, 2025
@github-actions github-actions bot dismissed their stale review July 23, 2025 12:03

Conventional Commits FTW!

@ulemons ulemons self-assigned this Jul 23, 2025
@ulemons ulemons added the Feature Created by Linear-GitHub Sync label Jul 23, 2025
@ulemons ulemons marked this pull request as ready for review July 23, 2025 13:33
@ulemons ulemons requested a review from themarolt July 23, 2025 14:08
WHERE "deletedAt" IS NULL;

-- 3. Create or replace the sync function
CREATE OR REPLACE FUNCTION sync_repositories()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm so far we didn't use triggers or functions or procedures in the db itself so I'm wondering if it would be better to have this logic in the code itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of doing it this way for performance reasons. We can also implement it in code and see how it performs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ye I would do that instead of triggers as it hides somewhat the logic. I mean if we were using this before I would let it be but if it's gonna be the only function we have in the db with such logic it will be hidden in a sense where people won't expect to find the logic here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes perfectly sense, thanks!

@ulemons ulemons force-pushed the feature/prevent-duplicates branch from 9ec1c10 to 54f5bd6 Compare July 24, 2025 13:41
@ulemons ulemons requested a review from themarolt July 24, 2025 13:57
@ulemons ulemons force-pushed the feature/prevent-duplicates branch from 6a2679d to 0c14779 Compare July 25, 2025 07:59
@ulemons ulemons requested review from borfast and mbani01 July 28, 2025 15:38
-- 2. Enforce that a repository can be assigned to only one active segment
CREATE UNIQUE INDEX IF NOT EXISTS unique_active_segment_repos
ON "segmentRepositories" (repository)
WHERE "deletedAt" IS NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add the unique index for every row, even if it's deleted, to avoid issues in case we ever remove the soft deleted marker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this constraint to keep track of deletions. If we use a unique index for every row, I guess we must use hard deletion. If this is the case I think we can go directly with hard deletion

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to have another unique index for each row because we already have the primary key on the repository and segmentId, which already makes them unique. Or am I missing something?

Copy link
Collaborator

@borfast borfast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. 👍 Just one question that isn't critical.

@ulemons ulemons merged commit 84f0c3b into main Jul 29, 2025
13 checks passed
@ulemons ulemons deleted the feature/prevent-duplicates branch July 29, 2025 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Created by Linear-GitHub Sync
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants