Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User re-create with get_or_create #150

Open
muhammadadeeltajamul opened this issue Jan 16, 2025 · 8 comments
Open

User re-create with get_or_create #150

muhammadadeeltajamul opened this issue Jan 16, 2025 · 8 comments
Assignees
Labels
release blocker Blocks the upcoming release (fix needed)

Comments

@muhammadadeeltajamul
Copy link

Management command is using get_or_create to fetch user which will create user when it does not exist. If a user does not exist, we don’t need to migrate their data. This could lead to issues i.e. deleted accounts are unintentionally recreated. Ideally, we should control this behavior using a switch in management command. This will ensure users are only created when explicitly needed. If we skip adding switch, there’s a risk of unnecessary account creation and data migration for non-existent users.

Code

@taimoor-ahmed-1
Copy link
Contributor

If we don't create users when migrating data from mongodb, it will result in data loss as thread/comment needs an author as of now to be created. Would we want that behaviour via a switch?
cc: @regisb @Faraz32123 @Ali-Salman29

@Faraz32123
Copy link
Contributor

Faraz32123 commented Jan 22, 2025

If we don't create users when migrating data from mongodb, it will result in data loss as thread/comment needs an author as of now to be created. Would we want that behaviour via a switch? cc: @regisb @Faraz32123 @Ali-Salman29

may be we can rely on ForumUser instead of creating User obj and it's relation with ForumUser.
like in cs_comment_service, thread/comment were relying on users in Users collection irrespective of the user deleted or existed in edx-platform.
what do you guys think?
switch can also be added, not sure!

@taimoor-ahmed-1
Copy link
Contributor

@Faraz32123 the ForumUser model also has a required OneToOne mapping with the user model so even if we use that, we would still require the corresponding User object to be linked with ForumUser first based on the current implementation.

@AhtishamShahid
Copy link

It might be an edge case but in a real-world scenario, there would be no data in cs_comment for the user that does not exist and vice versa, if there is such inconsistency we should not create a new user for just adding forum data, and that data should be skipped, but we should log that data for future reference.
If you think there is some use case in which we want to create the user that should be an optional feature.

@Faraz32123
Copy link
Contributor

@Faraz32123 the ForumUser model also has a required OneToOne mapping with the user model so even if we use that, we would still require the corresponding User object to be linked with ForumUser first based on the current implementation.

One way around would be that we can separate get and create, incase of get let the current code flow run, but while creating user, may be we can mark user as inactive or something or create a single anonymous user for all these kinds of posts/data, so that we don't loose older useful comments/data from the users that were deleted in the LMS and their data exists in mongodb.
or we can also mark the OneToOneField as null=True, blank=True in ForumUser model and skip creating users. Have not tested it.

@taimoor-ahmed-1
Copy link
Contributor

@AhtishamShahid I believe that the scenario is that some users have authored threads and comments and are present in mongodb. However, they have deleted their accounts in the edx platform but their data remains in the mongodb. That user data along with their activity is being pulled from mongo to mysql once we run the migration command. Hence, we have to make a strategic decision if we want to lose that data, recover that data, or as @Faraz32123 mentioned assign an anonymous user to those threads and comments.

@taimoor-ahmed-1
Copy link
Contributor

@regisb @jristau1984 what do you guys suggest?

@AhtishamShahid
Copy link

We don't have enough data about how many records exist that have no users for now
The following alternatives were discussed and have major issues in implementation

  • Objects that do not have existing users make them anonymous
    • It can have ripple effects for moderators or staff as anonymous posts user are visible to staff users
    • What users will be assigned to in that case?
    • It will cause data corruption i.e. No true mapping between posts and users.
  • Skip data without users
    • Data loss
    • No data to calculate data loss risk i.e No method exists to calculate how many posts and comments have no existing user
  • Create users for posts with no user
    • We don't want to introduce new users to the platform just for this data migration.

It would be great if we have some data before making any decision, can you create a simple script to get count of posts and comments without users?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release blocker Blocks the upcoming release (fix needed)
Projects
Status: Backlog
Development

No branches or pull requests

5 participants