performance: Avoid monster query for existing subs.

showell · timabbott · commit fe3295d395cc · 2021-12-28T12:15:02.000-08:00
Part of our codepath for subscribing users involves
fetching the users' existing subscriptions to make sure
we can do things like properly report to the clients
that the users were already subscribed.  This codepath
used to be coupled to code that helped users maintain
unique stream colors.

Suppose you are creating a new stream, and you are
importing users from an older stream with 15k
subscribers, and each of your users is subscribed to
about 20 streams.

The prior code, instead of filtering on recipient_id,
would literally look at every subscription for every
user, which was kind of crazy if you didn't understand
the pick-stream-color complications.

Before this commit, we would fetch 300k rows with 15
columns each (granted, all but one of the columns are
bool/int). That's a total of 4.5 million tiny objects
that we had to glom into Django ORM objects and slice
and dice.

After this commit, we would fetch exactly zero rows
for the are-they-already-subscribed logic.

Yes, ZERO.

If we were to mistakenly try to re-add the same 15k
subscribers to the new stream (under the new code), we
will now fetch 15k Sub rows instead of 300k.

It is worth looking at the prior commit. We go through
great pains to ensure that users get new stream colors
when we invite them to a stream, and we still fetch a
bunch of data for that. Instead of 4.5 million cells,
it's more like 600k cells (2 columns per row), and it's
less than that insofar as some users may only
have 24 distinct colors among their many streams.
It's a lot of work.
diff --git a/zerver/lib/actions.py b/zerver/lib/actions.py
@@ -135,7 +135,6 @@
     get_active_subscriptions_for_stream_id,
     get_bulk_stream_subscriber_info,
     get_stream_subscriptions_for_user,
-    get_stream_subscriptions_for_users,
     get_subscribed_stream_ids_for_user,
     get_subscriptions_for_send_message,
     get_used_colors_for_user_ids,
@@ -3856,13 +3855,19 @@ def bulk_add_subscriptions(
     for user in users:
         assert user.realm_id == realm.id
 
+    recipient_ids = [stream.recipient_id for stream in streams]
     recipient_id_to_stream = {stream.recipient_id: stream for stream in streams}
 
     used_colors_for_user_ids: Dict[int, Set[str]] = get_used_colors_for_user_ids(user_ids)
 
+    existing_subs = Subscription.objects.filter(
+        user_profile_id__in=user_ids,
+        recipient__type=Recipient.STREAM,
+        recipient_id__in=recipient_ids,
+    )
+
     subs_by_user: Dict[int, List[Subscription]] = defaultdict(list)
-    all_subs_query = get_stream_subscriptions_for_users(users)
-    for sub in all_subs_query:
+    for sub in existing_subs:
         subs_by_user[sub.user_profile_id].append(sub)
 
     already_subscribed: List[SubInfo] = []
diff --git a/zerver/lib/stream_subscription.py b/zerver/lib/stream_subscription.py
@@ -74,14 +74,6 @@ def get_stream_subscriptions_for_user(user_profile: UserProfile) -> QuerySet:
     )
 
 
-def get_stream_subscriptions_for_users(user_profiles: List[UserProfile]) -> QuerySet:
-    # TODO: Change return type to QuerySet[Subscription]
-    return Subscription.objects.filter(
-        user_profile__in=user_profiles,
-        recipient__type=Recipient.STREAM,
-    )
-
-
 def get_used_colors_for_user_ids(user_ids: List[int]) -> Dict[int, Set[str]]:
     """Fetch which stream colors have already been used for each user in
     user_ids. Uses an optimized query designed to support picking