Skip to content

Commit fe3295d

Browse files
showelltimabbott
authored andcommitted
performance: Avoid monster query for existing subs.
Part of our codepath for subscribing users involves fetching the users' existing subscriptions to make sure we can do things like properly report to the clients that the users were already subscribed. This codepath used to be coupled to code that helped users maintain unique stream colors. Suppose you are creating a new stream, and you are importing users from an older stream with 15k subscribers, and each of your users is subscribed to about 20 streams. The prior code, instead of filtering on recipient_id, would literally look at every subscription for every user, which was kind of crazy if you didn't understand the pick-stream-color complications. Before this commit, we would fetch 300k rows with 15 columns each (granted, all but one of the columns are bool/int). That's a total of 4.5 million tiny objects that we had to glom into Django ORM objects and slice and dice. After this commit, we would fetch exactly zero rows for the are-they-already-subscribed logic. Yes, ZERO. If we were to mistakenly try to re-add the same 15k subscribers to the new stream (under the new code), we will now fetch 15k Sub rows instead of 300k. It is worth looking at the prior commit. We go through great pains to ensure that users get new stream colors when we invite them to a stream, and we still fetch a bunch of data for that. Instead of 4.5 million cells, it's more like 600k cells (2 columns per row), and it's less than that insofar as some users may only have 24 distinct colors among their many streams. It's a lot of work.
1 parent f638fd6 commit fe3295d

File tree

2 files changed

+8
-11
lines changed

2 files changed

+8
-11
lines changed

zerver/lib/actions.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,6 @@
135135
get_active_subscriptions_for_stream_id,
136136
get_bulk_stream_subscriber_info,
137137
get_stream_subscriptions_for_user,
138-
get_stream_subscriptions_for_users,
139138
get_subscribed_stream_ids_for_user,
140139
get_subscriptions_for_send_message,
141140
get_used_colors_for_user_ids,
@@ -3856,13 +3855,19 @@ def bulk_add_subscriptions(
38563855
for user in users:
38573856
assert user.realm_id == realm.id
38583857

3858+
recipient_ids = [stream.recipient_id for stream in streams]
38593859
recipient_id_to_stream = {stream.recipient_id: stream for stream in streams}
38603860

38613861
used_colors_for_user_ids: Dict[int, Set[str]] = get_used_colors_for_user_ids(user_ids)
38623862

3863+
existing_subs = Subscription.objects.filter(
3864+
user_profile_id__in=user_ids,
3865+
recipient__type=Recipient.STREAM,
3866+
recipient_id__in=recipient_ids,
3867+
)
3868+
38633869
subs_by_user: Dict[int, List[Subscription]] = defaultdict(list)
3864-
all_subs_query = get_stream_subscriptions_for_users(users)
3865-
for sub in all_subs_query:
3870+
for sub in existing_subs:
38663871
subs_by_user[sub.user_profile_id].append(sub)
38673872

38683873
already_subscribed: List[SubInfo] = []

zerver/lib/stream_subscription.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -74,14 +74,6 @@ def get_stream_subscriptions_for_user(user_profile: UserProfile) -> QuerySet:
7474
)
7575

7676

77-
def get_stream_subscriptions_for_users(user_profiles: List[UserProfile]) -> QuerySet:
78-
# TODO: Change return type to QuerySet[Subscription]
79-
return Subscription.objects.filter(
80-
user_profile__in=user_profiles,
81-
recipient__type=Recipient.STREAM,
82-
)
83-
84-
8577
def get_used_colors_for_user_ids(user_ids: List[int]) -> Dict[int, Set[str]]:
8678
"""Fetch which stream colors have already been used for each user in
8779
user_ids. Uses an optimized query designed to support picking

0 commit comments

Comments
 (0)