Skip to content

INTPYTHON-676: Adding security and optimization to cache collections #343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

linted
Copy link

@linted linted commented Jul 17, 2025

Add HMAC signing of pickled cache data. This implementation uses Blake2b from hashlib to avoid introducing new 3rd party dependencies.

HMAC introduces some overhead to performance, but for cache entries less then 32kb the impact is less then 100ns. For cache entries larger then 1MB, signing can introduce up to 2ms of latency. For BSON serializable types (int, str, bytes), pickling and signing are skipped, and the values are stored in the cache collection directly.

The feature is easily disabled by setting "ENABLE_SIGNING" = False within the CACHE configuration.

Introduced three new cache config options:

  • ENABLE_SIGNING - boolean value to turn HMAC signing on or off. Defaults to True (on)
  • SALT - optional 16 character string to salt HMAC signatures with
  • KEY - optional string to use when performing HMAC operations. This can be up to 64 charcers, and if not provided, SECRET_KEY is used instead.

If the cache fails to validate a signature SuspiciousOperation will be thrown.

@aclark4life
Copy link
Collaborator

Thanks! OK to close #336 ?

Copy link
Collaborator

@timgraham timgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some consensus on the Django forum thread that it could be useful to have pluggable serializers (i.e. CACHES["alias"]["SERIALIZER"] = "path.to.Serializer"). If feasible, this seems like it would lead to a better separation of concerns. Currently the MongoSerializer andMongoDBCache classes seem rather intertwined. That would mean only one additional CACHES configuration value, with the other values like key and salt set on the custom serializer.

I guess I can understand that django.utils.crypto was insufficient for this task. Do you think we could propose some API modernization there? Even if we can't use some Django API right away for th is, I'd be more comfortable in the long run if we weren't "rolling our own crypto" in a way that doesn't get the eyes of Django's security team. (Basically, if Django adopts this sort of thing too, then this project doesn't look so strange.)

Comment on lines +35 to +41
Cache entries include a HMAC signature to ensure data integrity by default.
You can disable this by setting ``ENABLE_SIGNING`` to ``False``.
Signatures can also include an optional key and salt parameter by setting
``KEY`` and ``SALT`` repectively. Signatures are provided by the Blake2 hash
function, making Key sizes limited to 64 bytes, and salt sizes limited to 16
bytes. If a key is not provided, cache entries will be signed using the
``SECRET_KEY``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to be explicit about what "data integrity" means. (i.e. If your database is compromised, it can lead to RCE.)

  • It should also be made clear that this behavior differs from Django's built-in database backends.
  • It would be helpful to give some guidance about the performance implications.


def loads(self, data:Any, pickled:bool, signature=None) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest renaming pickled -> is_pickled.

if type(obj) is int: # noqa: E721
return obj
return pickle.dumps(obj, self.protocol)
def _get_signature(self, data) -> Optional[bytes]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently don't have any type hints in this project (Django hasn't adopted them), so it's a bit out of place to add them in this PR.

@@ -32,6 +32,25 @@ In addition, the cache is culled based on ``CULL_FREQUENCY`` when ``add()``
or ``set()`` is called, if ``MAX_ENTRIES`` is exceeded. See
:ref:`django:cache_arguments` for an explanation of these two options.

Cache entries include a HMAC signature to ensure data integrity by default.
You can disable this by setting ``ENABLE_SIGNING`` to ``False``.
Signatures can also include an optional key and salt parameter by setting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for custom key and salt? Probably it should be explained that SECRET_KEY is the default.

if pickled:
try:
if self.signer is not None:
# constant time compare is not required due to how data is retrieved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm familiar with the usage of django.utils.crypto.constant_time_compare() but I'm not sure what "how data is retrieved" means in this context.

Comment on lines +151 to +152
"pickled": pickled,
"signature": signature,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not good to add "pickled" and "signature" keys to all cache data when signing is disabled.

Comment on lines +35 to +36
case int() | str() | bytes():
return (obj, False, None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what you refer to as "optimization". I think any optimization should be done separate from security hardening to make that change more clear. However, I wonder if the benefit of not serializing str/bytes is mostly for the signing case. While it helps with CPU, it adds extra the "pickled" attribute for all keys in the cache. It might be better to limit this optimization a new MongoSignedSerializer class and leave MongoSerializer unchanged. (As our code strays more and more from tried and tested patterns in Django, I feel less confident in its robustness. In this case, the MongoSerializer is copied from django.core.cache.backends.redis.RedisSerializer. We have to ask why Django didn't make this decision and whether it is really better.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants