Skip to content

INTPYTHON-676 Add optional signing of cache data #336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 19 additions & 5 deletions django_mongodb_backend/cache.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,33 @@
import pickle
from datetime import datetime, timezone
from base64 import b64encode, b64decode

from django.core.cache.backends.base import DEFAULT_TIMEOUT, BaseCache
from django.core.cache.backends.db import Options
from django.core.signing import Signer, BadSignature
from django.db import connections, router
from django.utils.functional import cached_property
from pymongo import ASCENDING, DESCENDING, IndexModel, ReturnDocument
from pymongo.errors import DuplicateKeyError, OperationFailure


class MongoSerializer:
def __init__(self, protocol=None):
def __init__(self, protocol=None, signer=True, salt=None):
self.protocol = pickle.HIGHEST_PROTOCOL if protocol is None else protocol
self.signer = Signer(salt=salt) if signer else None

def dumps(self, obj):
# For better incr() and decr() atomicity, don't pickle integers.
# Using type() rather than isinstance() matches only integers and not
# subclasses like bool.
if type(obj) is int: # noqa: E721
return obj
return pickle.dumps(obj, self.protocol)
return obj if self.signer is None else self.signer.sign(b64encode(obj))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be b64encoded. On line 29/30, you can first unsign, and then check if the type after unsigning comes back as an int. If it does, then you can just return the int.

pickled_data = pickle.dumps(obj, protocol=self.protocol) # noqa: S301
return self.signer.sign(b64encode(pickled_data).decode()) if self.signer else pickled_data

def loads(self, data):
if self.signer is not None:
data = b64decode(self.signer.unsign(data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.signer is not None:
data = b64decode(self.signer.unsign(data))
if self.signer is not None:
if type(unsigned_data := self.signer.unsign(data)) is int:
data = unsigned_data
else:
data = b64decode(unsigned_data)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this when I was originally writing the function, but 1234 is valid base64 and int. So, it would be ambiguous as to which was actually being stored. I fixed the issue by simply not signing int values. I do not believe this to introduce any real problems and solves the $inc test cases which were failing.

try:
return int(data)
except (ValueError, TypeError):
Expand All @@ -39,6 +45,8 @@ class CacheEntry:
_meta = Options(collection_name)

self.cache_model_class = CacheEntry
self._sign_cache = params.get("ENABLE_SIGNING", True)
self._salt = params.get("SALT", None)

def create_indexes(self):
expires_index = IndexModel("expires_at", expireAfterSeconds=0)
Expand All @@ -47,7 +55,7 @@ def create_indexes(self):

@cached_property
def serializer(self):
return MongoSerializer(self.pickle_protocol)
return MongoSerializer(self.pickle_protocol, self._sign_cache, self._salt)

@property
def collection_for_read(self):
Expand Down Expand Up @@ -84,7 +92,13 @@ def get_many(self, keys, version=None):
with self.collection_for_read.find(
{"key": {"$in": tuple(keys_map)}, **self._filter_expired(expired=False)}
) as cursor:
return {keys_map[row["key"]]: self.serializer.loads(row["value"]) for row in cursor}
results = {}
for row in cursor:
try:
results[keys_map[row["key"]]] = self.serializer.loads(row["value"])
except (BadSignature, TypeError):
self.delete(row["key"])
Comment on lines +99 to +100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding why we're catching BadSignature and TypeError issues. What would trigger this and why would it be okay to delete a row in these instances?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the data in the cache collection is tampered with or somehow the HMAC secret key or salt is changed, a BadSignature error is thrown. I added error handling here to delete the row otherwise the exception will DOS the page until the cache entry gets culled. TypeError can be thrown if you switch from ENABLE_SIGNING=True to ENABLE_SIGNING=False without clearing all cache entries. It's highly unlikely that will ever happen in prod, but I was able to get that error while in changing settings in debug mode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better behavior than silently deleting bad data (which could happen in what circumstances besides an attacker putting malicious data in the cache?) could be to raise (or at least log) SuspiciousOperation. There is some precedent in Django for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained in a above comment the cases I was able to create where these two exceptions were thrown. If we do not delete the entry, won't we just create a DOS of the affected page until the cache entry is culled?

I do agree that this probably shouldn't be silent, but throwing an error here will stop the request from being handled, generate a 500, and require the request to be resent. I think that is probably ok if we delete the offending cache entry so only one request would be affected by the issue. What do you think?

return results

def set(self, key, value, timeout=DEFAULT_TIMEOUT, version=None):
key = self.make_and_validate_key(key, version=version)
Expand Down
5 changes: 5 additions & 0 deletions docs/source/topics/cache.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ In addition, the cache is culled based on ``CULL_FREQUENCY`` when ``add()``
or ``set()`` is called, if ``MAX_ENTRIES`` is exceeded. See
:ref:`django:cache_arguments` for an explanation of these two options.

Cache entries include a HMAC signature to ensure data integrity by default.
You can disable this by setting ``ENABLE_SIGNING`` to ``False``.
Signatures can also include an optional salt parameter by setting ``SALT``
to a string value.

Creating the cache collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down