-
Notifications
You must be signed in to change notification settings - Fork 25
INTPYTHON-676: Adding security and optimization to cache collections #343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nd loading with greater security.
Thanks! OK to close #336 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some consensus on the Django forum thread that it could be useful to have pluggable serializers (i.e. CACHES["alias"]["SERIALIZER"] = "path.to.Serializer"
). If feasible, this seems like it would lead to a better separation of concerns. Currently the MongoSerializer
andMongoDBCache
classes seem rather intertwined. That would mean only one additional CACHES
configuration value, with the other values like key and salt set on the custom serializer.
I guess I can understand that django.utils.crypto
was insufficient for this task. Do you think we could propose some API modernization there? Even if we can't use some Django API right away for th is, I'd be more comfortable in the long run if we weren't "rolling our own crypto" in a way that doesn't get the eyes of Django's security team. (Basically, if Django adopts this sort of thing too, then this project doesn't look so strange.)
Cache entries include a HMAC signature to ensure data integrity by default. | ||
You can disable this by setting ``ENABLE_SIGNING`` to ``False``. | ||
Signatures can also include an optional key and salt parameter by setting | ||
``KEY`` and ``SALT`` repectively. Signatures are provided by the Blake2 hash | ||
function, making Key sizes limited to 64 bytes, and salt sizes limited to 16 | ||
bytes. If a key is not provided, cache entries will be signed using the | ||
``SECRET_KEY``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to be explicit about what "data integrity" means. (i.e. If your database is compromised, it can lead to RCE.)
- It should also be made clear that this behavior differs from Django's built-in database backends.
- It would be helpful to give some guidance about the performance implications.
|
||
def loads(self, data:Any, pickled:bool, signature=None) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest renaming pickled -> is_pickled.
if type(obj) is int: # noqa: E721 | ||
return obj | ||
return pickle.dumps(obj, self.protocol) | ||
def _get_signature(self, data) -> Optional[bytes]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently don't have any type hints in this project (Django hasn't adopted them), so it's a bit out of place to add them in this PR.
@@ -32,6 +32,25 @@ In addition, the cache is culled based on ``CULL_FREQUENCY`` when ``add()`` | |||
or ``set()`` is called, if ``MAX_ENTRIES`` is exceeded. See | |||
:ref:`django:cache_arguments` for an explanation of these two options. | |||
|
|||
Cache entries include a HMAC signature to ensure data integrity by default. | |||
You can disable this by setting ``ENABLE_SIGNING`` to ``False``. | |||
Signatures can also include an optional key and salt parameter by setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case for custom key and salt? Probably it should be explained that SECRET_KEY is the default.
if pickled: | ||
try: | ||
if self.signer is not None: | ||
# constant time compare is not required due to how data is retrieved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm familiar with the usage of django.utils.crypto.constant_time_compare()
but I'm not sure what "how data is retrieved" means in this context.
"pickled": pickled, | ||
"signature": signature, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not good to add "pickled" and "signature" keys to all cache data when signing is disabled.
case int() | str() | bytes(): | ||
return (obj, False, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what you refer to as "optimization". I think any optimization should be done separate from security hardening to make that change more clear. However, I wonder if the benefit of not serializing str/bytes is mostly for the signing case. While it helps with CPU, it adds extra the "pickled" attribute for all keys in the cache. It might be better to limit this optimization a new MongoSignedSerializer
class and leave MongoSerializer
unchanged. (As our code strays more and more from tried and tested patterns in Django, I feel less confident in its robustness. In this case, the MongoSerializer
is copied from django.core.cache.backends.redis.RedisSerializer
. We have to ask why Django didn't make this decision and whether it is really better.)
Add HMAC signing of pickled cache data. This implementation uses Blake2b from hashlib to avoid introducing new 3rd party dependencies.
HMAC introduces some overhead to performance, but for cache entries less then 32kb the impact is less then 100ns. For cache entries larger then 1MB, signing can introduce up to 2ms of latency. For BSON serializable types (
int
,str
,bytes
), pickling and signing are skipped, and the values are stored in the cache collection directly.The feature is easily disabled by setting "ENABLE_SIGNING" = False within the CACHE configuration.
Introduced three new cache config options:
If the cache fails to validate a signature
SuspiciousOperation
will be thrown.