Skip to content

MongoDB documents with non-ASCII keys break MongoEngine document instantiation #1041

Open
@asiebert

Description

@asiebert

We came across a DynamicEmbeddedDocument in our mongo instance that contained some accentuated keys, and ended up breaking the whole queryset retrieval through mongoengine.

As far as MongoDB's doc goes, it doesn't seem like there is any restriction on field names beyond the . and $ characters, and mongo(-shell) has no issue dealing with such (non-ASCII of UTF) fields.
http://docs.mongodb.org/manual/reference/limits/#Restrictions-on-Field-Names

Here is a minimalist example reproducing the issue (mongoengine 0.10.0), assuming a local DB is running:

$ virtualenv venv && venv/bin/pip install mongoengine ipython
$ venv/bin/ipython

# Document ready-to-paste in iPython
from mongoengine import connect, DynamicDocument

class MongoTest(DynamicDocument):
    '''Test main document'''
    meta = {
        'collection': 'mongo_test',
    }
    connect("test")

# '>' for commands issued in mongo-shell
# ':' for commands issued in iPython

> db.mongo_test.insert({"iamakey_/}+*#!": "value"})
WriteResult({ "nInserted" : 1 })

: MongoTest.objects().all()                                                                                      │
[<MongoTest: MongoTest object>]

> db.mongo_test.insert({"iamakey_/}+*#!helloUTFé": "value"})
WriteResult({ "nInserted" : 1 })

: all = MongoTest.objects().all()
: all
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)

# --truncated stack: iPython internals--

...mongoengine_test/venv/local/lib/python2.7/site-packages/mongoengine/queryset/queryset.pyc in __repr__(self)
     56             return '.. queryset mid-iteration ..'
     57 
---> 58         self._populate_cache()
     59         data = self._result_cache[:REPR_OUTPUT_SIZE + 1]
     60         if len(data) > REPR_OUTPUT_SIZE:

...mongoengine_test/venv/local/lib/python2.7/site-packages/mongoengine/queryset/queryset.pyc in _populate_cache(self)
     90             try:
     91                 for i in xrange(ITER_CHUNK_SIZE):
---> 92                     self._result_cache.append(self.next())
     93             except StopIteration:
     94                 self._has_more = False

...mongoengine_test/venv/local/lib/python2.7/site-packages/mongoengine/queryset/base.pyc in next(self)
   1385             return self._get_as_pymongo(raw_doc)
   1386         doc = self._document._from_son(raw_doc,
-> 1387                                        _auto_dereference=self._auto_dereference, only_fields=self.only_fields)
   1388 
   1389         if self._scalar:

...mongoengine_test/venv/local/lib/python2.7/site-packages/mongoengine/base/document.pyc in _from_son(cls, son, _auto_dereference, only_fields, created)
    729             data = dict((k, v)
    730                         for k, v in data.iteritems() if k in cls._fields)
--> 731         obj = cls(__auto_convert=False, _created=created, __only_fields=only_fields, **data)
    732         obj._changed_fields = changed_fields
    733         if not _auto_dereference:

...mongoengine_test/venv/local/lib/python2.7/site-packages/mongoengine/base/document.pyc in __init__(self, *args, **values)
    128             self._dynamic_lock = False
    129             for key, value in dynamic_data.iteritems():
--> 130                 setattr(self, key, value)
    131 
    132         # Flag initialised

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 22: ordinal not in range(128)

: all
[<MongoTest: MongoTest object>]

: len(all)
1

: all.count()
2

> db.mongo_test.find()
{ "_id" : ObjectId("558b51e14a55a1ecdbf608aa"), "iamakey_/}+*#!" : "value" }
{ "_id" : ObjectId("558b52154a55a1ecdbf608ab"), "iamakey_/}+*#!helloUTFé" : "value" }

The last instructions show that loading the queryset yields an encoding exception, but the cache is still populated, alas without the failing document.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions