|  | 
| 1 |  | -# Query Objects | 
|  | 1 | +# Fetch | 
| 2 | 2 | 
 | 
| 3 |  | -**Data queries** retrieve data from the database. A data query is performed with the | 
| 4 |  | -  help of a **query object**, which is a symbolic representation of the query that does | 
| 5 |  | -  not in itself contain any actual data. The simplest query object is an instance of | 
| 6 |  | -  a **table class**, representing the contents of an entire table. | 
|  | 3 | +Data queries in DataJoint comprise two distinct steps: | 
| 7 | 4 | 
 | 
| 8 |  | -## Querying a database | 
|  | 5 | +1. Construct the `query` object to represent the required data using tables and  | 
|  | 6 | +[operators](operators.m`). | 
|  | 7 | +2. Fetch the data from `query` into the workspace of the host language -- described in  | 
|  | 8 | +this section. | 
| 9 | 9 | 
 | 
| 10 |  | -For example, if given a `Session` table, you can | 
| 11 |  | -create a query object to retrieve its entire contents as follows: | 
|  | 10 | +Note that entities returned by `fetch` methods are not guaranteed to be sorted in any  | 
|  | 11 | +particular order unless specifically requested. | 
|  | 12 | +Furthermore, the order is not guaranteed to be the same in any two queries, and the  | 
|  | 13 | +contents of two identical queries may change between two sequential invocations unless  | 
|  | 14 | +they are wrapped in a transaction. | 
|  | 15 | +Therefore, if you wish to fetch matching pairs of attributes, do so in one `fetch` call. | 
|  | 16 | + | 
|  | 17 | +The examples below are based on the [example schema](example-schema.md) for this part  | 
|  | 18 | +of the documentation. | 
|  | 19 | + | 
|  | 20 | +## Entire table | 
|  | 21 | + | 
|  | 22 | +The following statement retrieves the entire table as a NumPy  | 
|  | 23 | +[recarray](https://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html). | 
| 12 | 24 | 
 | 
| 13 | 25 | ```python | 
| 14 |  | -query  = Session() | 
|  | 26 | +data = query.fetch() | 
| 15 | 27 | ``` | 
| 16 | 28 | 
 | 
| 17 |  | -More generally, a query object may be formed as a **query expression** | 
| 18 |  | -constructed by applying [operators](./operators.md) to other query objects. | 
| 19 |  | - | 
| 20 |  | -For example, the following query retrieves information about all | 
| 21 |  | -experiments and scans for mouse 001: | 
|  | 29 | +To retrieve the data as a list of `dict`: | 
| 22 | 30 | 
 | 
| 23 | 31 | ```python | 
| 24 |  | -query = Session * Scan & 'animal_id = 001' | 
|  | 32 | +data = query.fetch(as_dict=True) | 
| 25 | 33 | ``` | 
| 26 | 34 | 
 | 
| 27 |  | -Note that for brevity, query operators can be applied directly to class, as | 
| 28 |  | -`Session` instead of `Session()`. | 
|  | 35 | +In some cases, the amount of data returned by fetch can be quite large; in these cases  | 
|  | 36 | +it can be useful to use the `size_on_disk` attribute to determine if running a bare  | 
|  | 37 | +fetch would be wise. | 
|  | 38 | +Please note that it is only currently possible to query the size of entire tables  | 
|  | 39 | +stored directly in the database at this time. | 
|  | 40 | + | 
|  | 41 | +## As separate variables | 
| 29 | 42 | 
 | 
| 30 |  | -Alternatively, we could query all scans with a sample rate over 1000, and preview the | 
| 31 |  | -contents of the query simply displaying the object.  | 
|  | 43 | +```python | 
|  | 44 | +name, img = query.fetch1('name', 'image')  # when query has exactly one entity | 
|  | 45 | +name, img = query.fetch('name', 'image')  # [name, ...] [image, ...] | 
|  | 46 | +``` | 
|  | 47 | + | 
|  | 48 | +## Primary key values | 
| 32 | 49 | 
 | 
| 33 | 50 | ```python | 
| 34 |  | -Scan & 'sample_rate > 1000' | 
|  | 51 | +keydict = tab.fetch1("KEY")  # single key dict when tab has exactly one entity | 
|  | 52 | +keylist = tab.fetch("KEY")  # list of key dictionaries [{}, ...] | 
| 35 | 53 | ``` | 
| 36 | 54 | 
 | 
| 37 |  | -The above command shows the following table: | 
| 38 |  | - | 
| 39 |  | -```text | 
| 40 |  | -| id* |    start_time*      | sample_rate | signal |  times | duration | | 
| 41 |  | -|-----|---------------------|-------------|--------|--------|----------|  | 
| 42 |  | -|  1  | 2020-01-02 22:15:00 |   1893.00   | =BLOB= | =BLOB= |  1981.29 | | 
| 43 |  | -|  2  | 2020-01-03 00:15:00 |   4800.00   | =BLOB= | =BLOB= |   548.0  | | 
| 44 |  | -|  3  | 2020-01-19 14:03:03 |   4800.00   | =BLOB= | =BLOB= |   336.0  | | 
| 45 |  | -|  4  | 2020-01-19 14:13:03 |   4800.00   | =BLOB= | =BLOB= |  2501.0  | | 
| 46 |  | -|  5  | 2020-01-23 11:05:23 |   4800.00   | =BLOB= | =BLOB= |  1800.0  | | 
| 47 |  | -|  6  | 2020-01-27 14:03:03 |   4800.00   | =BLOB= | =BLOB= |   600.0  | | 
| 48 |  | -|  7  | 2020-01-31 20:15:00 |   4800.00   | =BLOB= | =BLOB= |   600.0  | | 
| 49 |  | -... | 
| 50 |  | -11 tuples | 
|  | 55 | +`KEY` can also used when returning attribute values as separate variables, such that  | 
|  | 56 | +one of the returned variables contains the entire primary keys. | 
|  | 57 | + | 
|  | 58 | +## Sorting and limiting the results | 
|  | 59 | + | 
|  | 60 | +To sort the result, use the `order_by` keyword argument. | 
|  | 61 | + | 
|  | 62 | +```python | 
|  | 63 | +# ascending order: | 
|  | 64 | +data = query.fetch(order_by='name') | 
|  | 65 | +# descending order: | 
|  | 66 | +data = query.fetch(order_by='name desc')   | 
|  | 67 | +# by name first, year second: | 
|  | 68 | +data = query.fetch(order_by=('name desc', 'year')) | 
|  | 69 | +# sort by the primary key: | 
|  | 70 | +data = query.fetch(order_by='KEY') | 
|  | 71 | +# sort by name but for same names order by primary key: | 
|  | 72 | +data = query.fetch(order_by=('name', 'KEY desc')) | 
| 51 | 73 | ``` | 
| 52 | 74 | 
 | 
| 53 |  | -Note that this preview (a) only lists a few of the entities that will be returned and  | 
| 54 |  | -(b) does not contain any data for attributes of datatype `blob`. | 
|  | 75 | +The `order_by` argument can be a string specifying the attribute to sort by. By default  | 
|  | 76 | +the sort is in ascending order. Use `'attr desc'` to sort in descending order by  | 
|  | 77 | +attribute `attr`.  The value can also be a sequence of strings, in which case, the sort  | 
|  | 78 | +performed on all the attributes jointly in the order specified. | 
|  | 79 | + | 
|  | 80 | +The special attribute name `'KEY'` represents the primary key attributes in order that  | 
|  | 81 | +they appear in the index. Otherwise, this name can be used as any other argument. | 
| 55 | 82 | 
 | 
| 56 |  | -Once the desired query object is formed, the query can be executed using its [fetch] | 
| 57 |  | -(./fetch) methods. To **fetch** means to transfer the data represented by the query | 
| 58 |  | -object from the database server into the workspace of the host language. | 
|  | 83 | +If an attribute happens to be a SQL reserved word, it needs to be enclosed in  | 
|  | 84 | +backquotes.  For example: | 
| 59 | 85 | 
 | 
| 60 | 86 | ```python | 
| 61 |  | -query = Scan & 'sample_rate > 1000' | 
| 62 |  | -s = query.fetch() | 
|  | 87 | +data = query.fetch(order_by='`select` desc') | 
| 63 | 88 | ``` | 
| 64 | 89 | 
 | 
| 65 |  | -Here fetching from the `query` object produces the NumPy record array | 
| 66 |  | -`s` of the queried data. | 
|  | 90 | +The `order_by` value is eventually passed to the `ORDER BY`  | 
|  | 91 | +[clause](https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html). | 
|  | 92 | + | 
|  | 93 | +Similarly, the `limit` and `offset` arguments can be used to limit the result to a  | 
|  | 94 | +subset of entities. | 
|  | 95 | + | 
|  | 96 | +For example, one could do the following: | 
| 67 | 97 | 
 | 
| 68 |  | -## Checking for entities | 
|  | 98 | +```python | 
|  | 99 | +data = query.fetch(order_by='name', limit=10, offset=5) | 
|  | 100 | +``` | 
| 69 | 101 | 
 | 
| 70 |  | -The preview of the query object shown above displayed only a few of the entities | 
| 71 |  | -returned by the query but also displayed the total number of entities that would be | 
| 72 |  | -returned. It can be useful to know the number of entities returned by a query, or even | 
| 73 |  | -whether a query will return any entities at all, without having to fetch all the data | 
| 74 |  | -themselves. | 
|  | 102 | +Note that an `offset` cannot be used without specifying a `limit` as well.  | 
| 75 | 103 | 
 | 
| 76 |  | -The `bool` function applied to a query object evaluates to `True` if the | 
| 77 |  | -query returns any entities and to `False` if the query result is empty. | 
|  | 104 | +## Usage with Pandas | 
| 78 | 105 | 
 | 
| 79 |  | -The `len` function applied to a query object determines the number of | 
| 80 |  | -entities returned by the query. | 
|  | 106 | +The [pandas library](http://pandas.pydata.org/) is a popular library for data analysis  | 
|  | 107 | +in Python which can easily be used with DataJoint query results. | 
|  | 108 | +Since the records returned by `fetch()` are contained within a `numpy.recarray`, they  | 
|  | 109 | +can be easily converted to `pandas.DataFrame` objects by passing them into the  | 
|  | 110 | +`pandas.DataFrame` constructor. | 
|  | 111 | +For example: | 
| 81 | 112 | 
 | 
| 82 | 113 | ```python | 
| 83 |  | -# number of sessions since the start of 2018. | 
| 84 |  | -n = len(Session & 'session_date >= "2018-01-01"') | 
|  | 114 | +import pandas as pd | 
|  | 115 | +frame = pd.DataFrame(tab.fetch()) | 
| 85 | 116 | ``` | 
| 86 | 117 | 
 | 
| 87 |  | -## Normalization in queries | 
|  | 118 | +Calling `fetch()` with the argument `format="frame"` returns results as  | 
|  | 119 | +`pandas.DataFrame` objects indexed by the table's primary key attributes. | 
|  | 120 | + | 
|  | 121 | +```python | 
|  | 122 | +frame = tab.fetch(format="frame") | 
|  | 123 | +``` | 
| 88 | 124 | 
 | 
| 89 |  | -Query objects adhere to entity [entity normalization](../design/normalization). The result of a | 
| 90 |  | -query will include the uniquely defining attributes jointly distinguish any two | 
| 91 |  | -entities from each other. The query [operators](./operators) are designed to keep the | 
| 92 |  | -result normalized even in complex query expressions. | 
|  | 125 | +Returning results as a `DataFrame` is not possible when fetching a particular subset of  | 
|  | 126 | +attributes or when `as_dict` is set to `True`. | 
0 commit comments