|
1 |
| -# Query Objects |
| 1 | +# Fetch |
2 | 2 |
|
3 |
| -**Data queries** retrieve data from the database. A data query is performed with the |
4 |
| - help of a **query object**, which is a symbolic representation of the query that does |
5 |
| - not in itself contain any actual data. The simplest query object is an instance of |
6 |
| - a **table class**, representing the contents of an entire table. |
| 3 | +Data queries in DataJoint comprise two distinct steps: |
7 | 4 |
|
8 |
| -## Querying a database |
| 5 | +1. Construct the `query` object to represent the required data using tables and |
| 6 | +[operators](operators.m`). |
| 7 | +2. Fetch the data from `query` into the workspace of the host language -- described in |
| 8 | +this section. |
9 | 9 |
|
10 |
| -For example, if given a `Session` table, you can |
11 |
| -create a query object to retrieve its entire contents as follows: |
| 10 | +Note that entities returned by `fetch` methods are not guaranteed to be sorted in any |
| 11 | +particular order unless specifically requested. |
| 12 | +Furthermore, the order is not guaranteed to be the same in any two queries, and the |
| 13 | +contents of two identical queries may change between two sequential invocations unless |
| 14 | +they are wrapped in a transaction. |
| 15 | +Therefore, if you wish to fetch matching pairs of attributes, do so in one `fetch` call. |
| 16 | + |
| 17 | +The examples below are based on the [example schema](example-schema.md) for this part |
| 18 | +of the documentation. |
| 19 | + |
| 20 | +## Entire table |
| 21 | + |
| 22 | +The following statement retrieves the entire table as a NumPy |
| 23 | +[recarray](https://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html). |
12 | 24 |
|
13 | 25 | ```python
|
14 |
| -query = Session() |
| 26 | +data = query.fetch() |
15 | 27 | ```
|
16 | 28 |
|
17 |
| -More generally, a query object may be formed as a **query expression** |
18 |
| -constructed by applying [operators](./operators.md) to other query objects. |
19 |
| - |
20 |
| -For example, the following query retrieves information about all |
21 |
| -experiments and scans for mouse 001: |
| 29 | +To retrieve the data as a list of `dict`: |
22 | 30 |
|
23 | 31 | ```python
|
24 |
| -query = Session * Scan & 'animal_id = 001' |
| 32 | +data = query.fetch(as_dict=True) |
25 | 33 | ```
|
26 | 34 |
|
27 |
| -Note that for brevity, query operators can be applied directly to class, as |
28 |
| -`Session` instead of `Session()`. |
| 35 | +In some cases, the amount of data returned by fetch can be quite large; in these cases |
| 36 | +it can be useful to use the `size_on_disk` attribute to determine if running a bare |
| 37 | +fetch would be wise. |
| 38 | +Please note that it is only currently possible to query the size of entire tables |
| 39 | +stored directly in the database at this time. |
| 40 | + |
| 41 | +## As separate variables |
29 | 42 |
|
30 |
| -Alternatively, we could query all scans with a sample rate over 1000, and preview the |
31 |
| -contents of the query simply displaying the object. |
| 43 | +```python |
| 44 | +name, img = query.fetch1('name', 'image') # when query has exactly one entity |
| 45 | +name, img = query.fetch('name', 'image') # [name, ...] [image, ...] |
| 46 | +``` |
| 47 | + |
| 48 | +## Primary key values |
32 | 49 |
|
33 | 50 | ```python
|
34 |
| -Scan & 'sample_rate > 1000' |
| 51 | +keydict = tab.fetch1("KEY") # single key dict when tab has exactly one entity |
| 52 | +keylist = tab.fetch("KEY") # list of key dictionaries [{}, ...] |
35 | 53 | ```
|
36 | 54 |
|
37 |
| -The above command shows the following table: |
38 |
| - |
39 |
| -```text |
40 |
| -| id* | start_time* | sample_rate | signal | times | duration | |
41 |
| -|-----|---------------------|-------------|--------|--------|----------| |
42 |
| -| 1 | 2020-01-02 22:15:00 | 1893.00 | =BLOB= | =BLOB= | 1981.29 | |
43 |
| -| 2 | 2020-01-03 00:15:00 | 4800.00 | =BLOB= | =BLOB= | 548.0 | |
44 |
| -| 3 | 2020-01-19 14:03:03 | 4800.00 | =BLOB= | =BLOB= | 336.0 | |
45 |
| -| 4 | 2020-01-19 14:13:03 | 4800.00 | =BLOB= | =BLOB= | 2501.0 | |
46 |
| -| 5 | 2020-01-23 11:05:23 | 4800.00 | =BLOB= | =BLOB= | 1800.0 | |
47 |
| -| 6 | 2020-01-27 14:03:03 | 4800.00 | =BLOB= | =BLOB= | 600.0 | |
48 |
| -| 7 | 2020-01-31 20:15:00 | 4800.00 | =BLOB= | =BLOB= | 600.0 | |
49 |
| -... |
50 |
| -11 tuples |
| 55 | +`KEY` can also used when returning attribute values as separate variables, such that |
| 56 | +one of the returned variables contains the entire primary keys. |
| 57 | + |
| 58 | +## Sorting and limiting the results |
| 59 | + |
| 60 | +To sort the result, use the `order_by` keyword argument. |
| 61 | + |
| 62 | +```python |
| 63 | +# ascending order: |
| 64 | +data = query.fetch(order_by='name') |
| 65 | +# descending order: |
| 66 | +data = query.fetch(order_by='name desc') |
| 67 | +# by name first, year second: |
| 68 | +data = query.fetch(order_by=('name desc', 'year')) |
| 69 | +# sort by the primary key: |
| 70 | +data = query.fetch(order_by='KEY') |
| 71 | +# sort by name but for same names order by primary key: |
| 72 | +data = query.fetch(order_by=('name', 'KEY desc')) |
51 | 73 | ```
|
52 | 74 |
|
53 |
| -Note that this preview (a) only lists a few of the entities that will be returned and |
54 |
| -(b) does not contain any data for attributes of datatype `blob`. |
| 75 | +The `order_by` argument can be a string specifying the attribute to sort by. By default |
| 76 | +the sort is in ascending order. Use `'attr desc'` to sort in descending order by |
| 77 | +attribute `attr`. The value can also be a sequence of strings, in which case, the sort |
| 78 | +performed on all the attributes jointly in the order specified. |
| 79 | + |
| 80 | +The special attribute name `'KEY'` represents the primary key attributes in order that |
| 81 | +they appear in the index. Otherwise, this name can be used as any other argument. |
55 | 82 |
|
56 |
| -Once the desired query object is formed, the query can be executed using its [fetch] |
57 |
| -(./fetch) methods. To **fetch** means to transfer the data represented by the query |
58 |
| -object from the database server into the workspace of the host language. |
| 83 | +If an attribute happens to be a SQL reserved word, it needs to be enclosed in |
| 84 | +backquotes. For example: |
59 | 85 |
|
60 | 86 | ```python
|
61 |
| -query = Scan & 'sample_rate > 1000' |
62 |
| -s = query.fetch() |
| 87 | +data = query.fetch(order_by='`select` desc') |
63 | 88 | ```
|
64 | 89 |
|
65 |
| -Here fetching from the `query` object produces the NumPy record array |
66 |
| -`s` of the queried data. |
| 90 | +The `order_by` value is eventually passed to the `ORDER BY` |
| 91 | +[clause](https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html). |
| 92 | + |
| 93 | +Similarly, the `limit` and `offset` arguments can be used to limit the result to a |
| 94 | +subset of entities. |
| 95 | + |
| 96 | +For example, one could do the following: |
67 | 97 |
|
68 |
| -## Checking for entities |
| 98 | +```python |
| 99 | +data = query.fetch(order_by='name', limit=10, offset=5) |
| 100 | +``` |
69 | 101 |
|
70 |
| -The preview of the query object shown above displayed only a few of the entities |
71 |
| -returned by the query but also displayed the total number of entities that would be |
72 |
| -returned. It can be useful to know the number of entities returned by a query, or even |
73 |
| -whether a query will return any entities at all, without having to fetch all the data |
74 |
| -themselves. |
| 102 | +Note that an `offset` cannot be used without specifying a `limit` as well. |
75 | 103 |
|
76 |
| -The `bool` function applied to a query object evaluates to `True` if the |
77 |
| -query returns any entities and to `False` if the query result is empty. |
| 104 | +## Usage with Pandas |
78 | 105 |
|
79 |
| -The `len` function applied to a query object determines the number of |
80 |
| -entities returned by the query. |
| 106 | +The [pandas library](http://pandas.pydata.org/) is a popular library for data analysis |
| 107 | +in Python which can easily be used with DataJoint query results. |
| 108 | +Since the records returned by `fetch()` are contained within a `numpy.recarray`, they |
| 109 | +can be easily converted to `pandas.DataFrame` objects by passing them into the |
| 110 | +`pandas.DataFrame` constructor. |
| 111 | +For example: |
81 | 112 |
|
82 | 113 | ```python
|
83 |
| -# number of sessions since the start of 2018. |
84 |
| -n = len(Session & 'session_date >= "2018-01-01"') |
| 114 | +import pandas as pd |
| 115 | +frame = pd.DataFrame(tab.fetch()) |
85 | 116 | ```
|
86 | 117 |
|
87 |
| -## Normalization in queries |
| 118 | +Calling `fetch()` with the argument `format="frame"` returns results as |
| 119 | +`pandas.DataFrame` objects indexed by the table's primary key attributes. |
| 120 | + |
| 121 | +```python |
| 122 | +frame = tab.fetch(format="frame") |
| 123 | +``` |
88 | 124 |
|
89 |
| -Query objects adhere to entity [entity normalization](../design/normalization). The result of a |
90 |
| -query will include the uniquely defining attributes jointly distinguish any two |
91 |
| -entities from each other. The query [operators](./operators) are designed to keep the |
92 |
| -result normalized even in complex query expressions. |
| 125 | +Returning results as a `DataFrame` is not possible when fetching a particular subset of |
| 126 | +attributes or when `as_dict` is set to `True`. |
0 commit comments