Skip to content

Commit d3a855c

Browse files
committed
Replace with true contents of fetch page
1 parent 466a84d commit d3a855c

File tree

1 file changed

+94
-60
lines changed

1 file changed

+94
-60
lines changed

Diff for: docs/src/query/fetch.md

+94-60
Original file line numberDiff line numberDiff line change
@@ -1,92 +1,126 @@
1-
# Query Objects
1+
# Fetch
22

3-
**Data queries** retrieve data from the database. A data query is performed with the
4-
help of a **query object**, which is a symbolic representation of the query that does
5-
not in itself contain any actual data. The simplest query object is an instance of
6-
a **table class**, representing the contents of an entire table.
3+
Data queries in DataJoint comprise two distinct steps:
74

8-
## Querying a database
5+
1. Construct the `query` object to represent the required data using tables and
6+
[operators](operators.m`).
7+
2. Fetch the data from `query` into the workspace of the host language -- described in
8+
this section.
99

10-
For example, if given a `Session` table, you can
11-
create a query object to retrieve its entire contents as follows:
10+
Note that entities returned by `fetch` methods are not guaranteed to be sorted in any
11+
particular order unless specifically requested.
12+
Furthermore, the order is not guaranteed to be the same in any two queries, and the
13+
contents of two identical queries may change between two sequential invocations unless
14+
they are wrapped in a transaction.
15+
Therefore, if you wish to fetch matching pairs of attributes, do so in one `fetch` call.
16+
17+
The examples below are based on the [example schema](example-schema.md) for this part
18+
of the documentation.
19+
20+
## Entire table
21+
22+
The following statement retrieves the entire table as a NumPy
23+
[recarray](https://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html).
1224

1325
```python
14-
query = Session()
26+
data = query.fetch()
1527
```
1628

17-
More generally, a query object may be formed as a **query expression**
18-
constructed by applying [operators](./operators.md) to other query objects.
19-
20-
For example, the following query retrieves information about all
21-
experiments and scans for mouse 001:
29+
To retrieve the data as a list of `dict`:
2230

2331
```python
24-
query = Session * Scan & 'animal_id = 001'
32+
data = query.fetch(as_dict=True)
2533
```
2634

27-
Note that for brevity, query operators can be applied directly to class, as
28-
`Session` instead of `Session()`.
35+
In some cases, the amount of data returned by fetch can be quite large; in these cases
36+
it can be useful to use the `size_on_disk` attribute to determine if running a bare
37+
fetch would be wise.
38+
Please note that it is only currently possible to query the size of entire tables
39+
stored directly in the database at this time.
40+
41+
## As separate variables
2942

30-
Alternatively, we could query all scans with a sample rate over 1000, and preview the
31-
contents of the query simply displaying the object.
43+
```python
44+
name, img = query.fetch1('name', 'image') # when query has exactly one entity
45+
name, img = query.fetch('name', 'image') # [name, ...] [image, ...]
46+
```
47+
48+
## Primary key values
3249

3350
```python
34-
Scan & 'sample_rate > 1000'
51+
keydict = tab.fetch1("KEY") # single key dict when tab has exactly one entity
52+
keylist = tab.fetch("KEY") # list of key dictionaries [{}, ...]
3553
```
3654

37-
The above command shows the following table:
38-
39-
```text
40-
| id* | start_time* | sample_rate | signal | times | duration |
41-
|-----|---------------------|-------------|--------|--------|----------|
42-
| 1 | 2020-01-02 22:15:00 | 1893.00 | =BLOB= | =BLOB= | 1981.29 |
43-
| 2 | 2020-01-03 00:15:00 | 4800.00 | =BLOB= | =BLOB= | 548.0 |
44-
| 3 | 2020-01-19 14:03:03 | 4800.00 | =BLOB= | =BLOB= | 336.0 |
45-
| 4 | 2020-01-19 14:13:03 | 4800.00 | =BLOB= | =BLOB= | 2501.0 |
46-
| 5 | 2020-01-23 11:05:23 | 4800.00 | =BLOB= | =BLOB= | 1800.0 |
47-
| 6 | 2020-01-27 14:03:03 | 4800.00 | =BLOB= | =BLOB= | 600.0 |
48-
| 7 | 2020-01-31 20:15:00 | 4800.00 | =BLOB= | =BLOB= | 600.0 |
49-
...
50-
11 tuples
55+
`KEY` can also used when returning attribute values as separate variables, such that
56+
one of the returned variables contains the entire primary keys.
57+
58+
## Sorting and limiting the results
59+
60+
To sort the result, use the `order_by` keyword argument.
61+
62+
```python
63+
# ascending order:
64+
data = query.fetch(order_by='name')
65+
# descending order:
66+
data = query.fetch(order_by='name desc')
67+
# by name first, year second:
68+
data = query.fetch(order_by=('name desc', 'year'))
69+
# sort by the primary key:
70+
data = query.fetch(order_by='KEY')
71+
# sort by name but for same names order by primary key:
72+
data = query.fetch(order_by=('name', 'KEY desc'))
5173
```
5274

53-
Note that this preview (a) only lists a few of the entities that will be returned and
54-
(b) does not contain any data for attributes of datatype `blob`.
75+
The `order_by` argument can be a string specifying the attribute to sort by. By default
76+
the sort is in ascending order. Use `'attr desc'` to sort in descending order by
77+
attribute `attr`. The value can also be a sequence of strings, in which case, the sort
78+
performed on all the attributes jointly in the order specified.
79+
80+
The special attribute name `'KEY'` represents the primary key attributes in order that
81+
they appear in the index. Otherwise, this name can be used as any other argument.
5582

56-
Once the desired query object is formed, the query can be executed using its [fetch]
57-
(./fetch) methods. To **fetch** means to transfer the data represented by the query
58-
object from the database server into the workspace of the host language.
83+
If an attribute happens to be a SQL reserved word, it needs to be enclosed in
84+
backquotes. For example:
5985

6086
```python
61-
query = Scan & 'sample_rate > 1000'
62-
s = query.fetch()
87+
data = query.fetch(order_by='`select` desc')
6388
```
6489

65-
Here fetching from the `query` object produces the NumPy record array
66-
`s` of the queried data.
90+
The `order_by` value is eventually passed to the `ORDER BY`
91+
[clause](https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html).
92+
93+
Similarly, the `limit` and `offset` arguments can be used to limit the result to a
94+
subset of entities.
95+
96+
For example, one could do the following:
6797

68-
## Checking for entities
98+
```python
99+
data = query.fetch(order_by='name', limit=10, offset=5)
100+
```
69101

70-
The preview of the query object shown above displayed only a few of the entities
71-
returned by the query but also displayed the total number of entities that would be
72-
returned. It can be useful to know the number of entities returned by a query, or even
73-
whether a query will return any entities at all, without having to fetch all the data
74-
themselves.
102+
Note that an `offset` cannot be used without specifying a `limit` as well.
75103

76-
The `bool` function applied to a query object evaluates to `True` if the
77-
query returns any entities and to `False` if the query result is empty.
104+
## Usage with Pandas
78105

79-
The `len` function applied to a query object determines the number of
80-
entities returned by the query.
106+
The [pandas library](http://pandas.pydata.org/) is a popular library for data analysis
107+
in Python which can easily be used with DataJoint query results.
108+
Since the records returned by `fetch()` are contained within a `numpy.recarray`, they
109+
can be easily converted to `pandas.DataFrame` objects by passing them into the
110+
`pandas.DataFrame` constructor.
111+
For example:
81112

82113
```python
83-
# number of sessions since the start of 2018.
84-
n = len(Session & 'session_date >= "2018-01-01"')
114+
import pandas as pd
115+
frame = pd.DataFrame(tab.fetch())
85116
```
86117

87-
## Normalization in queries
118+
Calling `fetch()` with the argument `format="frame"` returns results as
119+
`pandas.DataFrame` objects indexed by the table's primary key attributes.
120+
121+
```python
122+
frame = tab.fetch(format="frame")
123+
```
88124

89-
Query objects adhere to entity [entity normalization](../design/normalization). The result of a
90-
query will include the uniquely defining attributes jointly distinguish any two
91-
entities from each other. The query [operators](./operators) are designed to keep the
92-
result normalized even in complex query expressions.
125+
Returning results as a `DataFrame` is not possible when fetching a particular subset of
126+
attributes or when `as_dict` is set to `True`.

0 commit comments

Comments
 (0)