Skip to content

Commit b38244d

Browse files
committedApr 27, 2016
Documentation update for spatial features and visualization
1 parent 8c923db commit b38244d

14 files changed

+124
-71
lines changed
 

‎bandicoot/helper/stops.py

+9-2
Original file line numberDiff line numberDiff line change
@@ -45,9 +45,16 @@ def get_neighbors(distance_matrix, source, eps):
4545

4646
def dbscan(points, eps, minpts):
4747
"""
48-
Implementation of DBSCAN (A density-based algorithm for discovering
49-
clusters in large spatial databases with noise) It accepts a list of
48+
Implementation of [DBSCAN]_ (*A density-based algorithm for discovering
49+
clusters in large spatial databases with noise*). It accepts a list of
5050
points (lat, lon) and returns the labels associated with the points.
51+
52+
References
53+
----------
54+
.. [DBSCAN] Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August).
55+
A density-based algorithm for discovering clusters in large
56+
spatial databases with noise. In Kdd (Vol. 96, No. 34, pp. 226-231).
57+
5158
"""
5259
next_label = 0
5360
n = len(points)

‎bandicoot/spatial.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ def percent_at_home(positions, user):
5353
def radius_of_gyration(positions, user):
5454
"""
5555
Returns the radius of gyration, the *equivalent distance* of the mass from
56-
the center of gravity, for all visited places [GON2008]_
56+
the center of gravity, for all visited places. [GON2008]_
5757
5858
References
5959
----------

‎bandicoot/tests/samples/regressions/ego.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"reporting__attributes_path": "samples/attributes",
66
"reporting__recharges_path": "samples/attributes",
77
"reporting__version": "0.4.0",
8-
"reporting__code_signature": "7b35bec9ffc41ee3013a66f98a7032bcf605734f",
8+
"reporting__code_signature": "ae5c1172b89bef72195a6f0d6ef14d0d5cac508a",
99
"reporting__groupby": "week",
1010
"reporting__split_week": true,
1111
"reporting__split_day": true,

‎bandicoot/tests/samples/regressions/empty_user.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"reporting__attributes_path": null,
66
"reporting__recharges_path": null,
77
"reporting__version": "0.4.0",
8-
"reporting__code_signature": "7b35bec9ffc41ee3013a66f98a7032bcf605734f",
8+
"reporting__code_signature": "ae5c1172b89bef72195a6f0d6ef14d0d5cac508a",
99
"reporting__groupby": "week",
1010
"reporting__split_week": true,
1111
"reporting__split_day": true,

‎bandicoot/tests/samples/regressions/manual_a.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"reporting__attributes_path": null,
66
"reporting__recharges_path": null,
77
"reporting__version": "0.4.0",
8-
"reporting__code_signature": "7b35bec9ffc41ee3013a66f98a7032bcf605734f",
8+
"reporting__code_signature": "ae5c1172b89bef72195a6f0d6ef14d0d5cac508a",
99
"reporting__groupby": "week",
1010
"reporting__split_week": true,
1111
"reporting__split_day": true,

‎bandicoot/tests/samples/regressions/manual_a_orange_network.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"reporting__attributes_path": "samples/attributes",
66
"reporting__recharges_path": "samples/attributes",
77
"reporting__version": "0.4.0",
8-
"reporting__code_signature": "7b35bec9ffc41ee3013a66f98a7032bcf605734f",
8+
"reporting__code_signature": "ae5c1172b89bef72195a6f0d6ef14d0d5cac508a",
99
"reporting__groupby": "week",
1010
"reporting__split_week": true,
1111
"reporting__split_day": true,

‎bandicoot/tests/samples/regressions/sample_user.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"reporting__attributes_path": null,
66
"reporting__recharges_path": null,
77
"reporting__version": "0.4.0",
8-
"reporting__code_signature": "7b35bec9ffc41ee3013a66f98a7032bcf605734f",
8+
"reporting__code_signature": "ae5c1172b89bef72195a6f0d6ef14d0d5cac508a",
99
"reporting__groupby": null,
1010
"reporting__split_week": true,
1111
"reporting__split_day": true,

‎docs/_static/style.css

+7
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,10 @@ code, pre {
4141
margin: 0 0 10px 10px;
4242
padding: 10px;
4343
}
44+
45+
46+
.citation .label {
47+
color: #000;
48+
font-size: 100%;
49+
font-weight: normal;
50+
}

‎docs/data_integrity.rst

+36-24
Original file line numberDiff line numberDiff line change
@@ -2,31 +2,37 @@ Data integrity
22
==============
33

44

5-
Collecting mobile phone metadata can lead to corrupted data: wrong format, faulty files, empty periods of time or missing users, *etc.* bandicoot will not try to fix corrupted data, but however let you handle such situations by:
5+
Occasionally, records in CDR and collected mobile phone metadata can be
6+
corrupted: wrong format, faulty files, empty periods of time, missing users,
7+
*etc.* bandicoot will not attempt to correct errors as this might lead to
8+
incorrect analysis. It will instead:
69

7-
1. warning you when importing data,
8-
2. removing faulty records,
9-
3. adding more than 30 reporting variables when exporting indicators.
10+
1. warn you when you attempt to import corrupted data,
11+
2. remove faulty records,
12+
3. report more than 30 variables warning you of potential issues when exporting
13+
indicators.
1014

1115

1216

1317
Warnings at import
1418
------------------
1519

16-
By default, :meth:`~bandicoot.io.read_csv` logs six warnings to the standard output:
20+
By default, :meth:`~bandicoot.io.read_csv` reports six warnings to the standard output:
1721

18-
1. when an attribute path is given, but no attributes are loaded (which can occur when the path is wrong, or the attribute file empty),
19-
2. a recharges path given, but no recharges loaded,
20-
3. the percentage of records missing a location when positive,
21-
4. the number of antennas missing a location (when an antenna file was provided)
22-
5. the percentage of duplicated records (which can happen when databases are mixed together)
23-
6. the percentage of calls with an overlap of more than 5 minutes
22+
1. when an *attribute path* is given but no attributes could be loaded, e.g.
23+
because the path is wrong or because the attribute file is empty,
24+
2. when a *recharges_path* is given but no recharges could be loaded,
25+
3. the percentage of records that do not contain location informationwhen an
26+
antenna file is provided, the number of antennas missing location information
27+
4. the percentage of duplicated records
28+
5. the percentage of calls with an overlap of more than 5 minutes
2429

2530

2631
Removal of faulty records
2732
-------------------------
2833

29-
When loading a CSV file containing records, bandicoot filters out lines with wrong values, and keeps the count of ignored lines in the :class:`~bandicoot.core.User` object:
34+
bandicoot will automatically remove faulty records and will report the number
35+
of ignored records (also available in the :class:`~bandicoot.core.User` Object):
3036

3137
.. code-block:: python
3238
@@ -39,24 +45,30 @@ When loading a CSV file containing records, bandicoot filters out lines with wro
3945
'interaction': 0,
4046
'location': 0}
4147
42-
The previous example means that six records were removed because:
48+
In this example, six records were removed:
4349

44-
- three records had wrong call durations,
45-
- two records had wrong dates and times,
46-
- four records had wrong with directions.
50+
- three records had incorrect call durations,
51+
- two records had incorrect dates and times,
52+
- four records had incorrect incoming or outgoing directions.
4753

48-
.. warning:: An ignored record with multiple faulty fields will be counted for all field, and not only for the first detected. The sum of all ignored fields in ``my_user.ignored_records`` is not equal to 5, the number of ignored records.
54+
.. warning:: An ignored record with multiple faulty fields will be double
55+
counted and reported for each incorrect value. The total number of ignored
56+
records is reported in all, here 5.
4957

5058

51-
bandicoot can also remove duplicated records, if the option ``drop_duplicates=True`` is provided to :meth:`bandicoot.core.read_csv`. This functionality is not activated by default, as one user can send multiple text messages in less than one minute (or less, depending on the granularity of the data set), yet they should not count as duplicated.
59+
bandicoot also offer the option to remove “duplicated records“ (same
60+
correspondants, direction, date and time). The option ``drop_duplicates=True``
61+
in :meth:`~bandicoot.io.read_csv` is not activated by defaul, as one user
62+
might send multiple text messages in less than one minute (or less, depending
63+
on the granularity of the data set).
5264

5365
Reporting variables
5466
-------------------
5567

56-
The function :meth:`~bandicoot.utils.all` returns a nested dictionnary containing all indicators, but also 31 reporting variables:
68+
The function :meth:`~bandicoot.utils.all` returns a nested dictionary containing all indicators, but also 39 reporting variables:
5769

58-
1. concerning the data loading (``antennas_path``, ``attributes_path``, ``recharges_path``),
59-
2. about the user (``start_time``, ``end_time``, ``night_start``, ``night_end``, ``weekend`` with a list of days defining a weekend, ``number_of_records``, ``number_of_antennas``, ``number_of_recharges``, ``bins``, ``bins_with_data``, ``bins_without_data``, ``has_call``, ``has_home``, ``has_recharges``, ``has_attributes``, ``has_network``),
60-
3. on records missing information (``percent_records_missing_location``, ``antennas_missing_locations``, and ``ignored_records`` mentioned previously),
61-
4. on the user's ego network (``percent_outofnetwork_calls``, ``percent_outofnetwork_texts``, ``percent_outofnetwork_contacts``, ``percent_outofnetwork_call_durations``),
62-
5. on the computation (``groupby``, ``split_week``, ``split_day``).
70+
1. information on the files: ``antennas_path``, ``attributes_path``, ``recharges_path``,
71+
2. information about the data: ``start_time``, ``end_time``, ``night_start``, ``night_end``, ``weekend`` with a list of days defining a weekend, ``number_of_records``, ``number_of_antennas``, ``number_of_recharges``,
72+
3. information on records for which information is missing: ``percent_records_missing_location``, ``antennas_missing_locations``, and ``ignored_records`` mentioned previously,
73+
4. information on the user's ego network: ``percent_outofnetwork_calls``, ``percent_outofnetwork_texts``, ``percent_outofnetwork_contacts``, ``percent_outofnetwork_call_durations``,
74+
5. and finally, information on the grouping: ``groupby``, ``split_week``, ``split_day``.

‎docs/index.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ Rocher <https://rocher.lc>`_ and `Alex Pentland <http://web.media.mit.edu/~sandy
88

99
The behavioral indicators computed by bandicoot have already been used to release
1010
data as part of Orange `D4D Challenge <http://www.d4d.orange.com/home>`_, to
11-
`predict personality <http://web.media.mit.edu/~yva/InfographicPersonality.png>`_,
11+
predict `gender, age <http://arxiv.org/abs/1511.06660>`_, and
12+
`personality <http://web.media.mit.edu/~yva/InfographicPersonality.png>`_;
1213
and for `customer segmentation <http://web.media.mit.edu/~yva/InfographicBigDataMarketing.png>`_.
1314

1415
If you use bandicoot in your research please cite it as:

‎docs/reference/bandicoot.others.rst

+29-5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Other modules
2-
=============
1+
utils and helper
2+
================
33

44

55
utils
@@ -76,13 +76,37 @@ helper.maths
7676
helper.stops
7777
------------
7878

79+
Building spatial indicators with both coarse (cell towers) and fine-grained
80+
(GPS) positions is not trivial. This modules helps cluster GPS locations and
81+
update the positions of given records to the closest cluster:
82+
83+
1. :meth:`~bandicoot.helper.stops.cluster_and_update` clusters records and updates
84+
their location,
85+
2. :meth:`~bandicoot.helper.stops.fix_location` updates the position of all records
86+
based on closest cluster found, to avoid having both antennas from cell towers
87+
and from clusters.
88+
89+
Algorithms implemented in this module were designed by Andrea Cuttone [CUT2013]_.
90+
7991
.. currentmodule:: bandicoot.helper.stops
8092
.. autosummary::
8193
:toctree: generated/
8294

95+
cluster_and_update
96+
fix_location
8397
compute_distance_matrix
84-
get_neighbors
98+
99+
**Low-level functions**
100+
101+
.. currentmodule:: bandicoot.helper.stops
102+
.. autosummary::
103+
:toctree: generated/
104+
85105
dbscan
106+
get_neighbors
86107
get_stops
87-
cluster_and_update
88-
fix_location
108+
109+
**References**
110+
111+
.. [CUT2013] Cuttone, A. (2013). SensibleJournal: A Mobile Personal Informatics
112+
System for Visualizing Mobility and Social Interactions. ISO 690
+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
visualization
2+
=============
3+
4+
This module generate an interactive dashboard to visualize user patterns.
5+
6+
.. image:: ../_static/bandicoot-dashboard.png
7+
8+
.. currentmodule:: bandicoot.visualization
9+
10+
.. autosummary::
11+
:toctree: generated/
12+
13+
run
14+
export
15+
user_data
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,17 @@
1-
special
2-
========
3-
4-
This module contains additional features:
5-
6-
- functions to generate a dashboard for one user,
7-
- routines to export indicators for deep learning tools.
8-
9-
10-
dashboard
11-
---------
12-
13-
This module generate an interactive dashboard to visualize user patterns.
14-
15-
.. image:: ../_static/bandicoot-dashboard.png
16-
17-
.. currentmodule:: bandicoot.special.dashboard
18-
19-
.. autosummary::
20-
:toctree: generated/
21-
22-
dashboard_data
23-
build
24-
server
25-
26-
27-
281
weekmatrix
29-
----------
2+
==========
303

31-
`Recent research <https://github.com/yvesalexandre/convnet-metadata>`_ shows
4+
Recent research [MON2015]_ shows
325
how deep learning methods (CNN) can achieve state-of-the-art classification
336
performance on mobile phone metadata. These methods can exploit the temporal
347
structure in mobile metadata by using specialized neural network architectures.
358

36-
This module contains functions for outputting the ‘week-matrix’ data
9+
.. note::
10+
See the `convnet-metadata <https://github.com/yvesalexandre/convnet-metadata>`_
11+
repository on Github to learn how to use bandicoot ``weekmatrix``
12+
features with the Caffe deep learning framework.
13+
14+
This module contains functions for outputting the *week-matrix* data
3715
representation, which can used with these deep learning methods. The mobile
3816
metadata is represented as 8 matrices summarizing mobile phone usage on a
3917
given week with hours of the day on the x-axis and the weekdays on the
@@ -44,13 +22,21 @@ for a given variable of interest in that hour interval (e.g. between 2 and
4422
3pm). In this way, any number of interactions during the week is binned.
4523
These 8 matrices are combined into a 3-dimensional matrix with a separate
4624
'channel' for each of the 8 variables of interest. Such a 3-dimensional
47-
matrix is named a 'week-matrix'.
25+
matrix is named a *week-matrix*.
26+
4827

49-
.. currentmodule:: bandicoot.special.weekmatrix
28+
.. currentmodule:: bandicoot.weekmatrix
5029

5130
.. autosummary::
5231
:toctree: generated/
5332

5433
create_weekmatrices
5534
read_csv
5635
to_csv
36+
37+
38+
References
39+
----------
40+
.. [MON2015] Felbo, B., Sundsøy, P., Pentland, A. S., Lehmann, S., & de
41+
Montjoye, Y. A. (2015). Using Deep Learning to Predict Demographics
42+
from Mobile Phone Metadata. arXiv preprint arXiv:1511.06660.

‎docs/reference/index.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ This reference manual details functions, modules, and objects included in bandic
1313
bandicoot.recharge
1414
bandicoot.io
1515
bandicoot.core
16-
bandicoot.special
16+
bandicoot.visualization
1717
bandicoot.others
18+
bandicoot.weekmatrix
1819

0 commit comments

Comments
 (0)
Please sign in to comment.