diff --git a/docs/conf.py b/docs/conf.py deleted file mode 100644 index a08dda6c9..000000000 --- a/docs/conf.py +++ /dev/null @@ -1,62 +0,0 @@ -# Configuration file for the Sphinx documentation builder. -# -# This file only contains a selection of the most common options. For a full -# list see the documentation: -# https://www.sphinx-doc.org/en/master/usage/configuration.html - -# -- Path setup -------------------------------------------------------------- - -# If extensions (or modules to document with autodoc) are in another directory, -# add these directories to sys.path here. If the directory is relative to the -# documentation root, use os.path.abspath to make it absolute, like shown here. -# -import os -import sys -sys.path.insert(0, os.path.abspath('../..')) -print(os.path.abspath('../..')) - - -# -- Project information ----------------------------------------------------- -project = '4CAT Capture & Analysis Toolkit' -copyright = '2021, OILab & Digital Methods Initiative' -author = 'OILab & Digital Methods Initiative' - - -# -- General configuration --------------------------------------------------- - -# Add any Sphinx extension module names here, as strings. They can be -# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom -# ones. -extensions = [ - 'sphinx.ext.autodoc', - 'sphinx.ext.autosummary', - "sphinx.ext.napoleon", - 'm2r2', - 'sphinx.ext.intersphinx' -] - -# Add any paths that contain templates here, relative to this directory. -templates_path = ['_templates'] - -# List of patterns, relative to source directory, that match files and -# directories to ignore when looking for source files. -# This pattern also affects html_static_path and html_extra_path. -exclude_patterns = [] - -source_suffix = [".rst", ".md"] - -autodoc_default_options = { - "member-order": "groupwise" -} - -# -- Options for HTML output ------------------------------------------------- - -# The theme to use for HTML and HTML Help pages. See the documentation for -# a list of builtin themes. -# -html_theme = 'sphinx_rtd_theme' - -# Add any paths that contain custom static files (such as style sheets) here, -# relative to this directory. They are copied after the builtin static files, -# so a file named "default.css" will overwrite the builtin "default.css". -html_static_path = ['_static'] \ No newline at end of file diff --git a/docs/datasource.rst b/docs/datasource.rst deleted file mode 100644 index c4731a3e1..000000000 --- a/docs/datasource.rst +++ /dev/null @@ -1,73 +0,0 @@ -================= -4CAT Data sources -================= - -4CAT is a modular tool. Its modules come in two varieties: data sources and processors. This article covers the former. - -Data sources are a collection of workers, processors and interface elements that extend 4CAT to allow scraping, -processing and/or retrieving data for a given platform (such as Instagram, Reddit or Telegram). 4CAT has APIs that can -do most of the scaffolding around this for you so data source can be quite lightweight and mostly focus on retrieving -the actual data while 4CAT's back-end takes care of the scheduling, determining where the output should go, et cetera. - -Data sources are defined as an arbitrarily-named folder in the datasources folder in the 4CAT root. It is recommended to -use the datasource ID (see below) as the data source folder name. However, since Python files included in the folder -will be included as modules by 4CAT, folder names should be allowed as module names. Concretely this means (among other -things) that data source folder names cannot start with a number (hence the fourchan data source). - -*WARNING:* Data sources in multiple ways can define arbitrary code that will be run by either the 4CAT server or -client-side browsers. Be careful when running a data source supplied by someone else. - -A data source will at least contain the following: - -* An __init__.py containing data source metadata and initialisation code -* A search worker, which can collect data according to provided parameters and format it as a CSV or NDJSON file that - 4CAT can work with. - -It may contain additional components: - -* Any processors that are specific to datasets created by this data source -* Views for the web app that allow more advanced behaviour of the web tool interface -* Database or Sphinx index definitions - -The instructions below describe how to format and create these components (work in progress!) - -------------------- -Initialisation code -------------------- - -The data source root should contain a file `__init__.py` which in turn defines the following: - -.. code-block:: python - - DATASOURCE = "datasource-identifier" - -This constant defines the data source ID. This is most importantly used in config.py to enable the data source. - -.. code-block:: python - - def init_datasource(database, logger, queue, name): - pass - -This function is called when 4CAT starts, if the data source is enabled, and should set up anything the data source -needs to function (e.g. queueing any recurring workers). A default implementation of this function can be used instead -(and when defining your own, it is advised to still call it as part of your own implementation): - -.. code-block:: python - - from backend.lib.helpers import init_datasource - ------------------- -The `Search` class ------------------- -.. autoclass:: backend.lib.search.Search - :members: - :undoc-members: - :show-inheritance: - ---------------------------- -The `SearchWithScope` class ---------------------------- -.. autoclass:: backend.lib.search.SearchWithScope - :members: - :undoc-members: - :show-inheritance: \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst deleted file mode 100644 index addb57b3f..000000000 --- a/docs/index.rst +++ /dev/null @@ -1,20 +0,0 @@ -.. 4CAT Capture & Analysis Toolkit documentation master file, created by - sphinx-quickstart on Tue Oct 19 11:38:20 2021. - You can adapt this file completely to your liking, but it should at least - contain the root `toctree` directive. - -Welcome to 4CAT Capture & Analysis Toolkit's documentation! -=========================================================== - -This documentation collects information about 4CAT's internals - -.. toctree:: - :maxdepth: 2 - :caption: Contents: - - introduction - processor - datasource - worker - -* :ref:`search` diff --git a/docs/introduction.rst b/docs/introduction.rst deleted file mode 100644 index b33e21e8d..000000000 --- a/docs/introduction.rst +++ /dev/null @@ -1,5 +0,0 @@ -============ -Introduction -============ - -.. mdinclude:: ../../README.md \ No newline at end of file diff --git a/docs/processor.rst b/docs/processor.rst deleted file mode 100644 index 3073b3a09..000000000 --- a/docs/processor.rst +++ /dev/null @@ -1,63 +0,0 @@ -=============== -4CAT Processors -=============== - -4CAT is a modular tool. Its modules come in two varieties: data sources and processors. This article covers the latter. - -Processors are bits of code that produce a dataset. Typically, their input is another dataset. As such they can be used -to analyse data; for example, a processor can take a csv file containing posts as input, count how many posts occur per -month, and produce another csv file with the amount of posts per month (one month per row) as output. Processors always -produce the following things: - -* A set of metadata for the Dataset the processor will produce. This is stored in 4CAT's PostgreSQL database. The - record for the database is created when the processor's job is first queued, and updated by the processor. -* A result file, which may have an arbitrary format. This file contains whatever the processor produces, e.g. a list - of frequencies, an image wall or a zip archive containing word embedding models. -* A log file, with the same file name as the result file but with a '.log' extension. This documents any output from - the processor while it was producing the result file. - -4CAT has an API that can do most of the scaffolding around this for you so processors can be quite lightweight and -mostly focus on the analysis while 4CAT's back-end takes care of the scheduling, determining where the output should -go, et cetera. - -A minimal example of a processor could look like this: - -.. code-block:: python - - """ - A minimal example 4CAT processor - """ - from backend.lib.processor import BasicProcessor - - class ExampleProcessor(BasicProcessor): - """ - Example Processor - """ - type = "example-processor" # job type ID - category = "Examples" # category - title = "A simple example" # title displayed in UI - description = "This doesn't do much" # description displayed in UI - extension = "csv" # extension of result file, used internally and in UI - - input = "csv:body" - output = "csv:value" - - def process(self): - """ - Saves a CSV file with one column ("value") and one row with a value ("Hello - world") and marks the dataset as finished. - """ - data = {"value": "Hello world!"} - self.write_csv_items_and_finish(data) - - -But there is more you can do. The full API looks like this: - --------------------------- -The `BasicProcessor` class --------------------------- - -.. autoclass:: backend.lib.processor.BasicProcessor - :members: - :undoc-members: - :show-inheritance: \ No newline at end of file diff --git a/docs/requirements.txt b/docs/requirements.txt deleted file mode 100644 index ecd67a4ad..000000000 --- a/docs/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -m2r2 \ No newline at end of file diff --git a/docs/worker.rst b/docs/worker.rst deleted file mode 100644 index 6eafd5f5a..000000000 --- a/docs/worker.rst +++ /dev/null @@ -1,14 +0,0 @@ -=============== -4CAT Workers -=============== - -TBD - ------------------------ -The `BasicWorker` class ------------------------ - -.. autoclass:: backend.lib.worker.BasicWorker - :members: - :undoc-members: - :show-inheritance: \ No newline at end of file