Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enh: refactoring repo organization #121

Merged
merged 16 commits into from
Mar 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 35 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,65 @@
# Nautilus Connectors Kit

**NCK is a Command-Line Interface (CLI), allowing you to easily request, stream and store raw reports, from the API source to the destination of your choice.**
**NCK is an E(T)L tool specialized in API data ingestion. It is accessible through a Command-Line Interface. The application allows you to easily extract, stream and load data (with minimum transformations), from the API source to the destination of your choice.**

The official documentation is available [here](https://artefactory.github.io/nautilus-connectors-kit/).
As of now, the most common output format of data loaded by the application is .njson (i.e. a file of n lines, where each line is a json-like dictionary).

Official documentation is available [here](https://artefactory.github.io/nautilus-connectors-kit/).

---

## Philosophy

The application is composed of **3 main components** (*implemented as Python classes*). When combined, these components act as data connectors, allowing you to stream data from a source to the destination of your choice:
The application is composed of **3 main components** (*implemented as Python classes*). When combined, these components act as an E(T)L pipeline, allowing you to stream data from a source to the destination of your choice:

- [Readers](nck/readers) are reading data from an API source, and transform it into a stream object.
- [Streams](nck/streams) (*transparent to the end-user*) are local objects used by writers to process individual records collected from the source.
- [Writers](nck/writers) are writing the output stream object to the destination of your choice.

## Available connectors

As of now, the application is offering:
As of now, the application is offering the following Readers & Writers:

### Readers

**Analytics**

- Adobe Analytics 1.4
- Adobe Analytics 2.0
- Google Analytics

**Advertising**

- **DSP**

- **Analytics**
- Adobe Analytics 1.4
- Adobe Analytics 2.0
- Google Analytics
- **Advertising - Adserver**
- Google Campaign Manager
- **Advertising - DSP**
- Google Display & Video 360
- The Trade Desk

- **Adserver**

- Google Campaign Manager

- **Search**

- **Advertising - Search**
- Google Ads
- Google Search Ads 360
- Google Search Console
- Yandex Campaign
- Yandex Statistics

- **Social**

- **Advertising - Social**
- Facebook Marketing
- MyTarget
- Radarly
- Twitter Ads

**CRM**

- SalesForce

**Databases**

- MySQL

**Files (.csv, .njson)**

- Amazon S3
- Google Cloud Storage
- Google Sheets

**DevTools**

- Confluence

- **CRM**
- SalesForce
- **Databases**
- MySQL
- **DevTools**
- Confluence
- **Files (.csv, .njson)**
- Amazon S3
- Google Cloud Storage
- Google Sheets

### Writers

**Files (.njson)**

- Amazon S3
- Google Cloud Storage
- Local file

**Data Warehouse**

- Google BigQuery

**Debugging**

- Console

*A data connector could be, for instance, the combination of a Google Analytics reader + a Google Cloud Storage writer, collecting data from the Google Analytics API, and storing output stream records into a Google Cloud Storage bucket.*

For more information on how to use NCK, check out the [official documentation](https://artefactory.github.io/nautilus-connectors-kit/).
- **Data Warehouses**
- Google BigQuery
- **Debugging**
- Console
- **Files (.njson)**
- Amazon S3
- Google Cloud Storage
- Local file
76 changes: 58 additions & 18 deletions docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,21 +181,39 @@ How to develop a new reader

To create a new reader, you should:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the purpose of this PR but the "Getting Started" is oriented towards contributions and not towards our end users. I'm wondering if it is a good thing. Maybe we should put this section in a "Contributing" section and create a real "Getting started" guide with info on installation and 1 or 2 simple commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'll create an issue for this matter.


1. Create the following modules: ``nck/readers/<SOURCE_NAME>_reader.py``` and ``nck/helpers/<SOURCE_NAME>_helper.py``
1. Create a ``nck/readers/<SOURCE_NAME>/`` directory, having the following structure:

The ``nck/readers/<SOURCE_NAME>_reader.py`` module should implement 2 components:
.. code-block:: shell

- nck/
-- readers/
--- <SOURCE_NAME>/
---- cli.py
---- reader.py
---- helper.py # Optional
---- config.py # Optional

``cli.py``

- A click-decorated reader function
This module should implement a click-decorated reader function:

- The reader function should be decorated with: a ``@click.command()`` decorator, several ``@click.option()`` decorators (*one for each input that should be provided by end-users*) and a ``@processor()`` decorator (*preventing secrets to appear in logs*). For further information on how to implement these decorators, please refer to `click documentation <https://click.palletsprojects.com/en/7.x/>`__.
- The reader function should return a reader class (*more details below*). A source name prefix should be added to the name of each class attribute, using the ``extract_args()`` function.
- The reader function should be decorated with: a ``@click.command()`` decorator, several ``@click.option()`` decorators (*one for each input provided by end-users*) and a ``@processor()`` decorator (*preventing secrets to appear in logs*). For further information on how to implement these decorators, please refer to `click documentation <https://click.palletsprojects.com/en/7.x/>`__.
- The reader function should return a reader class (*more details below*). The source prefix of each option will be removed when passed to the writer class, using the ``extract_args()`` function.

- A reader class
``reader.py``

This module should implement a reader class:

- Class attributes should be the previously defined click options.
- The class should have a ``read()`` method, yielding a stream object. This stream object can be chosen from `available stream classes <https://github.com/artefactory/nautilus-connectors-kit/tree/dev/nck/streams>`__, and has 2 attributes: a stream name and a source generator function named ``result_generator()``, and yielding individual source records.
- The class should have a ``read()`` method, yielding a stream object. This stream object can be chosen from `available stream classes <https://github.com/artefactory/nautilus-connectors-kit/tree/dev/nck/streams>`__, and has 2 attributes: a stream name and a source generator function named ``result_generator()``, yielding individual source records.

``helper.py`` (Optional)

The ``nck/helpers/<SOURCE_NAME>_helper.py`` module should implement helper methods and configuration variables (*warning: we are planning to move configuration variables to a separate module for reasons of clarity*).
This module gathers all helper functions used in the ``reader.py`` module.

``config.py`` (Optional)

This module gathers all configuration variables.

2. In parallell, create unit tests for your methods under the ``tests/`` directory

Expand All @@ -204,8 +222,8 @@ The ``nck/helpers/<SOURCE_NAME>_helper.py`` module should implement helper metho
4. Complete the documentation:

- Add your reader to the list of existing readers in the :ref:`overview:Available Connectors` section.
- Add your reader to the list of existing readers in the repo's ``./README.md``.
- Create dedicated documentation for your reader CLI command on the :ref:`readers:Readers` page. It should include the followings sections: *Source API - How to obtain credentials - Quickstart - Command name - Command options*
- Add your reader to the reader list in the README, at the root of the GitHub project

---------------------------
How to develop a new stream
Expand All @@ -228,24 +246,46 @@ How to develop a new writer

To develop a new writer, you should:

1. Create the following module: ``nck/writers/<DESTINATION_NAME>_writer.py``
1. Create a ``nck/writers/<DESTINATION_NAME>/`` directory, having the following structure:

.. code-block:: shell

- nck/
-- writers/
--- <DESTINATION_NAME>/
---- cli.py
---- writer.py
---- helper.py # Optional
---- config.py # Optional

``cli.py``

This module should implement 2 components:
This module should implement a click-decorated writer function:

- A click-decorated writer function
- The writer function should be decorated with: a ``@click.command()`` decorator, several ``@click.option()`` decorators (*one for each input provided by end-users*) and a ``@processor()`` decorator (*preventing secrets to appear in logs*). For further information on how to implement these decorators, please refer to `click documentation <https://click.palletsprojects.com/en/7.x/>`__.
- The writer function should return a writer class (*more details below*). The destination prefix of each option will be removed when passed to the writer class, using the ``extract_args()`` function.

- The writer function should be decorated with: a ``@click.command()`` decorator, several ``@click.option()`` decorators (*one for each input that should be provided by end-users*) and a ``@processor()`` decorator (*preventing secrets to appear in logs*). For further information on how to implement these decorators, please refer to `click documentation <https://click.palletsprojects.com/en/7.x/>`__.
- The writer function should return a writer class (*more details below*). A destination name prefix should be added to the name of each class attribute, using the `extract_args` function.
``writer.py``

- A writer class
This module should implement a writer class:

- Class attributes should be the previously defined click options.
- The class should have a ``write()`` method, writing the stream object to the destination.

2. Add your click-decorated writer function to the ``nck/writers/__init__.py`` file
``helper.py`` (Optional)

This module gathers all helper functions used in the ``writer.py`` module.

``config.py`` (Optional)

This module gathers all configuration variables.

3. Complete the documentation:
2. In parallell, create unit tests for your methods under the ``tests/`` directory

3. Add your click-decorated writer function to the ``nck/writers/__init__.py`` file

4. Complete the documentation:

- Add your writer to the list of existing writers in the :ref:`overview:Available Connectors` section.
- Add your reader to the list of existing readers in the repo's ``./README.md``.
- Create dedicated documentation for your writer CLI command on the :ref:`writers:Writers` page. It should include the followings sections: *Quickstart - Command name - Command options*
- Add your writer to the writer list in the README, at the root of the GitHub project
92 changes: 33 additions & 59 deletions docs/source/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
Overview
########

**NCK is a Command-Line Interface (CLI), allowing you to easily request, stream and store raw reports, from the API source to the destination of your choice.** As of now, the most common output format of data extracted by the application is .njson (i.e. a file of n lines, where each line is a json-like dictionary).
**NCK is an E(T)L tool specialized in API data ingestion. It is accessible through a Command-Line Interface. The application allows you to easily extract, stream and load data (with minimum transformations), from the API source to the destination of your choice.**

As of now, the most common output format of data loaded by the application is .njson (i.e. a file of n lines, where each line is a json-like dictionary).

==========
Philosophy
==========

The application is composed of **3 main components** (*implemented as Python classes*). When combined, these components act as data connectors, allowing you to stream data from a source to the destination of your choice:
The application is composed of **3 main components** (*implemented as Python classes*). When combined, these components act as an E(T)L pipeline, allowing you to stream data from a source to the destination of your choice:

- :ref:`readers:Readers` are reading data from an API source, and transform it into a stream object.
- :ref:`streams:Streams` (*transparent to the end-user*) are local objects used by writers to process individual records collected from the source.
Expand All @@ -18,80 +20,52 @@ The application is composed of **3 main components** (*implemented as Python cla
Available connectors
====================

As of now, the application is offering:
As of now, the application is offering the following Readers & Writers:

*******
Readers
*******

**Analytics**

- Adobe Analytics 1.4
- Adobe Analytics 2.0
- Google Analytics

**Advertising**

- **DSP**
*******

- **Analytics**
- Adobe Analytics 1.4
- Adobe Analytics 2.0
- Google Analytics
- **Advertising - Adserver**
- Google Campaign Manager
- **Advertising - DSP**
- Google Display & Video 360
- The Trade Desk

- **Adserver**

- Google Campaign Manager

- **Search**

- **Advertising - Search**
- Google Ads
- Google Search Ads 360
- Google Search Console
- Yandex Campaign
- Yandex Statistics

- **Social**

- **Advertising - Social**
- Facebook Marketing
- MyTarget
- Radarly
- Twitter Ads

**CRM**

- SalesForce

**Databases**

- MySQL

**Files (.csv, .njson)**

- Amazon S3
- Google Cloud Storage
- Google Sheets

**DevTools**

- Confluence

- **CRM**
- SalesForce
- **Databases**
- MySQL
- **DevTools**
- Confluence
- **Files (.csv, .njson)**
- Amazon S3
- Google Cloud Storage
- Google Sheets

*******
Writers
*******

**Files (.njson)**

- Amazon S3
- Google Cloud Storage
- Local file

**Data Warehouse**

- Google BigQuery

**Debugging**

- Console


*A data connector could be, for instance, the combination of a Google Analytics reader + a Google Cloud Storage writer, collecting data from the Google Analytics API, and storing output stream records into a Google Cloud Storage bucket.*
- **Data Warehouses**
- Google BigQuery
- **Debugging**
- Console
- **Files (.njson)**
- Amazon S3
- Google Cloud Storage
- Local file
Loading