Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@ jobs:
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- run: pip install '.[devel]'
with:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python-version: '3.9.4'
- run: pip install -e '.[devel]'
- run: pre-commit install
- run: pre-commit run --all-files
run-tests:
Expand All @@ -39,6 +41,6 @@ jobs:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.8'
python-version: '3.9.4'
- run: pip install '.[devel]'
- run: pytest tests
37 changes: 37 additions & 0 deletions docker-compose.dev.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
version: '3'

services:
# Elasticsearch node required as a database for Apache Kibble
elasticsearch:
image: elasticsearch:7.13.1
ports:
- 9200:9200
- 9300:9300
environment:
node.name: es01
discovery.seed_hosts: es02
cluster.initial_master_nodes: es01
cluster.name: kibble
ES_JAVA_OPTS: -Xms256m -Xmx256m
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- "kibble-es-data:/usr/share/elasticsearch/data"

# Kibana to view and manage Elasticsearch
kibana:
image: kibana:7.13.1
ports:
- 5601:5601
depends_on:
- elasticsearch
environment:
ELASTICSEARCH_URL: http://elasticsearch:9200
ELASTICSEARCH_HOSTS: http://elasticsearch:9200

volumes:
# named volumes can be managed easier using docker-compose
kibble-es-data:
126 changes: 126 additions & 0 deletions docs/architecture.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

Apache Kibble Overview
======================
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharanf @Humbedooh @kaxil @michalslowikowski00 I added some docs/notes about current status and how things are. Let me know what do you think

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tomek! This is good work. I have added some minor text changes.


Kibble configuration
--------------------

Currently Apache Kibble is configured using `kibble.yaml` configuration file.

Database configuration
......................

.. code-block::

elasticsearch:
hosts:
- http://localhost:9200

Data sources configuration
..........................

Multiple data sources can be configured. Each data source is defined by a python class. Additionally to that users
have to pass ``name`` and ``config`` which is a configuration specific for a given data source.

.. code-block::

data_sources:
- name: name
class: path.to.a.Class
config:
# Data source specific configuration

Data source
-----------

Data source represents an external source of information (for example Github, JIRA, mailing list etc). Each data source
is a python package. In this way users can easily build their own data sources and use them with Kibble.

Data source package has to have the following structure:

.. code-block::

data_source_name/
| __init__.py
| ...
| data_types
| | __init__.py
| | type1.py
| | type2.py
| | ...

The ``data_source_name.__init__`` should include the class defining the data source but the class can be placed in another
file in top leve directory of the package.

Data types
..........

Data type represents a single type of data within a data source. For example if Github is a data source then issues and
comments will be two different data types. A data type is a class that has to implement ``fetch_data`` method that is
used to fetch and persist data.

Data types are automatically determined using data source class path.

Each data type is an index in Kibble elasticsearch instance. The data should be stored "as is" so users can leverage existing
documentation.

Next to persisting data, a data type should also define metrics that can be calculate on retrieved data.

Configuring a data source
.........................

As described previously data sources can be configured in ``kibble.yaml`` config file. For example:

.. code-block::

data_sources:
- name: kibble_github
class: kibble.data_sources.github.GithubDataSource
config:
repo_owner: apache
repo_name: kibble
enabled_data_types:
- issues
- discussions

- name: pulsar_github
class: kibble.data_sources.github.GithubDataSource
config:
repo_owner: apache
repo_name: pulsar
enabled_data_types:
- issues
- comments

- name: pulsar_dev_list
class: kibble.data_sources.pony.PonyDataSource
config:
list_name: [email protected]
enabled_data_types:
- threads

In the above example we can see that:

* We configured two different data sources based on ``GithubDataSource``: apache/pulsar and apache/kibble Github repositories.
For both sources we fetch different information. For Kibble we fetch issues and discussions data while for Apache
Pulsar we fetch issues and comments data.
* There's also a third data source using ``PonyDataSource`` configured for Apache Pulsar dev list.

Thanks to this design users will gain more granularity to configure the data they want to fetch. This also creates a big
opportunity for configuring different authorization options for each data source in future.
16 changes: 16 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
36 changes: 36 additions & 0 deletions docs/installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

Installation steps
==================

To install Apache Kibble run:

.. code-block::

pip install -e ".[devel]"

You will also need a Elasticsearch instance up and running. You can setup one using docker-compose

.. code-block::

docker-compose -f docker-compose.dev.yaml up

Once ES is running you can scan configured data sources:

.. code-block::
kibble scanners run -s github_kibble
37 changes: 19 additions & 18 deletions kibble/cli/commands/scanners_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,28 +19,29 @@

import click

from kibble.configuration.yaml_config import kconfig
from kibble.data_sources.base.base_data_source import DataSourceConfig


@click.group(name="scanners")
def scanners_group():
"""Configure and trigger scanners"""


@scanners_group.command()
def add():
"""Add new scanner configuration"""
click.echo("To be implemented!")


@scanners_group.command(name="list")
def list_scanners():
"""List all available scanners"""
scanners_list = ["AbcScanner", "XyzeScanner"]
for scanner in scanners_list:
click.echo(f"- {scanner}")


@scanners_group.command()
@click.argument("scanner_name")
def run(scanner_name: str):
"""Trigger a scanning process for given scanner"""
click.echo(f"Running {scanner_name}")
@click.option("-s", "--data-source", "data_source_name", required=True)
def run(data_source_name: str):
"""Trigger a scanning process for given data source"""
data_source_config = None
for ds_in_config in kconfig.get("data_sources", []):
if ds_in_config["name"] == data_source_name:
data_source_config = DataSourceConfig.from_dict(ds_in_config)
break

if not data_source_config:
click.echo(f"Data source {data_source_name} not configured")
return

data_source = data_source_config.get_object()
click.echo(f"Scanning {data_source_name}")
data_source.scan()
File renamed without changes.
37 changes: 37 additions & 0 deletions kibble/configuration/yaml_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

from pathlib import Path
from typing import Dict

import yaml

KIBBLE_YAML = "kibble.yaml"


def parse_kibble_yaml() -> Dict:
"""Reads kibble.yaml config file"""
config_path = Path(__file__).parent.parent.joinpath(KIBBLE_YAML)
with open(config_path, "r") as stream:
config = yaml.safe_load(stream)
return config


kconfig = parse_kibble_yaml()

if __name__ == "__main__":
parse_kibble_yaml()
16 changes: 16 additions & 0 deletions kibble/data_sources/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
16 changes: 16 additions & 0 deletions kibble/data_sources/base/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
Loading