Component to extract CSV files from PostgreSQL database search (1st CLAIMED component) #19

romeokienzler · 2021-08-23T14:41:00Z

This is the first component created by C3 - the CLAIMED component compiler - entirely from a notebook. This is the prototype component and PR - once accepted will PR the remaining 30+ components from CLAIMED. More on CLAIMED

Here a summary video ~10min about how CLAIMED, Elyra, JupyterLab, KubeFlow, MLX and Kubernetes play together https://youtu.be/H8WskMEUI74

Here the same in written form https://romeokienzler.medium.com/create-component-oriented-data-science-pipelines-using-claimed-elyra-kubeflow-pipelines-mlx-and-f17eab24b91c

mlx-bot · 2021-09-01T11:58:16Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: romeokienzler
To complete the pull request process, please assign yhwang after the PR has been reviewed.
You can assign the PR to them by writing /assign @yhwang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

romeokienzler · 2021-09-01T11:58:43Z

@ckadner description and readme markdown added

Signed-off-by: Romeo Kienzler <[email protected]>

component-samples/claimed/input/input-postgresql.yaml

ckadner · 2021-09-08T21:52:46Z

component-samples/claimed/input/input-postgresql.yaml

+        - -ec
+        - |
+          mkdir -p `echo $0 |sed -e 's/\/[a-zA-Z0-9]*$//'`
+          wget https://raw.githubusercontent.com/IBM/claimed/master/component-library/input/input-postgresql.ipynb


I'm just wondering: since this component (and probably other Claimed compoinents?) is running a notebook, would there be a benefit to adding this functionality as a Notebook asset type, instead of a Component in the MLX Katalog?

The benefit of running as a notebook, would be additional graphics, tables, debug statements that could be more conveniently surfaced to the user?

If the functionality is very narrowly defined, like it seems to be the case here, a simple component might make more sense though.

ckadner · 2021-09-08T21:57:18Z

component-samples/claimed/input/input-postgresql.yaml

+# See the License for the specific language governing permissions and
+# limitations under the License.
+name: Input Postgresql
+description: This notebook pulls data from a postgresql database as CSV on a given SQL statement


Maybe we can add some filter_categories (manually) to this component (maybe all our components) to mark which step/stage in the ML Pipeline this (or any) component fits into, i.e. this one could be:

filter_categories: ... pipeline_stage: "data collection"

ckadner · 2021-09-08T22:06:10Z

component-samples/claimed/input/input-postgresql.md

@@ -0,0 +1,3 @@
+# Input PostgreSQL
+This component pulls data from a postgresql database as CSV on a given SQL statement. Parameters like
+host, database, user, password and sql need to be set. Please note that data is processed in-memory (pandas) and can't spill on disk (spark) yet. Therefore, the queried data must fit onto main memory (of the POD in case running within KubeFlow context.


Another nit-pick: there is an opening parenthesis ( that is not needed:

... fit into main memory (of the pod ...

ckadner · 2021-09-08T22:09:07Z

component-samples/claimed/input/input-postgresql.yaml

+- {name: password, type: String, description: 'db password'}
+- {name: port, type: Integer, description: 'db port'}
+- {name: sql, type: String, description: 'sql query statement to be executed'}
+- {name: data_dir, type: String, description: 'temporal data storage for local execution'}


You don't mention data_dir in the README so I assume it's optional?

I think in MLX, we have the convention of marking input parameters as Required in the description. Optional parameters should have a default value.

inputs: - {name: token, description: 'Required. GitHub token for accessing private repository'} - {name: url, description: 'Required. GitHub raw path for accessing the credential file'} - {name: name, description: 'Required. Secret Name to be stored in Kubernetes'}

The KFP docs say this https://www.kubeflow.org/docs/components/pipelines/sdk/component-development/#define-your-components-interface:

optional: Specifies if this input is optional. The value of this attribute is of type Bool, and defaults to False.

ckadner · 2021-09-09T18:43:30Z

@romeokienzler -- any reason to restrict this new component to extract CSV from a PostgrSQL database? Could it be generalized to work with any JDBC type database connection?

romeokienzler added 2 commits September 1, 2021 14:00

Create input-postgresql.yaml

517f966

Signed-off-by: Romeo Kienzler <[email protected]>

Create input-postgresql.md

69052ed

Signed-off-by: Romeo Kienzler <[email protected]>

romeokienzler force-pushed the patch-1 branch from 77ef9d5 to 69052ed Compare September 1, 2021 12:00

ckadner reviewed Sep 8, 2021

View reviewed changes

component-samples/claimed/input/input-postgresql.yaml Show resolved Hide resolved

Update input-postgresql.yaml

b2e94d3

ckadner reviewed Sep 8, 2021

View reviewed changes

ckadner changed the title ~~add input-postgresql.yaml - 1st CLAIMED component~~ Component to extract CSV files from PostgreSQL database search (1st CLAIMED component) Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Component to extract CSV files from PostgreSQL database search (1st CLAIMED component) #19

Component to extract CSV files from PostgreSQL database search (1st CLAIMED component) #19

romeokienzler commented Aug 23, 2021 •

edited

Loading

mlx-bot commented Sep 1, 2021

romeokienzler commented Sep 1, 2021

ckadner Sep 8, 2021

ckadner Sep 8, 2021

ckadner Sep 8, 2021

ckadner Sep 8, 2021

ckadner commented Sep 9, 2021

Component to extract CSV files from PostgreSQL database search (1st CLAIMED component) #19

Are you sure you want to change the base?

Component to extract CSV files from PostgreSQL database search (1st CLAIMED component) #19

Conversation

romeokienzler commented Aug 23, 2021 • edited Loading

mlx-bot commented Sep 1, 2021

romeokienzler commented Sep 1, 2021

ckadner Sep 8, 2021

Choose a reason for hiding this comment

ckadner Sep 8, 2021

Choose a reason for hiding this comment

ckadner Sep 8, 2021

Choose a reason for hiding this comment

ckadner Sep 8, 2021

Choose a reason for hiding this comment

ckadner commented Sep 9, 2021

romeokienzler commented Aug 23, 2021 •

edited

Loading