Guide to Implement a Python Native Operator (converting from a Python UDF)

In the kiwi page for PythonUDF, we introduced the basic concepts of PythonUDF and described each API. To let other users use the Python operators, it is necessary to implement it as a native operator.

In this section, we will discuss how to implement a Python native operator and let future users drag and drop to use it. We will start by implementing a sample UDF then talk about how to convert it to a native operator.

Starting with a Sample Python UDF

Suppose we have a sample Python UDF named Treemap Visualizer, as presented below:

The UDF takes a CSV file as its input. For this example, we use a dataset of geo-location information of tweets. A sample of the dataset is shown below:

The Treemap Visualizer UDF takes the CSV file as a table (using Table API) and outputs a HTML page that contains a treemap figure. The HTML page will be consumed by the HTML visualizer operator, and the View Result operator eventually displays the figure in the browser. The visualization is presented below:

Now, let's take a closer look at the Treemap Visualizer UDF. As shown in the following code block, the UDF contains 3 steps:

from pytexera import *

import plotly.express as px
import plotly.io
import plotly
import numpy as np


class ProcessTableOperator(UDFTableOperator):

    @overrides
    def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
        table = table.groupby(['geo_tag.countyName','geo_tag.stateName']).size().reset_index(name='counts')
        #print(table)
        fig = px.treemap(table, path=['geo_tag.stateName','geo_tag.countyName'], values='counts',
                         color='counts', hover_data=['geo_tag.countyName','geo_tag.stateName'],
                         color_continuous_scale='RdBu',
                         color_continuous_midpoint=np.average(table['counts'], weights=table['counts']))
        fig.update_layout(margin=dict(t=50, l=25, r=25, b=25))
        html = plotly.io.to_html(fig, include_plotlyjs='cdn', auto_play=False)
        yield {'html': html}

It first performs an aggregation with a groupby to calculate the number of geo_tags of each US state.
Then it invokes the Plotly library to create a treemap figure based on the aggregated dataset.
Lastly, it converts the treemap figure object into a HTML string, by invoking the to_html function in the Plotly library, and yields it as the output.

Convert the UDF into a Python Native Operator

Next we convert the Treemap Visualizer UDF into a native operator. As described in wiki page for Java native operator, a native operator requires the definitions of a descriptor (Desc), an executor (Exec), and a configuration (OpConfig). Python native operator also requires the same set of definitions, with some unique tweaks. We use the Treemap Visualization operator as an example to elaborate the differences:

Operator Descriptor (Desc)

Operator infomation
Operator information is the same as Java native operator, which contains the name, description, group, input port and output port information.
Extending interface
Instead of implementing OperatorDescriptor interface, Python native operators implement the PythonOperatorDescriptor interface with overriding the generatePythonCode method. At the same time, our example is a VisualizationOperator. So we need to extend it as well. But this might be different in other cases.
Python content
The generatePythonCode method returns the actual Python code in string, as shown below:

Now, let's compare the code in the PythonUDF with what we write in the descriptor. As you can see, both are responsible for generating the treemap figure and converting it into HTML. Additionally, we've included null value handling and error alerts to make our operator more comprehensive.
Output schema
The Python UDF needs to define output Schema in the property editor, while for native operators the output Schema is defined by implementing getOutputSchema. To do so, use a Schema builder and add the output schema with the attribute name “html-content”.
```
override def getOutputSchema(schemas: Array[Schema]): Schema = {
        Schema.newBuilder.add(new Attribute("html-content", AttributeType.STRING)).build
      }
```
Chart type
Since this operator is a visualization operator, we need to register its chart type as a HTML_VIZ.
```
override def chartType(): String = VisualizationConstants.HTML_VIZ
```

Executor (Exec)

In all Python native operators, the executor is simply the PythonUDFExecutor.

Operator Configuration

In a Python native operator, it shares the same configuration as a Java native operator.

Registration

It has the same process as a Java native operator.

Test

After following all the steps above, you should be able to drag and drop the operator into the canvas. By running the execution, the workflow should be able to run and the operator will output the result.

Available Visualization Operator tasks

Plotly, a Python graphing library, has a list of basic charts, which can be found at Plotly Python Open Source Graphing Library Basic Charts. By following all the 5 steps, you should have enough information about implementing a new Python Native Operator. Currently, we implemented several visualization operators for the them: Scatter Plots, Line Charts, Bar Charts (Horizontal as well), Pie Charts, Bubble Charts, Dots Plots, Filled Area Plots, Gantt Chart, Treemap Chart.

For double check, please refers to this link that contains all the visualization operators.

Overview of Wiki

Videos

Pubs, Talks, and Courses

Step 1 - Guide to Use Texera

Step 2 - Guide for Developers

Step 3 - Guide to Implement a Java Native Operator

Step 4 - Guide to Use a Python UDF

Step 5 - Guide to Implement a Python Native Operator

Step 6 - Guide to Raise a Pull Request (PR)

Interesting and Important reads

Contributors

Guide to Implement a Python Native Operator (converting from a Python UDF)

Starting with a Sample Python UDF

Convert the UDF into a Python Native Operator

Operator Descriptor (Desc)

Executor (Exec)

Operator Configuration

Registration

Test

Available Visualization Operator tasks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally