-
Notifications
You must be signed in to change notification settings - Fork 76
Guide to Implement a Python Native Operator (converting from a Python UDF)
In the kiwi page for PythonUDF, we introduced the basic concepts of PythonUDF and described each API. To let other users use the Python operators, it is necessary to implement it as a native operator.
In this section, we will discuss how to implement a Python native operator and let future users drag and drop to use it. We will start by implementing a sample UDF then talk about how to convert it to a native operator.
Suppose we have a sample Python UDF named Treemap Visualizer
, as presented below:

The UDF takes a CSV file as its input. For this example, we use a dataset of geo-location information of tweets. A sample of the dataset is shown below:

The Treemap Visualizer
UDF takes the CSV file as a table (using Table API) and outputs a HTML page that contains a treemap figure. The HTML page will be consumed by the HTML visualizer operator, and the View Result
operator eventually displays the figure in the browser. The visualization is presented below:

Now, let's take a closer look at the Treemap Visualizer
UDF.
As shown in the following code block, the UDF contains 3 steps:
from pytexera import *
import plotly.express as px
import plotly.io
import plotly
import numpy as np
class ProcessTableOperator(UDFTableOperator):
@overrides
def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
table = table.groupby(['geo_tag.countyName','geo_tag.stateName']).size().reset_index(name='counts')
#print(table)
fig = px.treemap(table, path=['geo_tag.stateName','geo_tag.countyName'], values='counts',
color='counts', hover_data=['geo_tag.countyName','geo_tag.stateName'],
color_continuous_scale='RdBu',
color_continuous_midpoint=np.average(table['counts'], weights=table['counts']))
fig.update_layout(margin=dict(t=50, l=25, r=25, b=25))
html = plotly.io.to_html(fig, include_plotlyjs='cdn', auto_play=False)
yield {'html': html}
- It first performs an aggregation with a groupby to calculate the number of geo_tags of each US state.
- Then it invokes the Plotly library to create a treemap figure based on the aggregated dataset.
- Lastly, it converts the treemap figure object into a HTML string, by invoking the
to_html
function in the Plotly library, and yields it as the output.
Next we convert the Treemap Visualizer
UDF into a native operator.
As described in wiki page for Java native operator, a native operator requires the definitions of a descriptor (Desc), an executor (Exec), and a configuration (OpConfig). Python native operator also requires the same set of definitions, with some unique tweaks. We use the Treemap Visualization
operator as an example to elaborate the differences:
-
Operator infomation
Operator information is the same as Java native operator, which contains the name, description, group, input port and output port information. -
Extending interface
Instead of implementingOperatorDescriptor
interface, Python native operators implement thePythonOperatorDescriptor
interface with overriding thegeneratePythonCode
method. At the same time, our example is aVisualizationOperator
. So we need to extend it as well. But this might be different in other cases. -
Python content
ThegeneratePythonCode
method returns the actual Python code in string, as shown below:Now, let's compare the code in the PythonUDF with what we write in the descriptor. As you can see, both are responsible for generating the treemap figure and converting it into HTML. Additionally, we've included null value handling and error alerts to make our operator more comprehensive.
-
Output schema
The Python UDF needs to define output Schema in the property editor, while for native operators the output Schema is defined by implementinggetOutputSchema
. To do so, use a Schema builder and add the output schema with the attribute name “html-content”.override def getOutputSchema(schemas: Array[Schema]): Schema = { Schema.newBuilder.add(new Attribute("html-content", AttributeType.STRING)).build }
-
Chart type
Since this operator is a visualization operator, we need to register its chart type as aHTML_VIZ
.override def chartType(): String = VisualizationConstants.HTML_VIZ
In all Python native operators, the executor is simply the PythonUDFExecutor
.
In a Python native operator, it shares the same configuration as a Java native operator.
It has the same process as a Java native operator.
After following all the steps above, you should be able to drag and drop the operator into the canvas. By running the execution, the workflow should be able to run and the operator will output the result.
Plotly, a Python graphing library, has a list of basic charts, which can be found at Plotly Python Open Source Graphing Library Basic Charts. By following all the 5 steps, you should have enough information about implementing a new Python Native Operator. Currently, we implemented several visualization operators for the them: Scatter Plots, Line Charts, Bar Charts (Horizontal as well), Pie Charts, Bubble Charts, Dots Plots, Filled Area Plots, Gantt Chart, Treemap Chart
.
For double check, please refers to this link that contains all the visualization operators.