|
| 1 | +# **RFC0011 for Presto** |
| 2 | + |
| 3 | +## Creating a Dynamically Linked Library in CPP |
| 4 | + |
| 5 | +Proposers |
| 6 | + |
| 7 | +* Soumya Duriseti |
| 8 | +* Tim Meehan |
| 9 | + |
| 10 | +## [Related Issues] |
| 11 | + |
| 12 | +https://github.com/facebookincubator/velox/pull/11439/ |
| 13 | +https://github.com/prestodb/presto/pull/24330 |
| 14 | + |
| 15 | +## Summary |
| 16 | +This proposed change adds the ability to load User Defined Functions (UDFs), types, and connectors without having to fork and build Prestissimo through the use of shared libraries. |
| 17 | +The Prestissimo worker is to access said code. The dynamic shared libraries are to be loaded upon running an instance of the presto server. In the presto server instance, it will look through a user provided Json config for the specified dynamic libraries in the plugin directory and load them dynamically. |
| 18 | +## Background |
| 19 | +Currently, on Presto, any Java UDFs, types, and connectors can be loaded dynamically through the use of the Plugin SPI. These plugins allow Presto users to add a custom flavor of the Presto language through the introductions of functions not known to the Presto engine. To describe the current flow, it is not possible to add custom elements to Prestissimo without: |
| 20 | +1. Adding a corresponding Java function through the use of the Plugin SPI. |
| 21 | +2. Forking Prestissimo to manually register the custom elements. |
| 22 | + |
| 23 | +RFC-0003 added the ability to validate C++ functions in the Presto coordinator without creating a corresponding function in the Java SPI. RFC 0005 allows users to register custom functions dynamically without having to fork prestissimo. |
| 24 | + |
| 25 | +### [Optional] Goals |
| 26 | +Register custom functions, types, and connectors dynamically without requiring a fork of Prestissimo. |
| 27 | +### [Optional] Non-goals |
| 28 | +Security concerns: There are some security concerns associated with using the built in plugins. Using the dlopen library, we run the risk of opening unsecure unknown shared objects especially given the lack of any form of validation. On C++, we share the same limitations as the functionality in Java with a noted exception of Java Presto running in a VM environmnet while the C++ version will be run locally. |
| 29 | + |
| 30 | +Performance concerns: Users may write poorly-written custom elements which may degrade performance. |
| 31 | + |
| 32 | +Reliability concerns: Users may write poorly-written custom elements which may cause instability to the Presto cluster. |
| 33 | + |
| 34 | +This is a well-known limitation of unfenced functions. Users are aware of these risks which are also concerns of the existing Plugin SPI. The goal of this RFC is to bring feature parity with the Plugin SPI. |
| 35 | +## Proposed Implementation |
| 36 | +This library will be implemented in three stages, with the first one being for User Defined Functions. Type and Connectors will build onto the dynamic library following the same format in terms of the implementation and design. |
| 37 | + |
| 38 | +### General Implementation |
| 39 | +The user can register their custom elements dynamically by creating .dylib or .so shared libraries and dropping them in a plugin directory. This plugin directory needs to be defined with the plugin.dir property in config.properties of the prestissimo worker. This directory will be scanned and all shared libraries will be attempted to be dynamically loaded when the worker or the sidecar process starts. |
| 40 | + |
| 41 | +#### User Defined Function Implementation |
| 42 | +For dynamically loaded function registration, the format followed is mirrored of that of built-in function registration with some noted differences. For instance, the below example function uses the extern "C" keyword to protect against name mangling. Additionally, a registry() function call is also necessary here. |
| 43 | + |
| 44 | +``` |
| 45 | +#include "presto_cpp/main/dynamic_registry/DynamicFunctionRegistrar.h" |
| 46 | +
|
| 47 | +namespace facebook::presto::common::dynamicRegistry { |
| 48 | +
|
| 49 | +template <typename T> |
| 50 | +struct DynamicFunction { |
| 51 | + FOLLY_ALWAYS_INLINE bool call(int64_t& result) { |
| 52 | + result = 123; |
| 53 | + return true; |
| 54 | + } |
| 55 | +}; |
| 56 | +
|
| 57 | +} // namespace facebook::presto::common::dynamicRegistry |
| 58 | +
|
| 59 | +extern "C" { |
| 60 | +void registry12() { |
| 61 | + facebook::presto::registerPrestoFunction< |
| 62 | + facebook::presto::common::dynamicRegistry::DynamicFunction, |
| 63 | + int64_t>("dynamic"); |
| 64 | +} |
| 65 | +} |
| 66 | +``` |
| 67 | +To turn these files into shared libraries, the CMakeLists.txt file will be used. |
| 68 | + |
| 69 | +#### Error and Signal Handling |
| 70 | +This function loads a shared library at runtime and invokes its registry entry point. Any failure during loading or symbol resolution causes the worker process to terminate immediately. This fail-fast design ensures consistency across the cluster: if a worker cannot successfully load the required library, it crashes rather than continuing in a partially initialized state. Allowing workers to proceed in inconsistent states could fail queries should the splits be assigned to one of these workers. |
| 71 | + |
| 72 | +## [Optional] Metrics |
| 73 | + |
| 74 | +We indend to use Pbench for performing performance testing. This library's effectiveness can be measured by successful completion of registering all of UDFs and validating their proper registration using the CLI with a call to SHOW FUNCTIONS and a SQL query invoking the registered function will indicate successful completion of the process. |
| 75 | + |
| 76 | +## [Optional] Other Approaches Considered |
| 77 | + |
| 78 | +Based on the discussion, this may need to be updated with feedback from reviewers. |
| 79 | + |
| 80 | +## Adoption Plan |
| 81 | + |
| 82 | +- What impact (if any) will there be on existing users? Are there any new session parameters, configurations, SPI updates, client API updates, or SQL grammar? |
| 83 | +No impact as this is a new offering. |
| 84 | +- If we are changing behaviour how will we phase out the older behaviour? |
| 85 | +- If we need special migration tools, describe them here. |
| 86 | +- When will we remove the existing behaviour, if applicable. |
| 87 | +- How should this feature be taught to new and existing users? Basically mention if documentation changes/new blog are needed? |
| 88 | +A presto-docs entry with these changes will be included to explain to users how to properly use the dylib functionality. |
| 89 | +- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? |
| 90 | +None. |
| 91 | + |
| 92 | +## Test Plan |
| 93 | + |
| 94 | +An E2E test will go through the entire process and validate the function registering with a call to SHOW FUNCTIONS and a SQL query invoking the registered function in a containerized test. |
0 commit comments