Skip to content

[native] Dynamically Linked Library in Presto CPP #24330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

soumiiow
Copy link

@soumiiow soumiiow commented Jan 7, 2025

Description

Depends on facebookincubator/velox#11439 in the Velox space
and based off of the following PR: https://github.com/facebookincubator/velox/pull/1005/files

Motivation and Context

Having these changes will enable users to register custom functions dynamically without requiring a fork of Prestissimo.

Impact

This extends Prestissimo functionality to include dynamic loading of functions, types, connectors, etc.

Test Plan

Unit tested. and Manually end to end tested the changes.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... :pr:`12345`
* ... :pr:`12345`

Hive Connector Changes
* ... :pr:`12345`
* ... :pr:`12345`

If release note is NOT required, use:

== NO RELEASE NOTE ==

@soumiiow soumiiow self-assigned this Jan 7, 2025
Copy link

linux-foundation-easycla bot commented Jan 7, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: soumiiow / name: Soumya Duriseti (9ef0b63)

@soumiiow soumiiow marked this pull request as ready for review January 7, 2025 18:04
@soumiiow soumiiow requested a review from a team as a code owner January 7, 2025 18:04
@tdcmeehan tdcmeehan self-assigned this Jan 8, 2025
@tdcmeehan tdcmeehan added the from:IBM PR from IBM label Jan 30, 2025
@prestodb-ci prestodb-ci requested review from a team, pdabre12 and psnv03 and removed request for a team January 30, 2025 18:47
Copy link
Contributor

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also rebase.

@@ -76,7 +76,8 @@ target_link_libraries(
${FOLLY_WITH_DEPENDENCIES}
${GLOG}
${GFLAGS_LIBRARIES}
pthread)
pthread
velox_dynamic_function_loader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this before the velox_encode library.

@@ -89,6 +90,7 @@ set_property(TARGET presto_server_lib PROPERTY JOB_POOL_LINK
presto_link_job_pool)

add_executable(presto_server PrestoMain.cpp)
target_link_options(presto_server BEFORE PUBLIC "-Wl,-export-dynamic")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here why we need these flags?

const fs::path path(systemConfig->pluginDir());
PRESTO_STARTUP_LOG(INFO) << path;
std::error_code
ec; // For using the non-throwing overloads of functions below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don;t need this comment here and so can fix up the odd formatting.

void PrestoServer::registerDynamicFunctions() {
auto systemConfig = SystemConfig::instance();
if (!systemConfig->pluginDir().empty()) {
// if it is a valid directory, traverse and call dynamic function loader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the comments are full sentences beginning with capitalization etc.

auto dirEntryPath = dirEntry.path();
if (!fs::is_directory(dirEntry, ec) &&
extensions.find(dirEntryPath.extension()) != extensions.end()) {
facebook::velox::loadDynamicLibrary(dirEntryPath.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

facebook is not needed here because we are already in the facebook namespace.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! A few minor formatting and phrasing suggestions.

@@ -0,0 +1,17 @@
# Dynamic Loading of Presto Cpp Extensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Dynamic Loading of Presto Cpp Extensions
# Dynamic Loading of Presto CPP Extensions

@@ -0,0 +1,17 @@
# Dynamic Loading of Presto Cpp Extensions
This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.
This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.

This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.
## Getting started
1. Create a cpp file for your dynamic library
For dynamically loaded function registration, the format followed is mirrored of that of built-in function registration with some noted differences. Using [MyDynamicFunction.cpp](examples/MyDynamicFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For dynamically loaded function registration, the format followed is mirrored of that of built-in function registration with some noted differences. Using [MyDynamicFunction.cpp](examples/MyDynamicFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here.
For dynamically loaded function registration, the format is similar to that of built-in function registration, with some noted differences. Using [MyDynamicFunction.cpp](examples/MyDynamicFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here.

```
plugin.dir="User\Test\Path\plugin"
```
4. When the worker or the sidecar process starts, it will scan the plugin directory and attempt to dynamically load all shared libraries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. When the worker or the sidecar process starts, it will scan the plugin directory and attempt to dynamically load all shared libraries
When the worker or the sidecar process starts, it scans the plugin directory and attempts to dynamically load all shared libraries.

I don't think of this as part of the steps to configure, it's what happens when things are run using the config in steps 1-3.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! I really liked your including the setup steps on plugin.rst. Some minor suggestions for formatting and phrasing but looks good overall.

@soumiiow
Copy link
Author

Thanks @steveburnett please take another look, I've made the changes. I'm not sure about the tone on the intro to UDFs i have on function_plugin.rst, would appreciate another set of eyes there.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revision! Nice work, your unordered list of UDF benefits was great and the format fixes in the README look good.

I made a couple of small suggestions about the intro to UDFs, let me know what you think.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have noticed this spelling nit earlier! After I found this one, I did a complete review of the doc in this PR and found no other errors so I think this is the last one.

@aditi-pandit aditi-pandit changed the title Dynamically Linked Library in Presto CPP [native] Dynamically Linked Library in Presto CPP Feb 13, 2025
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow : Thanks for this code. Would be great to have an e2e test with a SQL statement using the dynamic functions. On the lines of https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeRemoteFunctions.java

auto systemConfig = SystemConfig::instance();
cpp_nameSpace = systemConfig->prestoDefaultNamespacePrefix();
}
std::string cpp_name(cpp_nameSpace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an assumption that namespace should end in "." ? We should make this behavior more robust by adding the period separator if its not present.

const char* nameSpace = "",
const std::vector<velox::exec::SignatureVariable>& constraints = {},
bool overwrite = true) {
std::string cpp_nameSpace(nameSpace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use camelcase cppNamespace

@@ -1598,5 +1600,28 @@ protocol::NodeStatus PrestoServer::fetchNodeStatus() {

return nodeStatus;
}
void PrestoServer::registerDynamicFunctions() {
auto systemConfig = SystemConfig::instance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to have a config of library names to load rather than load all those found in the directory. wdyt ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit I'm trying to understand here what would the additional benefit be in doing so.

From the perspective of the user, only the shared library paths will be recognized here so any say .txt files, .cpp files, etc will not be read. Beyond that, is there a benefit to having a config here? I see it as another step for the user to have to maintain and worry about every time they'd want to add/remove a shared library

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow : My thinking is that having a config of library names could help us if someone accidentally put in a library that had errors and issues. Right now a bad library causes the entire loading to fail, and the platform owner would need to restage the complete directory. If on the other hand they have a white-list of libraries that are good, they can easily add or remove libraries by changing the config.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @aditi-pandit , better to have a config file/entry to limit the library that can be loaded by prestissimo. Imagine that you load all libraries under a specific folder, what if someone write a library maliciously and put it into this folder. When calling those function that might lead to:

  1. Process crash
  2. Same privilege is shared, the function can do anything they want.

And if scanned by a security tool, this piece of code could report a vulnerability.

BTW, if a customer implement a new UDF themself and put the library to this plugin directory, do they need to restart presto server to take effect?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingLiuPing to answer the question, yes they have to restart the coordinator and all workers for this new udf to take effect

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having recursive globbing or regex can remediate the issue and that's a good idea.
But I think your current logic is good enough to filter the dynamic libraries from other file types. And you have defined the plugin.dir in configuration file.

You need to verify each dynamic library is actually the one you want to load. The easist way to achieve this is adding a verify functioin inside the dynamic library. And when loading the library you can check if verify function is existed and then execute it. Inside verify function you can check the pass-in value against a fixed value, for example a UUID or even the UDF name. Though this still can be compromised, a step further towards industry product.

I just jump in to your new feature. I assume the UDF is automatically registerd.
Another approach is provide another layer of control where after the function been loaded into velox, those function cannot be directly registed and called. Instead, some DDL need to be executed to register those function, for example
create function ABC (parameters...) options (library_path ......)
In this way you define the library path explicitly for each function. And only those function been explicitly created by above DDL can be called.

Your work opened up a whole new world. And it deserves to be more careful. Welcome to discuss more alternatives.
Thanks.
@soumiiow @mohsaka

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an example of a function which is added to the non-default namespace.

@mohsaka
Copy link
Contributor

mohsaka commented Feb 17, 2025

Changes needed for Mac,

Undefined symbols for architecture arm64:
  "facebook::presto::SystemConfig::instance()", referenced from:
      void facebook::presto::registerPrestoFunction<facebook::velox::common::dynamicRegistry::Dynamic123Function, facebook::velox::Varchar>(char const*, char const*, std::__1::vector<facebook::velox::exec::SignatureVariable, std::__1::allocator<facebook::velox::exec::SignatureVariable>> const&, bool) in MyDynamicVarcharFunction.cpp.o
  "facebook::presto::SystemConfig::prestoDefaultNamespacePrefix() const", referenced from:
      void facebook::presto::registerPrestoFunction<facebook::velox::common::dynamicRegistry::Dynamic123Function, facebook::velox::Varchar>(char const*, char const*, std::__1::vector<facebook::velox::exec::SignatureVariable, std::__1::allocator<facebook::velox::exec::SignatureVariable>> const&, bool) in MyDynamicVarcharFunction.cpp.o
ld: symbol(s) not found for architecture arm64
c++: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
  • Add presto_common and remove all other linked libraries as they are included with it.

Remove braces from examples.

Update:
Put target_link_options to avoid symbol errors.

if(APPLE)
  set(COMMON_LIBRARY_LINK_OPTIONS "-Wl,-undefined,dynamic_lookup")
else()
  set(COMMON_LIBRARY_LINK_OPTIONS "-Wl,--exclude-libs,ALL")
endif()

target_link_options(velox_function_my_dynamic PRIVATE ${COMMON_LIBRARY_LINK_OPTIONS})
target_link_options(velox_varchar_function_my_dynamic PRIVATE ${COMMON_LIBRARY_LINK_OPTIONS})
target_link_options(velox_array_function_my_dynamic PRIVATE ${COMMON_LIBRARY_LINK_OPTIONS})

@@ -1598,5 +1600,28 @@ protocol::NodeStatus PrestoServer::fetchNodeStatus() {

return nodeStatus;
}
void PrestoServer::registerDynamicFunctions() {
auto systemConfig = SystemConfig::instance();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @aditi-pandit , better to have a config file/entry to limit the library that can be loaded by prestissimo. Imagine that you load all libraries under a specific folder, what if someone write a library maliciously and put it into this folder. When calling those function that might lead to:

  1. Process crash
  2. Same privilege is shared, the function can do anything they want.

And if scanned by a security tool, this piece of code could report a vulnerability.

BTW, if a customer implement a new UDF themself and put the library to this plugin directory, do they need to restart presto server to take effect?

extern "C" {
void registry() {
facebook::presto::registerPrestoFunction<
nameOfStruct,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nameOfStruct to NameOfStruct.

* Once defined, easily reusable and called multiple times just like built in functions.
* Shorter compile times.

1. To create the UDF, create a new C++ file in the following format:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to give a file name here, that will better align wiht the following content. For exmaple TestFunction.cpp is ok.

@mohsaka
Copy link
Contributor

mohsaka commented Feb 19, 2025

@PingLiuPing @aditi-pandit

Ideally we would have a kind of recursive globbing but there doesn't seem to be any library for that. Then we would have a config that allows something like this "a/b/c.dylib a//.dylib etc". But this will need to be implemented with probably filesystem + glob libraries.

Alternatively we can use the regular globbing library and have the user have to enter the directories. So then it would be something like "a/b/* a/c/* etc"

The easiest would be requiring the user to have to provide the absolute path for all of the libraries. Something like "a/b/c.dylib a/b/d.dylib"

An alternative would be that you can provide a path regex. Something like "^/a/b/.*" and a directory. We would then recursively go through the directory and only pick libraries that match a certain regex expression.

Or even simpler, just have library file names that we accept as a second config.

@soumiiow
Copy link
Author

soumiiow commented Feb 19, 2025

@mohsaka @aditi-pandit @PingLiuPing
appreciate the perspective and i see the value in adding the config.
that being said, other than the security vulnerabilities, I think another issue I can think of is having the config with a typo in the filename/path/regex with error handling for that scenario, the user would have to fix the config and restart coordinator and worker for the shared library to be kicked in.

Also for ease of use, i was also thinking, in the absence of a config, we can by default scan in all .dylib/.so files so it's not a requirement for them to put a config in if users just want to plug in and go. thoughts on that?

@mohsaka
Copy link
Contributor

mohsaka commented Feb 19, 2025

@mohsaka @aditi-pandit @PingLiuPing appreciate the perspective and i see the value in adding the config. that being said, other than the security vulnerabilities, I think another issue I can think of is having the config can still crash the worker which is what we would like to avoid in the first place if theres a typo or mismatch in how the user enters the filename/path/regex. Even with error handling for that scenario, the user would have to fix the config and restart coordinator and worker for the shared library to be kicked in.

Also for ease of use, i was also thinking, in the absence of a config, we can by default scan in all .dylib/.so files so it's not a requirement for them to put a config in if users just want to plug in and go. thoughts on that?

I don't think it would crash the worker, we would just not load the library if it didn't match the filepath/regex/etc.

@soumiiow
Copy link
Author

@mohsaka good point, looking through the velox implementation using dylib, dlerror() would catch the error and give a VELOX_USER_FAIL.
We would still have to restart the coordinator and workers if we want the file to go in so it would be disruptive in that way

@soumiiow
Copy link
Author

Hey @pedroerp! I was looking through remote_function_server.json to design the config for the dylib changes to be simple to use and similar to the remote fn registrations so as to make for a seamless user experience.

I was curious as to the intended purpose of the schema object for the remote function registrations.

In prestissimo, when we use a prefix/namespace like presto.default, is the equivalent for remote function registrations that "default" would be the schema and that "presto" would be the prefix?

also, i notice that in PrestoServer.cpp, prefix is set to systemConfig->remoteFunctionServerCatalogName(). For remote functions, does this imply that the prefix name would be set for all remote fns and only the schema name would change?

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
add_subdirectory(examples)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is potential issues with regard to upgrading.
You need to fix this by changing some script to trigger the building of the dynamic library when upgrading (from customer perspective). This gurrantee the ABI compatibility in case of major compiler upgrade.
Well this requires the source code should be under some specific directory of presto installation path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline about ABI compatibility issues and came up with documentation changes to alert the users to rebuild shared libraries upon upgrades manually.
presto-docs/src/main/sphinx/presto_cpp/plugin.rst

@soumiiow
Copy link
Author

Updated the RFC with the discussions regarding adding a Json config, function validation, customizable entrypoint, and signal handling so as to not crash the worker here prestodb/rfcs#24. please take a look!

@soumiiow soumiiow force-pushed the dylib_new branch 2 times, most recently from a8d2204 to 173ecc9 Compare March 27, 2025 01:16
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! This will be great to have in the doc. I have some suggestions for conciseness and readability, and a couple of questions for you to consider. Let me know what you think, please!

add_library(name_of_dynamic_fn SHARED TestFunction.cpp)
target_link_libraries(name_of_dynamic_fn PRIVATE fmt::fmt Folly::folly gflags::gflags)

3. Place your shared libraries in the plugin directory. The path to this directory needs to be the same as ``plugin.dir`` property set in :doc:`../plugin`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Place your shared libraries in the plugin directory. The path to this directory needs to be the same as ``plugin.dir`` property set in :doc:`../plugin`.
3. Place your shared libraries in the plugin directory. The path to this directory must be the same as the ``plugin.dir`` property set in :doc:`../plugin`.

@@ -0,0 +1 @@
Read [here](https://prestodb.io/docs/current/presto-cpp/plugin.html) on how to use the Dynamic Library Loader.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Read [here](https://prestodb.io/docs/current/presto-cpp/plugin.html) on how to use the Dynamic Library Loader.
See [Function Plugin](https://prestodb.io/docs/current/presto-cpp/plugin.html) on how to use the Dynamic Library Loader.

Thanks for addressing the other reviewer's comment by using link to your doc instead of duplicating the content! (If the same text is in two different places then it has to be maintained and updated as twice the work, and errors are likelier to happen.)

When creating a link, here doesn't tell the reader what to expect to find when they open the link. A good practice is to use the title of the destination page in the link, so the reader knows where they are going before they start. (Also, sometimes the title can save the reader time because they might decide this isn't what they need right now.)

Presto C++ Plugins
*******************

This chapter outlines the plugins in Presto C++ that are available for various use cases such as to load User Defined Functions (UDFs), connectors, or types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This chapter outlines the plugins in Presto C++ that are available for various use cases such as to load User Defined Functions (UDFs), connectors, or types.
This page lists the plugins in Presto C++ that are available for various use cases such as to load User Defined Functions (UDFs), connectors, or types, and describes the setup needed to use these plugins.


2. Create a Json configuration file where you will capture information on the shared libraries you wish to load dynamically.

3. Set the ``plugin.dir`` property to the path of the ``plugins`` directory in the ``config.properties`` file of each of your workers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Set the ``plugin.dir`` property to the path of the ``plugins`` directory in the ``config.properties`` file of each of your workers.
3. Set the ``plugin.dir`` property to the path of the ``plugins`` directory in the ``config.properties`` file of each worker.


2. Create a Json configuration file where you will capture information on the shared libraries you wish to load dynamically.

3. Set the ``plugin.dir`` property to the path of the ``plugins`` directory in the ``config.properties`` file of each of your workers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of my two suggestions for lines 20-22, you could consider combining these two as substeps of Step 3. Here's a screenshot of my trying this in a local build.

Screenshot 2025-03-27 at 9 51 27 AM


1. Place the plugin shared libraries in the ``plugins`` directory.

2. Create a Json configuration file where you will capture information on the shared libraries you wish to load dynamically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Create a Json configuration file where you will capture information on the shared libraries you wish to load dynamically.
2. Create a Json configuration file to capture information on the shared libraries you wish to load dynamically.

Where should this file be created? Can you give an example?

Copy link
Author

@soumiiow soumiiow Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this file can technically be created anywhere and my current design does not ask for a particular path to make it easier from a deployment perspective. Theres a chance that the person creating the shared library is not the same as the person creating the config. As long as the user gives the full path for the config file it should get read.

So i want to discuss here first if you see value in putting that information into the docs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are excellent points! I agree now that there's no need for it here. Thanks!


5. Start or restart the coordinator and workers to pick up any placed libraries.

Note: to avoid issues with ABI compatibility, we strongly recommend recompiling all shared library plugins during OS and presto version upgrades.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note: to avoid issues with ABI compatibility, we strongly recommend recompiling all shared library plugins during OS and presto version upgrades.
Note: To avoid issues with ABI compatibility, it is strongly recommended to recompile all shared library plugins during OS and Presto version upgrades.

* ``nameSpace``: optional field. omitting this field gives the function the default namespace similarly to the function registeration. must match the namespace of the function.
* ``docString``, ``routineCharacteristics``, ``functionKind``: collected for checking metadata.

5. Set the ``plugin.config`` property to the path of Json config. Instructions to set the property in :doc:`../plugin`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Set the ``plugin.config`` property to the path of Json config. Instructions to set the property in :doc:`../plugin`.
5. Set the ``plugin.config`` property to the path of the Json configuration file created in Step 4. See :doc:`../plugin`.

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow : Did a quick pass of your code. Its overall a good design for the first-cut implementation.

const fs::path path(systemConfig->pluginDir());
PRESTO_STARTUP_LOG(INFO) << "Dynamic library loading path: " << path;
std::error_code ec;
if (fs::is_directory(path, ec)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be simpler to check if there is a validator config and ensure its non-empty before checking the pluginDir path at all.

/// "my_function": [
/// {
/// "outputType": "integer",
/// "entrypoint": "nameOfRegistryFnCall",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the formating of this line.

int64_t compareConfigWithRegisteredFunctionSignatures(
facebook::velox::FunctionSignatureMap fnSignaturesBefore);

std::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a shorthand for
std::unordered_map<std::string, std::vectorvelox::exec::FunctionSignaturePtr> with "using" statement.

DynamicLibraryValidator dVal(configPath, systemConfig->pluginDir());
auto filenameAndEntrypointMap = dVal.getEntrypointMap();
auto registeredFnSignaturesBefore = velox::getFunctionSignatures();
for (const auto& entryPointItr : filenameAndEntrypointMap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this logic can be moved into the DynamicLibraryValidator class (maybe call it DynamicLibraryLoader instead).

The code here would only read the config entries for plugin.dir and dynamiclibraryvalidator and pass them to DynamicLibraryLoader.

@@ -197,5 +197,52 @@ TEST_F(JsonSignatureParserTest, multiple) {
EXPECT_EQ(signature1->argumentTypes()[1].baseName(), "varchar");
}

TEST_F(JsonSignatureParserTest, dynamic) {
auto input = R"(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a function which uses a complex type for either paramType or outputType

}
})";

// Emulate user provided config file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a some repetition of code between this and the test below. Can you make common functions for reuse ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to separate the directories for this file from the previous directory ?

If yes, then it might be worth trying a recursive directory scenario as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the usage of "My" in the naming of these files.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updated doc! Just a couple of small nits.


Note: the ``int64_t`` return type, ``registryTest`` registry symbol name can be changed as needed. For more examples, see the `examples <https://github.com/prestodb/presto/tree/master/presto-native-execution/main/dynamic_registry/examples>`_.

2. Create a shared library which may be made using CMakeLists.txt like the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Create a shared library which may be made using CMakeLists.txt like the following:
2. Create a shared library which may be made using ``CMakeLists.txt`` like the following:


3. Place your shared libraries in the plugin directory. The path to this directory must be the same as the ``plugin.dir`` property set in :doc:`../plugin`.

4. Create a Json configuration file in the same format as below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. Create a Json configuration file in the same format as below:
4. Create a JSON configuration file in the same format as below:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants