|
| 1 | +<!--- |
| 2 | + Licensed to the Apache Software Foundation (ASF) under one |
| 3 | + or more contributor license agreements. See the NOTICE file |
| 4 | + distributed with this work for additional information |
| 5 | + regarding copyright ownership. The ASF licenses this file |
| 6 | + to you under the Apache License, Version 2.0 (the |
| 7 | + "License"); you may not use this file except in compliance |
| 8 | + with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | + Unless required by applicable law or agreed to in writing, |
| 13 | + software distributed under the License is distributed on an |
| 14 | + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | + KIND, either express or implied. See the License for the |
| 16 | + specific language governing permissions and limitations |
| 17 | + under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# `datafusion-ffi`: Apache DataFusion Foreign Function Interface |
| 21 | + |
| 22 | +This crate contains code to allow interoperability of Apache [DataFusion] |
| 23 | +with functions from other languages using a stable interface. |
| 24 | + |
| 25 | +See [API Docs] for details and examples. |
| 26 | + |
| 27 | +We expect this crate may be used by both sides of the FFI. This allows users |
| 28 | +to create modules that can interoperate with the necessity of using the same |
| 29 | +version of DataFusion. The driving use case has been the `datafusion-python` |
| 30 | +repository, but many other use cases may exist. We envision at least two |
| 31 | +use cases. |
| 32 | + |
| 33 | +1. `datafusion-python` which will use the FFI to provide external services such |
| 34 | + as a `TableProvider` without needing to re-export the entire `datafusion-python` |
| 35 | + code base. With `datafusion-ffi` these packages do not need `datafusion-python` |
| 36 | + as a dependency at all. |
| 37 | +2. Users may want to create a modular interface that allows runtime loading of |
| 38 | + libraries. |
| 39 | + |
| 40 | +## Struct Layout |
| 41 | + |
| 42 | +In this crate we have a variety of structs which closely mimic the behavior of |
| 43 | +their internal counterparts. In the following example, we will refer to the |
| 44 | +`TableProvider`, but the same pattern exists for other structs. |
| 45 | + |
| 46 | +Each of the exposted structs in this crate is provided with a variant prefixed |
| 47 | +with `Foreign`. This variant is designed to be used by the consumer of the |
| 48 | +foreign code. The `Foreign` structs should _never_ access the `private_data` |
| 49 | +fields. Instead they should only access the data returned through the function |
| 50 | +calls defined on the `FFI_` structs. The second purpose of the `Foreign` |
| 51 | +structs is to contain additional data that may be needed by the traits that |
| 52 | +are implemented on them. Some of these traits require borrowing data which |
| 53 | +can be far more convienent to be locally stored. |
| 54 | + |
| 55 | +For example, we have a struct `FFI_TableProvider` to give access to the |
| 56 | +`TableProvider` functions like `table_type()` and `scan()`. If we write a |
| 57 | +library that wishes to expose it's `TableProvider`, then we can access the |
| 58 | +private data that contains the Arc reference to the `TableProvider` via |
| 59 | +`FFI_TableProvider`. This data is local to the library. |
| 60 | + |
| 61 | +If we have a program that accesses a `TableProvider` via FFI, then it |
| 62 | +will use `ForeignTableProvider`. When using `ForeignTableProvider` we **must** |
| 63 | +not attempt to access the `private_data` field in `FFI_TableProvider`. If a |
| 64 | +user is testing locally, you may be able to successfully access this field, but |
| 65 | +it will only work if you are building against the exact same version of |
| 66 | +`DataFusion` for both libraries **and** the same compiler. It will not work |
| 67 | +in general. |
| 68 | + |
| 69 | +It is worth noting that which library is the `local` and which is `foreign` |
| 70 | +depends on which interface we are considering. For example, suppose we have a |
| 71 | +Python library called `my_provider` that exposes a `TableProvider` called |
| 72 | +`MyProvider` via `FFI_TableProvider`. Within the library `my_provider` we can |
| 73 | +access the `private_data` via `FFI_TableProvider`. We connect this to |
| 74 | +`datafusion-python`, where we access it as a `ForeignTableProvider`. Now when |
| 75 | +we call `scan()` on this interface, we have to pass it a `FFI_SessionConfig`. |
| 76 | +The `SessionConfig` is local to `datafusion-python` and **not** `my_provider`. |
| 77 | +It is important to be careful when expanding these functions to be certain which |
| 78 | +side of the interface each object refers to. |
| 79 | + |
| 80 | +[datafusion]: https://datafusion.apache.org |
| 81 | +[api docs]: http://docs.rs/datafusion-ffi/latest |
0 commit comments