Skip to content

Commit 8392f23

Browse files
Added support for sparse vectors.
1 parent d5c0b82 commit 8392f23

27 files changed

+1368
-81
lines changed

doc/src/api_manual/fetch_info.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,3 +142,14 @@ FetchInfo Attributes
142142
the value returned is *None*.
143143

144144
.. versionadded:: 2.2.0
145+
146+
.. attribute:: FetchInfo.vector_is_sparse
147+
148+
This read-only attribute returns a boolean that indicates whether the
149+
vector is sparse or not.
150+
151+
If the column contains vectors that are SPARSE, the value returned is
152+
True. If the column contains vectors that are DENSE, the value returned is
153+
False. If the column is not a VECTOR column, the value returned is ``None``.
154+
155+
.. versionadded:: 3.0.0

doc/src/api_manual/module.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2509,6 +2509,19 @@ Oracledb Methods
25092509

25102510
The ``connection_id_prefix`` parameter was added.
25112511

2512+
.. function:: SparseVector(num_dimensions, indices, values)
2513+
2514+
Creates and returns a :ref:`SparseVector object <sparsevectorsobj>`.
2515+
2516+
The ``num_dimensions`` parameter is the number of dimensions contained in
2517+
the vector.
2518+
2519+
The ``indices`` parameter is the indices (zero-based) of non-zero values
2520+
in the vector.
2521+
2522+
The ``values`` parameter is the non-zero values stored in the vector.
2523+
2524+
.. versionadded:: 3.0.0
25122525

25132526
.. function:: register_password_type(password_type, hook_function)
25142527

doc/src/api_manual/sparse_vector.rst

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
.. _sparsevectorsobj:
2+
3+
*************************
4+
API: SparseVector Objects
5+
*************************
6+
7+
A SparseVector Object stores information about a sparse vector. This object
8+
can be created with :meth:`oracledb.SparseVector()`.
9+
10+
See :ref:`sparsevectors` for more information.
11+
12+
.. versionadded:: 3.0.0
13+
14+
SparseVector Attributes
15+
=======================
16+
17+
.. attribute:: SparseVector.indices
18+
19+
This read-only attribute is an array that returns the indices (zero-based)
20+
of non-zero values in the vector.
21+
22+
.. attribute:: SparseVector.num_dimensions
23+
24+
This read-only attribute is an integer that returns the number of
25+
dimensions of the vector.
26+
27+
.. attribute:: SparseVector.values
28+
29+
This read-only attribute is an array that returns the non-zero values
30+
stored in the vector.

doc/src/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ API Manual
6464
api_manual/subscription.rst
6565
api_manual/lob.rst
6666
api_manual/dbobject_type.rst
67+
api_manual/sparse_vector.rst
6768
api_manual/aq.rst
6869
api_manual/soda.rst
6970
api_manual/async_connection.rst

doc/src/release_notes.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ Thick Mode Changes
5555
Common Changes
5656
++++++++++++++
5757

58+
#) Added support for Oracle Database 23ai SPARSE vectors.
5859
#) Added support for :ref:`naming and caching connection pools
5960
<connpoolcache>` during creation, and retrieving them later from the
6061
python-oracledb pool cache with :meth:`oracledb.get_pool()`.

doc/src/user_guide/vector_data_type.rst

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,116 @@ If you are using python-oracledb Thick mode with older versions of Oracle
212212
Client libraries than 23ai, see this
213213
:ref:`section <vector_thick_mode_old_client>`.
214214

215+
.. _sparsevectors:
216+
217+
Using SPARSE Vectors
218+
====================
219+
220+
A Sparse vector is a vector which has zero value for most of its dimensions.
221+
This vector only physically stores the non-zero values. A sparse vector is
222+
supported when you are using Oracle Database 23.7 or later.
223+
224+
Sparse vectors can store the total number of dimensions, an array of indices,
225+
and an array of values. The storage formats that can be used with sparse
226+
vectors are float32, float64, and int8. Note that the binary storage format
227+
cannot be used with sparse vectors. You can define a column for a sparse
228+
vector using the following format::
229+
230+
VECTOR(number_of_dimensions, dimension_storage_format, sparse)
231+
232+
For example, to create a table with three columns for sparse vectors:
233+
234+
.. code-block:: sql
235+
236+
CREATE TABLE vector_sparse_table (
237+
float32sparsecol vector(25, float32, sparse),
238+
float64sparsecol vector(30, float64, sparse),
239+
int8sparsecol vector(35, int8, sparse)
240+
)
241+
242+
In this example the:
243+
244+
- The float32sparsecol column can store sparse vector data of 25 dimensions
245+
where each dimension value is a 32-bit floating-point number.
246+
247+
- The float64sparsecol column can store sparse vector data of 30 dimensions
248+
where each dimension value is a 64-bit floating-point number.
249+
250+
- The int8sparsecol column can store sparse vector data of 35 dimensions where
251+
each dimension value is a 8-bit signed integer.
252+
253+
.. _insertsparsevectors:
254+
255+
Inserting SPARSE Vectors
256+
------------------------
257+
258+
With python-oracledb, sparse vector data can be inserted using
259+
:ref:`SparseVector objects <sparsevectorsobj>`. You can specify the number of
260+
dimensions, an array of indices, and an array of values as the data for a
261+
sparse vector. For example, the string representation is::
262+
263+
[25, [5,8,11], [25.25, 6.125, 8.25]]
264+
265+
In this example, the sparse vector has 25 dimensions. Only indices 5, 8, and
266+
11 have values 25.25, 6.125, and 8.25 respectively. All of the other values
267+
are zero.
268+
269+
The SparseVector objects are used as bind values when inserting sparse vector
270+
columns. For example:
271+
272+
.. code-block:: python
273+
274+
import array
275+
276+
# 32-bit float sparse vector
277+
float32_val = oracledb.SparseVector(
278+
25, [6, 10, 18], array.array('f', [26.25, 129.625, 579.875])
279+
)
280+
281+
# 64-bit float sparse vector
282+
float64_val = oracledb.SparseVector(
283+
30, [9, 16, 24], array.array('d', [19.125, 78.5, 977.375])
284+
)
285+
286+
# 8-bit signed integer sparse vector
287+
int8_val = oracledb.SparseVector(
288+
35, [10, 20, 30], array.array('b', [26, 125, -37])
289+
)
290+
291+
cursor.execute(
292+
"insert into vector_sparse_table (:1, :2, :3)",
293+
[float32_val, float64_val, int8_val]
294+
)
295+
296+
.. _fetchsparsevectors:
297+
298+
Fetching Sparse Vectors
299+
-----------------------
300+
301+
With python-oracledb, sparse vector columns are fetched in the same format
302+
accepted by Oracle Database by using the str() function. For example:
303+
304+
.. code-block:: python
305+
306+
cursor.execute("select * from vec_sparse")
307+
for float32_val, float64_val, int8_val in cursor:
308+
print("float32:", str(float32_val))
309+
print("float64:", str(float64_val))
310+
print("int8:", str(int8_val))
311+
312+
This prints the following output::
313+
314+
float32: [25, [6, 10, 18], [26.25, 129.625, 579.875]]
315+
float64: [30, [9, 16, 24], [19.125, 78.5, 977.375]]
316+
int8: [35, [10, 20, 30], [26, 125, -37]]
317+
318+
The :ref:`FetchInfo <fetchinfoobj>` object that is returned as part of the
319+
fetched metadata contains attributes :attr:`FetchInfo.vector_dimensions`,
320+
:attr:`FetchInfo.vector_format`, and :attr:`FetchInfo.vector_is_sparse` which
321+
return the number of dimensions of the vector column, the format of each
322+
dimension value in the vector column, and a boolean which determines whether
323+
the vector is sparse or not.
324+
215325
.. _vector_thick_mode_old_client:
216326

217327
Using python-oracledb Thick Mode with Older Versions of Oracle Client Libraries

src/oracledb/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,10 @@
310310
future as __future__, # noqa: F401
311311
)
312312

313+
from .sparse_vector import (
314+
SparseVector as SparseVector,
315+
)
316+
313317
from . import config_providers
314318

315319
IntervalYM = collections.namedtuple("IntervalYM", ["years", "months"])
@@ -345,6 +349,7 @@ class JsonId(bytes):
345349
lob, # noqa
346350
pool, # noqa
347351
pool_params, # noqa
352+
sparse_vector, # noqa
348353
soda, # noqa
349354
subscr, # noqa
350355
sys, # noqa

src/oracledb/base_impl.pxd

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,12 +195,14 @@ cdef type PY_TYPE_MESSAGE
195195
cdef type PY_TYPE_MESSAGE_QUERY
196196
cdef type PY_TYPE_MESSAGE_ROW
197197
cdef type PY_TYPE_MESSAGE_TABLE
198+
cdef type PY_TYPE_SPARSE_VECTOR
198199
cdef type PY_TYPE_TIMEDELTA
199200
cdef type PY_TYPE_VAR
200201

201202
cdef str DRIVER_NAME
202203
cdef str DRIVER_VERSION
203204
cdef str DRIVER_INSTALLATION_URL
205+
cdef str ARRAY_TYPE_CODE_UINT32
204206

205207
cdef const char* ENCODING_UTF8
206208
cdef const char* ENCODING_UTF16
@@ -403,12 +405,17 @@ cdef class OsonEncoder(GrowableBuffer):
403405

404406
cdef class VectorDecoder(Buffer):
405407

408+
cdef array.array _decode_values(self, uint32_t num_elements,
409+
uint8_t vector_format)
406410
cdef object decode(self, bytes data)
407411

408412

409413
cdef class VectorEncoder(GrowableBuffer):
410414

411-
cdef int encode(self, array.array value) except -1
415+
cdef int _encode_values(self, array.array value, uint32_t num_elements,
416+
uint8_t vector_format) except -1
417+
cdef uint8_t _get_vector_format(self, array.array value)
418+
cdef int encode(self, object value) except -1
412419

413420

414421
cdef class OracleMetadata:
@@ -870,6 +877,13 @@ cdef class PipelineOpResultImpl:
870877
cdef int _capture_err(self, Exception exc) except -1
871878

872879

880+
cdef class SparseVectorImpl:
881+
cdef:
882+
readonly uint32_t num_dimensions
883+
readonly array.array indices
884+
readonly array.array values
885+
886+
873887
cdef struct OracleDate:
874888
int16_t year
875889
uint8_t month

src/oracledb/base_impl.pyx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ cdef type PY_TYPE_MESSAGE
7777
cdef type PY_TYPE_MESSAGE_QUERY
7878
cdef type PY_TYPE_MESSAGE_ROW
7979
cdef type PY_TYPE_MESSAGE_TABLE
80+
cdef type PY_TYPE_SPARSE_VECTOR
8081
cdef type PY_TYPE_TIMEDELTA = datetime.timedelta
8182
cdef type PY_TYPE_VAR
8283
cdef type PY_TYPE_FETCHINFO

src/oracledb/constants.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,3 +135,4 @@
135135

136136
# vector metadata flags
137137
VECTOR_META_FLAG_FLEXIBLE_DIM = 0x01
138+
VECTOR_META_FLAG_SPARSE_VECTOR = 0x02

src/oracledb/fetch_info.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,3 +249,13 @@ def vector_format(self) -> [oracledb.VectorFormat, None]:
249249
and self._impl.vector_format != 0
250250
):
251251
return oracledb.VectorFormat(self._impl.vector_format)
252+
253+
@property
254+
def vector_is_sparse(self) -> Union[bool, None]:
255+
"""
256+
Returns a boolean indicating if the vector is sparse or not. If the
257+
column is not a vector column, the value returned is None.
258+
"""
259+
if self._impl.dbtype is DB_TYPE_VECTOR:
260+
flags = self._impl.vector_flags
261+
return bool(flags & constants.VECTOR_META_FLAG_SPARSE_VECTOR)

src/oracledb/impl/base/connection.pyx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,8 @@ cdef class BaseConnImpl:
151151
if len(value) == 0:
152152
errors._raise_err(errors.ERR_INVALID_VECTOR)
153153
return value
154+
elif isinstance(value, PY_TYPE_SPARSE_VECTOR):
155+
return value
154156
elif db_type_num == DB_TYPE_NUM_INTERVAL_YM:
155157
if isinstance(value, PY_TYPE_INTERVAL_YM):
156158
return value

src/oracledb/impl/base/constants.pxi

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,11 +78,13 @@ cdef enum:
7878
TNS_VECTOR_MAGIC_BYTE = 0xDB
7979
TNS_VECTOR_VERSION_BASE = 0
8080
TNS_VECTOR_VERSION_WITH_BINARY = 1
81+
TNS_VECTOR_VERSION_WITH_SPARSE = 2
8182

8283
# VECTOR flags
8384
cdef enum:
8485
TNS_VECTOR_FLAG_NORM = 0x0002
8586
TNS_VECTOR_FLAG_NORM_RESERVED = 0x0010
87+
TNS_VECTOR_FLAG_SPARSE = 0x0020
8688

8789
# general constants
8890
cdef enum:

src/oracledb/impl/base/metadata.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ cdef class OracleMetadata:
158158
metadata.dbtype = value.type
159159
elif isinstance(value, (PY_TYPE_CURSOR, PY_TYPE_ASYNC_CURSOR)):
160160
metadata.dbtype = DB_TYPE_CURSOR
161-
elif isinstance(value, array.array):
161+
elif isinstance(value, (array.array, PY_TYPE_SPARSE_VECTOR)):
162162
metadata.dbtype = DB_TYPE_VECTOR
163163
elif isinstance(value, PY_TYPE_INTERVAL_YM):
164164
metadata.dbtype = DB_TYPE_INTERVAL_YM

src/oracledb/impl/base/utils.pyx

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,23 @@ cdef int _set_str_param(dict args, str name, object target, bint check_network_c
189189
setattr(target, name, in_val)
190190

191191

192+
def get_array_type_code_uint32():
193+
"""
194+
Returns the type code to use for array.array that will store uint32_t.
195+
"""
196+
cdef:
197+
array.array temp_array
198+
str type_code
199+
global ARRAY_TYPE_CODE_UINT32
200+
if ARRAY_TYPE_CODE_UINT32 is None:
201+
for type_code in ("I", "L"):
202+
temp_array = array.array(type_code)
203+
if temp_array.itemsize == 4:
204+
ARRAY_TYPE_CODE_UINT32 = type_code
205+
break
206+
return ARRAY_TYPE_CODE_UINT32
207+
208+
192209
def init_base_impl(package):
193210
"""
194211
Initializes globals after the package has been completely initialized. This
@@ -217,6 +234,7 @@ def init_base_impl(package):
217234
PY_TYPE_MESSAGE_ROW, \
218235
PY_TYPE_MESSAGE_TABLE, \
219236
PY_TYPE_POOL_PARAMS, \
237+
PY_TYPE_SPARSE_VECTOR, \
220238
PY_TYPE_VAR
221239

222240
errors = package.errors
@@ -241,6 +259,7 @@ def init_base_impl(package):
241259
PY_TYPE_MESSAGE_ROW = package.MessageRow
242260
PY_TYPE_MESSAGE_TABLE = package.MessageTable
243261
PY_TYPE_POOL_PARAMS = package.PoolParams
262+
PY_TYPE_SPARSE_VECTOR = package.SparseVector
244263
PY_TYPE_VAR = package.Var
245264

246265

0 commit comments

Comments
 (0)