Skip to content

Commit 80f64f5

Browse files
authored
Merge branch 'main' into shuowei-feat-persist-obj-ref
2 parents 63f65a7 + 2e5311e commit 80f64f5

File tree

4 files changed

+168
-40
lines changed

4 files changed

+168
-40
lines changed

bigframes/bigquery/__init__.py

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,38 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
"""This module integrates BigQuery built-in functions for use with DataFrame objects,
16-
such as array functions:
17-
https://cloud.google.com/bigquery/docs/reference/standard-sql/array_functions. """
15+
"""
16+
Access BigQuery-specific operations and namespaces within BigQuery DataFrames.
17+
18+
This module provides specialized functions and sub-modules that expose BigQuery's
19+
advanced capabilities to DataFrames and Series. It acts as a bridge between the
20+
pandas-compatible API and the full power of BigQuery SQL.
21+
22+
Key sub-modules include:
23+
24+
* :mod:`bigframes.bigquery.ai`: Generative and predictive AI functions (Gemini, BQML).
25+
* :mod:`bigframes.bigquery.ml`: Direct access to BigQuery ML model operations.
26+
* :mod:`bigframes.bigquery.obj`: Support for BigQuery object tables.
27+
28+
This module also provides direct access to optimized BigQuery functions for:
29+
30+
* **JSON Processing:** High-performance functions like ``json_extract``, ``json_value``,
31+
and ``parse_json`` for handling semi-structured data.
32+
* **Geospatial Analysis:** Comprehensive geographic functions such as ``st_area``,
33+
``st_distance``, and ``st_centroid`` (``ST_`` prefixed functions).
34+
* **Array Operations:** Tools for working with BigQuery arrays, including ``array_agg``
35+
and ``array_length``.
36+
* **Vector Search:** Integration with BigQuery's vector search and indexing
37+
capabilities for high-dimensional data.
38+
* **Custom SQL:** The ``sql_scalar`` function allows embedding raw SQL snippets for
39+
advanced operations not yet directly mapped in the API.
40+
41+
By using these functions, you can leverage BigQuery's high-performance engine for
42+
domain-specific tasks while maintaining a Python-centric development experience.
43+
44+
For the full list of BigQuery standard SQL functions, see:
45+
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-reference
46+
"""
1847

1948
import sys
2049

bigframes/bigquery/ai.py

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,49 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
"""This module integrates BigQuery built-in AI functions for use with Series/DataFrame objects,
16-
such as AI.GENERATE_BOOL:
17-
https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool"""
15+
"""
16+
Integrate BigQuery built-in AI functions into your BigQuery DataFrames workflow.
17+
18+
The ``bigframes.bigquery.ai`` module provides a Pythonic interface to leverage BigQuery ML's
19+
generative AI and predictive functions directly on BigQuery DataFrames and Series objects.
20+
These functions enable you to perform advanced AI tasks at scale without moving data
21+
out of BigQuery.
22+
23+
Key capabilities include:
24+
25+
* **Generative AI:** Use :func:`bigframes.bigquery.ai.generate` (Gemini) to
26+
perform text analysis, translation, or
27+
content generation. Specialized versions like
28+
:func:`~bigframes.bigquery.ai.generate_bool`,
29+
:func:`~bigframes.bigquery.ai.generate_int`, and
30+
:func:`~bigframes.bigquery.ai.generate_double` are available for structured
31+
outputs.
32+
* **Embeddings:** Generate vector embeddings for text using
33+
:func:`~bigframes.bigquery.ai.generate_embedding`, which are essential for
34+
semantic search and retrieval-augmented generation (RAG) workflows.
35+
* **Classification and Scoring:** Apply machine learning models to your data for
36+
predictive tasks with :func:`~bigframes.bigquery.ai.classify` and
37+
:func:`~bigframes.bigquery.ai.score`.
38+
* **Forecasting:** Predict future values in time-series data using
39+
:func:`~bigframes.bigquery.ai.forecast`.
40+
41+
**Example usage:**
42+
43+
>>> import bigframes.pandas as bpd
44+
>>> import bigframes.bigquery as bbq
45+
46+
>>> df = bpd.DataFrame({
47+
... "text_input": [
48+
... "Is this a positive review? The food was terrible.",
49+
... ],
50+
... }) # doctest: +SKIP
51+
52+
>>> # Assuming a Gemini model has been created in BigQuery as 'my_gemini_model'
53+
>>> result = bq.ai.generate_text("my_gemini_model", df["text_input"]) # doctest: +SKIP
54+
55+
For more information on the underlying BigQuery ML syntax, see:
56+
https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool
57+
"""
1858

1959
from bigframes.bigquery._operations.ai import (
2060
classify,

bigframes/pandas/__init__.py

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,64 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
"""BigQuery DataFrames provides a DataFrame API backed by the BigQuery engine."""
15+
"""
16+
The primary entry point for the BigQuery DataFrames (BigFrames) pandas-compatible API.
17+
18+
**BigQuery DataFrames** provides a Pythonic DataFrame and machine learning (ML) API
19+
powered by the BigQuery engine. The ``bigframes.pandas`` module implements a large
20+
subset of the pandas API, allowing you to perform large-scale data analysis
21+
using familiar pandas syntax while the computations are executed in the cloud.
22+
23+
**Key Features:**
24+
25+
* **Petabyte-Scale Scalability:** Handle datasets that exceed local memory by
26+
offloading computation to the BigQuery distributed engine.
27+
* **Pandas Compatibility:** Use common pandas methods like
28+
:func:`~bigframes.pandas.DataFrame.groupby`,
29+
:func:`~bigframes.pandas.DataFrame.merge`,
30+
:func:`~bigframes.pandas.DataFrame.pivot_table`, and more on BigQuery-backed
31+
:class:`~bigframes.pandas.DataFrame` objects.
32+
* **Direct BigQuery Integration:** Read from and write to BigQuery tables and
33+
queries with :func:`bigframes.pandas.read_gbq` and
34+
:func:`bigframes.pandas.DataFrame.to_gbq`.
35+
* **User-defined Functions (UDFs):** Effortlessly deploy Python functions
36+
functions using the :func:`bigframes.pandas.remote_function` and
37+
:func:`bigframes.pandas.udf` decorators.
38+
* **Data Ingestion:** Support for various formats including CSV, Parquet, JSON,
39+
and Arrow via :func:`bigframes.pandas.read_csv`,
40+
:func:`bigframes.pandas.read_parquet`, etc., which are automatically uploaded
41+
to BigQuery for processing. Convert any pandas DataFrame into a BigQuery
42+
DataFrame using :func:`bigframes.pandas.read_pandas`.
43+
44+
**Example usage:**
45+
46+
>>> import bigframes.pandas as bpd
47+
48+
Initialize session and set options.
49+
50+
>>> bpd.options.bigquery.project = "your-project-id" # doctest: +SKIP
51+
52+
Load data from a BigQuery public dataset.
53+
54+
>>> df = bpd.read_gbq("bigquery-public-data.usa_names.usa_1910_2013") # doctest: +SKIP
55+
56+
Perform familiar pandas operations that execute in the cloud.
57+
58+
>>> top_names = (
59+
... df.groupby("name")
60+
... .agg({"number": "sum"})
61+
... .sort_values("number", ascending=False)
62+
... .head(10)
63+
... ) # doctest: +SKIP
64+
65+
Bring the final, aggregated results back to local memory if needed.
66+
67+
>>> local_df = top_names.to_pandas() # doctest: +SKIP
68+
69+
BigQuery DataFrames is designed for data scientists and analysts who need the
70+
power of BigQuery with the ease of use of pandas. It eliminates the "data
71+
movement bottleneck" by keeping your data in BigQuery for processing.
72+
"""
1673

1774
from __future__ import annotations
1875

docs/index.rst

Lines changed: 35 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,81 @@
11
.. BigQuery DataFrames documentation main file
22
3-
Welcome to BigQuery DataFrames
4-
==============================
3+
Scalable Python Data Analysis with BigQuery DataFrames (BigFrames)
4+
==================================================================
55

6-
**BigQuery DataFrames** (``bigframes``) provides a Pythonic interface for data analysis that scales to petabytes. It gives you the best of both worlds: the familiar API of **pandas** and **scikit-learn**, powered by the distributed computing engine of **BigQuery**.
6+
.. meta::
7+
:description: BigQuery DataFrames (BigFrames) provides a scalable, pandas-compatible Python API for data analysis and machine learning on petabyte-scale datasets using the BigQuery engine.
78

8-
BigQuery DataFrames consists of three main components:
9+
**BigQuery DataFrames** (``bigframes``) is an open-source Python library that brings the power of **distributed computing** to your data science workflow. By providing a familiar **pandas** and **scikit-learn** compatible API, BigFrames allows you to analyze and model massive datasets where they live—directly in **BigQuery**.
910

10-
* **bigframes.pandas**: A pandas-compatible API for data exploration and transformation.
11-
* **bigframes.ml**: A scikit-learn-like interface for BigQuery ML, including integration with Gemini.
12-
* **bigframes.bigquery**: Specialized functions for managing BigQuery resources and deploying custom logic.
11+
Why Choose BigQuery DataFrames?
12+
-------------------------------
1313

14-
Why BigQuery DataFrames?
15-
------------------------
14+
BigFrames eliminates the "data movement bottleneck." Instead of downloading large datasets to a local environment, BigFrames translates your Python code into optimized SQL, executing complex transformations across the BigQuery fleet.
1615

17-
BigFrames allows you to process data where it lives. Instead of downloading massive datasets to your local machine, BigFrames translates your Python code into SQL and executes it across the BigQuery fleet.
16+
* **Petabyte-Scale Scalability:** Effortlessly process datasets that far exceed local memory limits.
17+
* **Familiar Python Ecosystem:** Use the same ``read_gbq``, ``groupby``, ``merge``, and ``pivot_table`` functions you already know from pandas.
18+
* **Integrated Machine Learning:** Access BigQuery ML's powerful algorithms via a scikit-learn-like interface (``bigframes.ml``), including seamless **Gemini AI** integration.
19+
* **Enterprise-Grade Security:** Maintain data governance and security by keeping your data within the BigQuery perimeter.
20+
* **Hybrid Flexibility:** Easily move between distributed BigQuery processing and local pandas analysis with ``to_pandas()``.
1821

19-
* **Scalability:** Work with datasets that exceed local memory limits without complex refactoring.
20-
* **Collaboration & Extensibility:** Bridge the gap between Python and SQL. Deploy custom Python functions to BigQuery, making your logic accessible to SQL-based teammates and data analysts.
21-
* **Production-Ready Pipelines:** Move seamlessly from interactive notebooks to production. BigFrames simplifies data engineering by integrating with tools like **dbt** and **Airflow**, offering a simpler operational model than Spark.
22-
* **Security & Governance:** Keep your data within the BigQuery perimeter. Benefit from enterprise-grade security, auditing, and data governance throughout your entire Python workflow.
23-
* **Familiarity:** Use ``read_gbq``, ``merge``, ``groupby``, and ``pivot_table`` just like you do in pandas.
22+
Core Components of BigFrames
23+
----------------------------
2424

25-
Quickstart
26-
----------
25+
BigQuery DataFrames is organized into specialized modules designed for the modern data stack:
2726

28-
Install the library via pip:
27+
1. :mod:`bigframes.pandas`: A high-performance, pandas-compatible API for scalable data exploration, cleaning, and transformation.
28+
2. :mod:`bigframes.bigquery`: Specialized utilities for direct BigQuery resource management, including integrations with Gemini and other AI models in the :mod:`bigframes.bigquery.ai` submodule.
29+
30+
31+
Quickstart: Scalable Data Analysis in Seconds
32+
---------------------------------------------
33+
34+
Install BigQuery DataFrames via pip:
2935

3036
.. code-block:: bash
3137
3238
pip install --upgrade bigframes
3339
34-
Load and aggregate a public dataset in just a few lines:
40+
The following example demonstrates how to perform a distributed aggregation on a public dataset with millions of rows using just a few lines of Python:
3541

3642
.. code-block:: python
3743
3844
import bigframes.pandas as bpd
3945
40-
# Load data from BigQuery
46+
# Initialize BigFrames and load a public dataset
4147
df = bpd.read_gbq("bigquery-public-data.usa_names.usa_1910_2013")
4248
43-
# Perform familiar pandas operations at scale
49+
# Perform familiar pandas operations that execute in the cloud
4450
top_names = (
4551
df.groupby("name")
4652
.agg({"number": "sum"})
4753
.sort_values("number", ascending=False)
4854
.head(10)
4955
)
5056
57+
# Bring the final, aggregated results back to local memory if needed
5158
print(top_names.to_pandas())
5259
5360
54-
User Guide
55-
----------
61+
Explore the Documentation
62+
-------------------------
5663

5764
.. toctree::
5865
:maxdepth: 2
66+
:caption: User Documentation
5967

6068
user_guide/index
6169

62-
API reference
63-
-------------
64-
6570
.. toctree::
66-
:maxdepth: 3
71+
:maxdepth: 2
72+
:caption: API Reference
6773

6874
reference/index
6975
supported_pandas_apis
7076

71-
Changelog
72-
---------
73-
74-
For a list of all BigQuery DataFrames releases:
75-
7677
.. toctree::
77-
:maxdepth: 2
78+
:maxdepth: 1
79+
:caption: Community & Updates
7880

7981
changelog

0 commit comments

Comments
 (0)