|
1 | 1 | .. BigQuery DataFrames documentation main file |
2 | 2 |
|
3 | | -Welcome to BigQuery DataFrames |
4 | | -============================== |
| 3 | +Scalable Python Data Analysis with BigQuery DataFrames (BigFrames) |
| 4 | +================================================================== |
5 | 5 |
|
6 | | -**BigQuery DataFrames** (``bigframes``) provides a Pythonic interface for data analysis that scales to petabytes. It gives you the best of both worlds: the familiar API of **pandas** and **scikit-learn**, powered by the distributed computing engine of **BigQuery**. |
| 6 | +.. meta:: |
| 7 | + :description: BigQuery DataFrames (BigFrames) provides a scalable, pandas-compatible Python API for data analysis and machine learning on petabyte-scale datasets using the BigQuery engine. |
7 | 8 |
|
8 | | -BigQuery DataFrames consists of three main components: |
| 9 | +**BigQuery DataFrames** (``bigframes``) is an open-source Python library that brings the power of **distributed computing** to your data science workflow. By providing a familiar **pandas** and **scikit-learn** compatible API, BigFrames allows you to analyze and model massive datasets where they live—directly in **BigQuery**. |
9 | 10 |
|
10 | | -* **bigframes.pandas**: A pandas-compatible API for data exploration and transformation. |
11 | | -* **bigframes.ml**: A scikit-learn-like interface for BigQuery ML, including integration with Gemini. |
12 | | -* **bigframes.bigquery**: Specialized functions for managing BigQuery resources and deploying custom logic. |
| 11 | +Why Choose BigQuery DataFrames? |
| 12 | +------------------------------- |
13 | 13 |
|
14 | | -Why BigQuery DataFrames? |
15 | | ------------------------- |
| 14 | +BigFrames eliminates the "data movement bottleneck." Instead of downloading large datasets to a local environment, BigFrames translates your Python code into optimized SQL, executing complex transformations across the BigQuery fleet. |
16 | 15 |
|
17 | | -BigFrames allows you to process data where it lives. Instead of downloading massive datasets to your local machine, BigFrames translates your Python code into SQL and executes it across the BigQuery fleet. |
| 16 | +* **Petabyte-Scale Scalability:** Effortlessly process datasets that far exceed local memory limits. |
| 17 | +* **Familiar Python Ecosystem:** Use the same ``read_gbq``, ``groupby``, ``merge``, and ``pivot_table`` functions you already know from pandas. |
| 18 | +* **Integrated Machine Learning:** Access BigQuery ML's powerful algorithms via a scikit-learn-like interface (``bigframes.ml``), including seamless **Gemini AI** integration. |
| 19 | +* **Enterprise-Grade Security:** Maintain data governance and security by keeping your data within the BigQuery perimeter. |
| 20 | +* **Hybrid Flexibility:** Easily move between distributed BigQuery processing and local pandas analysis with ``to_pandas()``. |
18 | 21 |
|
19 | | -* **Scalability:** Work with datasets that exceed local memory limits without complex refactoring. |
20 | | -* **Collaboration & Extensibility:** Bridge the gap between Python and SQL. Deploy custom Python functions to BigQuery, making your logic accessible to SQL-based teammates and data analysts. |
21 | | -* **Production-Ready Pipelines:** Move seamlessly from interactive notebooks to production. BigFrames simplifies data engineering by integrating with tools like **dbt** and **Airflow**, offering a simpler operational model than Spark. |
22 | | -* **Security & Governance:** Keep your data within the BigQuery perimeter. Benefit from enterprise-grade security, auditing, and data governance throughout your entire Python workflow. |
23 | | -* **Familiarity:** Use ``read_gbq``, ``merge``, ``groupby``, and ``pivot_table`` just like you do in pandas. |
| 22 | +Core Components of BigFrames |
| 23 | +---------------------------- |
24 | 24 |
|
25 | | -Quickstart |
26 | | ----------- |
| 25 | +BigQuery DataFrames is organized into specialized modules designed for the modern data stack: |
27 | 26 |
|
28 | | -Install the library via pip: |
| 27 | +1. :mod:`bigframes.pandas`: A high-performance, pandas-compatible API for scalable data exploration, cleaning, and transformation. |
| 28 | +2. :mod:`bigframes.bigquery`: Specialized utilities for direct BigQuery resource management, including integrations with Gemini and other AI models in the :mod:`bigframes.bigquery.ai` submodule. |
| 29 | + |
| 30 | + |
| 31 | +Quickstart: Scalable Data Analysis in Seconds |
| 32 | +--------------------------------------------- |
| 33 | + |
| 34 | +Install BigQuery DataFrames via pip: |
29 | 35 |
|
30 | 36 | .. code-block:: bash |
31 | 37 |
|
32 | 38 | pip install --upgrade bigframes |
33 | 39 |
|
34 | | -Load and aggregate a public dataset in just a few lines: |
| 40 | +The following example demonstrates how to perform a distributed aggregation on a public dataset with millions of rows using just a few lines of Python: |
35 | 41 |
|
36 | 42 | .. code-block:: python |
37 | 43 |
|
38 | 44 | import bigframes.pandas as bpd |
39 | 45 |
|
40 | | - # Load data from BigQuery |
| 46 | + # Initialize BigFrames and load a public dataset |
41 | 47 | df = bpd.read_gbq("bigquery-public-data.usa_names.usa_1910_2013") |
42 | 48 |
|
43 | | - # Perform familiar pandas operations at scale |
| 49 | + # Perform familiar pandas operations that execute in the cloud |
44 | 50 | top_names = ( |
45 | 51 | df.groupby("name") |
46 | 52 | .agg({"number": "sum"}) |
47 | 53 | .sort_values("number", ascending=False) |
48 | 54 | .head(10) |
49 | 55 | ) |
50 | 56 |
|
| 57 | + # Bring the final, aggregated results back to local memory if needed |
51 | 58 | print(top_names.to_pandas()) |
52 | 59 |
|
53 | 60 |
|
54 | | -User Guide |
55 | | ----------- |
| 61 | +Explore the Documentation |
| 62 | +------------------------- |
56 | 63 |
|
57 | 64 | .. toctree:: |
58 | 65 | :maxdepth: 2 |
| 66 | + :caption: User Documentation |
59 | 67 |
|
60 | 68 | user_guide/index |
61 | 69 |
|
62 | | -API reference |
63 | | -------------- |
64 | | - |
65 | 70 | .. toctree:: |
66 | | - :maxdepth: 3 |
| 71 | + :maxdepth: 2 |
| 72 | + :caption: API Reference |
67 | 73 |
|
68 | 74 | reference/index |
69 | 75 | supported_pandas_apis |
70 | 76 |
|
71 | | -Changelog |
72 | | ---------- |
73 | | - |
74 | | -For a list of all BigQuery DataFrames releases: |
75 | | - |
76 | 77 | .. toctree:: |
77 | | - :maxdepth: 2 |
| 78 | + :maxdepth: 1 |
| 79 | + :caption: Community & Updates |
78 | 80 |
|
79 | 81 | changelog |
0 commit comments