Skip to content

Latest commit

 

History

History
178 lines (152 loc) · 12.4 KB

README.md

File metadata and controls

178 lines (152 loc) · 12.4 KB

Firestore Import Integration

The Firestore Import is Apify integration Actor that import data into Firebase Firestore (NoSQL cloud database build on Google Cloud infrastructure) from Apify dataset. It allows you to configure various options, such as the target collection, handling conflicts in data, and transforming the dataset item before importing it into Firestore.

Features

The Firestore Import Actor takes a dataset, applies transformations, and imports the data into a Firestore database. This Actor is highly customizable, you can control how the data are imported such as:

  • Selecting Firestore database and collection.
  • Automatically generating document IDs or using a field from the dataset for the document ID.
  • Handling document conflicts by either overwriting, merging, or skipping documents with existing ID.
  • Transforming data before it gets imported using a customizable JavaScript function.
  • One dataset item can lead to multiple Firestore inserts/updates.
  • Each document can have its own configuration, such as a custom collection or document ID.

Input

The actor requires several input fields to work correctly. Below is a detailed description of each input field:

Field Name Type Description
serviceAccountKey string (secret, required) Service account key in JSON format.
You can get it from Firebase Console -> Project Settings -> Service accounts -> Generate new private key.

Paste the whole JSON string here, don't worry this is secret input which store the value in encrypted form.
datasetId string (required) ID of the Apify dataset to import data from.
collection string (required) Firestore collection to import data to. If it doesn't exist, it will be created.

Note: you can customize the collection for each record by using the transformFunction input. This can be useful when you want to import data to sub-collections.
databaseName string (optional) Name of the Firestore database.
If not provided, the default database ("(default)") will be used.
idField string (optional) Field in the dataset item that will be used as a Firestore document ID. It must be string or number.
If not provided, all documents will be created with a random ID generated by Firestore (it means that value of documentConflictResolution is ignored in that case).

This is useful when you want to update existing documents in Firestore.

Note: you can customize the ID for each document independently using the transformFunction input field.
documentConflictResolution enum: overwrite, merge, skip (required) How to handle conflicts when importing data to Firestore:
- overwrite: replace existing Firestore documents with the same ID.
- merge: merge data from the dataset items with existing Firestore documents.
- skip: documents with existing IDs will be skipped.

⚠️ Please note that the skip resolution has really bad performance on large scale and can't use batch writes (it makes request to Firestore for each document separately).
transformFunction string (javascript, optional) Javascript function that transforms each item from the dataset before importing it to Firestore.

The function must return an object (or array of objects) with the data key that contains the transformed record and other optional fields. See examples below.
batchSize number (optional) Number of items to import in a single batch. Lower values are safer but slower, see Firestore limits (10 MiB batch write). Please note that skip conflict resolution does not use batch writes and will always import one item at a time.
Defaults to 500.

Transformation Function

The option transformFunction input field allows you to transform each dataset item before importing it to Firestore. The field accepts a JavaScript function that takes one dataset item as a parameter and returns an object (or array of objects) with the following keys:

  • data (required): transformed document that will be imported to Firestore.
  • id (optional): custom document ID. If not provided, the idField input field will be used to resolve document id or if not provided the document will be created with a random ID generated by Firestore.
  • collection (optional): custom collection name. If not provided, the collection input field will be used.
  • documentConflictResolution (optional): custom conflict resolution for the document. If not provided, the documentConflictResolution input field will be used.
(item) => {
    return {
        data: item,                           // transformed document
        id: item.id,                          // custom document ID
        collection: "customCollection",       // custom collection name
        documentConflictResolution: "merge",  // custom conflict resolution
    };
}

Examples

  1. Simple transformation function:

    The function below increments the value of the oldField by 1 and removes the unused field from the dataset item.

    (item) => {
        item.newField = item.oldField + 1;
        delete item.unused;
        return { data: item };
    }
  2. Nested objects:

    The function below transforms the dataset item into a Firestore document with nested objects. It updates the subdocument.field field and overwrites the whole author sub-document.

    (item) => {
        return {
            data: {
                title: item.title,
                "subdocument.field": item.name,  // update single field of subdocument
                author: item.author              // overwrite whole subdocument
            },
        };
    }
  3. Field value functions:

    The function below demonstrates how to use Firestore FieldValue functions. It adds new IDs to the existing ids array, removes values from the values array, increments the count field, and deletes the old field.

    (item) => {
        return {
            data: {
                ids: FieldValue.arrayUnion(item.ids),         // add new ids to existing ids array
                values: FieldValue.arrayRemove(item.values),  // remove new values from existing array
                count: FieldValue.increment(item.count),      // increment existing count field by provided value
                old: FieldValue.delete(),                     // removes field
            },
        };
    }
  4. Data types:

    The function below demonstrates how to create Firestore data types such as Timestamp, Vector, GeoPoint, and DocumentReference.

    (item) => {
        return {
            data: {
                updatedAt: Timestamp.fromDate(Date.parse(item.date)),          // create Timestamp data type
                vector: FieldValue.VectorValue(item.values),                   // create vector data type
                position: GeoPoint(item.lat, item.lon),                        // create geopoint data type
                reference: DocumentReference("collection", "referenceDocId"),  // create reference type
            },
        };
    }
  5. Subcollection:

    The function below demonstrates how to import data to sub-collections. It returns an array where the first item is the main document and other items are documents for sub-collection.

    (item) => {
        const subDocuments = item.items.map((subItem) => ({
            id: subItem.id,
            collection: `records/${item.customId}/items`,
            documentConflictResolution: "skip",
            data: {
                weight: subItem.weight,
                length: subItem.length,
                name: subItem.name,
            },
        }));
    
        return [
            {
                id: item.customId,
                collection: "records",
                documentConflictResolution: "merge",
                data: {
                    title: item.title,
                    description: item.description,
                },
            },
            ...subDocuments,
        ];
    }

Output

The Actor outputs statistics about the import to Key-Value store key Statistics with the following structure:

  • imported: total number of processed Firestore documents (either created, updated or skipped).
  • skipped: number of skipped Firestore documents.
  • overwritten: number of overwritten Firestore documents.
  • merged: number of merged Firestore documents.
  • created: number of created Firestore documents (counts written document if documentConflictResolution is skip).
  • failed: number of failed writes to Firestore documents.
  • itemsProcessed: total number of processed dataset items (including failed items).
  • itemsFailed: number of failed dataset items.
  • executionTimeMs: time in milliseconds it took to import the data.
  • startTime: timestamp when the import started.
  • endTime: timestamp when the import ended.
{
  "imported": 59278,
  "skipped": 0,
  "overwritten": 0,
  "merged": 59278,
  "created": 0,
  "failed": 0,
  "itemsProcessed": 1136,
  "itemsFailed": 0,
  "executionTimeMs": 19725,
  "startTime": "2025-02-26T17:56:22.652Z",
  "endTime": "2025-02-26T17:56:42.377Z"
}