Firestore Import

The Firestore Import is Apify integration Actor that import data into Firebase Firestore (NoSQL cloud database build on Google Cloud infrastructure) from Apify dataset. It allows you to configure various options, such as the target collection, handling conflicts in data, and transforming the dataset item before importing it into Firestore.

Features

The Firestore Import Actor takes a dataset, applies transformations, and imports the data into a Firestore database. This Actor is highly customizable, you can control how the data are imported such as:

Selecting Firestore database and collection.
Automatically generating document IDs or using a field from the dataset for the document ID.
Handling document conflicts by either overwriting, merging, or skipping documents with existing ID.
Transforming data before it gets imported using a customizable JavaScript function.
One dataset item can lead to multiple Firestore inserts/updates.
Each document can have its own configuration, such as a custom collection or document ID.

Input

The actor requires several input fields to work correctly. Below is a detailed description of each input field:

Field Name	Type	Description
`serviceAccountKey`	`string` (secret, required)	Service account key in JSON format. You can get it from Firebase Console -> Project Settings -> Service accounts -> Generate new private key. Paste the whole JSON string here, don't worry this is secret input which store the value in encrypted form.
`datasetId`	`string` (required)	ID of the Apify dataset to import data from.
`collection`	`string` (required)	Firestore collection to import data to. If it doesn't exist, it will be created. Note: you can customize the collection for each record by using the `transformFunction` input. This can be useful when you want to import data to sub-collections.
`databaseName`	`string` (optional)	Name of the Firestore database. If not provided, the default database (`"(default)"`) will be used.
`idField`	`string` (optional)	Field in the dataset item that will be used as a Firestore document ID. It must be `string` or `number`. If not provided, all documents will be created with a random ID generated by Firestore (it means that value of `documentConflictResolution` is ignored in that case). This is useful when you want to update existing documents in Firestore. Note: you can customize the ID for each document independently using the `transformFunction` input field.
`documentConflictResolution`	`enum`: `overwrite`, `merge`, `skip` (required)	How to handle conflicts when importing data to Firestore: - overwrite: replace existing Firestore documents with the same ID. - merge: merge data from the dataset items with existing Firestore documents. - skip: documents with existing IDs will be skipped. ⚠️ Please note that the skip resolution has really bad performance on large scale and can't use batch writes (it makes request to Firestore for each document separately).
`transformFunction`	`string` (javascript, optional)	Javascript function that transforms each item from the dataset before importing it to Firestore. The function must return an object (or array of objects) with the `data` key that contains the transformed record and other optional fields. See examples below.
`batchSize`	`number` (optional)	Number of items to import in a single batch. Lower values are safer but slower, see Firestore limits (10 MiB batch write). Please note that skip conflict resolution does not use batch writes and will always import one item at a time. Defaults to `500`.

Transformation Function

The option transformFunction input field allows you to transform each dataset item before importing it to Firestore. The field accepts a JavaScript function that takes one dataset item as a parameter and returns an object (or array of objects) with the following keys:

data (required): transformed document that will be imported to Firestore.
id (optional): custom document ID. If not provided, the idField input field will be used to resolve document id or if not provided the document will be created with a random ID generated by Firestore.
collection (optional): custom collection name. If not provided, the collection input field will be used.
documentConflictResolution (optional): custom conflict resolution for the document. If not provided, the documentConflictResolution input field will be used.

(item) => {
    return {
        data: item,                           // transformed document
        id: item.id,                          // custom document ID
        collection: "customCollection",       // custom collection name
        documentConflictResolution: "merge",  // custom conflict resolution
    };
}

Examples

Simple transformation function:

The function below increments the value of the oldField by 1 and removes the unused field from the dataset item.
```
(item) => {
    item.newField = item.oldField + 1;
    delete item.unused;
    return { data: item };
}
```

Nested objects:

The function below transforms the dataset item into a Firestore document with nested objects. It updates the subdocument.field field and overwrites the whole author sub-document.

(item) => {
    return {
        data: {
            title: item.title,
            "subdocument.field": item.name,  // update single field of subdocument
            author: item.author              // overwrite whole subdocument
        },
    };
}

Field value functions:

The function below demonstrates how to use Firestore FieldValue functions. It adds new IDs to the existing ids array, removes values from the values array, increments the count field, and deletes the old field.

(item) => {
    return {
        data: {
            ids: FieldValue.arrayUnion(item.ids),         // add new ids to existing ids array
            values: FieldValue.arrayRemove(item.values),  // remove new values from existing array
            count: FieldValue.increment(item.count),      // increment existing count field by provided value
            old: FieldValue.delete(),                     // removes field
        },
    };
}

Data types:

The function below demonstrates how to create Firestore data types such as Timestamp, Vector, GeoPoint, and DocumentReference.

(item) => {
    return {
        data: {
            updatedAt: Timestamp.fromDate(Date.parse(item.date)),          // create Timestamp data type
            vector: FieldValue.VectorValue(item.values),                   // create vector data type
            position: GeoPoint(item.lat, item.lon),                        // create geopoint data type
            reference: DocumentReference("collection", "referenceDocId"),  // create reference type
        },
    };
}

Subcollection:

The function below demonstrates how to import data to sub-collections. It returns an array where the first item is the main document and other items are documents for sub-collection.

(item) => {
    const subDocuments = item.items.map((subItem) => ({
        id: subItem.id,
        collection: `records/${item.customId}/items`,
        documentConflictResolution: "skip",
        data: {
            weight: subItem.weight,
            length: subItem.length,
            name: subItem.name,
        },
    }));

    return [
        {
            id: item.customId,
            collection: "records",
            documentConflictResolution: "merge",
            data: {
                title: item.title,
                description: item.description,
            },
        },
        ...subDocuments,
    ];
}

Output

The Actor outputs statistics about the import to Key-Value store key Statistics with the following structure:

imported: total number of processed Firestore documents (either created, updated or skipped).
skipped: number of skipped Firestore documents.
overwritten: number of overwritten Firestore documents.
merged: number of merged Firestore documents.
created: number of created Firestore documents (counts written document if documentConflictResolution is skip).
failed: number of failed writes to Firestore documents.
itemsProcessed: total number of processed dataset items (including failed items).
itemsFailed: number of failed dataset items.
executionTimeMs: time in milliseconds it took to import the data.
startTime: timestamp when the import started.
endTime: timestamp when the import ended.

{
  "imported": 59278,
  "skipped": 0,
  "overwritten": 0,
  "merged": 59278,
  "created": 0,
  "failed": 0,
  "itemsProcessed": 1136,
  "itemsFailed": 0,
  "executionTimeMs": 19725,
  "startTime": "2025-02-26T17:56:22.652Z",
  "endTime": "2025-02-26T17:56:42.377Z"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!