Skip to content

[WIP] mongo: add documentation for private preview #4165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/integrations/data-ingestion/clickpipes/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ import Image from '@theme/IdealImage';
| [Amazon Kinesis](/integrations/clickpipes/kinesis) | <Amazonkinesis class="image" alt="Amazon Kenesis logo" style={{width: '3rem', height: 'auto'}}/> |Streaming| Stable | Configure ClickPipes and start ingesting streaming data from Amazon Kinesis into ClickHouse cloud. |
| [Postgres](/integrations/clickpipes/postgres) | <Postgressvg class="image" alt="Postgres logo" style={{width: '3rem', height: 'auto'}}/> |DBMS| Stable | Configure ClickPipes and start ingesting data from Postgres into ClickHouse Cloud. |
| [MySQL](/integrations/clickpipes/mysql) | <Mysqlsvg class="image" alt="MySQL logo" style={{width: '3rem', height: 'auto'}}/> |DBMS| Private Beta | Configure ClickPipes and start ingesting data from MySQL into ClickHouse Cloud. |
| [MongoDB](/integrations/clickpipes/mongodb) | <Mongodbsvg class="image" alt="MongoDB logo" style={{width: '3rem', height: 'auto'}}/> |DBMS| Private Preview | Configure ClickPipes and start ingesting data from MongoDB into ClickHouse Cloud. |

More connectors will get added to ClickPipes, you can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes).

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: 'ClickPipes for MongoDB: Supported data types'
slug: /integrations/clickpipes/mongodb/datatypes
description: 'Page describing MongoDB ClickPipe datatype mapping from MongoDB to ClickHouse'
---

Mongo BSON documents are stored to ClickHouse as `JSON` data type. `JSON` fields are recursively mapped to ClickHouse data types based on the following mapping:

| MongoDB Type | ClickHouse JSON Field Type | Notes |
| ------------------------ | -------------------------------------- | ------------------------ |
| ObjectId | String | |
| String | String | |
| 32-bit integer | Int64 | |
| 64-bit integer | Int64 | |
| Double | Float64 | |
| Boolean | Bool | |
| Date | String | ISO 8601 format |
| Regular Expression | {Options: String, Pattern: String} | |
| Timestamp | {T: Int64, I: Int64} | Mongo internal timestamp format|
| Decimal128 | String | |
| Array | Array(Nullable(String)) | |
| Binary data | {Data: String, Subtype: Int64} | See [Mongo Subtypes](https://www.mongodb.com/docs/manual/reference/bson-types/#binary-data) for reference |
| JavaScript | String | |
| Object | Dynamic type | Types of each field recursively applies this mapping |
25 changes: 25 additions & 0 deletions docs/integrations/data-ingestion/clickpipes/mongodb/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
sidebar_label: 'FAQ'
description: 'Frequently asked questions about ClickPipes for MongoDB.'
slug: /integrations/clickpipes/mongodb/faq
sidebar_position: 2
title: 'ClickPipes for MongoDB FAQ'
---

# ClickPipes for MongoDB FAQ

### How do I flatten the nested MongoDB documents in ClickHouse? {#how-do-i-flatten-mongodb-documents-in-clickhouse}

### What read preference should I select for my MongoDB CDC ClickPipe? {#what-read-preference-should-i-select-for-my-mongodb-cdc-clickpipe}

### What happens if I update a schema in my MongoDB table? {#what-happens-if-i-update-a-schema-in-my-mongodb-table}

### What Read Preference should I use for my MongoDB CDC ClickPipe? {#what-read-preference-should-i-use-for-my-mongodb-cdc-clickpipe}

### How does idling affect my MongoDB CDC ClickPipe? {#how-does-idling-affect-my-mongodb-cdc-clickpipe}

### Can I connect MongoDB databases that don't have a public IP or are in private networks? {#can-i-connect-mongodb-databases-that-dont-have-a-public-ip-or-are-in-private-networks}

### What happens if I delete a table from my MongoDB database? {#what-happens-if-i-delete-a-table-from-my-mongodb-database}

### Is DocumentDB supported? {#is-documentdb-supported}
110 changes: 110 additions & 0 deletions docs/integrations/data-ingestion/clickpipes/mongodb/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
sidebar_label: 'Ingesting Data from MongoDB to ClickHouse'
description: 'Describes how to seamlessly connect your MongoDB to ClickHouse Cloud.'
slug: /integrations/clickpipes/mongodb
title: 'Ingesting data from MongoDB to ClickHouse (using CDC)'
---

import BetaBadge from '@theme/badges/BetaBadge';
import cp_service from '@site/static/images/integrations/data-ingestion/clickpipes/cp_service.png';
import cp_step0 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step0.png';
import mongodb_tile from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongodb-tile.png'
import mongodb_connection_details from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongodb-connection-details.png'
import ssh_tunnel from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ssh-tunnel.jpg'
import select_destination_db from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/select-destination-db.png'
import ch_permissions from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ch-permissions.jpg'
import Image from '@theme/IdealImage';

# Ingesting data from MongoDB to ClickHouse (using CDC)

<BetaBadge/>

:::info
Currently, ingesting data from MongoDB to ClickHouse Cloud via ClickPipes is in Private Preview.
:::

You can use ClickPipes to ingest data from your MongoDB database into ClickHouse Cloud. The source MongoDB database can be hosted on-premises or in the cloud using services like Mongo Atlas.

## Prerequisites {#prerequisites}

To get started, you first need to ensure that your MongoDB database is correctly configured for replication. The configuration steps depend on how you're deploying MongoDB, so please follow the relevant guide below:

1. [Mongo Atlas](./mongodb/source/atlas)

2. [Generic MariaDB](./mongodb/source/generic)

Once your source MongoDB database is set up, you can continue creating your ClickPipe.

## Create your ClickPipe {#create-your-clickpipe}

Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).

[//]: # ( TODO update image here)
1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service.

<Image img={cp_service} alt="ClickPipes service" size="lg" border/>

2. Select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe"

<Image img={cp_step0} alt="Select imports" size="lg" border/>

3. Select the `MongoDB CDC` tile

<Image img={mongodb_tile} alt="Select MongoDB" size="lg" border/>

### Add your source MongoDB database connection {#add-your-source-mongodb-database-connection}

4. Fill in the connection details for your source MongoDB database which you configured in the prerequisites step.

:::info
Before you start adding your connection details make sure that you have whitelisted ClickPipes IP addresses in your firewall rules. On the following page you can find a [list of ClickPipes IP addresses](../index.md#list-of-static-ips).
For more information refer to the source MongoDB setup guides linked at [the top of this page](#prerequisites).
:::

<Image img={mongodb_connection_details} alt="Fill in connection details" size="lg" border/>

#### (Optional) Set up SSH Tunneling {#optional-set-up-ssh-tunneling}

You can specify SSH tunneling details if your source MongoDB database is not publicly accessible.

1. Enable the "Use SSH Tunnelling" toggle.
2. Fill in the SSH connection details.

<Image img={ssh_tunnel} alt="SSH tunneling" size="lg" border/>

3. To use Key-based authentication, click on "Revoke and generate key pair" to generate a new key pair and copy the generated public key to your SSH server under `~/.ssh/authorized_keys`.
4. Click on "Verify Connection" to verify the connection.

:::note
Make sure to whitelist [ClickPipes IP addresses](../clickpipes#list-of-static-ips) in your firewall rules for the SSH bastion host so that ClickPipes can establish the SSH tunnel.
:::

Once the connection details are filled in, click `Next`.

#### Configure advanced settings {#advanced-settings}

You can configure the advanced settings if needed. A brief description of each setting is provided below:

- **Sync interval**: This is the interval at which ClickPipes will poll the source database for changes. This has an implication on the destination ClickHouse service, for cost-sensitive users we recommend to keep this at a higher value (over `3600`).
- **Pull batch size**: The number of rows to fetch in a single batch. This is a best effort setting and may not be respected in all cases.
- **Snapshot number of tables in parallel**: This is the number of tables that will be fetched in parallel during the initial snapshot. This is useful when you have a large number of tables and you want to control the number of tables fetched in parallel.

### Configure the tables {#configure-the-tables}

5. Here you can select the destination database for your ClickPipe. You can either select an existing database or create a new one.

<Image img={select_destination_db} alt="Select destination database" size="lg" border/>

6. You can select the tables you want to replicate from the source MongoDB database. While selecting the tables, you can also choose to rename the tables in the destination ClickHouse database.

### Review permissions and start the ClickPipe {#review-permissions-and-start-the-clickpipe}

7. Select the "Full access" role from the permissions dropdown and click "Complete Setup".

<Image img={ch_permissions} alt="Review permissions" size="lg" border/>

Finally, please refer to the ["ClickPipes for MongoDB FAQ"](/integrations/clickpipes/mongodb/faq) page for more information about common issues and how to resolve them.

## What's next? {#whats-next}

Once you've set up your ClickPipe to replicate data from MongoDB to ClickHouse Cloud, you can focus on how to query and model your data for optimal performance. For common questions around MongoDB CDC and troubleshooting, see the [MongoDB FAQs page](/integrations/data-ingestion/clickpipes/mongodb/faq.md).
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
sidebar_label: 'Mongo Atlas'
description: 'Step-by-step guide on how to set up Mongo Atlas as a source for ClickPipes'
slug: /integrations/clickpipes/mongodb/source/atlas
title: 'Mongo Atlas source setup guide'
---

import mongo_atlas_add_user from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongo-atlas-add-new-database-user.png'
import mongo_atlas_add_roles from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongo-atlas-database-user-privilege.png'
import mongo_atlas_restrict_access from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongo-atlas-restrict-access.png'

# Mongo Atlas source setup guide

## Enable oplog retention {#enable-oplog-retention}

Minimum oplog retention of 24 hours is required for replication. The oplog retention must be longer than the time it takes to complete initial snapshot.

You can check your current oplog retention by running the following command in the MongoDB shell:

```javascript
db.serverStatus().oplogTruncation.oplogMinRetentionHours
```

To set the oplog retention to 72 hours, run the following command as an admin user:

```javascript
db.adminCommand({
"replSetResizeOplog" : 1,
"minRetentionHours": 72
})
```

## Configure a database user {#configure-database-user}

Once you are logged in to your Atlas console, click `Database Access` under the Security tab in the left navigation bar. Click on "Add New Database User".

ClickPipes requires password authentication:

<Image img={mongo_atlas_add_user} alt="Add database user" size="lg" border/>

ClickPipes requires a user with the following roles:

- `readAnyDatabase`
- `clusterMonitor`

<Image img={mongo_atlas_add_roles} alt="Configure user roles" size="lg" border/>

You can specify the cluster(s)/instance(s) you wish to grant access to ClickPipes user:

<Image img={mongo_atlas_restrict_access} alt="Restrict cluster/instance acces" size="lg" border/>

## What's next? {#whats-next}

You can now [create your ClickPipe](../index.md) and start ingesting data from your MongoDB instance into ClickHouse Cloud.
Make sure to note down the connection details you used while setting up your MongoDB instance as you will need them during the ClickPipe creation process.
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
sidebar_label: 'Generic MongoDB'
description: 'Set up any MongoDB instance as a source for ClickPipes'
slug: /integrations/clickpipes/mongodb/source/generic
title: 'Generic MongoDB source setup guide'
---

# Generic MongoDB source setup guide

:::info

If you use one of the supported providers (in the sidebar), please refer to the specific guide for that provider.

:::

## Enable oplog retention {#enable-oplog-retention}

Minimum oplog retention of 24 hours is required for replication. The oplog retention must be longer than the time it takes to complete initial snapshot.

You can check your current oplog retention by running the following command in the MongoDB shell:

```javascript
db.serverStatus().oplogTruncation.oplogMinRetentionHours
```

To set the oplog retention to 72 hours, run the following command as an admin user:

```javascript
db.adminCommand({
"replSetResizeOplog" : 1,
"minRetentionHours": 72
})
```

## Configure a database user {#configure-database-user}

Connect to your MongoDB instance as an admin user and execute the following command to create a user for MongoDB CDC ClickPipes:

```javascript
use admin;
db.createUser({
user: "clickpipes_user",
pwd: "some_secure_password",
roles: ["readAnyDatabase", "clusterMonitor"],
})
```

:::note

Make sure to replace `clickpipes_user` and `some_secure_password` with your desired username and password.

:::

## What's next? {#whats-next}

You can now [create your ClickPipe](../index.md) and start ingesting data from your MongoDB instance into ClickHouse Cloud.
Make sure to note down the connection details you used while setting up your MongoDB instance as you will need them during the ClickPipe creation process.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading