You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-27Lines changed: 15 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
# terraform-google-bigquery-loader-pubsub-ce
4
4
5
-
A Terraform module which deploys the requisite micro-services for loading BigQuery on Google running on top of Compute Engine. If you want to use a custom image for this deployment you will need to ensure it is based on top of Ubuntu 20.04.
5
+
A Terraform module which deploys the BigQuery Loader application on Google running on top of Compute Engine. If you want to use a custom image for this deployment you will need to ensure it is based on top of Ubuntu 20.04.
6
6
7
7
## Telemetry
8
8
@@ -20,13 +20,7 @@ For details on what information is collected please see this module: https://git
20
20
21
21
## Usage
22
22
23
-
This module will deploy three seperate instance groups:
24
-
25
-
1.`mutator`: Attempts to create the events table if it doesn't exist and then listens for new `types` to update the table with as custom events and entities are tracked
26
-
2.`repeater`: Events that were sent with custom `events` and `entities` that have not yet been added to the events table will be re-tried later by the repeater
27
-
3.`streamloader`: Core application which pulls data from an Enriched events topic and loads into BigQuery
28
-
29
-
The mutator is deployed as a `singleton` instance but both the `repeater` and `streamloader` can be scaled horizontally if higher throughput is needed.
23
+
The BigQuery Loader reads data from a Snowplow Enriched output PubSub topic and writes in realtime to BigQuery events table.
30
24
31
25
```hcl
32
26
# NOTE: Needs to be fed by the enrich module with valid Snowplow Events
| <aname="input_bigquery_partition_column"></a> [bigquery\_partition\_column](#input\_bigquery\_partition\_column)| The partition column to use in the dataset |`string`|`"collector_tstamp"`| no |
148
130
| <aname="input_bigquery_require_partition_filter"></a> [bigquery\_require\_partition\_filter](#input\_bigquery\_require\_partition\_filter)| Whether to require a filter on the partition column in all queries |`bool`|`true`| no |
149
131
| <aname="input_bigquery_table_id"></a> [bigquery\_table\_id](#input\_bigquery\_table\_id)| The ID of the table within a dataset to load data into (will be created if it doesn't exist) |`string`|`"events"`| no |
132
+
| <aname="input_service_account_json_b64"></a> [bigquery\_service\_account\_json\_b64](#input\_bigquery\_service\_account\_json\_b64)| Custom credentials (as base64 encoded service account key) instead of default service account assigned to the loader's compute group |`string`|`""`| no |
150
133
| <aname="input_custom_iglu_resolvers"></a> [custom\_iglu\_resolvers](#input\_custom\_iglu\_resolvers)| The custom Iglu Resolvers that will be used by the loader to resolve and validate events | <pre>list(object({<br> name = string<br> priority = number<br> uri = string<br> api_key = string<br> vendor_prefixes = list(string)<br> }))</pre> |`[]`| no |
151
134
| <aname="input_default_iglu_resolvers"></a> [default\_iglu\_resolvers](#input\_default\_iglu\_resolvers)| The default Iglu Resolvers that will be used by the loader to resolve and validate events | <pre>list(object({<br> name = string<br> priority = number<br> uri = string<br> api_key = string<br> vendor_prefixes = list(string)<br> }))</pre> | <pre>[<br> {<br> "api_key": "",<br> "name": "Iglu Central",<br> "priority": 10,<br> "uri": "http://iglucentral.com",<br> "vendor_prefixes": []<br> },<br> {<br> "api_key": "",<br> "name": "Iglu Central - Mirror 01",<br> "priority": 20,<br> "uri": "http://mirror01.iglucentral.com",<br> "vendor_prefixes": []<br> }<br>]</pre> | no |
152
-
| <aname="input_gcp_logs_enabled"></a> [gcp\_logs\_enabled](#input\_gcp\_logs\_enabled)| Whether application logs should be reported to GCP Logging |`bool`|`true`| no |
135
+
| <aname="input_iglu_cache_size"></a> [iglu\_cache\_size](#input\_iglu\_cache\_size)| The size of cache used by Iglu Resolvers |`number`|`500`| no |
136
+
| <aname="input_iglu_cache_ttl_seconds"></a> [iglu\_cache\_ttl\_seconds](#input\_iglu\_cache\_ttl\_seconds)| Duration in seconds, how long should entries be kept in Iglu Resolvers cache before they expire |`number`|`600`| no |
153
137
| <aname="input_java_opts"></a> [java\_opts](#input\_java\_opts)| Custom JAVA Options |`string`|`"-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75"`| no |
154
138
| <aname="input_labels"></a> [labels](#input\_labels)| The labels to append to this resource |`map(string)`|`{}`| no |
155
-
| <aname="input_machine_type_mutator"></a> [machine\_type\_mutator](#input\_machine\_type\_mutator)| The machine type to use |`string`|`"e2-small"`| no |
156
-
| <aname="input_machine_type_repeater"></a> [machine\_type\_repeater](#input\_machine\_type\_repeater)| The machine type to use |`string`|`"e2-small"`| no |
157
-
| <aname="input_machine_type_streamloader"></a> [machine\_type\_streamloader](#input\_machine\_type\_streamloader)| The machine type to use |`string`|`"e2-small"`| no |
139
+
| <aname="input_machine_type"></a> [machine\_type](#input\_machine\_type)| The machine type to use |`string`|`"e2-small"`| no |
158
140
| <aname="input_network_project_id"></a> [network\_project\_id](#input\_network\_project\_id)| The project ID of the shared VPC in which the stack is being deployed |`string`|`""`| no |
159
141
| <aname="input_ssh_block_project_keys"></a> [ssh\_block\_project\_keys](#input\_ssh\_block\_project\_keys)| Whether to block project wide SSH keys |`bool`|`true`| no |
160
142
| <aname="input_ssh_ip_allowlist"></a> [ssh\_ip\_allowlist](#input\_ssh\_ip\_allowlist)| The list of CIDR ranges to allow SSH traffic from |`list(any)`| <pre>[<br> "0.0.0.0/0"<br>]</pre> | no |
161
143
| <aname="input_ssh_key_pairs"></a> [ssh\_key\_pairs](#input\_ssh\_key\_pairs)| The list of SSH key-pairs to add to the servers | <pre>list(object({<br> user_name = string<br> public_key = string<br> }))</pre> |`[]`| no |
162
144
| <aname="input_subnetwork"></a> [subnetwork](#input\_subnetwork)| The name of the sub-network to deploy within; if populated will override the 'network' setting |`string`|`""`| no |
163
-
| <aname="input_target_size_repeater"></a> [target\_size\_repeater](#input\_target\_size\_repeater)| The number of servers to deploy |`number`|`1`| no |
164
-
| <aname="input_target_size_streamloader"></a> [target\_size\_streamloader](#input\_target\_size\_streamloader)| The number of servers to deploy |`number`|`1`| no |
145
+
| <aname="input_target_size"></a> [target\_size](#input\_target\_size)| The number of servers to deploy |`number`|`1`| no |
165
146
| <aname="input_telemetry_enabled"></a> [telemetry\_enabled](#input\_telemetry\_enabled)| Whether or not to send telemetry information back to Snowplow Analytics Ltd |`bool`|`true`| no |
166
147
| <aname="input_ubuntu_20_04_source_image"></a> [ubuntu\_20\_04\_source\_image](#input\_ubuntu\_20\_04\_source\_image)| The source image to use which must be based of of Ubuntu 20.04; by default the latest community version is used |`string`|`""`| no |
167
148
| <aname="input_user_provided_id"></a> [user\_provided\_id](#input\_user\_provided\_id)| An optional unique identifier to identify the telemetry events emitted by this stack |`string`|`""`| no |
149
+
| <aname="input_webhook_collector"></a> [webhook\_collector](#input\_webhook\_collector)| Collector address used to gather monitoring alerts |`string`|`""`| no |
150
+
| <aname="input_skip_schemas"></a> [skip\_schemas](#input\_skip\_schemas)| The list of schema keys which should be skipped (not loaded) to the warehouse |`list(string)`|`[]`| no |
151
+
| <aname="input_healthcheck_enabled"></a> [healthcheck\_enabled](#input\_healthcheck\_enabled)| Whether or not to enable health check probe for GCP instance group |`bool`|`true`| no |
168
152
169
153
## Outputs
170
154
171
155
| Name | Description |
172
156
|------|-------------|
157
+
| <aname="output_health_check_id"></a> [health\_check\_id](#output\_health\_check\_id)| Identifier for the health check on the instance group |
158
+
| <aname="output_health_check_self_link"></a> [health\_check\_self\_link](#output\_health\_check\_self\_link)| The URL for the health check on the instance group |
173
159
| <aname="output_instance_group_url"></a> [instance\_group\_url](#output\_instance\_group\_url)| The full URL of the instance group created by the manager |
174
160
| <aname="output_manager_id"></a> [manager\_id](#output\_manager\_id)| Identifier for the instance group manager |
175
161
| <aname="output_manager_self_link"></a> [manager\_self\_link](#output\_manager\_self\_link)| The URL for the instance group manager |
162
+
| <aname="output_named_port_http"></a> [named\_port\_http](#output\_named\_port\_http)| The name of the port exposed by the instance group |
163
+
| <aname="output_named_port_value"></a> [named\_port\_value](#output\_named\_port\_value)| The named port value (e.g. 8080) |
0 commit comments