Skip to content

Commit 06a6e31

Browse files
committed
monorepo init
1 parent c3426fc commit 06a6e31

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+10693
-0
lines changed

.gitignore

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
2+
3+
# dependencies
4+
node_modules
5+
.pnp
6+
.pnp.js
7+
8+
# testing
9+
coverage
10+
11+
# svelte
12+
.svelte-kit
13+
14+
# misc
15+
.DS_Store
16+
*.pem
17+
18+
# debug
19+
npm-debug.log*
20+
yarn-debug.log*
21+
yarn-error.log*
22+
23+
# local env files
24+
.env.local
25+
.env.development.local
26+
.env.test.local
27+
.env.production.local
28+
29+
# turbo
30+
.turbo

.npmrc

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
auto-install-peers = true

README.md

+209
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# Tinybird Data Generator
2+
3+
## Description
4+
5+
The Tinybird Data Generator is a tool for generating fake data and sending it to Tinybird via the [Events API](https://www.tinybird.co/docs/guides/ingest-from-the-events-api.html).
6+
7+
Go to [https://data-generator-v2-tinybird.vercel.app/](https://data-generator-v2-tinybird.vercel.app/)
8+
or alternatively run it [locally](#local-install) on your machine.
9+
10+
![Generator UI!](/readme/img/ui.png "Generator UI")
11+
12+
## Local Install
13+
14+
To run the generator locally, first install Node.js (developed using v18 lts/hydrogen).
15+
16+
Then:
17+
18+
1. Clone this monorepo: `git clone https://github.com/tinybirdco/data_generator_2.git`
19+
2. Install dependencies with npm: `npm install`
20+
21+
Repository has the following structure:
22+
23+
```bash
24+
├── apps
25+
│ ├── cli
26+
│ └── web
27+
└── packages
28+
└── tinybird-generator
29+
```
30+
31+
## Web
32+
33+
### How To Use
34+
35+
1. Run the development server: `npm run dev`. The web app will be available on [http://localhost:3000](http://localhost:3000).
36+
2. Input your Tinybird config by clicking the __Tinybird Settings__ button in the top right & click __Save__.
37+
3. Customise the schema in the text box on the left & click __Save__. This will first validate that it is valid JSON, and second, validate that you have used known data types. If you pass validation, you'll see a preview of your data on the right.
38+
4. Click the __Generate!__ button to start sending data to Tinybird.
39+
40+
### Passing params in the URL
41+
42+
To allow faster demoing, you can pass the desired params in the url. E.g., [http://localhost:3000/?schema=z_sales&eps=1&host=eu_gcp&datasource=sales_dg&token=p.eyJ1IjogIjg4Nzk5NGUxLWZmNmMtNGUyMi1iZTg5LTNlYzBmNmRmMzlkZCIsICJpZCI6ICIwN2RlZThhMS0wNGMzLTQ4OTQtYmQxNi05ZTlkMmM3ZWRhMTgifQ.p_N4EETK7dbxOgHtugAUue3BUWwyGHT461Ha8P-d3Go](http://localhost:3000/?schema=z_sales&eps=1&host=eu_gcp&datasource=sales_dg&token=p.eyJ1IjogIjg4Nzk5NGUxLWZmNmMtNGUyMi1iZTg5LTNlYzBmNmRmMzlkZCIsICJpZCI6ICIwN2RlZThhMS0wNGMzLTQ4OTQtYmQxNi05ZTlkMmM3ZWRhMTgifQ.p_N4EETK7dbxOgHtugAUue3BUWwyGHT461Ha8P-d3Go)
43+
44+
***
45+
__NOTE:__ shown token has been deprecated so it won't work
46+
***
47+
48+
### Passing predefined Schemas
49+
50+
Apart from passing the schema templates —`schema=z_sales`— as you can see [above](#passing-params-in-the-url), you can also pass a hash that contains the JSON Schema.
51+
This way you can save it for later and using it for quick demos.
52+
53+
To generate new valid schemas without [adding default templates](#adding-new-default-templates) you can simply edit the JSON in the Schema Builder, click on Save, and, if it is a valid one, a new hash of the JSON will be saved in the url. E.g., `schema=XQAAAAJxAQAAAAAAAABtAElZc5EUiPjRVsymUw6kCv7vrUcchRGo2mUJw1XM3QbdEbjcCYxEyOlqpifLRoogkVAEtr4ISWOmu33DlLoNC_p6GQaSv1x6BIitOL-wxXI56XzsFLA0JJEAl4NmVrBsCzjgHzv0MIgS5NF7EEnU7qypT5jWdjF8Svh9vy1epdoW6QCBCYbevwnEVck6v-SvIsw5D_ggGs7AOZMlRRwbj4gl_57mYFDOcqi2AXzhPmmQNKpmf3EaZWtzauwCUNUmU7u57rgynqMaWgZTysoukECAVA1mIPGEI3cMA0C-l7kRc_J7qCpAObcGfciJ_XYA1AiiWylgDcU4BxXTY3D4DhFX_r1oIA`
54+
55+
## CLI
56+
57+
The tool has a CLI mode you can use:
58+
59+
```sh
60+
> node cli/index.js --schema schema.txt \
61+
--datasource $DS_NAME \
62+
--token $TB_TOKEN \
63+
--endpoint eu_gcp \
64+
--eps 50 \
65+
--limit 200
66+
```
67+
68+
And for your local Tinybird development environment:
69+
70+
```sh
71+
> TB_ENDPOINT=http://localhost:8300 \
72+
node cli/index.js --schema schema.txt \
73+
--datasource $DS_NAME \
74+
--token $TB_TOKEN \
75+
--endpoint custom \
76+
--eps 50 \
77+
--limit 200
78+
```
79+
80+
## DataTypes
81+
82+
See [/packages/tinybird-generator/src/dataTypes.ts](./packages/tinybird-generator/src/dataTypes.ts) for DataTypes supported in schemas. We use [Faker](https://fakerjs.dev/) to generate random data with some custom extensions. Not all Faker types are exposed, only types explicitly added via `dataTypes.ts` can be used in schemas.
83+
84+
Current types are:
85+
86+
```txt
87+
int
88+
intString
89+
string
90+
first_name
91+
last_name
92+
full_name
93+
email
94+
word
95+
domain
96+
values
97+
values_weighted
98+
datetime //values up to the second -> DateTime
99+
timestamp //values up to the millisecond -> DateTime64(3)
100+
function // returns a user defined function. E.g., "udf": {"type": "function", "params": {"function": "function () { date = new Date(); return date.toISOString()}"}}
101+
range //supports two params: 'params': ['start', 'end'] and returns ints between the params
102+
timestamp_range
103+
datetime_range
104+
uuid
105+
bool
106+
browser_name
107+
browser_engine_name
108+
city_name
109+
country_code_iso2
110+
operating_system
111+
search_engine
112+
lat_or_lon_string
113+
lat_or_lon_int
114+
words
115+
http_method
116+
user_agent
117+
semver
118+
```
119+
120+
### Adding new DataTypes
121+
122+
DataTypes are defined in [/packages/tinybird-generator/src/dataTypes.ts](./packages/tinybird-generator/src/dataTypes.ts).
123+
124+
To add a new DataType, add a new object to the data type array.
125+
126+
Requires properties:
127+
128+
`tinybird_type` - The Type to use for the column in Tinybird (Current not implemented)
129+
130+
`generator` - A function that generates and returns the actual value to use in the data
131+
132+
Optional properties:
133+
134+
`params` - An array of key names to use for input parameters
135+
136+
`param_validator` - A function that validates incoming parameters and returns true/false
137+
138+
For example, a minimal type takes no input params, thus requires no param validator:
139+
140+
```javascript
141+
'int': {
142+
'tinybird_type': 'int',
143+
generator() {
144+
return Math.floor(Math.random() * 100) + 1;
145+
}
146+
},
147+
```
148+
149+
A more complex type can take input parameters and must provide a validator:
150+
151+
```javascript
152+
'values': {
153+
'tinybird_type': 'string',
154+
'params': ['values'],
155+
'param_validator': function (params) {
156+
const validators = [
157+
param_validators.length,
158+
param_validators.keys
159+
]
160+
return RunValidators(validators, this.params, params);
161+
},
162+
'generator': function (params) {
163+
return params.values[Math.floor(Math.random() * params.values.length)];
164+
}
165+
},
166+
```
167+
168+
### Adding new default templates
169+
170+
Templates are defined in [/packages/tinybird-generator/src/presetSchemas.ts](./packages/tinybird-generator/src/presetSchemas.ts).
171+
172+
To add a new template, add a new property containing the schema to the presetSchemas object.
173+
174+
```javascript
175+
export const presetSchemas = {
176+
'default': {
177+
"some_int": {
178+
"type": "int"
179+
},
180+
"some_values": {
181+
"type": "values",
182+
"params": {
183+
"values": [123, 456]
184+
}
185+
},
186+
"values_weighted": {
187+
"type": "values_weighted",
188+
"params": {
189+
"values": [123, 456, 789],
190+
"weights": [90, 7, 3]
191+
}
192+
}
193+
},
194+
'acme store': {
195+
"datetime": {
196+
"type": "datetime"
197+
},
198+
"article_id": {
199+
"type": "values",
200+
"params": {
201+
"values": [709138001,517762001,675068002,712216001,507909003,762846008,469039019,631878001,697054003,682511001,618800001,710056003,507910001,470985003,697054014,762846001,762846007,721435001,734460001,762846006,581298005,682509001,502224001,850917001,622955001,695632001,349301001,507909001,859125001,623115001,622958003,716672001]
202+
}
203+
},
204+
"customer_id": {
205+
"type": "string"
206+
}
207+
}
208+
}
209+
```

apps/cli/index.js

+101
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
#!/usr/bin/env node
2+
3+
import fs from "fs";
4+
import yargs from "yargs/yargs";
5+
import { hideBin } from "yargs/helpers";
6+
import {
7+
presetSchemas,
8+
validateSchema,
9+
setConfig,
10+
createRowGenerator,
11+
sendData,
12+
} from "tinybird-generator";
13+
14+
const presetSchemaNames = Object.keys(presetSchemas);
15+
16+
const argv = yargs(hideBin(process.argv))
17+
.example("node index.js --datasource=XXX --token=XXX --endpoint=XXX")
18+
.options({
19+
template: {
20+
describe: "Template to use for populating",
21+
default: presetSchemaNames[0],
22+
choices: presetSchemaNames,
23+
conflicts: ["schema"],
24+
},
25+
schema: {
26+
describe: "Path to schema file",
27+
conflicts: ["template"],
28+
},
29+
datasource: {
30+
describe: "Tinybird datasource",
31+
demandOption: true,
32+
},
33+
token: {
34+
describe: "Tinybird API token",
35+
demandOption: true,
36+
},
37+
endpoint: {
38+
describe: "Tinybird API endpoint",
39+
demandOption: true,
40+
choices: ["eu_gcp", "us_gcp", "custom"],
41+
},
42+
eps: {
43+
describe: "Events per second",
44+
default: 1,
45+
},
46+
limit: {
47+
describe: "Max number of rows to send (-1 for unlimited)",
48+
default: -1,
49+
},
50+
}).argv;
51+
52+
const schema = argv.schema
53+
? JSON.parse(fs.readFileSync(argv.schema, "utf8"))
54+
: presetSchemas[argv.template];
55+
56+
if (!validateSchema(schema).valid) throw new Error("Invalid schema");
57+
58+
const min_delay_per_batch = 200;
59+
const max_batches_per_second = 1000 / min_delay_per_batch;
60+
61+
let batch_size, delay_per_batch;
62+
if (argv.eps < 1000) {
63+
batch_size = argv.eps;
64+
delay_per_batch = 1000;
65+
} else {
66+
batch_size = argv.eps / max_batches_per_second;
67+
delay_per_batch = min_delay_per_batch;
68+
}
69+
70+
setConfig({
71+
endpoint: argv.endpoint,
72+
datasource: argv.datasource,
73+
token: argv.token,
74+
});
75+
76+
const rowGenerator = createRowGenerator(schema),
77+
rows = [];
78+
79+
let limit = argv.limit,
80+
sent_rows = 0;
81+
82+
while (true) {
83+
rows.push(rowGenerator.generate());
84+
if (rows.length >= batch_size) {
85+
const data = rows.splice(0, batch_size);
86+
87+
try {
88+
await sendData(data);
89+
} catch (ex) {
90+
console.log(`ERR > ${ex}.`);
91+
break;
92+
}
93+
94+
sent_rows += data.length;
95+
console.log(`INFO> ${sent_rows} rows sent so far...`);
96+
97+
if (limit != -1 && sent_rows >= limit) break;
98+
99+
await new Promise((r) => setTimeout(r, delay_per_batch));
100+
}
101+
}

apps/cli/package.json

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"name": "tinybird-generator-cli",
3+
"description": "CLI for tinybird data generator",
4+
"version": "1.0.0",
5+
"main": "index.js",
6+
"bin": "index.js",
7+
"scripts": {
8+
"start": "node index.js"
9+
},
10+
"type": "module",
11+
"homepage": "https://github.com/tinybirdco/mockingbird#readme",
12+
"repository": {
13+
"type": "git",
14+
"url": "https://github.com/tinybirdco/mockingbird/apps/cli"
15+
},
16+
"dependencies": {
17+
"tinybird-generator": "*",
18+
"yargs": "^17.6.2"
19+
}
20+
}

apps/web/.eslintignore

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
.DS_Store
2+
node_modules
3+
/build
4+
/.svelte-kit
5+
/package
6+
.env
7+
.env.*
8+
!.env.example
9+
10+
# Ignore files for PNPM, NPM and YARN
11+
pnpm-lock.yaml
12+
package-lock.json
13+
yarn.lock

0 commit comments

Comments
 (0)