You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+78-68Lines changed: 78 additions & 68 deletions
Original file line number
Diff line number
Diff line change
@@ -7,17 +7,17 @@ Ingest streaming data into PostgresSQL or Export data from PostgreSQL and transf
7
7
8
8
## what are you talking about ?
9
9
10
-
Well first you have to know that PostgreSQL has not-so-well-known mechanism that helps when importing into PostgreSQL from a source (*copy-in*)
11
-
or exporting to a sink from PostgreSQL (*copy-out*)
10
+
Well first you have to know that PostgreSQL has not-so-well-known mechanism that helps when importing into PostgreSQL from a source (_copy-in_)
11
+
or exporting to a sink from PostgreSQL (_copy-out_)
12
12
13
13
You should first go and get familiar with the [pg-copy-streams](https://github.com/brianc/node-pg-copy-streams) module that does
14
14
the heavy lifting of handling the COPY part of the protocol flow.
15
15
16
16
## what does this module do ?
17
17
18
-
When dealing with the COPY mechanism, you can use different formats for *copy-out* or *copy-in* : text, csv or binary.
18
+
When dealing with the COPY mechanism, you can use different formats for _copy-out_ or _copy-in_ : text, csv or binary.
19
19
20
-
The text and csv formats are interesting but they have some limitations due to the fact that they are text based, need field separators, escaping, etc. Have you ever been in the CSV hell ?
20
+
The text and csv formats are interesting but they have some limitations due to the fact that they are text based, need field separators, escaping, etc. Have you ever been in the CSV hell ?
21
21
22
22
The PostgreSQL documentation states : Many programs produce strange and occasionally perverse CSV files, so the file format is more a convention than a standard. Thus you might encounter some files that cannot be imported using this mechanism, and COPY might produce files that other programs cannot process.
23
23
@@ -26,9 +26,10 @@ Do you want to go there ? If you take the blue pill, then this module might be f
26
26
It can be used to parse and deparse the PostgreSQL binary streams that are made available by the `pg-copy-streams` module.
27
27
28
28
The main API is called `transform` an tries to hide many of those details. It can be used to easily do non trivial things like :
29
-
- transforming rows
30
-
- expanding on the number of rows
31
-
- forking rows into several databases at the same time, with the same of different structures
29
+
30
+
- transforming rows
31
+
- expanding on the number of rows
32
+
- forking rows into several databases at the same time, with the same of different structures
32
33
33
34
## Example
34
35
@@ -61,83 +62,89 @@ Table C has the simple structure
61
62
CREATETABLEgenerated (body text);
62
63
```
63
64
64
-
And you want to fill it, for each source row, with a number `id` of rows (expanding the number of rows), with a body of "BODY: " + description.
65
+
And you want to fill it, for each source row, with a number `id` of rows (expanding the number of rows), with a body of "BODY: " + description.
65
66
66
67
After all this is done, you want to add a line in the `generated` table with a body of "COUNT: " + total number of rows inserted (not counting this one)
67
68
68
69
Here is a code that will do just this.
69
70
70
71
```js
71
-
var pg =require('pg');
72
-
var through2 =require('through2');
73
-
var copyOut =require('pg-copy-streams').to;
74
-
var copyIn =require('pg-copy-streams').from;
75
-
var pgCopyTransform =require('pg-copy-streams-binary').transform;
76
-
77
-
varclient=function(dsn) {
78
-
var client =newpg.Client(dsn);
79
-
client.connect();
80
-
return client;
72
+
var pg =require('pg')
73
+
var through2 =require('through2')
74
+
var copyOut =require('pg-copy-streams').to
75
+
var copyIn =require('pg-copy-streams').from
76
+
var pgCopyTransform =require('pg-copy-streams-binary').transform
77
+
78
+
varclient=function(dsn) {
79
+
var client =newpg.Client(dsn)
80
+
client.connect()
81
+
return client
81
82
}
82
83
83
-
var dsnA =null;// configure database A connection parameters
84
-
var dsnB =null;// configure database B connection parameters
85
-
var dsnC =null;// configure database C connection parameters
84
+
var dsnA =null// configure database A connection parameters
85
+
var dsnB =null// configure database B connection parameters
86
+
var dsnC =null// configure database C connection parameters
86
87
87
-
var clientA =client(dsnA);
88
-
var clientB =client(dsnB);
89
-
var clientC =client(dsnC);
88
+
var clientA =client(dsnA)
89
+
var clientB =client(dsnB)
90
+
var clientC =client(dsnC)
90
91
91
92
var AStream =clientA.query(copyOut('COPY item TO STDOUT BINARY'))
92
-
var BStream =clientB.query(copyIn('COPY product FROM STDIN BINARY'))
93
-
var CStream =clientB.query(copyIn('COPY generated FROM STDIN BINARY'))
93
+
var BStream =clientB.query(copyIn('COPY product FROM STDIN BINARY'))
94
+
var CStream =clientB.query(copyIn('COPY generated FROM STDIN BINARY'))
The `test/transform.js` test does something along these lines to check that it works.
@@ -207,7 +214,6 @@ default: true
207
214
This option can be used to not send the header that PostgreSQL expects at the end of COPY session.
208
215
You could use this if you want to unpipe this stream pipe another one that will send more data and maybe finish the COPY session.
209
216
210
-
211
217
## API for Parser
212
218
213
219
### options.mapping
@@ -225,27 +231,33 @@ When `mapping` is not given, the Parser will push rows as arrays of Buffers.
225
231
226
232
For all supported types, their corresponding array version is also supported.
227
233
228
-
* bool
229
-
* bytea
230
-
* int2, int4
231
-
* float4, float8
232
-
* text
233
-
* json
234
-
* timestamptz
234
+
- bool
235
+
- bytea
236
+
- int2, int4
237
+
- float4, float8
238
+
- text
239
+
- json
240
+
- timestamptz
235
241
236
242
Note that when types are mentioned in the `mapping` option, it should be stricly equal to one of theses types. pgadmin might sometimes mention aliases (like integer instead of int4) and you should not use these aliases.
237
243
238
-
The types for array (one or more dimentions) corresponds to the type prefixed with an underscore. So an array of int4, int4[], needs to be referenced as _int4 without any mention of the dimensions. This is because the dimension information is embedded in the binary format.
244
+
The types for array (one or more dimentions) corresponds to the type prefixed with an underscore. So an array of int4, int4[], needs to be referenced as \_int4 without any mention of the dimensions. This is because the dimension information is embedded in the binary format.
239
245
246
+
## changelog
247
+
248
+
### version 1.2.1 - published 2020-05-29
249
+
250
+
- Fix a compatibility bug introduced via `pg-copy-streams` 3.0. The parser can now handle rows that span across several stream chunks
251
+
- Migration of tests to mocha
240
252
241
253
## Warnings & Disclaimer
242
254
243
255
There are many details in the binary protocol, and as usual, the devil is in the details.
244
-
* Currently, operations are considered to happen on table WITHOUT OIDS. Usage on table WITH OIDS has not been tested.
245
-
* In Arrays null placeholders are not implemented (no spot in the array can be empty).
246
-
* In Arrays, the first element of a dimension is always at index 1.
247
-
* Errors handling has not yet been tuned so do not expect explicit error messages
248
256
257
+
- Currently, operations are considered to happen on table WITHOUT OIDS. Usage on table WITH OIDS has not been tested.
258
+
- In Arrays null placeholders are not implemented (no spot in the array can be empty).
259
+
- In Arrays, the first element of a dimension is always at index 1.
260
+
- Errors handling has not yet been tuned so do not expect explicit error messages
249
261
250
262
The PostgreSQL documentation states it clearly : "a binary-format file is less portable across machine architectures and PostgreSQL versions".
251
263
Tests are trying to discover issues that may appear in between PostgreSQL version but it might not work in your specific environment.
@@ -255,9 +267,9 @@ Use it at your own risks !
255
267
256
268
## External references
257
269
258
-
*[COPY documentation, including binary format](https://www.postgresql.org/docs/current/static/sql-copy.html)
259
-
*[send/recv implementations for types in PostgreSQL](https://github.com/postgres/postgres/tree/master/src/backend/utils/adt)
260
-
*[default type OIDs in PostgreSQL catalog](https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_type.h)
270
+
-[COPY documentation, including binary format](https://www.postgresql.org/docs/current/static/sql-copy.html)
271
+
-[send/recv implementations for types in PostgreSQL](https://github.com/postgres/postgres/tree/master/src/backend/utils/adt)
272
+
-[default type OIDs in PostgreSQL catalog](https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_type.h)
261
273
262
274
## Acknowledgments
263
275
@@ -286,5 +298,3 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
286
298
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
287
299
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
0 commit comments