Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode JSONBOID as binary bytes, like TextOID and JSONOID #115

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sean-
Copy link

@sean- sean- commented Jul 11, 2021

This results in a >10x improvement in throughput when working with the JSONB data type.

@sean-
Copy link
Author

sean- commented Jul 11, 2021

I'm assuming this PR is correct (this PR was dry-coded). Registering the following data type worked around the performance problem I was experiencing (reading data from one database and writing it to another).

    conn.ConnInfo().RegisterDataType(pgtype.DataType{Value: &pgtype.Bytea{}, Name: "jsonb", OID: pgtype.JSONBOID})

Before:

     0.04s 0.035%  5.17%     36.30s 31.61%  github.com/jackc/pgx/v4.(*Conn).execParamsAndPreparedPrefix
     0.02s 0.017%  5.19%     36.21s 31.53%  github.com/jackc/pgx/v4.(*extendedQueryBuilder).AppendParam
    36.01s 31.36% 36.55%     36.13s 31.46%  runtime.heapBitsForAddr (inline)
     0.07s 0.061% 36.61%     36.01s 31.36%  github.com/jackc/pgx/v4.(*extendedQueryBuilder).encodeExtendedParamValue
     0.02s 0.017% 36.62%     35.22s 30.67%  github.com/jackc/pgtype.(*JSONB).Set
     0.01s 0.0087% 36.63%     35.20s 30.65%  github.com/jackc/pgtype.(*JSON).Set
     0.25s  0.22% 36.85%     30.09s 26.20%  reflect.copyVal
     0.11s 0.096% 36.95%     27.85s 24.25%  reflect.unsafe_New
     0.29s  0.25% 37.20%     25.13s 21.88%  reflect.Value.MapKeys
         0     0% 37.21%     22.33s 19.44%  runtime.systemstack
     0.06s 0.052% 37.26%     22.05s 19.20%  github.com/jackc/pgx/v4.(*connRows).Values
         0     0% 37.26%     21.16s 18.43%  encoding/json.Unmarshal
         0     0% 37.26%     21.16s 18.43%  github.com/jackc/pgtype.JSON.Get
         0     0% 37.26%     21.16s 18.43%  github.com/jackc/pgtype.JSONB.Get
     0.68s  0.59% 37.85%     19.55s 17.02%  reflect.Value.MapIndex
     0.05s 0.044% 38.01%     15.76s 13.72%  runtime.(*mcache).nextFree
     0.44s  0.38% 38.39%     15.62s 13.60%  runtime.(*mcache).refill
     0.21s  0.18% 38.58%     14.31s 12.46%  runtime.(*mcentral).cacheSpan
     0.12s   0.1% 38.68%     13.88s 12.09%  encoding/json.(*decodeState).valueInterface

@jackc
Copy link
Owner

jackc commented Jul 17, 2021

I don't think this will work in the general case as is. The binary format of jsonb includes a 1 byte prefix that must be stripped.

Beyond that, some care needs to be taken when choosing whether binary or text format is preferred. I've found that the text format is significantly faster in some cases. Not sure whether that is the case with jsonb though. See https://www.postgresql.org/message-id/CAMovtNoHFod2jMAKQjjxv209PCTJx5Kc66anwWvX0mEiaXwgmA%40mail.gmail.com for discussion on the PG hackers list.

Oh, one other thing. If your underlying use case is copying data from one database to another at high speed then you should check out the copy protocol support in pgconn. You can use CopyFrom and CopyTo to copy data with much less overhead. Could be far faster than anything that uses a query.

@sean-
Copy link
Author

sean- commented Jul 20, 2021

I don't think this will work in the general case as is. The binary format of jsonb includes a 1 byte prefix that must be stripped.

Oh! Interesting, I didn't notice that... but that's probably because I was splatting things into a different jsonb column, so everything checksumed correctly on the source and destination.

Beyond that, some care needs to be taken when choosing whether binary or text format is preferred. I've found that the text format is significantly faster in some cases. Not sure whether that is the case with jsonb though. See https://www.postgresql.org/message-id/CAMovtNoHFod2jMAKQjjxv209PCTJx5Kc66anwWvX0mEiaXwgmA%40mail.gmail.com for discussion on the PG hackers list.

This is actually querying data from CockroachDB, not PostgreSQL. In this case, the team will be querying the data as JSON. I suppose it's possible to cast TEXT to JSONB to extract meaningful data, but that seems pretty expensive, too. I thought, maybe incorrectly, that JSONB would still be sent over the wire as TEXT, and that the only difference was on-disk and the lack of ordering guarantees of the record after it was written, etc. But, if you queried a JSONB column, you'd get a text value back that was the TEXT encoding of the JSONB. ?

https://www.postgresql.org/docs/current/datatype-json.html

Oh, one other thing. If your underlying use case is copying data from one database to another at high speed then you should check out the copy protocol support in pgconn. You can use CopyFrom and CopyTo to copy data with much less overhead. Could be far faster than anything that uses a query.

With the Bytea{} data type, in our workload we were network-bound, not CPU bound. For the size of data and level of parallelism, CopyFrom() could work just as easily as the query. In nearly all cases, the writes were the limiting factor as long as JSONB was not decoded as JSON.

@rafiss
Copy link

rafiss commented Aug 3, 2021

But, if you queried a JSONB column, you'd get a text value back that was the TEXT encoding of the JSONB. ?

It does seem that way. This is a Postgres SQL shell:

rafiss@127:postgres> select '"\uD83D\uDE80"'::jsonb
+---------+
| jsonb   |
|---------|
| "🚀"    |
+---------+
SELECT 1

rafiss@127:postgres> select '"\uD83D\uDE80"'::json
+----------------+
| json           |
|----------------|
| "\uD83D\uDE80" |
+----------------+
SELECT 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants