Concurrent checkpoint creation leads to corrupt delta table for delta-rs readers #3244
-
I have the following setup:
When both processes decide to create a checkpoint on the same version, there is no failure on writing, since the notebook does a multi-part checkpoint, while the delta-rs process does a single-part checkpoint. After this occurs, when trying to open the table with the delta-rs lib, we get the following error:
This is because of the way the library counts the number of parts Should the library ignore the multi-part files if the |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
I don't thinking using multiple delta writers in the same table is a good idea, the whole ecosystem is not mature enough, just use one writer for everything. |
Beta Was this translation helpful? Give feedback.
-
@Werepyrex10 I suggest you disable the checkpointing in delta-spark or delta-rs for now. |
Beta Was this translation helpful? Give feedback.
-
@Werepyrex10 What storage backend is this? If it's S3, is the Databricks cluster using the same S3DynamoDbLogStore configuration as the delta-rs process? |
Beta Was this translation helpful? Give feedback.
-
Hey @rtyler , we are using azure blob storage as the storage backend |
Beta Was this translation helpful? Give feedback.
-
We had the same issue, or you config a Dynamo for the delta log or you use only 1 writer. We ended up with the 1 writer solution. |
Beta Was this translation helpful? Give feedback.
-
Hello, I am trying to read a delta table that someone else created, and I am getting the error
Is there nothing I can do from my side? |
Beta Was this translation helpful? Give feedback.
I don't thinking using multiple delta writers in the same table is a good idea, the whole ecosystem is not mature enough, just use one writer for everything.