Skip to content

Conversation

julienrffr
Copy link

@julienrffr julienrffr commented Oct 16, 2023

we should not simply replace and lose existing XMP metadata if any already set

NB: this feature is available in PDFsharp-extended nuget package

we should not simply replace and lose existing XMP metadata if any already set
@ThomasHoevel
Copy link
Member

I'm afraid it is not that simple.
We do not want to keep old PDFsharp XMP metadata either.

@julienrffr
Copy link
Author

I'm afraid it is not that simple. We do not want to keep old PDFsharp XMP metadata either.

XMP metadata can already be filled by the dev which is generating a new document from scratch.
It can also come from an already generated document by a third party.

Either case it is bad to just replace as it's currently done.

Until a proper PdfMetadata implementation (which means access properties individually, not setting a plain text XMP XML), we should not simply replace.

Why don't you want to keep old metadata?
The only thing that could make sense to change silently is the ModifyDate.

At least dev should have control of how metadata is handled, it should not be just lost.

@julienrffr
Copy link
Author

I'm afraid it is not that simple. We do not want to keep old PDFsharp XMP metadata either.

Another solution would be to add a document option in PdfDocumentOptions => XmpMetadata, which would have 2 values: auto/manual.
If manual, XMP would not be overriden.

@julienrffr
Copy link
Author

I updated my pull request.

The behavior is now depending on PdfDocumentOptions.ManualXmpGeneration:
true (manual): XMP metadata has to be built and attached manually via PdfMetadata class.
false (auto): XMP metadata will be built and attached automatically.

The value is false by default, which means by default XMP metadata will be generated by PdfSharp and attached to the document silently.

This will allow devs who want to build their own XMP metadata to do so, without it being overriden by PdfSharp.

@podprad
Copy link

podprad commented Oct 16, 2023

As a developer, who reported this issue I would say thank you for fixing it. The flag will be enough for us to keeping PdfSharp. Maybe just rename it to "PreserveMetadata".

As an architect... well. It depends what do you want to achieve. If you just want to write properties like ProducerTool and ModDate, then overwriting the whole metadata is just not right. The metadata might contain information critical for the document. For example - it might contain some PDF/A conformance info.
In my opinion you should first check if metadata exist. If exist, then it should be parsed and updated correctly. It can be only achieved when you have XMP parser.
In our solution we use a combination of PdfSharp + XmpCore. Maybe XmpCore could be somehow incorporated into PdfSharp? In such case you won't need flags like "ManualXmpGeneration". Just parse existing metadata, update ProducerTool, serialize it back and write changes to pdf stream.

Also keep in mind that Metadata may appear in /Page objects. Some devices require it for batch printing. I'm modifying it by combination of PdfSharp and XmpCore. It works pretty well.

@julienrffr
Copy link
Author

If you just want to write properties like ProducerTool and ModDate, then overwriting the whole metadata is just not right. The metadata might contain information critical for the document. For example - it might contain some PDF/A conformance info. In my opinion you should first check if metadata exist. If exist, then it should be parsed and updated correctly. It can be only achieved when you have XMP parser. In our solution we use a combination of PdfSharp + XmpCore. Maybe XmpCore could be somehow incorporated into PdfSharp? In such case you won't need flags like "ManualXmpGeneration". Just parse existing metadata, update ProducerTool, serialize it back and write changes to pdf stream.

Totally agree, PdfSharp should not brutally overwrite existing metadata set by developer or existing on third party pdf.
Indeed partial update of metadata via XmpCore could be interesting, BUT:

  • it is a strong dependency
  • arbitrary decide which metadata is overwritten? and on what schemas (dublin core, xmp 1.0, etc.)?
  • IMO developer should always have control on what is the output. Metadata should not be added at PreSave time without any chance to modify what has been silently changed.

To give an example, in my company, we even want to have control on the ModifyDate in Xmp's metadata.
So partial update via XmpCore could be an option (implies some work and arbitrary decisions), but we should still have a flag that allows to completely have control over the Xmp metadata.

@podprad
Copy link

podprad commented Oct 17, 2023

XmpCore has BSD license, so maybe the source code can be copied to PdfSharp, without making strong DLL/nuget dependency:
https://github.com/drewnoakes/xmp-core-dotnet

"arbitrary decide which metadata is overwritten? and on what schemas (dublin core, xmp 1.0, etc.)?" - derive it from original document, if possible. For new documents it should be allowed to specify the schema (via enum etc.).

"IMO developer should always have control on what is the output. Metadata should not be added at PreSave time without any chance to modify what has been silently changed." - Yes. In my opinion it also applies to ProducerTool. From my perspective I would like to avoid exposing information about our software externally. This is rather security consideration.

@J3ro3nC
Copy link

J3ro3nC commented Nov 2, 2023

Totally agree, overwriting existing metadata is not good.
Instead of using xmpCore metadata can mayby be updated using XmlDocument.

@julienrffr
Copy link
Author

Partial update of XMP data is a bit of a work...
And still developer should have full control if he wants to not override anything he already put.

NB: this feature is available in PDFsharp-extended nuget package

@stephanstapel
Copy link

@ThomasHoevel : is there an update on this? For https://github.com/stephanstapel/ZUGFeRD-csharp, we also need to be able to write custom xmp data.

@chrislanzara
Copy link

chrislanzara commented Dec 17, 2024

Hi, I've been looking into @stephanstapel 's ZUGFeRD-csharp library (thanks @stephanstapel! ) for the last few days wondering why the validation was failing after adding my own Metadata, ultimately to track down the same "feature" in the public master branch which lead me to this PR, so wanted to add a +1 to this request.
I don't have all the familiarity with this as a lot of the team working on this PR, however I would support the addition of a flag the developer can use to control whether PDFSharp outputs it own Metadata or allows the developer to output it manually and take full control (and responsibility).
Is there any way forward with this please, or as an alternative, is there a way of deleting existing Metadata sections after the final document has been created, thus allowing the developer to replace it with one of their own?
Thanks!

Update: Just to add a bit of clarity to my comments above, I took ZUGFeRD-csharp and re-pointed it to the master branch version of PDFSharp (also cloned locally) and stepped through the code in Visual Studio.
If I comment out line 438 of src/foundation/src/PDFsharp/src/PdfSharp/Pdf/PdfDocument.cs so it doesn't execute, i.e.

// Catalog.Elements.SetReference(PdfCatalog.Keys.Metadata, new PdfMetadata(this));

And I continue to add my own Metadata, then I can generate a PDF file that passes ZUGFeRD validation.
For my particular case, the metadata produced by this line is the problem for me because validation picks up on this version rather than the version of metadata that I have added which contains the required fields, so simply removing this line will resolve my issue. From an architectural point of view however, I fully agree that allowing this to be configured by the developer (which the solution proposed in this PR) is the correct way forward.

@jjtilly
Copy link

jjtilly commented Jan 7, 2025

How come this is closed? We would also want this feature. It is present in other libraries some PdfSharp-Extended. But we don't want to mix libraries. We love this one :)

@iMoppel
Copy link

iMoppel commented Jan 7, 2025

It would be very helpfull if this is fixed!

@ThomasHoevel
Copy link
Member

How come this is closed?

This is not closed.

We would also want this feature. It is present in other libraries some PdfSharp-Extended. But we don't want to mix libraries. We love this one :)

We have it on our list. We want a clean solution, not a quick hack that solves one issue, but leads to other issues.

@stephanstapel
Copy link

How come this is closed?

This is not closed.

We would also want this feature. It is present in other libraries some PdfSharp-Extended. But we don't want to mix libraries. We love this one :)

We have it on our list. We want a clean solution, not a quick hack that solves one issue, but leads to other issues.

Hi @ThomasHoevel ,

that is great to hear, thanks a lot for your answer. Is there anything I/ we can do to support you?

@confix
Copy link

confix commented Jul 28, 2025

Any updates on this one? It'd be really helpful. Is there any way I can support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants