Skip to content

Deletion of a Part breaks reading other Parts through "using" method #1729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Asbjoedt opened this issue May 23, 2024 · 11 comments
Open

Deletion of a Part breaks reading other Parts through "using" method #1729

Asbjoedt opened this issue May 23, 2024 · 11 comments
Assignees

Comments

@Asbjoedt
Copy link
Contributor

Asbjoedt commented May 23, 2024

*Describe the bug
Hello
I am upgrading from v2.20 to v.3.0.2.

The new SDK update breaks code which involves deleting/removing some Open XML Parts in my spreadsheet document.

I am applying "using" to read and write the spreadsheet document. I receive error when trying to read any Part after deletion of some other Parts, when I try to read the Part in a new "using" scope. I receive error "Specified part does not exist in the package".

Opening the spreadsheet document wit Excel or LibreOffice works perfectly fine.

Observed behavior
Open XML SDK perceives the spreadsheet document to be broken with error "Specified part does not exist in the package" whenever I try to read any Part, if some Parts have previously been deleted/removed with the "using" method. Excel renders the spreadsheet document without errors.

Expected behavior

Desktop (please complete the following information):

  • OS: Windows 11
  • Office version: Office Professional 2019
  • .NET Target:
  • DocumentFormat.OpenXml Version: 3.0.2
@mikeebowen
Copy link
Collaborator

Hi @Asbjoedt, could you be more specific about which parts you mean:

I am applying "using" to read and write the spreadsheet document. I receive error when trying to read any Part after deletion of some other Parts, when I try to read the Part in a new "using" scope. I receive error "Specified part does not exist in the package".

When you day "read any Part after deletion of some other Parts", do you mean when you read any Part at all even the WorkbookPart? And which parts do you mean by "deletion of some other Parts"?

@Asbjoedt
Copy link
Contributor Author

When I delete these parts, I get "Specified part does not exist in the package", whenever I try to read any Part again:

  • ImagePart
  • ExternalWorkbookPart
  • CalculationChainPart
  • ConnectionsPart
  • TableDefinitionPart
  • SpreadsheetPrinterSettingsPart
  • VolatileDependenciesPart

@twsouthwick
Copy link
Member

please supply a repro so that we can try it out

@Asbjoedt
Copy link
Contributor Author

Asbjoedt commented Jun 14, 2024

Sample spreadsheet with a data connection
With data connection.xlsx

Use this code to remove a data connection

// Remove data connections
public int Remove_DataConnections(string filepath)
{
    int success = 0;

    using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, true))
    {
        ConnectionsPart conn = spreadsheet.WorkbookPart.ConnectionsPart;

        // Count connections
        success = conn.Connections.Count();

        // Delete all connections
        spreadsheet.WorkbookPart.DeletePart(conn);

        // Delete all QueryTableParts
        IEnumerable<WorksheetPart> worksheetParts = spreadsheet.WorkbookPart.WorksheetParts;
        foreach (WorksheetPart worksheetPart in worksheetParts)
        {
            // Delete all QueryTableParts in WorksheetParts
            List<QueryTablePart> queryTables = worksheetPart.QueryTableParts.ToList(); // Must be a list
            foreach (QueryTablePart queryTablePart in queryTables)
            {
                worksheetPart.DeletePart(queryTablePart);
            }

            // Delete all QueryTableParts, if they are not registered in a WorksheetPart
            List<TableDefinitionPart> tableDefinitionParts = worksheetPart.TableDefinitionParts.ToList();
            foreach (TableDefinitionPart tableDefinitionPart in tableDefinitionParts)
            {
                List<IdPartPair> idPartPairs = tableDefinitionPart.Parts.ToList();
                foreach (IdPartPair idPartPair in idPartPairs)
                {
                    if (idPartPair.OpenXmlPart.ToString() == "DocumentFormat.OpenXml.Packaging.QueryTablePart")
                    {
                        // Delete QueryTablePart
                        tableDefinitionPart.DeletePart(idPartPair.OpenXmlPart);
                        // The TableDefinitionPart must also be deleted
                        worksheetPart.DeletePart(tableDefinitionPart);
                        // And the reference to the TableDefinitionPart in the WorksheetPart must be deleted
                        List<TablePart> tableParts = worksheetPart.Worksheet.Descendants<TablePart>().ToList();
                        foreach (TablePart tablePart in tableParts)
                        {
                            if (idPartPair.RelationshipId == tablePart.Id)
                                tablePart.Remove();
                        }
                    }
                }
            }
        }

        // If spreadsheet contains a CustomXmlMappingsPart, delete databinding
        if (spreadsheet.WorkbookPart.CustomXmlMappingsPart != null)
        {
            CustomXmlMappingsPart xmlMap = spreadsheet.WorkbookPart.CustomXmlMappingsPart;
            List<Map> maps = xmlMap.MapInfo.Elements<Map>().ToList(); // Must be a list
            foreach (Map map in maps)
            {
                if (map.DataBinding != null)
                    map.DataBinding.Remove();
            }
        }
    }
    return success;
}

Then immediately use this code to read all hyperlink relationships of the same spreadsheet. It should fail with error System.InvalidOperationException: 'Specified part does not exist in the package.'

// Extract all cell hyperlinks to an external file
public int Extract_Hyperlinks(string filepath)
{
    int hyperlinks_count = 0;

    // Read spreadsheet
    using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, false))
    {
        // Find all hyperlinks
        List<HyperlinkRelationship> hyperlinks = spreadsheet.GetAllParts().SelectMany(p => p.HyperlinkRelationships).ToList();

        // Create metadata file
        string folder = System.IO.Path.GetDirectoryName(filepath);
        using (StreamWriter w = File.AppendText($"{folder}\\orgFile_Metadata.txt"))
        {
            w.WriteLine("---");
            w.WriteLine("EXTRACTED HYPERLINKS");
            w.WriteLine("---");

            foreach (HyperlinkRelationship hyperlink in hyperlinks)
            {
                // Write information to metadata file
                w.WriteLine(hyperlink.Uri);
                // Add to count
                hyperlinks_count++;
            }
        }
    }
    return hyperlinks_count;
}

The exception will be thrown at:

// Find all hyperlinks
List<HyperlinkRelationship> hyperlinks = spreadsheet.GetAllParts().SelectMany(p => p.HyperlinkRelationships).ToList();

@Asbjoedt
Copy link
Contributor Author

Asbjoedt commented Jul 4, 2024

Hi @mikeebowen, @twsouthwick
Have you been able to look any further at the issue?
Can I help to progress issue resolution?

@mikeebowen
Copy link
Collaborator

Hi @Asbjoedt, I tested your code with the file you linked, but that file does not contain any HyperLinkRelationships, so

 List<HyperlinkRelationship> hyperlinks = spreadsheet.GetAllParts().SelectMany(p => p.HyperlinkRelationships).ToList();

produces an empty list. It does create the .txt file without error. We have had a few releases, since this issue, so can you update to the latest version of the SDK and try your code again. If it still causes an error, please link the file in question and I'll investigate more.

@Asbjoedt
Copy link
Contributor Author

Asbjoedt commented Apr 8, 2025

Hi @mikeebowen
You are right. There is no hyperlinks. Replace the hyperlink code with the example below, the error should appear on this line.

BookViews bookViews = spreadsheet.WorkbookPart.Workbook.GetFirstChild<BookViews>();

Here is the full code sample the excerpt is from.

// Make first sheet active sheet
public bool Activate_FirstSheet(string filepath)
{
    bool success = false;

    using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, true))
    {
        BookViews bookViews = spreadsheet.WorkbookPart.Workbook.GetFirstChild<BookViews>();
        WorkbookView workbookView = bookViews.GetFirstChild<WorkbookView>();
        if (workbookView.ActiveTab != null)
        {
            var activeSheetId = workbookView.ActiveTab.Value;
            if (activeSheetId > 0)
            {
                // Set value in workbook.xml to first sheet
                workbookView.ActiveTab.Value = 0;

                // Iterate all worksheets to detect if sheetview.Tabselected exists and change it
                IEnumerable<WorksheetPart> worksheets = spreadsheet.WorkbookPart.WorksheetParts;
                foreach (WorksheetPart worksheet in worksheets)
                {
                    SheetViews sheetviews = worksheet.Worksheet.SheetViews;
                    foreach (SheetView sheetview in sheetviews)
                    {
                        sheetview.TabSelected = null;
                    }
                }
                success = true;
            }
        }
    }
    return success;
}

The call stack log is this; see below. I notice there is something about malformed URI.

System.InvalidOperationException
  HResult=0x80131509
  Message=Specified part does not exist in the package.
  Source=System.IO.Packaging
  StackTrace:
   at System.IO.Packaging.Package.GetPart(Uri partUri)
   at DocumentFormat.OpenXml.Features.PackageFeatureBase.DocumentFormat.OpenXml.Packaging.IPackage.GetPart(Uri uriTarget)
   at DocumentFormat.OpenXml.Packaging.PackageUriHandlingExtensions.MalformedUriHandlingPackage.GetPart(Uri uriTarget)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPart.Load(OpenXmlPackage openXmlPackage, OpenXmlPart parent, Uri uriTarget, String id)
   at DocumentFormat.OpenXml.Packaging.PartRelationshipsFeature.LoadReferencedPartsAndRelationships()
   at DocumentFormat.OpenXml.Packaging.PartRelationshipsFeature.DocumentFormat.OpenXml.Features.IPartRelationshipsFeature.get_Count()
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.LoadAllParts()
   at DocumentFormat.OpenXml.Features.StrictNamespaceExtensions.StrictNamespaceFeature.DocumentFormat.OpenXml.Features.IStrictNamespaceFeature.get_Found()
   at DocumentFormat.OpenXml.Packaging.OpenXmlPart.LoadDomTree[T]()
   at DocumentFormat.OpenXml.Packaging.WorkbookPart.get_Workbook()
   at CLISC.Archive_Requirements.Activate_FirstSheet(String filepath) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Archive_Requirements_Change.cs:line 410
   at CLISC.Archive_Requirements.Change_XLSX_Requirements(List`1 arcReq, String filepath, Boolean fullcompliance) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Archive_Requirements_Change.cs:line 80
   at CLISC.Archive.Archive_Spreadsheets(String Results_Directory, List`1 File_List, Boolean fullcompliance) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Archive.cs:line 95
   at CLISC.Program_Real.Execute(String function, String inputdir, String outputdir, Boolean recurse, Boolean fullcompliance) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Program_Real.cs:line 57
   at CLISC.Program.Run(Program_Args Arg) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Program.cs:line 56
   at CommandLine.ParserResultExtensions.WithParsed[T](ParserResult`1 result, Action`1 action)
   at CLISC.Program.Main(String[] args) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Program.cs:line 21

I just made a branch of my code, upgraded to latest v3 and still get the error in my codebase across multiple different test files with different content. The common denominator seem to be that I have removed a "Part" using the SDK.

System.InvalidOperationException: 'Specified part does not exist in the package.'

@mikeebowen
Copy link
Collaborator

mikeebowen commented Apr 8, 2025

Hi @Asbjoedt, Thanks for updating and trying it again. Unfortunately, I copied your code, and it runs without errors for me with the .xlsx you provided. The stack trace you provided looks like it is calling several other methods besides the one provided:

C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Archive_Requirements_Change.cs:line 410
   at CLISC.Archive_Requirements.Change_XLSX_Requirements(List`1 arcReq, String filepath, Boolean fullcompliance) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Archive_Requirements_Change.cs:line 80
   at CLISC.Archive.Archive_Spreadsheets(String Results_Directory, List`1 File_List, Boolean fullcompliance) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Archive.cs:line 95
   at CLISC.Program_Real.Execute(String function, String inputdir, String outputdir, Boolean recurse, Boolean fullcompliance) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Program_Real.cs:line 57
   at CLISC.Program.Run(Program_Args Arg) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Program.cs:line 56
   at CommandLine.ParserResultExtensions.WithParsed[T](ParserResult`1 result, Action`1 action)
   at CLISC.Program.Main(String[] args) in C:\Users\asbjoedt\source\repos\Asbjoedt\CLISC\CLISC\Program.cs:line 21

Is it possible to link to the whole project? I think the issue might be somewhere outside of the methods you already gave.

@Asbjoedt
Copy link
Contributor Author

Asbjoedt commented Apr 8, 2025

Yes of course.

This is the main project running fine on v2: https://github.com/Asbjoedt/CLISC
This is a new branch running latest v3: https://github.com/Asbjoedt/CLISC/tree/oxmlsdk-v3-upgrade-new-try

I have this folder with 50 spreadsheets as sample data: https://github.com/Asbjoedt/CLISC/blob/oxmlsdk-v3-upgrade-new-try/Docs/SampleData.zip

If you dare to make a fork of the branch you may run the SampleData folder and should encounter the error based on these debug properties:

--function CountConvertCompareArchive --inputdir "C:\Users\%USERNAME%\Desktop\SampleData" --outputdir "C:\Users\%USERNAME%\Desktop" --recurse --fullcompliance

However, it runs with some dependencies. It should be able to run it with only LibreOffice installed for some command line file format conversion flows. https://github.com/Asbjoedt/CLISC?tab=readme-ov-file#dependencies

Please do not give me feedback on how shitty my code is. 😆

If you expected something different than the above or have any questions, let me know.

@mikeebowen mikeebowen assigned mikeebowen and unassigned twsouthwick Apr 9, 2025
@mikeebowen
Copy link
Collaborator

mikeebowen commented Apr 9, 2025

Hi @Asbjoedt,

Part of the problem is that you need to use the Null-conditional operators ? to prevent trying to access properties on a null object change your condition to

if (spreadsheet.WorkbookPart?.Workbook?.WorkbookProtection != null || spreadsheet.WorkbookPart?.Workbook?.FileSharing != null)

Looking in your project, you have the same issue in multiple places. I see that your are using .net 6.0, which is out of support. If you update your project to .Net 8 then these issues will show as warnings in Visual Studio. For more about these type of errors please see Resolve nullable warnings.

There are a few other issues that can be resolved by using the ?? and ??= operators - the null-coalescing operators.

I resolved the nullable reference issues in my fork of your project.

The other issue that is causing the 'Specified part does not exist in the package.' error is that when the SDK reads the rels file and tries to open a part that is listed as a relationship, but has been removed from the package. I'm still looking into that.

Could you explain more what your trying to do with this project?

@Asbjoedt
Copy link
Contributor Author

Thanks for the feedback.

That specific line always gave NullReferenceException even though I put it in try-catch. I spent a lot of time trying to fix it. In the end I turned off the NullReferenceException in debug, because the compiled program would not throw the NullReferenceException anyway.
Thank you for providing me the fix for this and giving me the documentation to read up on the fix and understand it.

Yes, it should be bumped up to .Net 8 or 9.

Let me know if you need anything more for debugging the "specified part" error.

Hmm, a couple of years ago I worked for the Danish National Archives, and the standard was to convert any spreadsheet to tiff/jpeg-2000 digital image representations to make sure the data could be read many years from now. The approach was to make the data static and remove it from as many technical dependencies as possible. The downside was that reuseability for a future archival user would be severely reduced, there was also problems with data loss, so questions could be raised regarding authenticity.

I made the project as a demonstration/prototype project to try to convince colleagues and management that the modern Office formats were mature enough to archive, if done considering the abovementioned considerations of trying to lock data down and reduce technical dependencies. To show the feasibility, I thought I had to implement the SDKs for OOXML and OpenDocument, because any conversion and data manipulation would in 99,9% (or so) of the files have to be done programmatically, and then leave some residual files for manual handling.

I got management, a few other national archives and the interest organisation Open Preservation Foundation onboard to make a validator for the OpenDocument format. You can read more about that here. Working with the OOXML file formats were not scoped out, but neither the focus, of the project, but I think with my departure, the focus became very much to just finish the OpenDocument validator.

I have completed most of the stuff I set out to prototype and I wrapped it up very recently by accepting dependence on v2 of the Open XML SDK. However, if the error could be fixed, the project could bump to latest v3. I would have some time left over to see, if there were any stuff I made along the way, that I could feed back to the SDK for example by taking a look at this one: #1209. Be aware, this is just hobby for me now and no one else has shown any interest in the project.

Let me know if any of the above makes sense to you or not. The story became longer, than I expected lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants