-
Notifications
You must be signed in to change notification settings - Fork 557
Deletion of a Part breaks reading other Parts through "using" method #1729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Asbjoedt, could you be more specific about which parts you mean:
When you day "read any Part after deletion of some other Parts", do you mean when you read any Part at all even the WorkbookPart? And which parts do you mean by "deletion of some other Parts"? |
When I delete these parts, I get "Specified part does not exist in the package", whenever I try to read any Part again:
|
please supply a repro so that we can try it out |
Sample spreadsheet with a data connection Use this code to remove a data connection // Remove data connections
public int Remove_DataConnections(string filepath)
{
int success = 0;
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, true))
{
ConnectionsPart conn = spreadsheet.WorkbookPart.ConnectionsPart;
// Count connections
success = conn.Connections.Count();
// Delete all connections
spreadsheet.WorkbookPart.DeletePart(conn);
// Delete all QueryTableParts
IEnumerable<WorksheetPart> worksheetParts = spreadsheet.WorkbookPart.WorksheetParts;
foreach (WorksheetPart worksheetPart in worksheetParts)
{
// Delete all QueryTableParts in WorksheetParts
List<QueryTablePart> queryTables = worksheetPart.QueryTableParts.ToList(); // Must be a list
foreach (QueryTablePart queryTablePart in queryTables)
{
worksheetPart.DeletePart(queryTablePart);
}
// Delete all QueryTableParts, if they are not registered in a WorksheetPart
List<TableDefinitionPart> tableDefinitionParts = worksheetPart.TableDefinitionParts.ToList();
foreach (TableDefinitionPart tableDefinitionPart in tableDefinitionParts)
{
List<IdPartPair> idPartPairs = tableDefinitionPart.Parts.ToList();
foreach (IdPartPair idPartPair in idPartPairs)
{
if (idPartPair.OpenXmlPart.ToString() == "DocumentFormat.OpenXml.Packaging.QueryTablePart")
{
// Delete QueryTablePart
tableDefinitionPart.DeletePart(idPartPair.OpenXmlPart);
// The TableDefinitionPart must also be deleted
worksheetPart.DeletePart(tableDefinitionPart);
// And the reference to the TableDefinitionPart in the WorksheetPart must be deleted
List<TablePart> tableParts = worksheetPart.Worksheet.Descendants<TablePart>().ToList();
foreach (TablePart tablePart in tableParts)
{
if (idPartPair.RelationshipId == tablePart.Id)
tablePart.Remove();
}
}
}
}
}
// If spreadsheet contains a CustomXmlMappingsPart, delete databinding
if (spreadsheet.WorkbookPart.CustomXmlMappingsPart != null)
{
CustomXmlMappingsPart xmlMap = spreadsheet.WorkbookPart.CustomXmlMappingsPart;
List<Map> maps = xmlMap.MapInfo.Elements<Map>().ToList(); // Must be a list
foreach (Map map in maps)
{
if (map.DataBinding != null)
map.DataBinding.Remove();
}
}
}
return success;
} Then immediately use this code to read all hyperlink relationships of the same spreadsheet. It should fail with error System.InvalidOperationException: 'Specified part does not exist in the package.' // Extract all cell hyperlinks to an external file
public int Extract_Hyperlinks(string filepath)
{
int hyperlinks_count = 0;
// Read spreadsheet
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, false))
{
// Find all hyperlinks
List<HyperlinkRelationship> hyperlinks = spreadsheet.GetAllParts().SelectMany(p => p.HyperlinkRelationships).ToList();
// Create metadata file
string folder = System.IO.Path.GetDirectoryName(filepath);
using (StreamWriter w = File.AppendText($"{folder}\\orgFile_Metadata.txt"))
{
w.WriteLine("---");
w.WriteLine("EXTRACTED HYPERLINKS");
w.WriteLine("---");
foreach (HyperlinkRelationship hyperlink in hyperlinks)
{
// Write information to metadata file
w.WriteLine(hyperlink.Uri);
// Add to count
hyperlinks_count++;
}
}
}
return hyperlinks_count;
} The exception will be thrown at: // Find all hyperlinks
List<HyperlinkRelationship> hyperlinks = spreadsheet.GetAllParts().SelectMany(p => p.HyperlinkRelationships).ToList(); |
Hi @mikeebowen, @twsouthwick |
Hi @Asbjoedt, I tested your code with the file you linked, but that file does not contain any List<HyperlinkRelationship> hyperlinks = spreadsheet.GetAllParts().SelectMany(p => p.HyperlinkRelationships).ToList(); produces an empty list. It does create the .txt file without error. We have had a few releases, since this issue, so can you update to the latest version of the SDK and try your code again. If it still causes an error, please link the file in question and I'll investigate more. |
Hi @mikeebowen BookViews bookViews = spreadsheet.WorkbookPart.Workbook.GetFirstChild<BookViews>(); Here is the full code sample the excerpt is from. // Make first sheet active sheet
public bool Activate_FirstSheet(string filepath)
{
bool success = false;
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, true))
{
BookViews bookViews = spreadsheet.WorkbookPart.Workbook.GetFirstChild<BookViews>();
WorkbookView workbookView = bookViews.GetFirstChild<WorkbookView>();
if (workbookView.ActiveTab != null)
{
var activeSheetId = workbookView.ActiveTab.Value;
if (activeSheetId > 0)
{
// Set value in workbook.xml to first sheet
workbookView.ActiveTab.Value = 0;
// Iterate all worksheets to detect if sheetview.Tabselected exists and change it
IEnumerable<WorksheetPart> worksheets = spreadsheet.WorkbookPart.WorksheetParts;
foreach (WorksheetPart worksheet in worksheets)
{
SheetViews sheetviews = worksheet.Worksheet.SheetViews;
foreach (SheetView sheetview in sheetviews)
{
sheetview.TabSelected = null;
}
}
success = true;
}
}
}
return success;
} The call stack log is this; see below. I notice there is something about malformed URI.
I just made a branch of my code, upgraded to latest v3 and still get the error in my codebase across multiple different test files with different content. The common denominator seem to be that I have removed a "Part" using the SDK. System.InvalidOperationException: 'Specified part does not exist in the package.' |
Hi @Asbjoedt, Thanks for updating and trying it again. Unfortunately, I copied your code, and it runs without errors for me with the .xlsx you provided. The stack trace you provided looks like it is calling several other methods besides the one provided:
Is it possible to link to the whole project? I think the issue might be somewhere outside of the methods you already gave. |
Yes of course. This is the main project running fine on v2: https://github.com/Asbjoedt/CLISC I have this folder with 50 spreadsheets as sample data: https://github.com/Asbjoedt/CLISC/blob/oxmlsdk-v3-upgrade-new-try/Docs/SampleData.zip If you dare to make a fork of the branch you may run the SampleData folder and should encounter the error based on these debug properties:
However, it runs with some dependencies. It should be able to run it with only LibreOffice installed for some command line file format conversion flows. https://github.com/Asbjoedt/CLISC?tab=readme-ov-file#dependencies Please do not give me feedback on how shitty my code is. 😆 If you expected something different than the above or have any questions, let me know. |
Hi @Asbjoedt, Part of the problem is that you need to use the Null-conditional operators if (spreadsheet.WorkbookPart?.Workbook?.WorkbookProtection != null || spreadsheet.WorkbookPart?.Workbook?.FileSharing != null) Looking in your project, you have the same issue in multiple places. I see that your are using .net 6.0, which is out of support. If you update your project to .Net 8 then these issues will show as warnings in Visual Studio. For more about these type of errors please see Resolve nullable warnings. There are a few other issues that can be resolved by using the ?? and ??= operators - the null-coalescing operators. I resolved the nullable reference issues in my fork of your project. The other issue that is causing the 'Specified part does not exist in the package.' error is that when the SDK reads the rels file and tries to open a part that is listed as a relationship, but has been removed from the package. I'm still looking into that. Could you explain more what your trying to do with this project? |
Thanks for the feedback. That specific line always gave NullReferenceException even though I put it in try-catch. I spent a lot of time trying to fix it. In the end I turned off the NullReferenceException in debug, because the compiled program would not throw the NullReferenceException anyway. Yes, it should be bumped up to .Net 8 or 9. Let me know if you need anything more for debugging the "specified part" error. Hmm, a couple of years ago I worked for the Danish National Archives, and the standard was to convert any spreadsheet to tiff/jpeg-2000 digital image representations to make sure the data could be read many years from now. The approach was to make the data static and remove it from as many technical dependencies as possible. The downside was that reuseability for a future archival user would be severely reduced, there was also problems with data loss, so questions could be raised regarding authenticity. I made the project as a demonstration/prototype project to try to convince colleagues and management that the modern Office formats were mature enough to archive, if done considering the abovementioned considerations of trying to lock data down and reduce technical dependencies. To show the feasibility, I thought I had to implement the SDKs for OOXML and OpenDocument, because any conversion and data manipulation would in 99,9% (or so) of the files have to be done programmatically, and then leave some residual files for manual handling. I got management, a few other national archives and the interest organisation Open Preservation Foundation onboard to make a validator for the OpenDocument format. You can read more about that here. Working with the OOXML file formats were not scoped out, but neither the focus, of the project, but I think with my departure, the focus became very much to just finish the OpenDocument validator. I have completed most of the stuff I set out to prototype and I wrapped it up very recently by accepting dependence on v2 of the Open XML SDK. However, if the error could be fixed, the project could bump to latest v3. I would have some time left over to see, if there were any stuff I made along the way, that I could feed back to the SDK for example by taking a look at this one: #1209. Be aware, this is just hobby for me now and no one else has shown any interest in the project. Let me know if any of the above makes sense to you or not. The story became longer, than I expected lol. |
*Describe the bug
Hello
I am upgrading from v2.20 to v.3.0.2.
The new SDK update breaks code which involves deleting/removing some Open XML Parts in my spreadsheet document.
I am applying "using" to read and write the spreadsheet document. I receive error when trying to read any Part after deletion of some other Parts, when I try to read the Part in a new "using" scope. I receive error "Specified part does not exist in the package".
Opening the spreadsheet document wit Excel or LibreOffice works perfectly fine.
Observed behavior
Open XML SDK perceives the spreadsheet document to be broken with error "Specified part does not exist in the package" whenever I try to read any Part, if some Parts have previously been deleted/removed with the "using" method. Excel renders the spreadsheet document without errors.
Expected behavior
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: