Skip to content

ChangeDocumentType (.xlsm) to (.xlsx) not removing /xl/vbaProject.bin #1209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Asbjoedt opened this issue Sep 24, 2022 · 13 comments
Open

ChangeDocumentType (.xlsm) to (.xlsx) not removing /xl/vbaProject.bin #1209

Asbjoedt opened this issue Sep 24, 2022 · 13 comments

Comments

@Asbjoedt
Copy link
Contributor

Asbjoedt commented Sep 24, 2022

Describe the bug
Using ChangeDocumentType from MacroEnabledWorkbook (.xlsm) to Workbook (.xlsx) results in the error in Excel and spreadsheet cannot be opened (loosely translated from another language):
"The files does not have any macros, but it contains macro-based content"

When looking at the XML of the new .xlsx spreadsheet, the spreadsheet contains the file /xl/vbaProject.bin, which I think it should not. When removing this file, Excel will no longer give error preventing the user from opening the spreadsheet.

I use this code to change document type:

public void Convert_to_OOXML_Transitional(string input_filepath, string output_filepath)
{
    byte[] byteArray = File.ReadAllBytes(input_filepath);
    using (MemoryStream stream = new MemoryStream())
    {
        stream.Write(byteArray, 0, (int)byteArray.Length);
        using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(stream, true))
        {
            spreadsheet.ChangeDocumentType(SpreadsheetDocumentType.Workbook);
        }
        File.WriteAllBytes(output_filepath, stream.ToArray());
    }
}

I can fix the problem with the following code:

public void Remove_VBA(string filepath)
{
    using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, true))
    {
        VbaProjectPart vba = spreadsheet.WorkbookPart.VbaProjectPart;
        if (vba != null)
        {
            spreadsheet.WorkbookPart.DeletePart(vba);
        }
    }
}

The problem can also relate to MacroEnabledTemplate (.xltm) spreadsheets but I have not tested this.

Even though I can fix the problem in my code, I think this should be fixed in Open XML SDK.

Screenshots
If applicable, add screenshots to help explain your problem.

To Reproduce

  1. Run the ChangeDocumentType code on my linked data sample
  2. Run the fix to see if it works

Observed behavior
VBA project file /xl/vbaProject.bin is not removed from spreadsheet when using ChangeDocumentType from .xlsm to .xlsx. The problem could also relate to MacroEnabled Temaplte (.xltm), but I have not checked this.

Expected behavior
It should be removed, because Excel gives error when trying to open spreadsheet that has been changed from .xlsm to .xlsx. When removing /xl/vbaProject.bin Excel does no longer give error.

Desktop (please complete the following information):

  • OS: Windows 11
  • Office version 2208
  • .NET Target: .NET Core 6.0.9
  • DocumentFormat.OpenXml Version: 2.18.0

Additional context
I refer to data sample (.xlsm) that has this error, if you try to ChangeDocumentType.
https://github.com/Asbjoedt/CLISC/blob/master/Docs/Example.xlsm

@Asbjoedt
Copy link
Contributor Author

UPDATE
I have found another .xlsm file resulting in this error, but it does not contain /xl/vbaProject.bin.

This spreadsheet contains content, which Excel informs the user when using Excel to save as .xlsx is: "Excel 4.0 function is saved in defined names" and asks to click OK to proceed with removing them.

I have looked at the XML and workbook.xml contains two defined names containing "GET.CELL", which by doing an internet search confirms is an old Excel 4.0 function.

This can be removed by using this code

using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(output_filepath, true))
{
    if (spreadsheet.WorkbookPart.Workbook.DefinedNames != null)
    {
        var definednames = spreadsheet.WorkbookPart.Workbook.DefinedNames.ToList();
        foreach (DocumentFormat.OpenXml.Spreadsheet.DefinedName definedname in definednames)
        {
            if (definedname.InnerXml.Contains("GET.CELL"))
            {
                definedname.Remove();
            }
        }
    }
}

This should also be fixed in Open XML SDK and maybe there is more defined names from Excel 4.0 etc. out there which must be removed? Removing all defined names is not an approach to catch all these possibilities.

I refer to data sample here:
https://github.com/Asbjoedt/CLISC/blob/master/Docs/Example2.xlsm

@tomjebo
Copy link
Collaborator

tomjebo commented Sep 26, 2022

@Asbjoedt Thanks for reporting this. I'll take a look at your repro and see what's happening.

@tomjebo tomjebo self-assigned this Sep 26, 2022
@tomjebo tomjebo added the bug label Sep 26, 2022
@tomjebo
Copy link
Collaborator

tomjebo commented Sep 27, 2022

@Asbjoedt So I see where we might be able to fix the VBA project part issue. But the Excel 4.0 cell/formula content might be a little trickier as it means analyzing the content of cells but I'm guessing we have a list of what formulas are valid for macro-free worksheets.

@Asbjoedt
Copy link
Contributor Author

Thanks for looking into this.

@tomjebo
Copy link
Collaborator

tomjebo commented Sep 29, 2022

@Asbjoedt thinking about the second case (i.e. GET.CELL) I would prefer the SDK not do this kind of processing. There may be many other situations like this but we typically don't do much validation or scanning of content (in this case definedName) values. I'm not sure why Excel only balks at this after converting from macro to xlsx but this seems like something that is part of the application (consumer/producer) responsibility to ensure. Others may have opinions on this for the SDK so I'll leave this open for them to comment.

@Asbjoedt
Copy link
Contributor Author

Hi @tomjebo
I can see your point.

However, I think it would be great with a Repair() method to Open XML SDK to enable a program to fix these issues, much like Excel has a repair function when manually opening spreadsheets with invalid content.

Then we can apply these to a spreadsheet, you want to normalize to .xlsx:

  • ChangeDocumentType()
  • RepairDocument()
  • SDK Validator

Then put spreadsheet to longterm storage (archive).

To give another example of this, I can scan and remove any RTD (RealTimeData) formulas from a spreadsheet, but RTD formulas also create VolatileDependenciesPart (only RTD formulas create this part), which I then have to remove by coding DeletePart into my Remove_RTD() function, otherwise Excel will give messagebox saying this part exists and it should not and then Excel can remove it for you, if you allow. The repair function in Excel would be great to have for Open XML SDK, if certain invalid or unnecessary parts exist (or necessary parts don't exist). A method for this can be a work in progress thing, that grows better over time.

I don't have enough experience to give you a PR on a Repair() method, if you think this is a good idea.

@twsouthwick
Copy link
Member

@Asbjoedt kind of late to the party here, but I'm going to take a slightly different tack on how we can deal with this. A few years back, I added an extensibility point based on ASP.NET Core features that allow you to customize behavior.

What I propose is the following:

  1. We add an interface to handle changing document type such as the following:
public interface IChangeDocumentType
{
  void ChangeDocument(SpreadsheetDocumentType type);
}
  1. We provide a basic implementation that is low cost, but doesn't necessarily handle all cases
  2. In your code, you can implement whatever strategy you want to handle changes. You'll be able to replace the built-in feature by package.Features.Set<IChangeDocumentType>(new MyImplementation())
  3. After you've tested the behavior and have a list of things that need to be addressed, we may consider bringing that into the default implementation

@Asbjoedt
Copy link
Contributor Author

Asbjoedt commented Nov 16, 2022

@twsouthwick. Hi, I think that would be a good approach.

My only comment would be to add parameter for changing conformance. It would be somthing like:

public interface IChangeDocumentType
{
  void ChangeDocument(SpreadsheetDocumentType type, ConformanceClass enum);
}

Reference to ConformanceClass enum options

Excel can currently only save .xlsx as Strict conformance and not .xlsm, .xltm, .xlam and .xltx. Whether your MVP is to only support changing conformance for .xlsx to and from Strict/Transitional, that would be enough for our specific needs.

@twsouthwick
Copy link
Member

twsouthwick commented Nov 22, 2022

After thinking a bit more, I'd like this to be a more general purpose API and keep it separate (at least from an API perspective) from conformance level.

At a high level, here's the API I'm thinking we should expose:

public readonly struct DocumentType
{
  public string Extension { get; }
}

public interface IDocumentTypeFeature
{
  public DocumentType DocumentType { get; set; }
}

public class OpenXmlPackage
{
  public DocumentType DocumentType => Features.GetRequired<IDocumentTypeFeature>().DocumentType;
}

We would then need some convenienve methods to convert between the generic DocumentType and the excel/powerpoint/word specific types.

NOTE: This is just the design for what the public API should probably look like. Once we're comfortable with that, the actual implementation can be done.

@twsouthwick
Copy link
Member

@Asbjoedt I went with just the typed document type for now. I've enabled that to be overridden with #1263. Can you take a stab at seeing if you can implement it in a way that would solve your problem?

The conformance part is a separate (but related) issue. Let's start with making ChangeDocumentType more robust

@Asbjoedt
Copy link
Contributor Author

@twsouthwick yes I will take a stab.

Can you push this one into the next release, as soon as possible.

I will have follow up questions on how to do my own implementation.

@AlfredHellstern
Copy link
Collaborator

@Asbjoedt could you please confirm the fix from #1263? And above you mentioned you would work on implementation in a way that would solve your problem? Has that happened?

@Asbjoedt
Copy link
Contributor Author

Asbjoedt commented Apr 7, 2025

Hi @AlfredHellstern
Thanks for replying.
I cannot use any of the v3 versions of the Open XML SDK, due to this issue: #1729
Therefore I am no longer pursuing any developments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants