Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #57 - Recover responsiveness to stdin and upgrade strOrFile #140

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

wbkboyer
Copy link
Collaborator

@wbkboyer wbkboyer commented Feb 10, 2025

Resolves #57

Added

  • c/samples/nauty_example.LEDA - handwritten LEDA format translation of the nauty example graph from section 20 (p.82) of the manual, which was used to manually test _ReadLEDAGraph()

Updated

Changes to the strOrFile container and supporting sf_[a-zA-Z]+() functions

c/graphLib/io/strOrFile.h
  • Updated strOrFile struct to include strBufP for theStr, which eliminates the need for theStrPos. Also adds containerType, which allows us to swiftly check whether the container is an INPUT_CONTAINER or OUTPUT_CONTAINER when validating whether a particular operation is applicable to the container type. Finally, the ungetBuf, a stackP, was added to manage unget operations on INPUT_CONTAINERs
  • Updated the declaration of sf_New() to indicate it now accepts a string for the filename to open, if applicable, and a string representing the ioMode, in addition to the optional theStr
  • Added declarations for helper functions required by the _Read[a-zA-Z]+() and _Write[a-zA-Z]+() functions
c/graphLib/io/strOrFile.c
  • sf_New() - If the strOrFile container is meant to contain a FILE *, it now handles opening the file corresponding to fileName with the correct ioMode (either READTEXT or WRITETEXT) and assigns to pFile. This logic also handles when the user specifies some input/output stream (i.e. stdin, stdout, stderr). Otherwise, if the container is meant to hold a string, it initializes a strBufP and appends the initial theStr (as a follow-up to this question, if theStr is not NULL and the containerType is OUTPUT_CONTAINER, then we error out). Regardless of the container type, an ungetBuf stackP is allocated, although only in the case of INPUT_CONTAINERs do we interact with it: it is used to contain characters that are ungotten (either by sf_ungetc() or sf_ungets()) and is consumed first when we sf_getc() or sf_fgets() (in this question, I asked whether the ungetBuf should only be allocated for INPUT_CONTAINER (i.e. when ioMode is READTEXT). Upon further reflection, even if it's not being used, we should allocate the ungetBuf so that there's no illegal memory access issues).
  • sf_getc() - if the ungetBuf stackP is nonempty, sp_Pop()s a char; otherwise, getc() from the FILE * or get the character at the current ReadPos of theStr (a strBufP).
  • sf_ReadSkipChar(), sf_ReadSkipWhitespace(), sf_ReadSingleDigit(), sf_ReadInteger(), and sf_ReadSkipInteger() all use sf_getc() to advance through the strOrFile container
  • sf_ReadSkipLineRemainder() uses sf_fgets() to advance through the file container and consume up to and including the next \n
  • sf_ungetc() -ensures the container is a valid INPUT_CONTAINER before sp_Push()ing theChar onto the ungetBuf stackP
  • sf_ungets() - ensures the container is a valid INPUT_CONTAINER before sp_Push()ing each character of strToUnget onto the ungetBuf stackP in reverse order so that they can be fetched from the ungetBuf in the order of the original string.
  • sf_fgets() - ensures the container is a valid INPUT_CONTAINER before it obtains as many characters as possible from the ungetBuf before switching over to the FILE * or strBufP in the strOrFile container (if no \n encountered and there's still characters desired by the caller)
  • sf_fputs() - ensures the container is a valid OUTPUT_CONTAINER before it calls fputs() if the container has a FILE * or sb_ConcatStr() if it contains theStr.
  • sf_takeTheStr - now uses sb_TakeString(), since theStr is a strBufP
  • sf_closeFile() - if the strOrFile contains a FILE *, it's only appropriate to sp_Free() the ungetBuf, since it's no longer valid.
  • sf_Free() - Now calls sb_Free() on theStr (a strBufP) rather than free() (since theStr used to correspond to a char *), and frees the ungetBuf stackP using sp_Free() if it has not yet been freed.

Changes to strOrFile dependencies

  • c/graphLib/io/strbuf.h - added new macro to sb_GetUnreadCharCount(): we only want to see how many characters have not yet been read (e.g. used to ensure there's at least one character available to read in sf_getc() and sf_fgets())
  • c/graphLib/lowLevelUtils/apiutils.h - added MAXCHARSFOR32BITINT so that, for example, sf_ReadInteger() can allocate sufficient memory for the intCandidateStr. Other functions needing to represent 32-bit integers as strings should use this same constant.

Using the strOrFile container for input and output

The original intent of this refactor was to restore being able to read from stdin, which was lost when we changed how we interrogated the input files (reading an entire line from file vs. reading a single character). As an added bonus of this refactor, the logic of gp_Read(FromString)?() and gp_Write(ToString)?() functions has been simplified and unified in the respective _(Read|Write)Graph() functions and the logic of the _(Read|Write)[a-zA-Z]+() functions no longer require branching logic to handle whether or not the (input|output)Str or (in|out)fileName was NULL.

c/graphLib/io/graphIO.c
  • gp_Read() - Now creates a strOrFileP to handle opening the file corresponding to FileName for READTEXT and calls _ReadGraph()
  • gp_ReadFromString() - Now creates a strOrFileP to contain the inputStr and calls _ReadGraph()
  • _ReadGraph() - Applies the relevant sf_[a-zA-Z]+() functions to interrogate the contents of the inputContainer to determine the input filetype and to call the corresponding _Read[a-zA-Z]+(). If extra data is allowed for the input type (i.e. only after successfully reading Adjacency List or Adjacency Matrix), create a strBufP to contain the extraData, which is read from the inputContainer line-by-line using sf_fgets() and concatenated using sb_ConcatString().
  • gp_Write() - Now creates a strOrFileP to handle opening the file corresponding to FileName for WRITETEXT and calls _WriteGraph()
  • gp_WriteToString() - Now creates a strOrFileP with theStr initially empty to send to _WriteGraph() alongside the pOutputStr; if writing to the outputContainer was successful AND if the outputContainer has not yet been cleaned up (as is the case for writing Adjacency List and Adjacency Matrix), then sf_takeTheStr() and assign the pointer to (*pOutputStr)
  • _WriteGraph() - Based on the Mode, determines which _Write[a-zA-Z]+() function should be called and passes in the (dereferenced) outputContainer. If we're writing to .g6, then _WriteGraphToG6StrOrFile() will have already taken the string from the outputContainer and assigned it to pOutputStr, so we null out the reference to outputContainer to prevent gp_WriteToString() from trying to take the string again from the outputContainer, or to double-free the outputContainer. Then, if extra data is allowed to be written (i.e. assuming we have run gp_Embed() on theGraph with a graph algorithm extension attached for which fpWritePostprocess() has been overridden, e.g. _DrawPlanar_WritePostprocess()), generates the extra data as a char * and uses sf_fputs() to write it to the outputContainer.
  • Function signatures for all of _ReadAdjMatrix(), _ReadAdjList(), _ReadLEDAGraph(), _WriteAdjList(), _WriteAdjMatrix(), and _WriteDebugInfo() all updated so that they accept only the input/output strOrFile container and perform the relevant sf_[a-zA-Z]+() functions for read/write operations.
c/graphLib/io/g6-read-iterator.c/.h
  • Deprecated beginG6ReadIterationFromG6FilePath(), beginG6ReadIterationFromG6FilePointer(), and beginG6ReadIterationFromG6String() in favour of the new beginG6ReadIterationFromG6StrOrFile()
  • Deprecated _ReadGraphFromG6FilePointer() and updated _ReadGraphFromG6FilePath() and _ReadGraphFromG6String() helper functions to call the new _ReadGraphFromG6StrOrFile() helper function.
  • The new beginG6ReadIterationFromG6StrOrFile() function validates the g6InputContainer and assigns it to the pG6ReadIterator before calling _beginG6ReadIteration()
  • _beginG6ReadIteration() now initializes charConfirmation to EOF (since value of EOF is implementation-dependent, although it's commonly set to -1)
  • endG6ReadIteration() now no longer "knows" whether the g6Input strOrFileP holds a string or a file, so remove the explicit call to sf_closeFile() and allow sf_Free() to clean that up.
  • _ReadGraphFromG6FilePath() now creates an inputContainer to handle opening the file corresponding to the pathToG6File before calling _ReadGraphFromG6StrOrFile()
  • _ReadGraphFromG6String() now creates an inputContainer to handle reading the graph from the g6EncodedString before calling _ReadGraphFromG6StrOrFile()
  • The new _ReadGraphFromG6StrOrFile() helper function accepts the g6InputContainer, ensures it's valid, and then allocates the G6ReadIterator before calling beginG6ReadIterationFromG6StrOrFile() and proceeding with the remaining flow (i.e. readGraphUsingG6ReadIterator(), endG6ReadIteration(), and freeG6ReadIterator())
c/graphLib/io/g6-write-iterator.c/.h
  • Deprecated beginG6WriteIterationToG6FilePath(), beginG6WriteIterationToG6FilePointer(), and beginG6WriteIterationToG6String() in favour of the new beginG6WriteIterationToG6StrOrFile()
  • Deprecated _WriteGraphToG6FilePointer() and updated _WriteGraphToG6FilePath() and _WriteGraphToG6String() helper functions to call the new _WriteGraphToG6StrOrFile() helper function.
  • The new beginG6WriteIterationToG6StrOrFile() function validates the outputContainer and assigns it to the pG6WriteIterator before calling _beginG6WriteIteration()
  • endG6WriteIteration() now no longer "knows" whether the g6Output strOrFileP holds a string or a file, so remove the explicit call to sf_closeFile() and allow sf_Free() to clean that up.
    • N.B. If the g6Output contains a string, it's expected that you must have sf_TakeTheString() before ending write iteration, otherwise theStr gets cleaned up.
  • _WriteGraphToG6FilePath() now creates an outputContainer to handle opening the file corresponding to the g6OutputFilename before calling _WriteGraphToG6StrOrFile() (which now owns that memory)
  • _WriteGraphToG6String() now creates an outputContainer to pass into _WriteGraphToG6StrOrFile() (owns that memory) alongside the pointer-pointer g6OutputStr
  • The new _WriteGraphToG6StrOrFile() helper function accepts the outputContainer, ensures it's valid, and then allocates the G6WriteIterator before calling beginG6WriteIterationToG6StrOrFile() and proceeding with the remaining flow (i.e. writeGraphUsingG6WriteIterator(), taking the string from the outputContainer if outputStr is not NULL, endG6WriteIteration(), and freeG6WriteIterator())
c/planarityApp/planarityRandomGraphs.c

This refactor was primarily due to the fact that beginG6WriteIterationToG6FilePath() was deprecated. So then the two branches that allow us to write to .g6 had to be updated:

  • If RandomGraphs() is called with non-NULL outfileName, it must have been called by callRandomGraphs() where the user specified the optional O parameter via the command-line. The outputContainer is initialized with this outfileName to handle opening the file object for this path for the command-line-driven branch.
  • If outfileName is NULL, then we check to see if RandomGraph() was called via the menu-driven system after choosing to Reconfigure() with r ("Randomly generate graphs"), y ("Do you want original graphs in directory 'random'?"), and g ("Do you want to output generated graphs to ... G6 (all)?"). If so, then we construct theFileName based on the order of the graphs and the number to be generated and initialize the outputContainer with theFileName.

Afterwards, if the G6WriteIterator and outputContainer have been allocated, then we beginG6WriteIterationToG6StrOrFile() and will writeGraphUsingG6WriteIterator() during the random graph generation loop.

c/planarityApp/planarityTestAllGraphs.c
  • In TestAllGraphs(), we create the inputContainer from the infileName and allow the strOrFile machinery to handle opening the file and creating the file pointer. Then, this is passed to testAllGraphs(), which sets up g6 read iteration from this strOrFileP using beginG6ReadIterationFromG6StrOrFile(). This means that one may pipe .g6-encoded graphs in from stdin to test:
    % ./Release/planarity -t -p stdin n8.m15.g6.stdin_test.t.p.out.txt < ./TestSupport/results/graph_generation_orchestrator/8/n8.m15.g6 
    Start testing all graphs in "stdin".
    
    Done testing all graphs (0.006 seconds).
    
    % cat n8.m15.g6.stdin_test.t.p.out.txt
    FILENAME="stdin" DURATION="0.006"
    -p 1557 631 926 SUCCESS
    
  • In outputTestAllGraphsResults(), we create the testOutput strOrFileP to simplify handling whether we're outputting to a file with path outfileName or to outputStr. Once the headerStr and resultsStr are written to whatever output via sf_fputs(), only if outputStr was non-NULL do we try to sf_takeTheString() from the testOutput and assign that pointer to the memory to which outputStr points. This means that one may still output the results of TestAllGraphs() to file or to string.

Changing the type of extraData

@john-boyer-phd explains the original motivation behind extraData having void * type and the choice to move to using char * in this comment:

When first designed, the thought was to let people do any binary thing they needed to do in extensions, but it eventually became apparent that extensions would only work together if we treated the extra data as a string, so all the code ended up relying on string-ness, so best to clean this up and make it be a string instead of a binary blob.

  • c/graphLib/extensionSystem/graphFunctionTable.h - updated graphFunctionTable struct's function pointers' signatures for fpReadPostprocess() and fpWritePostprocess() to correspond to extraData being a null-terminated char * (i.e. extraDataLength is no longer required)
  • c/graphLib/graphUtils.c - updated default graph algorithm extension functions _ReadPostprocess() and _WritePostprocess() function signatures to correspond to extraData being a null-terminated char *
  • c/graphLib/planarityRelated/graphDrawPlanar_Extensions.c
    • updated declarations and definitions of function overrides for _ReadPostprocess() (i.e. _DrawPlanar_ReadPostprocess()) and _WritePostprocess() (i.e. _DrawPlanar_WritePostprocess()) to correspond to extraData being a null-terminated char *
    • In _DrawPlanar_ReadPostprocess(), we can now take strlen(extraData), since it must be a null-terminated char *, and no longer need to rely on supplementary variable extraDataSize. No longer need to cast extraData as a char * to advance the pointer beyond the textual <DrawPlanar> start tag and re-cast to void * (extraData's original type), since we are guaranteed that extraData is a char *
    • In _DrawPlanar_WritePostprocess(), the extraData is now calloc'ed instead of malloc'ed to guarantee the memory allocated is zeroed out (since \0 is a control character with value zero, this means any string we construct is guaranteed to be null-terminated)

Resolving MSVC compiler warnings

  • c/planarityApp/planarityMenu.c and c/planarityApp/planarityUtils.c - since tolower() returns an int rather than a char, explicitly cast the result to char
  • c/planarityApp/planarityRandomGraphs.c - in RandomGraphs(), explicitly assign outputContainer NULL when declared
  • c/graphLib/io/g6-read-iterator.c - Explicitly cast result of strlen() to int (from size_t) in comparisons, and cast firstChar and graphChar to char when we sf_ungetc()
  • c/graphLib/io/g6-write-iterator.c - explicitly cast conversion of graphOrder to an ASCII character

…etBuf, and consuming this content with existing functions
…, which is now called in gp_Read() with the inputContainer strOrFileP (instead of original call to _ReadGraphFromG6FilePointer())
…sful. Additionally, manual tests redirecting input file contents to planarity executable with stdin as "filename" works.
…graph transformation to .g6 of nauty_example.g6.0-based.AdjList.out.txt
…s already NULL when we tried to take the string after _WriteGraph() in gp_WriteToString() with Mode being WRITE_G6, which had already been taken and assigned in _WriteGraphToG6StrOrFile()
…pChar() so that they return OK/NOTOK, and so that they also check that theStrOrFile has containerType of INPUT_CONTAINER, as well as added checks to sf_getc(), sf_ReadSingleDigit(), sf_ReadInteger(), sf_ungetc(), sf_ungets(), and sf_fgets() that they are operating over INPUT_CONTAINER and that sf_fputs() is operating over an OUTPUT_CONTAINER.
…an void * requiring extraDataSize to ensure we don't overrun allocated memory). The only instance is DrawPlanar; writing extra data was tested using

planarity -s -d nauty_example.g6 nauty_example.g6.s.d.WriteExtraData.out.txt

and reading the extra data was tested using
planarity -s -d nauty_example.g6.s.d.WriteExtraData.out.txt nauty_example.g6.s.d.ReadExtraData.out.txt
@wbkboyer wbkboyer changed the title Issue #57 - Issue #57 - Recover responsiveness to stdin and upgrade strOrFile Feb 10, 2025
…dereference and assign the string taken from the output container
…t a non-NULL theStr was provided. This is because we don't want to have an incongruety between what would happen if the container had a file vs. a string. We *could* introduce some logic to write theStr to the file to sync-up with sb_ConcatString(strBufToAssign, theStr), but this seems messy and unnecessary.
Copy link
Collaborator

@john-boyer-phd john-boyer-phd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most comments are just going ga-ga over how great this change set is going to be. A few minor tweaky tweakies mixed in.

int intCandidateIndex = 0;
while ((currChar = sf_getc(theStrOrFile)) != EOF)
{
if (intCandidateIndex > (MAXCHARSFOR32BITINT - 2))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means you don't read integers at or below -1 billion, and that integers up to 10 billion mine 1 are allowed. Need a little work on the bounds checking. Generally, MAXCHARSFOR32BITINT would best be set to 11 and change the comment to exclude counting for the null terminator. Then, in the declaration and memset, use MAXCHARSFOR32BITINT+1 so that the code self-documents your intent to allocate and clear space for the null terminator o a string (bcz a string is separate from the max chars needed to hold a 32 bit int). Then, at this if stmt, it can be a little clearer to instead use >= bcz intCandidateIndex >= MAXCHARSFOR32BITINT more clearly says what boundary condition you're hitting. Last but not least, I think you need to catch this on the byte before (or two before if the number has no negative) and see whether the string would blow the 32-bit value space if the last character is added.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I use %d in the sscanf() in sf_ReadInteger(), I believe it will try to still read the contents of intCandidateStr as a signed integer and will produce an incorrect result; I was thinking that the updated comment in apiutils.h for MAXCHARSFOR32BITINT should be:

// N.B. Every time you're trying to read a 32-bit int from a string,
// you should only need to read this many characters: an optional '-',
// followed by 10 digits (max signed int value is 2,147,483,647). One
// must always allocate an additional byte for the null-terminator!

Does that work? 🎉

…t strncmp() was being made to ioMode. Verified the fix to output to stdout when called from command-line works for SpecificGraph(), RandomGraph(), TransformGraph(), and TestAllGraphs()
…pecify stdin (which is not allowed for any menu-driven access).
Copy link
Collaborator

@john-boyer-phd john-boyer-phd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great bug fixes on stdout! Just a few tweaks on the tweaks.

* Fixed len arg of memset() in sf_ReadInteger() to fully clear out intCandidateStr.
* Updated exitCode logic in g6-read-iterator.c's _ReadGraphFromG6StrOrFile() to match g6-write-iterator's _WriteGraphToG6StrOrFile()
Copy link
Collaborator

@john-boyer-phd john-boyer-phd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small tweak in latest changes, plus we still have to discuss the ReadInteger() edge cases, which we can do on tomorrow's regular zoom.

…mG6StrOrFile(), and also made sure to endG6WriteIteration() regardless of whether the exitCode was OK in _WriteGraphToG6StrOrFile().
… in _WriteAdjList() and _WriteAdjMatrix() so that it refers to the MAXCHARSFOR32BITINT.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EPIC - Recover responsiveness to stdin and upgrade strOrFile
2 participants