Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/new json #92

Open
wants to merge 344 commits into
base: master
Choose a base branch
from
Open

Feature/new json #92

wants to merge 344 commits into from

Conversation

staleyLANL
Copy link
Contributor

Might as well put this in.

An earlier branch introduced the ability to write HDF5 files in different ways, according to two boolean flags: *reduced* and *typed*.

This branch enhances our JSON capabilities in the same manner, reflecting, with JSON, an analog of the options we have for HDF5.

The above was done in respect to GNDStk's general philosophy of making its support of different file types as consistent with one another as reasonably possible.

Similarly to HDF5, the JSON class now has two static booleans: "reduced" and "typed".

The "reduced" flag tells the JSON writer whether or not to reduce -- basically, to fold into a shorter and simpler representation -- certain special constructs that GNDStk's Node uses internally in order to handle what are called cdata, pcdata, and comment nodes in XML. Basically, if reduced == true, then we make this simplification, which shortens the JSON output. If reduced == false, then the JSON will closely reflect what Node uses internally.

The "typed" flag tells the JSON writer whether or not it should use our type-guessing code to guess whether "strings" in certain contexts (in particular, metadata values and pcdata) actually contain what look like numbers. Examples: "12" looks like an int, "321 476" looks like a vector of ints, "3.14" looks like a double, and "3.14 2.72 1.41" looks like a vector of doubles. If typed == false, we use the original strings, with no type guessing. If typed == true, we apply the type guesser, and, where appropriate, get JSON numbers in place of strings in the JSON output.

This is a work in progress, with a couple of things that still need doing...

Still to do: modify the JSON *reading* code so that it recognizes the various different ways that the JSON *writing* code writes things, and can reliably reverse the writing process and recover a GNDStk Node.

Also still to do: at present, the nlohmann JSON library, which we use under the hood, doesn't provide a way to write JSON numbers (unquoted, as opposed to quoted JSON strings) beginning from an existing string representation that we might provide for the number. Consider this discussion that I started:

     nlohmann/json#3460

This is relevant to us because GNDStk provides very fine control over exactly how numbers, in particular floating-point numbers, are formatted, in terms of the number of significant digits, fixed vs. scientific form, etc. At the time of this writing, if we use typed == true, so that certain strings that look like numbers are written as numbers, the numbers will, unfortunately, be formatted by the JSON library itself. We'd like to have the capability of writing an original string (one that we've already determined looks like a number!) - but writing it as a JSON number (so, without quotes) rather than a JSON string.

In this commit, we also Modified an old Tree/ test that depended on the previous default (but still available, via appropriate flags, even though no longer the default) JSON writing behavior. And, we changed a variable name in json2class.cpp.

We refactored detail-node2hdf5.hpp (for HDF5) considerably, and then reflected this in the modified detail-node2json.hpp (for JSON) that we're primarily working on.

And, we've enhanced the JSON test code so that it tries out the new flags. This is a work in progress as well. In order to finish it, we need to finish the ability to read JSON files that were written not just in our original manner, but with any variations of the above-described flags.
Also some small tweaks and comment changes here and there.
Some tweaks to HDF5-related code.
Extended and improved some comments, to clarify things.
Additional tests for those, too.
There are still some instances of the strings in question, typically appearing amongst other text inside of string literals. At some point I'll look into fixing up those cases too, but that's not a priority right now.
Updated the HDF5 test suite to ensure that we test the modifications.
We'll update the code generator later, and ensure that it does the right thing.
More feature-packed classes Field and FieldPart.
I'm not sure why I made them be converting constructors to begin with. Whatever the reason, it seems (according to the test suite!) that they don't need to be converting constructors. And, I don't think that they should be. Indeed, some new code modifications uncovered this issue, as converting constructors for Node triggered certain ambiguities.
Code generator itself still needs updating.
…in name isn't known.

This was important for the new generated-class Field system to work well, without carrying along a std::string (expensive!) in each class field. On the other hand, such a thing wouldn't necessarily be a memory hog in a well-designed library that mostly contained vectors and such - not a huge number of individual scalar fields.
Improved some SFINAE.
Added more comments, and did a bit of rearrangement.
It now generates classes that use the new system.
Note that it doesn't yet add shortcuts to generated classes; that'll come soon.
Changed the way a map was done. Faster, and leads to better printing of shortcut information.
Some refactoring related to the above.
Miscellaneous cosmetic and clarity-related changes.
Some small printing-related changes and fixes as well.
… offers.

There's lots more to be done in the above respect; this is just a start.

Also, provided some preliminary support for another approach to customization.

The original (and still in place) "customization" system involves inserting code from custom.hpp files directly into generated classes. Inserting additional bits and pieces of code into already-existing classes is actually a somewhat goofy concept. (Imagine someone customizing, say, std::complex by adding a new piece of information. They really should derive from it, or contain it in another class. Not try to mess with the existing class.)

I'm fleshing out *one* new way, at least (with possibly others to come) for doing customizations. There's more to be done, but I'll work with a particular user right now, to see how this works.

JSON specs now allow metadata to have converters. It turns out that this will help with the new customization business. Briefly: we want an ability to insert our own fields into classes (say, to contain data that we'll compute as functions of other fields), but without those new fields playing a role in I/O from/to files. A new (and simple) "Noop" (no-op[eration]) class fills the necessary role of representing the "no conversion is taking place" for I/O on the new fields. Some new if-constexprs, elsewhere in the code, allow convert() functions to return a bool (but they can still return void, as before). If a convert() returns bool, and is used in the context of Node add() metadatum or child node, then a return value of false means, "don't actually add the new metadatum or children." This is all in support of our new customization approach, because we don't necessarily want auxiliary computed fields to be written to files. (But still want them in the multi-query, so they work with print() and such.)

Tweaked some SFINAE. This detail:: stuff really ought to be made more internally consistent, even though it's in detail::. Note: the _v and _t business is modeled after what's in std::. (Examples: enable_if_t, is_convertible_v.)

Adjusted some comments to reflect recent changes.
…terface.

Some code cleanup and minor rearrangements.

Lots more to come in the Python interface, as time allows.

The next thing will probably be access to our extensive capabilities for controlling formatting, in particular as it relates to floating-point precision.
The code generator builds C++ code, a C-language interface, and Python bindings. Prior to this change, the code that generated the C interface handled only the old "BlockData" manner of handling data nodes. That was a construct that handled the "valueType"-based "run-time defined data type" business that exists in some GNDS nodes. But our format for format specs allows for a fixed data type as well, and it's handled in a different way. For a user who has been using GNDStk's code generator to generate an entirely different format, I needed to update the generation of the C interface in order to handle this properly.
These are largely in preparation for some upcoming, more-substantive changes; ones that should somewhat improve and clarify the interface.
Some work on Component::has() and Component::operator().
Small change: class to struct in a few places.
Additional work regarding Lookup objects, Field, Component, etc.
Cleaned up some SFINAE, and removed some that was no longer needed.
A bit of formatting work in some CMakeLists.txt files, just to make them arguably more readable.
 - Clarified "auto" in some places.
 - Ditto for constness.
Minor changes in some SFINAE to more-closely resemble certain std:: constructs.

Sfome SFINAE sfimplifications.

Removed unnecessary "inline"s. The functions in question were in fact function *templates*, so that "inline" was unnecessary in order to still achieve header-only-ness.

Used "struct" instead of "class" in a few places.

Used "size_t" instead of "int" in a few places. The difference wasn't really important in these places, but I decided that size_t was correct in principle.

C Interface: fixed some previously unnoticed discrepancies that arose after an earlier change. Some function names were changed a while back: color to colors, note to notes, and warning to warnings. However, the *declarations* (for C) hadn't been updated accordingly.

Function name change: detail::getter() to detail::compGetter(). (comp, for class Component.) Too many things were called "getter." This change should make code auditing/analysis a bit easier, going forward.

Miscellaneous other changes. There are still some relevant files to be uploaded, but I need to review them more carefully first.
Split up some longer files.
The new overload and SFINAE were added after I switched from clang 9 to clang 17, and the latter reported overload ambiguities in some old code that hadn't had problems with earlier compilers. It's not clear to me that ambiguities really exist, but in any event, the new overload and the SFINAE disappears the new errors.

Casted to (void) in certain places in some test files, after noticing that a new g++ emitted warnings about the results of some function calls being unused. (We did those calls to test that they're valid and to check out-parameters, not because we intended to use the return values of the functions.)
Under 1600 lines, in comparison with over 24000.
Gives faster test-suite compilation times.
Also, our JSON has more-flexible capabilities regarding the printing of floating-point numbers, in terms of format and number of significant digits. We're not taking advantage of those capabilities quite yet, but we will, if we choose to make JSON a first-class format - as important as XML.
@staleyLANL staleyLANL requested review from whaeck and BobbiRiedel June 29, 2023 21:41
@staleyLANL staleyLANL self-assigned this Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants