Skip to content

CSHARP-3985: Support multiple SerializerRegistries #1592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 53 commits into
base: main
Choose a base branch
from

Conversation

papafe
Copy link
Contributor

@papafe papafe commented Jan 13, 2025

/// <summary>
/// //TODO
/// </summary>
public interface IBsonSerializationDomain
Copy link
Member

@sanych-sun sanych-sun Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need such super-interface? It has everything in it: configuration, resolving and even reading/writing. Is there any use case when one need all of this together? I would prefer to use 3 separate interfaces: IBsonSerializationBuilder (or similar, to contain everything related to configuration, registration) - when one done with configuration - can call Build method and get IBsonSerializerRegistry (+ we might want to make IBsonSerializerRegistry as readonly for lookup only) and finally there is IBsonSerializer (to contain Serialize-Deserialize methods). So we will separate all use-cases: for application bootstrap/configuration - one have to use IBsonSerializationBuilder, in run-time to resolve the serializer - IBsonSerializerRegistry should be used, and finally to read/write - we have IBsonSerializer itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, having IBsonSerializationBuilder is very well aligned with idea of having MongoClientBuilder to create MongoClients. Serialization builder could be part of the bigger MongoClientBuilder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main problem here is that we're trying to move away from the static super-class that is BsonSerializer without creating a breaking change. At the moment IBsonSerialisationDomain is an interface with everything that BsonSerialiser does, so that we can create an instance class that we can inject where we need. At a later time (possibly we can break this ticket in multiple PRs) this would allow developers to define custom serialization configuration for a certain mongo client/database/collection.
And I agree that this interface does a lot, but so did BsonSerializer and unfortunately it's difficult to separate everything from each other without breaking what's already there, given that almost everything is in the public API and we can't change the behaviour before a new major. For example, one of the issue we have is that the code to work with discriminators is not part of IBsonSerializerRegistry, but it's all over the place.
I feel that moving away from a static class would already be a positive change for the future, and that changing the way we do serialisation is definitely something we should do, but that would require changing the public API and a new major version.
What do you think?
Also @rstam tagging you here so you can follow the conversation 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand that we are limited by not allowing to have a breaking changes, but if we decide to introduce a new interface(s), let's make them useful for initiatives that will follow soon. Let's discuss pros and cons of each idea with whole team.

@rstam rstam self-requested a review January 14, 2025 17:11
/// <summary>
/// //TODO
/// </summary>
IBsonCoreSerializerConfiguration SerializationConfiguration { get; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this property doesn't belong in this interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea would be that the IBsonCoreSerializer interface would contain everything that can be used to do the serialization and this also includes the configuration necessary for that.

@papafe
Copy link
Contributor Author

papafe commented Jan 30, 2025

This is still a super quick and dirty proof of concept, just to verify we can actually create a custom domain and move it along down the call stack. Of course this broke down several things, and works only in certain cases, but it's a starting point.
The main idea here is:

  • The serialization domain (IBsonSerializationDomain) can be setup at the client/database/collection level
  • The serialization domain is also added to the serialization context (BsonSerializationContext/BsonDeserializationContext). This way it is passed down wherever it is needed
  • In order to pass the serialization domain between the collection level and the BsonSerializationContext, the serialization domain is passed down in the MessageEncoderSettings,
  • From MessageEncoderSettings, the domain is then added to the BsonBinaryWriterSettings in the MessageBinaryEncoderBase.
  • Finally, the writer settings are read in the serialization context constructor.

(This works similar for `BsonSerializationContext)

I'm not trying to say that this is the perfect way, but I just tried to come up with the "most direct" way to go from the collection to the serialization contexts, so it's mostly a proof of concept.

Also, I still did not touch the conventions, the discriminators and the class maps, that are also statics and need to be part of the serialization domain.

@papafe papafe requested review from rstam and sanych-sun January 31, 2025 11:14
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review every single file. I tried to pick the most relevant ones.

Overall this looks like the right direction.

But... it's going to be a pain to finish and to review.

@papafe papafe marked this pull request as ready for review May 26, 2025 14:57
@papafe papafe requested a review from a team as a code owner May 26, 2025 14:57
@papafe papafe marked this pull request as draft May 26, 2025 14:58
@papafe papafe removed the request for review from a team May 26, 2025 15:04
@papafe
Copy link
Contributor Author

papafe commented Jul 3, 2025

Now the PR is open for reviews. There are still a couple of open questions, but given the size of this I thought it would be better to start getting feedback the earliest possible. The next section will be both a summary and partially a guide on how to read the PR itself.

The aim of this PR is to move away from the global serialization settings we use right now (BsonSerializer, BsonClassMap, ....) and move towards a more flexible system that uses instance serialization settings.
This should allow, among the rest:

  • To have different serialization settings depending on the client/database/collection
  • To improve the testability of serialization/deserialization, as we'll move away from global state
    As a first step in that direction, this PR tries to remove all the use of the global state internally, without changes for the developers using the API. We decided not to make the changes public as we would like to take this occasion for a revamp/simplification/reorganisation of the serialization API.

The main idea behind the PR is to create a serialization domain, represented by the IBsonsSerializationDomain interface, and to find a way to pass this interface around so that all the places in our code in which we refer to the global state, we use instead this instantiated domain.
IBsonSerializationDomain is an interface that contains all the global state that was contained in static classes, namely: BsonSerializer, BsonClassMap, ConventionRegistry, BsonDefaults. The methods and properties of BsonSerializer are directly represented in the interface, while for the other static classes, those are included as properties in IBsonSerializationDomain.
One thing to note is that I tried not to modify the current global serialization API, even though there are various improvements that could be done, including removing unnecessary methods, better grouping of methods by functionality (for example creating a DiscriminatorRegistry) and reducing the use of locks. I decided again doing any king of improvement as this PR is already quite broad, and having changes would have further complicated it. Nevertheless, this is something that can be done in follow up PR, mostly after we have taken a decision on the shape of the public API.

In order for the domain to be accessible where it is needed, I've added a domain property in multiple classes, among which:

  • MongoClientSettings/MongoDatabaseSettings/MongoClientSettings
  • BsonSerializationContext/BsonDeserializationContext
  • BsonReaderSettings/BsonWriterSettings
  • TranslationContext
  • MongoQueryProvider
  • MessageEncoderSettings

These new properties allow to pass down the domain where it is needed. This also required enriching interfaces with methods or properties that use the domain. For public interface, this was obtained by creating an internal interface that derives from the public one, and that contains the new method that take the domain as input, as you can see with IMemberMapConventionInternal for example. In order to keep the implementation of those interfaces hidden from the public API, those interfaces have been implemented explicitly.
Also, in order to keep compatibility with the current API, I have created the static InternalExtensions (in both the Bson and Driver package) to create internal extension methods that select the appropriate method to call depending if a class implements the internal (enriched) interface, or only the public one. For example, take a look at InternalExtensions.ApplyInternal.

Another thing to note is that I did not remove calls to the global state (mostly to BsonSerializer) everywhere I could, as I think in some cases it is the most appropriate. In those cases (for example inside the Authentication.AWS or Driver.Encryption packages ) I think BsonSerializer is used as a container for "default serialization", for example when deserializing keys for encryption. I think in the future we need to create a "default" serialization domain that can't be modified and only contains default serializers/deserializers to be used in those cases. At the moment the use of BsonSerializer is possibly risky, as we do not know how developers are modifying the global settings, even though we don't expect them to modify the most common serializers.

In order to keep the changes private and let Drivers and other packages access the internals of Bson, I had to use the InternalsVisibleToAttribute.

As a final note, I've left lots of comments in the code, in order to remember why certain decisions were taken, and a couple of questions left. Those comments have specific tags so they can be found and removed.

@papafe papafe requested a review from rstam July 3, 2025 15:27
@papafe papafe marked this pull request as ready for review July 3, 2025 15:27
@papafe papafe requested a review from BorisDog July 3, 2025 15:27
/// <summary>
/// Gets the settings of the reader.
/// </summary>
BsonReaderSettings Settings { get; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was just an omission.

In any case I don't think we need this.

@@ -62,7 +62,7 @@ public interface IBsonReader : IDisposable
/// <summary>
/// Pops the settings.
/// </summary>
void PopSettings();
void PopSettings(); //TODO Why do we have push and pop methods? They are not used. We should remove them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for a different PR maybe?

@@ -25,7 +25,7 @@ namespace MongoDB.Bson.IO
/// <summary>
/// Represents a BSON reader for some external format (see subclasses).
/// </summary>
public abstract class BsonReader : IBsonReader
public abstract class BsonReader : IBsonReaderInternal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IBsonReaderInternal not needed.

@@ -77,6 +79,16 @@ public BsonReaderSettings FrozenCopy()
}
}

internal IBsonSerializationDomain SerializationDomain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a reader setting.

Classes in this directory are low level I/O classes that just do I/O.

Serialization is one level up and is built on TOP of these I/O classes.

/// <summary>
/// //TODO
/// </summary>
internal IBsonSerializationDomain SerializationDomain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a writer setting.

Classes in this directory are low level I/O classes that just do I/O.

Serialization is one level up and is built on TOP of these I/O classes.

@@ -0,0 +1,317 @@
using System;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add copyright.

{
return type.IsGenericType && type.GetGenericTypeDefinition() == typeof(Nullable<>);
}
// Commented out because there is an identical method in Bson assembly (and also in this assembly...).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See CSHARP-5632.

The duplication between Bson and Driver was because they were internal.

Be careful if you remove any of the duplication here. I don't think all the duplicate methods are 100% identical and there might be subtle reasons why (but I don't know them if there are).

{
var discriminator = discriminatorConvention.GetDiscriminator(nominalType, actualType);
var discriminator = discriminatorConvention.GetDiscriminatorInternal(nominalType, actualType, serializationDomain);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going the VERY difficult and fragile to remember NOT to call GetDiscriminator and to call GetDiscriminatorInternal (or whatever we decide to call it) instead...


namespace MongoDB.Driver.Support
{
internal static class InternalExtensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see this in a file called IPipelineStageDefinitionExtensions.cs.

I also would not mix extension methods for different types in the same file (that didn't happen here though, but it did in some other file).

<Compile Include="..\MongoDB.Shared\SequenceComparer.cs" Link="Shared\SequenceComparer.cs" />
<Compile Include="..\MongoDB.Shared\Hasher.cs" Link="Shared\Hasher.cs" />
</ItemGroup>
<!-- The followings have been removed because they are accessed directly through MongoDB.Bson, otherwise there will be a double definition -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean "INDIRECTLY through MongoDB.Bson"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InternalsVisibleTo sucks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants