Background
GossipSub messages have four fields, two of which are source and sequence_number.
In addition, each message is associated with a “message ID”. These do not feature in messages as a field, but are reported in the event emitted when a message is received.
To prevent broadcast storms, GossipSub peers have to maintain a cache of the messages it has seen and “processed”. This cache must identify seen messages using some key. The material I’ve read has not been clear on whether source and sequence number, or message ID, or a combination of the three things are used as this key.
Potential problem
If only message ID is used as the de-duplication key, malicious GossipSub peers can “force” message ID collisions to prevent other peers’ messages from being propagated.
My guess
This seems like a pretty bad and obvious vulnerability, so to prevent this, I assume that libp2p maintains a separate message ID cache for each message.source to de-duplicate messages.
However, if this is the case, its weird to me that message IDs are by default the concatenation of source peer ID and the message’s sequence number. If a separate message ID cache is already maintained for each message.source, just having a message’s ID be the same as its sequence number should be sufficient.
Is this just an idiosyncrasy? Or is there something I’m missing?
Background
GossipSub messages have four fields, two of which are
sourceandsequence_number.In addition, each message is associated with a “message ID”. These do not feature in messages as a field, but are reported in the event emitted when a message is received.
To prevent broadcast storms, GossipSub peers have to maintain a cache of the messages it has seen and “processed”. This cache must identify seen messages using some key. The material I’ve read has not been clear on whether
sourceandsequence number, or message ID, or a combination of the three things are used as this key.Potential problem
If only message ID is used as the de-duplication key, malicious GossipSub peers can “force” message ID collisions to prevent other peers’ messages from being propagated.
My guess
This seems like a pretty bad and obvious vulnerability, so to prevent this, I assume that libp2p maintains a separate message ID cache for each
message.sourceto de-duplicate messages.However, if this is the case, its weird to me that message IDs are by default the concatenation of source peer ID and the message’s sequence number. If a separate message ID cache is already maintained for each
message.source, just having a message’s ID be the same as its sequence number should be sufficient.Is this just an idiosyncrasy? Or is there something I’m missing?