Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluffy: Portal subnetwork peer ban list #3007

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
1f7869b
Implement ban list data structure, banPeer and isBanned.
bhartnett Jan 16, 2025
000e2bb
Ban nodes when state network content lookups and offers fail validation.
bhartnett Jan 17, 2025
db11fb1
Filter out and ignore banned peers in portal protocol.
bhartnett Jan 17, 2025
4b8cf1c
Merge branch 'master' into fluffy-peer-ban-list
bhartnett Jan 20, 2025
5f91b1b
Remove helper templates.
bhartnett Jan 20, 2025
5e38c95
Merge branch 'master' into fluffy-peer-ban-list
bhartnett Jan 20, 2025
22d4627
Update portal protocol to use routing table node bans.
bhartnett Jan 27, 2025
aed4020
Fix copyright.
bhartnett Jan 29, 2025
0ca892a
Merge branch 'master' into fluffy-peer-ban-list
bhartnett Jan 29, 2025
628285f
Merge branch 'master' into fluffy-peer-ban-list
bhartnett Jan 30, 2025
89d7230
Use latest nim-eth feature branch.
bhartnett Jan 30, 2025
885a1c0
Ban on message response errors. Filter out banned nodes in findContent.
bhartnett Jan 30, 2025
4b855f1
Add config parameter to support disabling node bans.
bhartnett Jan 30, 2025
279e2eb
Fix tests.
bhartnett Jan 30, 2025
287262b
Improve node bans.
bhartnett Jan 30, 2025
915cab2
Enable discv5 ban nodes in tests.
bhartnett Jan 30, 2025
b2f2784
Enable node bans in portal cli and utp test.
bhartnett Jan 30, 2025
0d73e57
Remove wrong subprotocol ban and add tests.
bhartnett Jan 30, 2025
ac109f0
Run nph and update nim-eth.
bhartnett Jan 30, 2025
e335374
Fix copyright.
bhartnett Jan 30, 2025
120ef39
Remove bans that may cause issues.
bhartnett Feb 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions fluffy/conf.nim
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,15 @@ type
name: "disable-state-root-validation"
.}: bool

disableBanNodes* {.
hidden,
desc:
"Disable node banning functionality for both discv5 and portal sub-protocols",
defaultValue: defaultDisableBanNodes,
defaultValueDesc: $defaultDisableBanNodes,
name: "debug-disable-ban-nodes"
.}: bool

case cmd* {.command, defaultValue: noCommand.}: PortalCmd
of noCommand:
discard
Expand Down
2 changes: 2 additions & 0 deletions fluffy/fluffy.nim
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ proc run(
enrAutoUpdate = config.enrAutoUpdate,
config = discoveryConfig,
rng = rng,
banNodes = not config.disableBanNodes,
)

d.open()
Expand Down Expand Up @@ -184,6 +185,7 @@ proc run(
config.tableIpLimit, config.bucketIpLimit, config.bitsPerHop, config.alpha,
config.radiusConfig, config.disablePoke, config.maxGossipNodes,
config.contentCacheSize, config.disableContentCache, config.maxConcurrentOffers,
config.disableBanNodes,
)

portalNodeConfig = PortalNodeConfig(
Expand Down
13 changes: 13 additions & 0 deletions fluffy/network/history/history_network.nim
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,9 @@ proc getVerifiedBlockHeader*(
return Opt.none(Header)

header = validateCanonicalHeaderBytes(headerContent.content, id, n.accumulator).valueOr:
n.portalProtocol.banNode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While all these banNodes are fine in theory. They might cause issues with a occasional Trin bug where only partial data is send currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the longer term improvement for this would be to add counts of violations and then only ban nodes after a certain limit is reached. For this PR though I guess we could simply reduce the ban time if there are concerns with it not behaving as intended.

headerContent.receivedFrom.id, NodeBanDurationContentLookupFailedValidation
)
warn "Validation of block header failed",
error = error, node = headerContent.receivedFrom.record.toURI()
continue
Expand Down Expand Up @@ -192,6 +195,9 @@ proc getBlockBody*(
return Opt.none(BlockBody)

body = validateBlockBodyBytes(bodyContent.content, header).valueOr:
n.portalProtocol.banNode(
bodyContent.receivedFrom.id, NodeBanDurationContentLookupFailedValidation
)
warn "Validation of block body failed",
error, node = bodyContent.receivedFrom.record.toURI()
continue
Expand Down Expand Up @@ -266,7 +272,11 @@ proc getReceipts*(
receiptsContent = (await n.portalProtocol.contentLookup(contentKey, contentId)).valueOr:
debug "Failed fetching receipts from the network"
return Opt.none(seq[Receipt])

receipts = validateReceiptsBytes(receiptsContent.content, header.receiptsRoot).valueOr:
n.portalProtocol.banNode(
receiptsContent.receivedFrom.id, NodeBanDurationContentLookupFailedValidation
)
warn "Validation of receipts failed",
error, node = receiptsContent.receivedFrom.record.toURI()
continue
Expand Down Expand Up @@ -384,6 +394,9 @@ proc validateContent(

debug "Received offered content validated successfully", srcNodeId, contentKey
else:
if srcNodeId.isSome():
n.portalProtocol.banNode(srcNodeId.get(), NodeBanDurationOfferFailedValidation)

debug "Received offered content failed validation",
srcNodeId, contentKey, error = res.error
return false
Expand Down
7 changes: 7 additions & 0 deletions fluffy/network/state/state_network.nim
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,9 @@ proc getContent(
continue

validateRetrieval(key, contentValue).isOkOr:
n.portalProtocol.banNode(
lookupRes.receivedFrom.id, NodeBanDurationContentLookupFailedValidation
)
error "Validation of retrieved state content failed"
continue

Expand Down Expand Up @@ -246,6 +249,10 @@ proc processContentLoop(n: StateNetwork) {.async: (raises: []).} =
debug "Received offered content validated successfully",
srcNodeId, contentKeyBytes
else:
if srcNodeId.isSome():
n.portalProtocol.banNode(
srcNodeId.get(), NodeBanDurationOfferFailedValidation
)
state_network_offers_failed.inc(labelValues = [$n.portalProtocol.protocolId])
error "Received offered content failed validation",
srcNodeId, contentKeyBytes, error = offerRes.error()
Expand Down
77 changes: 63 additions & 14 deletions fluffy/network/wire/portal_protocol.nim
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,11 @@ const
## value in milliseconds
initialLookups = 1 ## Amount of lookups done when populating the routing table

## Ban durations for banned nodes in the routing table
NodeBanDurationInvalidResponse = 30.minutes
NodeBanDurationContentLookupFailedValidation* = 60.minutes
NodeBanDurationOfferFailedValidation* = 60.minutes

type
ToContentIdHandler* =
proc(contentKey: ContentKeyByteList): results.Opt[ContentId] {.raises: [], gcsafe.}
Expand Down Expand Up @@ -285,6 +290,13 @@ func getProtocolId*(
of PortalSubnetwork.transactionGossip:
[portalPrefix, 0x4F]

proc banNode*(p: PortalProtocol, nodeId: NodeId, period: chronos.Duration) =
if not p.config.disableBanNodes:
p.routingTable.banNode(nodeId, period)

proc isBanned*(p: PortalProtocol, nodeId: NodeId): bool =
p.config.disableBanNodes == false and p.routingTable.isBanned(nodeId)

func `$`(id: PortalProtocolId): string =
id.toHex()

Expand All @@ -300,8 +312,10 @@ func getNode*(p: PortalProtocol, id: NodeId): Opt[Node] =
func localNode*(p: PortalProtocol): Node =
p.baseProtocol.localNode

func neighbours*(p: PortalProtocol, id: NodeId, seenOnly = false): seq[Node] =
p.routingTable.neighbours(id = id, seenOnly = seenOnly)
template neighbours*(
p: PortalProtocol, id: NodeId, k: int = BUCKET_SIZE, seenOnly = false
): seq[Node] =
p.routingTable.neighbours(id, k, seenOnly)

func distance(p: PortalProtocol, a, b: NodeId): UInt256 =
p.routingTable.distance(a, b)
Expand Down Expand Up @@ -480,7 +494,7 @@ proc handleFindContent(
# Node does not have the content, or content is not even in radius,
# send closest neighbours to the requested content id.
let
closestNodes = p.routingTable.neighbours(NodeId(contentId), seenOnly = true)
closestNodes = p.neighbours(NodeId(contentId), seenOnly = true)
enrs = truncateEnrs(closestNodes, maxPayloadSize, enrOverhead)
portal_content_enrs_packed.observe(enrs.len().int64, labelValues = [$p.protocolId])

Expand Down Expand Up @@ -557,6 +571,12 @@ proc messageHandler(

let p = PortalProtocol(protocol)

if p.isBanned(srcId):
# The sender of the message is in the temporary node ban list
# so we don't process the message
debug "Dropping message from banned node", srcId, srcUdpAddress
return @[] # Reply with an empty response message

let decoded = decodeMessage(request)
if decoded.isOk():
let message = decoded.get()
Expand Down Expand Up @@ -661,15 +681,15 @@ proc reqResponse[Request: SomeMessage, Response: SomeMessage](
labelValues = [$p.protocolId, $messageKind(Request)]
)

let talkresp =
let talkResp =
await talkReq(p.baseProtocol, dst, @(p.protocolId), encodeMessage(request))

# Note: Failure of `decodeMessage` might also simply mean that the peer is
# not supporting the specific talk protocol, as according to specification
# an empty response needs to be send in that case.
# See: https://github.com/ethereum/devp2p/blob/master/discv5/discv5-wire.md#talkreq-request-0x05

let messageResponse = talkresp
let messageResponse = talkResp
.mapErr(
proc(x: cstring): string =
$x
Expand All @@ -680,7 +700,11 @@ proc reqResponse[Request: SomeMessage, Response: SomeMessage](
)
.flatMap(
proc(m: Message): Result[Response, string] =
getInnerMessage[Response](m)
let r = getInnerMessage[Response](m)
# Ban nodes that that send wrong type of response message
if r.isErr():
p.banNode(dst.id, NodeBanDurationInvalidResponse)
return r
)

if messageResponse.isOk():
Expand Down Expand Up @@ -758,6 +782,9 @@ proc ping*(
): Future[PortalResult[(uint64, CapabilitiesPayload)]] {.
async: (raises: [CancelledError])
.} =
if p.isBanned(dst.id):
return err("destination node is banned")

let pongResponse = await p.pingImpl(dst)

if pongResponse.isOk():
Expand All @@ -783,12 +810,16 @@ proc ping*(
proc findNodes*(
p: PortalProtocol, dst: Node, distances: seq[uint16]
): Future[PortalResult[seq[Node]]] {.async: (raises: [CancelledError]).} =
if p.isBanned(dst.id):
return err("destination node is banned")

let nodesMessage = await p.findNodesImpl(dst, List[uint16, 256](distances))
if nodesMessage.isOk():
let records = recordsFromBytes(nodesMessage.get().enrs)
if records.isOk():
# TODO: distance function is wrong here for state, fix + tests
return ok(verifyNodesRecords(records.get(), dst, enrsResultLimit, distances))
let res = verifyNodesRecords(records.get(), dst, enrsResultLimit, distances)
return ok(res.filterIt(not p.isBanned(it.id)))
else:
return err(records.error)
else:
Expand All @@ -801,6 +832,9 @@ proc findContent*(
node = dst
contentKey

if p.isBanned(dst.id):
return err("destination node is banned")

let contentMessageResponse = await p.findContentImpl(dst, contentKey)

if contentMessageResponse.isOk():
Expand Down Expand Up @@ -868,8 +902,11 @@ proc findContent*(
let records = recordsFromBytes(m.enrs)
if records.isOk():
let verifiedNodes = verifyNodesRecords(records.get(), dst, enrsResultLimit)

return ok(FoundContent(src: dst, kind: Nodes, nodes: verifiedNodes))
return ok(
FoundContent(
src: dst, kind: Nodes, nodes: verifiedNodes.filterIt(not p.isBanned(it.id))
)
)
else:
return err("Content message returned invalid ENRs")
else:
Expand Down Expand Up @@ -935,6 +972,9 @@ proc offer(
contentKeys.len().int64, labelValues = [$p.protocolId]
)

if p.isBanned(o.dst.id):
return err("destination node is banned")

let acceptMessageResponse = await p.offerImpl(o.dst, contentKeys)

if acceptMessageResponse.isOk():
Expand Down Expand Up @@ -1088,7 +1128,7 @@ proc lookup*(
## target. Maximum value for n is `BUCKET_SIZE`.
# `closestNodes` holds the k closest nodes to target found, sorted by distance
# Unvalidated nodes are used for requests as a form of validation.
var closestNodes = p.routingTable.neighbours(target, BUCKET_SIZE, seenOnly = false)
var closestNodes = p.neighbours(target, BUCKET_SIZE, seenOnly = false)

var asked, seen = HashSet[NodeId]()
asked.incl(p.localNode.id) # No need to ask our own node
Expand Down Expand Up @@ -1190,7 +1230,7 @@ proc contentLookup*(
## target.
# `closestNodes` holds the k closest nodes to target found, sorted by distance
# Unvalidated nodes are used for requests as a form of validation.
var closestNodes = p.routingTable.neighbours(targetId, BUCKET_SIZE, seenOnly = false)
var closestNodes = p.neighbours(targetId, BUCKET_SIZE, seenOnly = false)

# Shuffling the order of the nodes in order to not always hit the same node
# first for the same request.
Expand Down Expand Up @@ -1316,7 +1356,7 @@ proc traceContentLookup*(
# Need to use a system clock and not the mono clock for this.
let startedAtMs = int64(times.epochTime() * 1000)

var closestNodes = p.routingTable.neighbours(targetId, BUCKET_SIZE, seenOnly = false)
var closestNodes = p.neighbours(targetId, BUCKET_SIZE, seenOnly = false)
# Shuffling the order of the nodes in order to not always hit the same node
# first for the same request.
p.baseProtocol.rng[].shuffle(closestNodes)
Expand Down Expand Up @@ -1514,7 +1554,7 @@ proc query*(
## This will take k nodes from the routing table closest to target and
## query them for nodes closest to target. If there are less than k nodes in
## the routing table, nodes returned by the first queries will be used.
var queryBuffer = p.routingTable.neighbours(target, k, seenOnly = false)
var queryBuffer = p.neighbours(target, k, seenOnly = false)

var asked, seen = HashSet[NodeId]()
asked.incl(p.localNode.id) # No need to ask our own node
Expand Down Expand Up @@ -1605,7 +1645,7 @@ proc neighborhoodGossip*(
# It might still cause issues in data getting propagated in a wider id range.

var closestLocalNodes =
p.routingTable.neighbours(NodeId(contentId), k = 16, seenOnly = true)
p.routingTable.neighbours(NodeId(contentId), BUCKET_SIZE, seenOnly = true)

# Shuffling the order of the nodes in order to not always hit the same node
# first for the same request.
Expand Down Expand Up @@ -1813,6 +1853,9 @@ proc refreshLoop(p: PortalProtocol) {.async: (raises: []).} =
trace "Discovered nodes in random target query", nodes = randomQuery.len
debug "Total nodes in routing table", total = p.routingTable.len()

# Remove the expired bans from routing table to limit memory usage
p.routingTable.cleanupExpiredBans()

await sleepAsync(refreshInterval)
except CancelledError:
trace "refreshLoop canceled"
Expand Down Expand Up @@ -1865,6 +1908,12 @@ proc resolve*(
if id == p.localNode.id:
return Opt.some(p.localNode)

# No point in trying to resolve a banned node because it won't exist in the
# routing table and it will be filtered out of any respones in the lookup call
if p.isBanned(id):
debug "Not resolving banned node", nodeId = id
return Opt.none(Node)

let node = p.getNode(id)
if node.isSome():
let nodesMessage = await p.findNodes(node.get(), @[0'u16])
Expand Down
7 changes: 6 additions & 1 deletion fluffy/network/wire/portal_protocol_config.nim
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Fluffy
# Copyright (c) 2021-2024 Status Research & Development GmbH
# Copyright (c) 2021-2025 Status Research & Development GmbH
# Licensed and distributed under either of
# * MIT license (license terms in the root directory or at https://opensource.org/licenses/MIT).
# * Apache v2 license (license terms in the root directory or at https://www.apache.org/licenses/LICENSE-2.0).
Expand Down Expand Up @@ -45,6 +45,7 @@ type
contentCacheSize*: int
disableContentCache*: bool
maxConcurrentOffers*: int
disableBanNodes*: bool

const
defaultRadiusConfig* = RadiusConfig(kind: Dynamic)
Expand All @@ -56,6 +57,7 @@ const
defaultMaxConcurrentOffers* = 50
defaultAlpha* = 3
revalidationTimeout* = chronos.seconds(30)
defaultDisableBanNodes* = false

defaultPortalProtocolConfig* = PortalProtocolConfig(
tableIpLimits: DefaultTableIpLimits,
Expand All @@ -67,6 +69,7 @@ const
contentCacheSize: defaultContentCacheSize,
disableContentCache: defaultDisableContentCache,
maxConcurrentOffers: defaultMaxConcurrentOffers,
disableBanNodes: defaultDisableBanNodes,
)

proc init*(
Expand All @@ -81,6 +84,7 @@ proc init*(
contentCacheSize: int,
disableContentCache: bool,
maxConcurrentOffers: int,
disableBanNodes: bool,
): T =
PortalProtocolConfig(
tableIpLimits:
Expand All @@ -93,6 +97,7 @@ proc init*(
contentCacheSize: contentCacheSize,
disableContentCache: disableContentCache,
maxConcurrentOffers: maxConcurrentOffers,
disableBanNodes: disableBanNodes,
)

func fromLogRadius*(T: type UInt256, logRadius: uint16): T =
Expand Down
9 changes: 9 additions & 0 deletions fluffy/rpc/rpc_portal_state_api.nim
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ proc installPortalStateApiHandlers*(rpcServer: RpcServer, p: PortalProtocol) =
of Content:
let valueBytes = foundContentResult.content
validateRetrieval(key, valueBytes).isOkOr:
p.banNode(node.id, NodeBanDurationContentLookupFailedValidation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No huge deal, but having these banNode calls also in the rpc api is a bit much. It would be nice if we could limit them to portal wire protocol and the specific networks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the bans on this behaviour are correct, and should be done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No huge deal, but having these banNode calls also in the rpc api is a bit much. It would be nice if we could limit them to portal wire protocol and the specific networks.

Yeah I guess at some point I could refactor this to import the state network into the rpc api and then reuse the existing functions there which will already apply the bans as needed. Probably better to do that in a future PR though.

raise invalidValueErr()

let res = ContentInfo(
Expand Down Expand Up @@ -97,6 +98,10 @@ proc installPortalStateApiHandlers*(rpcServer: RpcServer, p: PortalProtocol) =
valueBytes = contentLookupResult.content

validateRetrieval(key, valueBytes).isOkOr:
p.banNode(
contentLookupResult.receivedFrom.id,
NodeBanDurationContentLookupFailedValidation,
)
raise invalidValueErr()
p.storeContent(keyBytes, contentId, valueBytes, cacheContent = true)

Expand Down Expand Up @@ -132,6 +137,10 @@ proc installPortalStateApiHandlers*(rpcServer: RpcServer, p: PortalProtocol) =
raise contentNotFoundErrWithTrace(data)

validateRetrieval(key, valueBytes).isOkOr:
if res.trace.receivedFrom.isSome():
p.banNode(
res.trace.receivedFrom.get(), NodeBanDurationContentLookupFailedValidation
)
raise invalidValueErr()
p.storeContent(keyBytes, contentId, valueBytes, cacheContent = true)

Expand Down
Loading
Loading