Access lists indexing#1361
Conversation
|
/run-security-scan |
alexcos20
left a comment
There was a problem hiding this comment.
AI automated code review (Gemini 3).
Overall risk: medium
Summary:
This PR successfully implements indexing and database querying for Access Lists, introducing event processors, new HTTP endpoints, and comprehensive DB schemas for both Typesense and Elasticsearch. The implementation integrates well with existing architecture and uses robust testing. However, there are significant performance bottlenecks related to universally emitted events (OwnershipTransferred) and synchronous RPC calls per event. There is also a potential bug regarding precision loss for large tokenIds.
Comments:
• [ERROR][performance] The OwnershipTransferred event is emitted by almost every Ownable contract on the blockchain. Because this processor is hooked to the topic hash, it will trigger a database retrieve for every ownership transfer across the entire network. On active networks, this will cause severe database spam and degrade indexer performance. Consider utilizing an in-memory Set of known AccessList addresses to filter relevant events before querying the database.
+ // Suggestion: Filter using an in-memory cache first
+ if (!knownAccessLists.has(contractAddress)) return null;
const existing = await accessList.retrieve(chainId, contractAddress)• [WARNING][performance] Fetching the block via provider.getBlock(event.blockNumber) inside the event processor means an asynchronous RPC call is made for every AddressAdded event. If a factory deploys a list with hundreds of initial users, this will result in hundreds of duplicate RPC calls for the same block, potentially hitting RPC rate limits. Consider utilizing a simple block cache mechanism or passing the block timestamp directly from the indexer context.
- const block = await provider.getBlock(event.blockNumber)
- const timestamp = block?.timestamp ?? Math.floor(Date.now() / 1000)
+ // Suggestion: implement an LRU cache for block fetches
+ const timestamp = await getCachedBlockTimestamp(provider, event.blockNumber) ?? Math.floor(Date.now() / 1000)• [WARNING][bug] tokenId in ERC721 contracts is typically a uint256. Converting it directly to a JavaScript Number will cause precision loss for any value exceeding Number.MAX_SAFE_INTEGER (2^53 - 1). This could corrupt index data and make elements un-removable. It is highly recommended to store large integer identifiers as string values (and update the DB schema accordingly) unless you enforce that token IDs will remain under the safety limit.
- const tokenId = Number(decoded.args[1])
+ // Safer approach if database schema is updated to handle strings
+ const tokenId = BigInt(decoded.args[1]).toString()• [INFO][bug] Typesense lacks atomic operations for nested arrays (unlike Elasticsearch's painless scripts). Fetching the document, modifying the users array in-memory, and writing it back creates a potential race condition if multiple indexing events update the same AccessList concurrently. Since node indexers typically operate sequentially per chain, it may not be an issue in practice, but it's important to document or keep in mind for future horizontal scaling.
|
/run-security-scan |
alexcos20
left a comment
There was a problem hiding this comment.
AI automated code review (Gemini 3).
Overall risk: low
Summary:
Excellent implementation of Access List indexing. The overall design, HTTP/P2P handlers, and integration tests are well-structured and comprehensive. I've left a few suggestions regarding updating array elements (for token transfers), normalizing API responses between ElasticSearch and Typesense, and hardening some edge cases around error handling and validation.
Comments:
• [WARNING][bug] If an AddressAdded event fires for an existing tokenId (e.g., the NFT was transferred to a new wallet), the current painless script simply ignores it because exists is true. It should overwrite the existing user with the new wallet so the index stays up to date.
- boolean exists = false;
- for (int i = 0; i < ctx._source.users.length; i++) {
- if (ctx._source.users[i].tokenId == params.user.tokenId) { exists = true; break; }
- }
- if (!exists) { ctx._source.users.add(params.user); }
+ boolean exists = false;
+ for (int i = 0; i < ctx._source.users.length; i++) {
+ if (ctx._source.users[i].tokenId == params.user.tokenId) {
+ ctx._source.users[i] = params.user;
+ exists = true;
+ break;
+ }
+ }
+ if (!exists) { ctx._source.users.add(params.user); }• [WARNING][bug] Similar to the ElasticSearch implementation, if the tokenId already exists in the users array, we should overwrite it to reflect the new wallet in case of an NFT transfer.
- const nextUsers = exists ? users : [...users, normalized]
+ const nextUsers = exists
+ ? users.map((u: AccessListUser) => u.tokenId === normalized.tokenId ? normalized : u)
+ : [...users, normalized]• [INFO][style] In ElasticsearchAccessListDatabase, searchByWallet maps and returns an innerHits array for each document containing the explicitly matched users. Typesense does not do this natively here, leading to inconsistent API responses depending on the database backend. You can manually construct innerHits to match the ES schema.
- return (result.hits ?? []).map((h: any) => h.document)
+ return (result.hits ?? []).map((h: any) => {
+ const doc = h.document
+ const innerHits = (doc.users ?? []).filter((u: AccessListUser) => u.wallet === lowerWallet)
+ return { ...doc, innerHits }
+ })• [WARNING][bug] If the RPC call to accessListContract.transferable() fails due to a transient network error, the catch block logs a warning and proceeds with transferable = false. This permanently locks in potentially incorrect data. It's safer to return null and log as an error so the indexer can retry the event later.
- } catch (err) {
- INDEXER_LOGGER.log(
- LOG_LEVELS_STR.LEVEL_WARN,
- `Failed to read transferable() on ${contractAddress}: ${err.message}`
- )
- }
+ } catch (err) {
+ INDEXER_LOGGER.log(
+ LOG_LEVELS_STR.LEVEL_ERROR,
+ `Failed to read transferable() on ${contractAddress}: ${err.message}`
+ )
+ return null
+ }• [INFO][bug] If task.chainId is present but is an invalid string, it gets cast to NaN. Sending NaN to the database query could cause unexpected behavior. It's safer to validate it identically to the HTTP route.
- const chainId = task.chainId !== undefined ? Number(task.chainId) : undefined
+ let chainId: number | undefined
+ if (task.chainId !== undefined) {
+ chainId = Number(task.chainId)
+ if (Number.isNaN(chainId)) {
+ return {
+ stream: null,
+ status: { httpStatus: 400, error: 'Invalid chainId' }
+ }
+ }
+ }• [INFO][other] Just a heads up: casting decoded.args[1] (which is a BigInt representing a uint256) using Number() can result in precision loss if the tokenId exceeds 2^53 - 1 (the max safe integer in JS). Since Ocean Protocol's AccessList tokens are likely strictly auto-incrementing counters this is perfectly safe, but if they are ever hashes or UUIDs, consider changing the database schema and this parser to handle tokenId as a string.
|
/run-security-scan |
alexcos20
left a comment
There was a problem hiding this comment.
AI automated code review (Gemini 3).
Overall risk: low
Summary:
This pull request implements comprehensive indexing and API handling for AccessList contracts. It correctly captures NewAccessList, AddressAdded, and AddressRemoved events, indexing them efficiently into Elasticsearch and Typesense. The architecture gracefully handles out-of-order log indexing using atomic scripts and upserts. However, a few improvements are required regarding P2P handler input validation (to prevent NoSQL filter injections and invalid DB queries) and stream manipulation in HTTP routes to optimize memory usage.
Comments:
• [WARNING][bug] When handlers are invoked via P2P commands, task.chainId is unchecked and might evaluate to NaN if a peer sends an invalid format. Passing NaN directly to the database layer can cause query errors. Validate chainId carefully.
- const doc = await db.accessList.retrieve(
- Number(task.chainId),
- String(task.contractAddress)
- )
+ const chainId = Number(task.chainId)
+ if (Number.isNaN(chainId)) {
+ return {
+ stream: null,
+ status: { httpStatus: 400, error: 'chainId must be a valid number' }
+ }
+ }
+ const doc = await db.accessList.retrieve(
+ chainId,
+ String(task.contractAddress)
+ )• [WARNING][bug] Similar to GetAccessListHandler, validate chainId here to ensure a P2P caller hasn't provided a non-numeric string.
- const chainId = task.chainId !== undefined ? Number(task.chainId) : undefined
+ let chainId: number | undefined
+ if (task.chainId !== undefined) {
+ chainId = Number(task.chainId)
+ if (Number.isNaN(chainId)) {
+ return {
+ stream: null,
+ status: { httpStatus: 400, error: 'chainId must be a valid number' }
+ }
+ }
+ }• [WARNING][security] Unlike the HTTP route, the P2P handler does not validate task.wallet. A peer could pass crafted strings containing Typesense logical operators (&&, ||) to manipulate the backend query (NoSQL filter injection). Validate the wallet address before processing.
- const docs = await db.accessList.searchByWallet(String(task.wallet), chainId)
+ const walletString = String(task.wallet)
+ if (!/^0x[a-fA-F0-9]{40}$/i.test(walletString)) {
+ return {
+ stream: null,
+ status: { httpStatus: 400, error: 'Invalid wallet address' }
+ }
+ }
+ const docs = await db.accessList.searchByWallet(walletString, chainId)• [WARNING][performance] Reading the entire stream into a string, parsing it as JSON, and then serializing it again with res.json() adds unnecessary memory overhead and GC pressure. Piping the stream directly to the response object avoids these multiple transformations.
- if (result.stream) {
- const data = JSON.parse(await streamToString(result.stream as Readable))
- res.json(data)
- } else {
+ if (result.stream) {
+ res.set('Content-Type', 'application/json')
+ ;(result.stream as Readable).pipe(res)
+ } else {• [WARNING][performance] Apply the same streaming optimization here to improve endpoint performance and reduce payload buffering.
- if (result.stream) {
- const data = JSON.parse(await streamToString(result.stream as Readable))
- res.json(data)
- } else {
+ if (result.stream) {
+ res.set('Content-Type', 'application/json')
+ ;(result.stream as Readable).pipe(res)
+ } else {• [INFO][style] Excellent use of upsert configurations and Painless scripts! This design flawlessly handles the asynchronous and occasionally unpredictable nature of blockchain log indexing, gracefully processing AddressAdded events that might be indexed slightly before their parent NewAccessList events.
• [INFO][bug] Using Number() on uint256 arguments works fine for small, sequential AccessList identifiers. However, if a malicious contract emits AddressAdded with type(uint256).max, Number() will lose precision because it exceeds Number.MAX_SAFE_INTEGER. While the risk here is low and only impacts visual display, standard practice is to handle unbounded EVM uint256 properties as BigInt or string.
Fixes #1353
Changes proposed in this PR: