timezone |
---|
Europe/Berlin |
- 自我介绍
- Chloe,ETHPanda Core team,prev EIP Fun project lead
- 去年参加了第一期 EPF study group,对以太坊底层协议研发开始上瘾,去年10周 study group 的笔记也可以作为参考:https://hackmd.io/@chloezhux/epfsg_notes , 目前我对 protocol network/ light client 比较关注和感兴趣
- 底层协议的信息量巨大,前沿领域也在不断发展,需要一遍遍不断学习,so here I am~
- 我的 Twitter 和 Telegram
- 你认为你会完成本次残酷学习吗?
- 一定!
-
What's a P2P network
- Definition
- A decentralized communication model where nodes in the network can communicate directly with each other witout a central server
- Unlike traditional client/ server model, where a centralized authority manage all connection & data transfer, p2p network distribute workload and data among participants
- Key features of p2p network
- decentralized: no central server, nodes share data directly
- scalable: network grows as more nodes join
- fault tolerance: no single point of failure
- resource sharing: peers can share computing power, storage, or bandwidth
- Type of p2p network
- unstructured p2p
- nodes randomly connect (eg. Gnutella, Kazaa)
- structured p2p
- use algo to route data (eg. DHT in BitTorrent, Kademila)
- hybrid p2p
- mix of decentralized peers and some centralized componenets
- unstructured p2p
- Definition
-
What type of p2p is Ethereum and Bitcoin
- Bitcoin: mostly unstructured p2p with a gossip protocol for tx & block propagation
- Network structure
- bitcoin nodes randomly connect to other nodes
- tx and blocks are relayed to neighbours, which propagate them further
- nodes discover & main peer lists dynamically
- Data progagtion
- Uses flooding (gossip protocol) where each node forwards data to its connected peers
- Peer discovery
- use DNS seed nodes, hardcoded bootstrap nodes, and peer exchanges
- Network structure
- Ethereum: structured p2p with Kademlia DHT
- Network structure
- used a modified Kademlia DHT to structure peer discovery & routing
- nodes are identified by unique IDs and stored in tree-like structure for efficient lookup
- allow for faster peer discovery & data retrieval compared to Bitcoin
- Data propagation
- also use gossip protocol
- has additional subnetworks (devp2p, libp2p) for different types of data, eg. state sync, block propagation, tx relaying
- Peer discovery
- use a Kademlia DHT for peer lookup
- nodes maintain a routing table that organizes peers based on proximity in the DHT
- Network structure
Bitcoin Ethereum network type unstructred p2p structured p2p (kademlia DHT) node discovery random peer selection, DNS seed kademlia DHT for structured peer lookup data porpagation gossip-based (flooding) gossip-based + DHT routing efficiency redundant message forwarding more efficient lookup - Bitcoin: mostly unstructured p2p with a gossip protocol for tx & block propagation
- What's DHT and Kademlia DHT
- DHT (Distributed Hash Table)
- a decentralized system for storing & retrieving key-value pairs in a distributed network
- How it works
- Each node in the network store a portion of the key-value pairs
- Keys are hashed to produce a unique identifier -> determine which node is responsible for storing the cooresponding value
- When a node wants to retrive a value, it uses the DHT to locate the node responsible for that key
- Kademlia DHT
- a specific implementation of a DHT that is widely used in P2P networks, including Ethereum, BitTorrent, and IPFS. It was introduced in 2002 by Petar Maymounkov and David Mazières and is known for its efficiency, simplicity, and robustness
- Key features
- use a binary tree-based routing algo to locate nodes & data in O(logN) steps
- use XOR (excl. OR) to measure the distance btw nodes and keys
- send queries to multiple nodes simultaneously
- each node & key is assigned a unqiue 160-bit ID
- each node maintains a routing table (k-bucket) that stores info about other nodes in the network
- How it works
- Node ID assignment: each node is assigned a unique 160-bit ID, usually generated by hashing its IP address or public key
- Key-value storage: keys are also hashed to 160-bit ID; each key-value pair is stored on the node whose ID is closest to the key ID (bassed on XOR)
- Lookup process: a node sends a lookup request to the nodes in its routing table that are closest to the key's ID; These nodes respond with information about even closer nodes, and the process repeats until the closest node (responsible for the key) is found
- Routing table maintenance: nodes periodically update their routing tables by querying other nodes and exchanging information about peers
- Application
- Used in Ethereum, BitTorrent, IPFS
- Other types of DHT
- Chord (consistent hashing-based)
- use ring structure: node ID and keys are arranged in a circular space
- each node maintains a finger table pointing to nodes at exponentially increasing distance in the ring
- require more maintenance when nodes join/ leave, whereas Kademlia’s XOR-based buckets provide better resilience
- Pastry (prefiex-based routing)
- use prefix-matching for routing. Nodes and keys have numerical IDs, and nodes forward requests to peers whose ID shares the longest prefix with the target
- each node keeps a leaf set (close nodes) and a routing table for long-range hops
- need more state per node (bigger routing table) than Kademlia
- CAN (content addressable network)
- use a d-dimensional coordinate space, where each node owns a zone
- keys are mapped to coordinate points in this space, and nodes forward queries toward the target zone
- lookup complexity is O(d N^(1/d)), scalable with more dimensions
- less efficient than Kademlia for large networks because it requires more hops in high-dimensional spaces
- Why Kademlia is a better choice
- XOR-based distance, enables parallel lookups
- better fault tolerance: node cache more peer info, more resilient to churn
- efficient lookups: O(logN) hops with min maintenance overhead
- Chord (consistent hashing-based)
- DHT (Distributed Hash Table)
- What's Ethereum Protocol design in high level
- Design philo
- Simplicity, Universality, Modularity, Non-discrimination, Agility
- Main component
- EL: execution engine
- handle user tx and all state (addr, contract data)
- CL: implement pos mechanism
- ensure security and fault tolerance
- EL: execution engine
- Implementation & development
- Client: an implementation of the EL or CL
- Node: a computer running this client & connecting to the network; a node is a pair of EL and CL clients actively participating in the network
- Client diversity strategy
- Testing & security
- Different testing tools for state transition testing, fuzzing, shadow forks, RPC tests, client unit tests and CI/CD, etc.
- Coordination
- Design philo
- Protocol architecture
- Graph: https://epf.wiki/#/wiki/protocol/architecture
- What's user APIs and beacon APIs
- User API (aka JSON-RPC API)
- primary interface for interacting with the EL
- used by wallet, dapps etc.
- Key features
- JSON-RPC protocol: a lightweight remote procedure call (RPC) protocol, that allows clients to send & receive response in JSON format
- Common use: send tx, query blockchain data (eg. balance, contract states), deploy & interact with smart conracts, listen for events (logs emitted by smart contract)
- Endpoints: expose endpoints eg. eth_sendTransaction, eth_getBalance, eth_call, eth_getLogs
- Beacon API
- interface to interact with the beacon chain, which coordinate validators and achieve consensus
- Key features
- RESTful interface
- Common use: query info about the beacon chain (block headers, validator status), submmit attestations & block proposals from validators, monitor the status of the beacon network
- Endpoints: eg. /eth/v1/beacon/blocks (retrive beacon chain blocks), /eth/v1/validator/attestation (submit an attestation from a validator)
- Staking pools & monitor tools use the api to track validator performance and network health
- User API (aka JSON-RPC API)
- Issue with JSON RPC api
- Centralization
- rely on centralized infra provider (eg. infura, alchemy, quicknode) to access Ethereum nodes via json rpc. These service act as intermediaries, reducing the need for developers to run their own nodes
- barrier to run full nodes: it requires significiant resources (storage, bandwidth, computation power) to run a full node, so many devs opt for centralized service instread
- Scalability
- high load on nodes: can lead to performance bottlenecks and increased cost for node operators
- inefficient data retrieval: not optimized for querying large amounts of data, can result in slow response time and high latency
- Security
- json-rpc endpoints can expose sensitive info if not properly secured (eg. account balance, tx history)
- public json-rpc endpoints are often targeted by DDoS attacks
- by default json-rpc don't require authentication, making it easy for unauthorized user to access node data
- Lack of modern features
- No RESTful design
- limited tool: lack support for features like filtering, sorting etc.
- verbose and complex
- Potential Alternative/ Solution
- Decentralized node infra: eg. the Graph, EPNS
- Light clients and stateless: reduce the resource required for running nodes
- RESTful api: eg. Besu
- Improved json-rpc: add support for batch request, better error handling etc.
- Centralization
- Blockchain level protocol
- reference link: https://epf.wiki/#/wiki/protocol/design-rationale
- Accounts over UTXOs
- UTXO (unspent tx output)
- Account
- What's the pros/ cons of account vs UTXO? Why Ethereum chooses account-based model?
- Merkle patricia trie
- a modified MPT
- deterministic and cryptographically verifiable
- Verkle tree
- vector commitments allow for much smaller proofs (aka witness)
- RLP (recursive length prefix)
- SSZ (simple serialize)
- Hunt for finality: Casper FFG + LMD GHOST
- Discv5: the discovery protocol
- a kademlia based DHT to store ENR records
- ENR (ethereum node record) contain routing info to establish connections between peers
-
Pros Cons of Account vs UTXO
- Account: How it works in high level
- the blockchain maintains a global state composed of accounts
- each account has a balance (in smart contract, storage and code)
- tx directly modify the state of these accounts
- UTXO: How it works in high level
- the blockchain tracks unspent tx outputs
- a UTXO represents a chunk of crypto that has been created as an output of a tx, but has not yet been spent
- tx consume existing UTXO and create new ones
- Comparison
Account-based UTXO-based state representation global state of accounts & balances set of unspent tx outputs tx logic directly modify account balance consume UTXOs and create new ones complexity easier for dev, esp. for smart contractss more complex for devs, esp. for advanced logic parallelizability limited, as txs modifying the same account must be processed sequentially high, as independent UTXOs can be processed in parallel - Account: How it works in high level
-
Why MPT then Verkle tree
- MPT
- a data structure, that combines merkle tree (provide cryptographic proofs to verify the data integrity) and patricia trie (a compressed trie that stores key-value pairs)
- Verkle tree
- more advanced data structure to address the issue of MPT
- vector commitment + merkle tree, it uses vector commitment (eg. polynomial commitments) to create smaller & more efficient proofs
- Comparison
MPT Verkle tree proof size large (scale with tree depth) small (constant or logarithmic) efficiency less efficient for deep trees more efficient for deep trees stateless clients inefficient due to large proof efficient due to compact proof scalability limited by proof size & depth better scalability for large states cryptographic basis merkle proof (hash-based) vector commitment (eg. polynomial) - MPT
-
What's RLP? What's its purpose? Why want to convert to SSZ?
- Recursive length prefix: a serialization format, to encode & decode data structure into compact byte array
- Where is it used in Ethereum
- transaction: txs are serialized using RLP before being broadcasted or stored in blocks
- block: seralized for storage and transmission
- state trie: stored in a MPT, where keys and values are RLP-encoded
- p2p networking protocol: RLP is used to encode data for p2p networking
- Issue
- don't natively support some data type (Eg. int, float, boolean), but treat everything byte arrays
- not optimized for merkleization
- add overhead for small data structure due to length-prefixing scheme
- not human-readable
- variable-length coding makes it harder for light clients to parse and verify data
-
What's SSZ? What's its purpose?
- simple serialize is a serialization format designed specifically for eth2
- encode & decode data structure in a more efficient, type-aware, optimized for merkleization style
Update the notes on mindmap: https://ab9jvcjkej.feishu.cn/mindnotes/IfABbVTMfmg5IFnqinEcmcDqnFe#mindmap
- 最近新出的关于 hash table 算法的研究和论文,感觉对 DHT 和以太坊会有不少影响,以下是 ds + gpt 老师的回复
- 研究总结链接:https://mp.weixin.qq.com/s/3IvM0b9kHdO66KV18cv11w
- 对以太坊的影响
- 状态存储优化
- 更高效的 MPT:可能优化 MPT 的结构,减少存储空间和查询时间
- 节点发现和通信
- discv5 协议改进:可能优化 Discv5 的路由表结构,使得节点发现更快、更可靠
- 降低网络开销:通过改进哈希表算法,可以减少节点之间的通信开销,从而提高以太坊网络的整体效率
- 可扩展性
- 状态同步加速:更快的哈希表可以提升存储引擎查询效率,进而加快区块同步(fast sync/ snap sync)和轻客户端数据索引
- smart contract 执行和 EVM 性能
- 智能合约存储 & 计算涉及大量哈希运算(例如 SSTORE 操作存储合约变量),更快的哈希表可以优化存储访问,降低 gas 费用
- 尤其是 MEV 交易、去中心化交易所(DEX)撮合、L2 Rollup 状态提交 等,都可能受益
- 状态存储优化
- 对 DHT 的影响
- 加速 p2p 网络的节点发现
- Kademlia DHT 依赖 XOR 距离度量进行路由查询,每个查询涉及多个哈希表操作(节点存储、索引、查询)
- 更快的哈希表 可以优化 查找最近节点的速度,从而提升 网络连接稳定性 和 低延迟通信
- 优化 DHT 结构的扩展性
- 目前的 DHT 通常采用 树形(Trie)或分桶(Bucket)存储 进行哈希索引
- 更快的哈希表算法可能允许 减少存储开销,或者提高节点维护的效率,让 DHT 可以扩展到更大规模的 P2P 网络
- 改进分布式存储系统
- DHT 作为去中心化存储(如 IPFS、Arweave、Filecoin)的核心组件,影响数据定位和检索速度
- 更快的哈希表可以减少存储查询延迟,优化去中心化存储网络的性能
- 加速 p2p 网络的节点发现