a16z: Why is the encryption memory pool not a universal remedy for MEV?

Question

> Technology, economy, efficiency: three towering mountains that cannot be avoided.  **Written by: Pranav Garimidi, Joseph Bonneau, Lioba Heimbach, a16z****Compiled by: Saoirse, Foresight News**  In blockchain, the maximum value that can be earned by deciding which transactions to include in a block, which to exclude, or by adjusting the order of transactions is called "Maximum Extractable Value," abbreviated as MEV. MEV is commonly present in most blockchains and has always been a widely discussed topic in the industry.*Note: This article assumes that the reader has a basic understanding of MEV. Some readers may first read our * *MEV popular science article**.*  Numerous researchers have posed a clear question while observing the MEV phenomenon: Can cryptographic technology solve this problem? One proposed solution is to use encrypted mempool: users broadcast encrypted transactions, which are only decrypted and disclosed after sorting is completed. In this way, the consensus protocol must "blindly select" the transaction order, which seems to prevent profiting from MEV opportunities during the sorting phase.  However, unfortunately, both from a practical application and theoretical perspective, encrypted memory pools cannot provide a universal solution to the MEV problem. This article will outline the difficulties involved and explore feasible design directions for encrypted memory pools.  ## How the Encrypted Memory Pool Works  There have been many proposals regarding the crypto memory pool, but its general framework is as follows:  1. Users broadcast encrypted transactions.2. Encrypted transactions are submitted to the blockchain (in some proposals, transactions must first undergo verifiable random shuffling).3. When the blocks containing these transactions are finally confirmed, the transactions are decrypted.4. Finally execute these transactions.  It should be noted that there is a key issue in Step 3 (Transaction Decryption): Who is responsible for the decryption? What happens if the decryption is not completed? One simple idea is to let users decrypt their own transactions (in which case encryption is not even necessary, just hiding the commitment would suffice). However, this approach has vulnerabilities: attackers could implement speculative MEV.  In speculative MEV, attackers guess that a particular crypto transaction contains MEV opportunities, then encrypt their own transaction and attempt to insert it into a favorable position (such as before or after the target transaction). If the transactions are arranged in the expected order, the attacker will decrypt and extract MEV through their own transaction; if not as expected, they will refuse to decrypt, and their transaction will not be included in the final blockchain.  Perhaps penalties can be imposed on users who fail to decrypt, but the implementation of this mechanism is extremely difficult. The reason is that the penalties for all encrypted transactions must be uniform (after all, transactions cannot be distinguished after encryption), and the penalties need to be severe enough to deter speculative MEV even when facing high-value targets. This could result in a large amount of capital being locked up, and these funds must maintain anonymity (to avoid revealing the association between transactions and users). More troublesome is that if real users are unable to decrypt normally due to program vulnerabilities or network failures, they will also suffer losses as a result.  Therefore, most solutions suggest that when encrypting transactions, it must be ensured that they can definitely be decrypted at some point in the future, even if the initiating user is offline or refuses to cooperate. This goal can be achieved through several methods:  **Trusted Execution Environments (TEEs)**: Users can encrypt transactions to keys held in a secure area of a Trusted Execution Environment (TEE). In some basic versions, the TEE is only used to decrypt transactions after a specific point in time (which requires the TEE to have time awareness). More complex solutions allow the TEE to be responsible for decrypting transactions and building blocks, sorting transactions based on criteria such as arrival time and fees. Compared to other encrypted memory pool solutions, the advantage of TEE is that it can directly handle plaintext transactions, reducing on-chain redundant information by filtering out transactions that would roll back. However, the drawback of this method is its reliance on hardware trustworthiness.  **Secret-sharing and threshold encryption**: In this scheme, users encrypt transactions to a certain key, which is jointly held by a specific committee (usually a subset of validators). Decryption requires meeting a certain threshold condition (for example, two-thirds of the members in the committee must agree).  When using threshold decryption, the trusted bearer shifts from hardware to a committee. Supporters argue that since most protocols already assume that validators possess the "honest majority" characteristic in their consensus mechanisms, we can make a similar assumption that the majority of validators will remain honest and will not decrypt transactions early.  However, it is important to note a key distinction here: these two trust assumptions are not the same concept. Consensus failures, such as blockchain forks, have public visibility (falling under the "weak trust assumption"), while a malicious committee decrypting transactions in private leaves no public evidence; this type of attack is neither detectable nor punishable (falling under the "strong trust assumption"). Therefore, although at first glance the security assumptions of the consensus mechanism and the cryptographic committee seem aligned, in practice, the credibility of the assumption that "the committee will not collude" is much lower.  **Time-lock and delay encryption**: As an alternative to threshold encryption, the principle of delay encryption is as follows: users encrypt transactions to a certain public key, while the private key corresponding to that public key is hidden within a time-lock puzzle. A time-lock puzzle is a cryptographic puzzle that encapsulates a secret, the content of which can only be revealed after a preset time has passed. More specifically, the decryption process requires repeatedly performing a series of non-parallelizable computations. Under this mechanism, anyone can solve the puzzle to obtain the key and decrypt the transaction, but only after completing a sufficiently long slow computation (essentially executed serially), ensuring that the transaction cannot be decrypted until final confirmation. The strongest form of this cryptographic primitive is to publicly generate such puzzles through delay encryption technology; this process can also be approximated using time-lock encryption by a trusted committee, but at this point, its relative advantages compared to threshold encryption are debatable.  Whether using delayed encryption or having a trusted committee perform calculations, such schemes face numerous practical challenges: First, since the delay essentially relies on the computation process, it is difficult to ensure the accuracy of decryption time; second, these schemes need to depend on specific entities operating high-performance hardware to efficiently solve puzzles. Although anyone can assume this role, how to incentivize that entity's participation remains unclear; finally, in such designs, all broadcasted transactions will be decrypted, including those that were never ultimately written into the block. In contrast, threshold (or witness encryption) based schemes may only decrypt transactions that are successfully included.  **Witness encryption**: The most advanced cryptographic scheme is the use of "witness encryption" technology. The mechanism of witness encryption is theoretically as follows: after encrypting the information, only those who know the specific NP relationship corresponding to the "witness information" can decrypt it. For example, the information can be encrypted in such a way that only someone who can solve a specific Sudoku puzzle, or who can provide a certain numerical hash preimage, can complete the decryption.*(Note: NP relationship refers to the correspondence between a "problem" and a "quickly verifiable answer")*  For any NP relation, similar logic can be implemented through SNARKs. It can be said that witness encryption essentially encrypts data in such a way that only entities that can prove satisfaction of specific conditions via SNARK can decrypt it. In the context of an encrypted memory pool, a typical example of such a condition is: transactions can only be decrypted after the block is finally confirmed.  This is a highly promising theoretical primitive. In fact, it is a general scheme, where both committee-based methods and delay-based methods are merely specific applications of it. Unfortunately, we currently do not have any practical witness-based cryptographic schemes that can be implemented. Furthermore, even if such schemes exist, it is difficult to say that they would have advantages over committee-based methods in proof-of-stake chains. Even if witness encryption is set to "only decrypt when the transaction has been ordered in the finalized block", a malicious committee can still privately simulate a consensus protocol to forge the final confirmation status of transactions, and then use this private chain as a "witness" to decrypt the transaction. At this point, the same committee can achieve equivalent security by using threshold decryption, which is also much simpler to operate.  However, in the proof-of-work consensus protocol, the advantages of witness encryption are even more pronounced. Because even if the committee is completely malicious, it cannot privately mine multiple new blocks at the current blockchain head to forge the final confirmation status.  ## Technical Challenges Facing Encrypted Memory Pools  Multiple practical challenges restrict the ability of crypto memory pools to prevent MEV. Overall, information confidentiality is itself a challenge. It is worth noting that the application of cryptography in the Web3 space is not widespread, but decades of practice in deploying cryptographic technology in networks (such as TLS/HTTPS) and private communications (from PGP to modern encrypted messaging platforms like Signal and WhatsApp) have fully exposed the difficulties involved: while cryptography is a tool for protecting confidentiality, it cannot provide absolute assurance.  First, certain entities may directly access the plaintext information of user transactions. In typical scenarios, users usually do not encrypt transactions themselves, but instead delegate this task to wallet service providers. As a result, wallet service providers can access the transaction plaintext and may even utilize or sell this information to extract MEV. The security of encryption always depends on all entities that can access the keys. The scope of key control defines the boundary of security.  In addition, the biggest issue lies in the metadata, namely the unencrypted data surrounding the encrypted payload (transactions). Searchers can use this metadata to infer transaction intentions, thereby implementing speculative MEV. It’s important to note that searchers do not need to fully understand the contents of the transaction, nor do they need to guess correctly every time. For example, as long as they can reasonably judge that a particular transaction is a buy order from a specific decentralized exchange (DEX), it is enough to initiate an attack.  We can categorize metadata into several types: one type consists of classic problems inherent to cryptographic technology, while the other type pertains to issues unique to the cryptographic memory pool.  * **Transaction Size**: Encryption itself cannot hide the size of the plaintext (it is worth noting that the formal definition of semantic security explicitly excludes the hiding of plaintext size). This is a common attack vector in encrypted communications; a typical case is that even after encryption, an eavesdropper can still determine in real-time what content is being played on Netflix by the size of each packet in the video stream. In an encrypted memory pool, certain types of transactions may have unique sizes, thereby leaking information.* **Broadcast Time**: Encryption also cannot hide time information (this is another classic attack vector). In Web3 scenarios, certain senders (such as structured sell-off scenarios) may initiate transactions at fixed intervals. Transaction times may also be associated with other information, such as activities on external exchanges or news events. A more covert way to exploit time information is through arbitrage between centralized exchanges (CEX) and decentralized exchanges (DEX): sorters can insert transactions created as late as possible to leverage the latest CEX price information; at the same time, sorters can exclude all other transactions broadcast after a certain point in time (even if encrypted), ensuring their transaction enjoys the latest price advantage exclusively.* **Source IP Address**: Researchers can infer the identity of transaction senders by monitoring peer-to-peer networks and tracking source IP addresses. This issue was identified early on in Bitcoin's history (over a decade ago). If a specific sender has a fixed behavioral pattern, this is highly valuable to researchers. For example, knowing the sender's identity allows them to associate encrypted transactions with decrypted historical transactions.* **Transaction Sender and Fee / Gas Information**: Transaction fees are a type of metadata unique to the cryptocurrency mempool. In Ethereum, a traditional transaction includes the on-chain sender address (used for fee payment), maximum gas budget, and the unit gas fee the sender is willing to pay. Similar to the source network address, the sender address can be used to associate multiple transactions and real entities; the gas budget can imply the intent of the transaction. For example, interacting with a specific DEX may require a recognizable fixed amount of gas.  Complex seekers may combine the various types of metadata mentioned above to predict trading content.  In theory, this information can all be hidden, but at the cost of performance and complexity. For example, padding transactions to a standard length can hide the size, but it wastes bandwidth and on-chain space; adding delays before sending can hide the time, but it increases latency; submitting transactions through anonymous networks like Tor can hide the IP address, but this brings new challenges.  The most difficult metadata to hide is transaction fee information. Cryptocurrency fee data poses a series of problems for block builders: the first is the issue of spam. If transaction fee data is encrypted, anyone can broadcast incorrectly formatted encrypted transactions. These transactions will be ordered but cannot pay fees, and once decrypted, they cannot be executed, leaving no one accountable. This may be solvable through SNARKs, which prove that the transaction format is correct and the funds are sufficient, but it would significantly increase the overhead.  Secondly, there is the issue of efficiency in block construction and fee auction. Builders rely on fee information to create blocks that maximize profits and determine the current market price of on-chain resources. Cryptocurrency fee data can disrupt this process. One solution is to set a fixed fee for each block, but this is economically inefficient and may lead to a secondary market for transaction packaging, which goes against the original design intention of the cryptocurrency memory pool. Another solution is to conduct fee auctions through secure multi-party computation or trusted hardware, but both methods are extremely costly.  Finally, a secure cryptographic memory pool will increase system overhead from multiple aspects: encryption will increase the latency, computational load, and bandwidth consumption of the chain; how to integrate with important future goals such as sharding or parallel execution is currently unclear; it may also introduce new failure points for liveness (such as decryption committees in threshold schemes, delay function solvers); at the same time, the complexity of design and implementation will also significantly rise.  Many issues with encrypted memory pools are similar to the challenges faced by blockchains aimed at ensuring transaction privacy, such as Zcash and Monero. If there is any positive significance, it is that addressing all the challenges of cryptographic technology in MEV mitigation will also help clear obstacles for transaction privacy.  ## Economic Challenges Facing the Encrypted Memory Pool  Finally, the encrypted memory pool also faces economic challenges. Unlike technical challenges, which can be gradually alleviated through sufficient engineering investment, these economic challenges are fundamental limitations that are extremely difficult to resolve.  The core issue of MEV stems from the information asymmetry between transaction creators (users) and MEV opportunity miners (searchers and block builders). Users are often unclear about how much extractable value is embedded in their transactions, which means that even with a perfect crypto mempool, they may still be induced to leak decryption keys in exchange for a reward that is lower than the actual MEV value. This phenomenon can be referred to as "incentive decryption."  This scenario is not hard to imagine, as similar mechanisms like MEV Share already exist in reality. MEV Share is an order flow auction mechanism that allows users to selectively submit transaction information to a pool, where seekers compete to obtain the right to exploit MEV opportunities from that transaction. The winning bidder, after extracting MEV, will return a portion of the profits (i.e., the bid amount or a certain percentage of it) to the user.  This model can directly adapt to encrypted memory pools: users must disclose decryption keys (or partial information) to participate. However, most users are unaware of the opportunity cost of participating in such mechanisms; they only see the immediate returns and are willing to disclose information. There are similar cases in traditional finance: for example, the zero-commission trading platform Robinhood, whose profit model is to sell user order flow to third parties through "payment-for-order-flow."  Another possible scenario is that large builders may force users to disclose transaction content (or related information) under the pretext of compliance. Censorship resistance is an important and controversial topic in the Web3 space, but if large validators or builders are legally obligated (such as under the regulations of the U.S. Office of Foreign Assets Control, OFAC) to enforce a sanctions list, they may refuse to process any crypto transactions. Technically, users may be able to prove that their crypto transactions comply with censorship requirements through zero-knowledge proofs, but this adds additional costs and complexity. Even if the blockchain has strong censorship resistance (ensuring that crypto transactions are inevitably included), builders may still prioritize known plaintext transactions at the front of the block while placing encrypted transactions at the end. Therefore, those transactions that need to ensure execution priority may ultimately be forced to disclose content to the builders.  ## Other Efficiency Challenges  The encrypted memory pool will increase system overhead in several obvious ways. Users need to encrypt transactions, and the system also needs to decrypt them in some way, which increases computational costs and may also increase the transaction size. As mentioned earlier, handling metadata further exacerbates these overheads. However, there are also some efficiency costs that are not so obvious. In the financial field, if prices can reflect all available information, the market is considered efficient; whereas delays and information asymmetry can lead to market inefficiency. This is the inevitable result of the encrypted memory pool.  This type of inefficiency leads to a direct consequence: increased price uncertainty, which is a direct product of the additional delays introduced in the crypto mempool. As a result, transactions that fail due to exceeding the price slippage tolerance may increase, thus wasting on-chain space.  Similarly, this price uncertainty may also give rise to speculative MEV trading, which attempts to profit from on-chain arbitrage. It is worth noting that crypto mempools may make such opportunities more prevalent: due to execution delays, the current state of decentralized exchanges (DEXs) becomes more ambiguous, which is likely to lead to decreased market efficiency and price discrepancies between different trading platforms. Such speculative MEV trades can also waste block space, as they often terminate execution once undiscovered arbitrage opportunities are identified.  ## Summary  The original intention of this article is to outline the challenges faced by the crypto memory pool, so that people can redirect their efforts towards the development of other solutions. However, the crypto memory pool may still become a part of the MEV governance scheme.  One feasible approach is a hybrid design: part of the transactions is sorted through a cryptographic memory pool using "blind sorting," while another part adopts other sorting schemes. For specific types of transactions (such as buy and sell orders from large market participants who have the ability to carefully encrypt or fill transactions and are willing to pay higher costs to avoid MEV), a hybrid design may be a suitable choice. This design also makes practical sense for highly sensitive transactions (such as repair transactions targeting vulnerable smart contracts).  However, due to technical limitations, high engineering complexity, and performance overhead, encrypted memory pools are unlikely to become the "universal solution to MEV" that people expect. The community needs to develop other solutions, including MEV auctions, application layer defense mechanisms, and reducing final confirmation times. MEV will remain a challenge for some time in the future, requiring in-depth research to find a balance of various solutions to address its negative impacts.