New Paradigm of Blockchain Data Access: The Rise of Indexers and Comparison of Mainstream Projects

2025-07-27 06:03:59

The Development of Blockchain Data Access: Introduction to Indexers and Related Projects

Data is the core of Blockchain technology and the foundation for developing decentralized applications ( dApp ). Although current discussions mainly focus on data availability ( DA ), which ensures that every network participant can access the latest transaction data for verification, there is another equally important yet often overlooked aspect: data accessibility.

In the era of modular Blockchain, DA solutions have become an indispensable component. These solutions ensure that all participants can use transaction data, enabling real-time verification and maintaining network integrity. However, the functionality of the DA layer resembles a bulletin board rather than a database. This means that data will not be stored indefinitely, but will be deleted over time, just as posters on a bulletin board are eventually replaced by new ones.

In contrast, data accessibility focuses on the ability to retrieve historical data, which is crucial for developing dApps and conducting Blockchain analysis. This is particularly important for tasks that require access to past data to ensure accurate representation and execution. Although discussions on data accessibility are less common, it is equally important as data availability. Both play different but complementary roles in the Blockchain ecosystem, and a comprehensive data management approach must address both issues to support robust and efficient Blockchain applications.

Since its inception, Blockchain has fundamentally changed infrastructure, driving the creation of decentralized applications such as games, finance, and social networks. However, building these dApps requires access to vast amounts of Blockchain data, which is both difficult and costly.

For dApp developers, one option is to host and run their own archival RPC nodes. These nodes store all historical blockchain data from the beginning, allowing complete access to the data. However, maintaining archival nodes is expensive, and their query capabilities are limited, making it difficult to retrieve data in the format developers need. While running cheaper nodes is an option, these nodes have limited data retrieval capabilities, which may affect the operation of the dApp.

Another approach is to use commercial RPC node providers. These providers are responsible for the costs and management of the nodes and provide data through RPC endpoints. Public RPC endpoints are free but have rate limits that may negatively impact the user experience of dApps. Private RPC endpoints offer better performance by reducing congestion, but even simple data retrieval requires a significant amount of back-and-forth communication. This makes them request-heavy and inefficient for complex data queries. Additionally, private RPC endpoints are often difficult to scale and lack compatibility across different networks.

Blockchain indexers play a crucial role in organizing data on the chain and sending it to databases for easier querying, which is why they are often referred to as "the search engines of the blockchain." They work by indexing blockchain data and making it readily available through a query language similar to SQL (using APIs like GraphQL). By providing a unified interface for querying data, indexers allow developers to quickly and accurately retrieve the information they need using standardized query languages, significantly simplifying the process.

Different types of indexers optimize data retrieval in various ways:

Full node indexer: These indexers run full blockchain nodes and extract data directly from them, ensuring data completeness and accuracy, but require significant storage and processing capacity.
Lightweight Indexers: These indexers rely on full nodes to retrieve specific data as needed, thereby reducing storage requirements but potentially increasing query time.
Specialized Indexers: These indexers are designed for certain types of data or specific Blockchains, optimizing retrieval for specific use cases, such as NFT data or DeFi transactions.
Aggregated Indexers: These indexers extract data from multiple blockchains and sources, including off-chain information, providing a unified query interface, which is particularly useful for multi-chain dApps.

Ethereum alone requires 3TB of storage space, and as the Blockchain continues to grow, the data storage of archive nodes will also increase. The indexer protocol deploys multiple indexers, which can efficiently index and quickly query large amounts of data, something that RPC cannot achieve.

The indexer also allows for complex queries, easy filtering of data based on different criteria, and extraction for subsequent analysis. Some indexers also allow for the aggregation of data from multiple sources, thus avoiding the need to deploy multiple APIs in multi-chain dApps. By being distributed across multiple nodes, the indexer provides enhanced security and performance, while RPC providers may experience interruptions and downtime due to their centralized nature.

Overall, compared to RPC node providers, indexers improve the efficiency and reliability of data retrieval while also reducing the cost of deploying a single node. This makes the Blockchain indexer protocol the preferred choice for dApp developers.

Building a dApp requires retrieving and reading Blockchain data to operate its services. This includes any type of dApp, including DeFi, NFT platforms, games, and even social networks, as these platforms need to read data first to execute other transactions.

DeFi protocols require different information to quote users specific prices, rates, fees, etc. The automated market maker (AMM) needs price and liquidity information about certain liquidity pools to calculate swap rates, while lending protocols need utilization rates to determine lending rates and the debt ratio for liquidation. It is essential to input this information into their dApp before calculating the rates executed by users.

GameFi requires rapid indexing and access to data to ensure users can play games smoothly. Only through lightning-fast data retrieval and execution can Web3 games match the performance of Web2 games, thereby attracting more users. These games need data such as land ownership, in-game token balances, and in-game operations. By using indexers, they can better ensure a stable data flow and stable uptime to guarantee a perfect gaming experience.

NFT markets and lending platforms need to index data to access various information, such as NFT metadata, ownership and transfer data, royalty information, etc. Quickly indexing this data can avoid browsing through each NFT individually to find ownership or NFT attribute data.

Whether it is the DeFi automated market maker (AMM) that needs price and liquidity information, or the SocialFi application that requires updates on new user posts, the ability to quickly retrieve data is crucial for the normal operation of dApps. With the help of indexers, they can efficiently and accurately retrieve data, thereby providing a smooth user experience.

The indexer provides a way to extract specific data from the raw Blockchain data (including smart contract events in each Block). This offers opportunities for more specific data analysis, thereby providing comprehensive insights.

For example, perpetual trading protocols can identify which tokens have high trading volumes and which tokens generate fees, thereby deciding whether to list these tokens as perpetual contracts on their platform. DEX developers can create dashboards for their products to gain insights into which liquidity pools have the highest returns or the strongest liquidity. They can also create public dashboards that allow developers to freely and flexibly query any type of data to be displayed on the charts.

Due to the availability of multiple blockchain indexers, identifying the differences between indexing protocols is essential to ensure that developers choose the indexer that best suits their needs.

The Graph is the first indexing protocol launched on Ethereum, which makes it easy to query previously inaccessible transaction data. It uses subgraph definitions and filters to collect subsets of data from the Blockchain, such as all transactions related to a certain DEX USDC/ETH pool.

Using index proof, indexers stake the native token GRT for indexing and query services, and delegators can choose to stake their tokens here. Curators can access high-quality subgraphs to help indexers determine which subgraphs to compile data for to earn the best query fees. In the process of transitioning to greater decentralization, The Graph will ultimately stop its hosting services and require subgraphs to upgrade to its network while providing upgraded indexers.

Its infrastructure allows the average cost per million queries to be 40 dollars, which is much lower than the cost of self-hosted nodes. Using file data sources, it also supports parallel indexing of both on-chain and off-chain data for efficient data retrieval.

The Graph's indexer rewards have been steadily increasing over the past few quarters. This is partly due to the increase in query volume, but also attributed to the rise in token prices as they plan to integrate AI-assisted queries in the future.

Subsquid is a peer-to-peer, horizontally scalable decentralized data lake that efficiently aggregates large amounts of on-chain and off-chain data, protected by zero-knowledge proofs. As a decentralized worker network, each node is responsible for storing data from specific Block subsets, accelerating the data retrieval process by quickly identifying the nodes that hold the required data.

Subsquid also supports real-time indexing, allowing for indexing before a block is finalized. It also supports storing data in formats of the developer's choice, facilitating easier analysis using tools like BigQuery, Parquet, or CSV. Additionally, subgraphs can be deployed on the Subsquid network without migrating to the Squid SDK, enabling no-code deployment.

Although still in the testnet phase, Subsquid has achieved impressive statistics, with over 80,000 testnet users, deployed over 60,000 Squid indexers, and more than 20,000 verified developers on the network. Recently, on June 3rd, Subsquid launched the mainnet of its data lake.

In addition to indexing, the Subsquid Network data lake can also replace RPC in use cases such as analytics, ZK/TEE co-processors, AI agents, and Oracles.

SubQuery is a decentralized middleware infrastructure network that provides RPC and indexing data services. It initially supported the Polkadot and Substrate networks and has now expanded to include more than 200 chains. Its working mechanism is similar to The Graph, which uses index proofs; indexers index data and provide query requests, while delegators stake shares to the indexers. However, it introduces consumers to submit purchase orders to ensure that the indexers' income is guaranteed, rather than the managers.

It will introduce SubQuery data nodes that support sharding to prevent constant synchronization of new data between each node, thereby optimizing query efficiency while moving towards greater decentralization. Users can choose to pay approximately 1 SQT token in computing fees for every 1000 requests or set custom fees for indexers through the protocol.

Although SubQuery only launched its token earlier this year, the issuance rewards for nodes and delegators have also increased in USD value, which represents a continuous increase in the number of query services offered on its platform. Since the TGE, the total amount of staked SQT has increased from 6 million to 125 million, highlighting the growth in network participation.

Covalent is a decentralized indexing network where Block Sample Producers (BSP) network nodes create copies of blockchain data through batch exports and publish proofs on the Covalent L1 Blockchain. This data is then refined by Block Result Producers (BRP) nodes according to established rules to filter out data that meets the requirements.

Through a unified API, developers can easily extract relevant Blockchain data in a consistent request and response format without writing custom complex queries to access the data. The CQT token, settled on Moonbeam, can be used as a means of payment to extract these pre-configured datasets from network operators.

The rewards of Covalent seem to show a general upward trend from the first quarter of 2023 to the first quarter of 2024, partly due to the increase in the price of Covalent token CQT.

Some indexers (such as Covalent) are general-purpose indexers that provide standard pre-configured datasets via API. While they may be fast, they do not offer flexibility for developers who need custom datasets. By using the indexing framework, it allows for progress

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

19 Likes