The Blockchain Storage Problem: Is It Too Late?

TABLE OF CONTENTS
User profile photo
By Max
Estimated reading: 8mins
Blockchain storage

Have you ever wondered where all blockchain data is actually stored? From the first pizzas bought with Bitcoin to the latest NFT scam, someone’s carrying all that data to this day. The more blocks, the more secure. But is there a limit?

We’re already seeing its effects. It takes more and more requirements to run a full node. This doesn’t help the blockchain supply-demand, making transactions costlier or slower. 

Is it the answer to delete blockchain data for extra space, or is there a better solution to storage?

Quick Takes

  • The blockchain storage problem is a conflict between preserving data, making nodes available to everyone, and scaling.
  • Because blockchain applications aren’t entirely on-chain, the Web 3.0. relies on blockchain oracles and centralized servers. Digital assets don’t last forever (yet). But changing that worsens the storage problem if not solved first. 
  • There are specialized blockchains and storage coins to make data management more efficient. The underlying blockchain will require its own solutions, such as sharding or pruning.

Why Is There A Blockchain Storage Problem?

The largest public blockchains have sort of become a mixed bag of junk data. There are thousands of transaction blocks with little to no purpose other than allowing the secure validation of new ones. And it’s going to get worse once true mass adoption occurs.

One reason we can’t just delete the data is that transparency and preservation are important values for the crypto community. Even if you only keep the “important” blocks, it’s only a matter of time before you run into storage problems again.

This also debunks the ownership concept surrounding non-fungible tokens (NFTs). We assume these collections will exist forever. Well, if someone deletes the block or the blockchain paralyzes (like BnB did), they no longer exist.

Not to mention that the front end part of DeFi platforms is stored in centralized servers. Anytime, any provider could take down websites or manipulate/delete NFTs (e.g., the NFT you own redirects to a 404 not-found page). Digital assets shouldn’t be this fragile.

One reason blockchains don’t decentralize the entire Internet stack is storage. It takes 4-8 GB RAM and 2 TB SSD to run a full node on Ethereum, and the increased data will increase the requirements. Whether it’s because of cost or people being too lazy to buy equipment, Ethereum doesn’t have enough nodes for the volume it receives: about 10,000.

This tendency is called centralization, the opposite of why blockchain was invented.

How Blockchain Storage Works

There are two parts of the blockchain storage problem:

  • There’s more and more data every time and fewer nodes who have enough hardware capacity.
  • Blockchain can’t preserve its decentralization because the infrastructure layer not always is (AKA layer-0).

To understand why there’s no obvious solution, here’s how storage works.

Blockchain is essentially a shared database that records transaction information. When you store a copy of all blockchain data, you become a full node. Because there are many nodes with the same copy, no one can simply manipulate or delete this data.

Full nodes update the database following a consensus mechanism. Every new block adds to the storage needed, which starts from the first-ever transaction. One solution is to partition the chain (AKA sharding), but in practice no one has done it yet in 2023.

Simply put, more storage means fewer full nodes, more centralization, and lower security.

The second part is, we need web infrastructure to access blockchain. Both blockchain and the Internet are decentralized, but the way the latter is managed isn’t. There are entire crypto exchanges stored in servers, as well as NFTs and custodial-wallets.

The giant tech companies could disrupt the off-chain side of crypto platforms. DeFi dApps would still function but with lower volume and more token volatility. Internet infrastructure is a bottleneck for crypto adoption, although it can too be decentralized.

Examples Of Blockchain Storage Projects

Examples of blockchain storage projects

Thankfully, the blockchain storage problem is nothing new for developers. That’s why many have already spent years building systems to distribute storage. Not only are they available, but many have ranked within the top 100 cryptocurrencies for years.

InterPlanetary File System (IPFS)

IPFS is a peer-to-peer storing system with broader applications than the typical public blockchain. Its vision is to “preserve and grow humanity’s knowledge” with a resilient, open web. It’s strictly a file network, although projects like Filecoin use this infrastructure to monetize storage.

Any peer can upload information (e.g., websites), and whenever someone wants to access it, IPFS distributes that file from the nearest available peer. The more users access it, the faster and safer it is distributed. Files remain available as long as there’s one node and it doesn’t delete it (AKA garbage collection).

For example, if someone takes down Wikipedia, the IPFS link will still work. This preservation property partially contributed to the rise of NFTs. You can’t keep ownership of anything stored on centralized servers.

Filecoin (FIL)

Filecoin is a blockchain with an incentive layer for decentralized storage. Users pay FIL tokens to upload, store, or retrieve information from various storage providers. Anyone with an Internet connection and free disk space can earn fees by selling their space to the open market.

To participate, providers need to stake FIL and continuously submit proof of the stored information to avoid their tokens being slashed. This prevents providers from deleting data after getting paid.

Although they’re independent projects, both Filecoin and IPFS are created by Protocol Labs. FIL launched for ~$25 and 16M tokens in October 2020.

Storj (STORJ)

Like Filecoin, Storj is a blockchain-based token that powers the nodes of its storage system. Users pay with this token to buy storage for cloud-native apps, backups, video, streaming, and other use cases.

The biggest (or only) difference is that Storj is built on Ethereum and Filecoin on its own blockchain. 

The STORJ ERC-20 token launched in 2017 with ~50M tokens at ~$0.90 each.

The Graph (GRT) 

The Graph is a data-sorting platform that allows developers to fetch data from the blockchain in far less time. GRT first launched in 2020 for $0.03 and now trades a daily $10M to $45M.

There’s over 9B GRT as an incentive to keep data accessible, relevant, and up-to-date. The Graph achieves this with a complex structure of delegators, indexers, and curators.

Internet Computer (ICP) 

Internet Computer is a blockchain that deploys smart contracts and files directly on-chain. This may sound similar to the decentralized computing of Ethereum. The difference is, ICP decentralizes the entire Internet stack.

It doesn’t require traditional servers or web providers. It doesn’t rely as much on oracles as Ethereum dApps. ICP nodes can host everything from the website to dApp and the contracts. Its storage cost is several times lower than transactions on the most scalable blockchains.

ICP launched in May 2021 at $551 with 124M tokens, now trading a daily $10M to $25M.

Possible Blockchain Storage Solutions

The previous projects bring new storage features that hopefully blockchain developers can model. These don’t directly solve the storage problem of the blockchain, because it’s an application layer.

Storj decentralizes storage for its users, but it doesn’t directly reduce the validator requirements of Ethereum. Filecoin creates an efficient storage marketplace, but it’s for the Filecoin blockchain

Now, here are two possibilities that don’t centralize the blockchain:

  • Block Pruning. Like IPFS’ garbage collection, Ethereum and other blockchains could delete non-essential data from old blocks and transactions over time. This would slow down the size increase but also weaken the network if someone decides to manipulate those blocks— putting at risk the countless posterior blocks.
  • Sharding. The blockchain reduces storage by dividing different block groups, similar to the way you can divide a large zip file to upload/download. Validators also validate their partitions only, which acts as a mini blockchain that connects with the others. In the zip file analogy, that’s like having one computer extract one part rather than all, which is faster.

(Ethereum implemented a variant called Danksharding)

The Ethereum co-founder Vitalik Buterin sees storage as the main limitation for scalability. Not only it needs to be light, but easy enough for users not to get lazy and pay companies/pools to validate on their behalf. Only when the everyday user can run a full node without buying extra hardware, the storage problem might be solved.

FAQ

  • How big is Ethereum full node?

From the Genesis block to December 2022, it needs ~2 TB. That’s only required for validators or consensus clients. For execution clients or full nodes running Geth, it’s about 650 GB.

  • How large is a Bitcoin full node?

Bitcoin validators store about 483 GB from the Genesis block until May 2023. To run a full Bitcoin node, the storage required is only 7 GB of free disk space.

Ethereum needs more space than Bitcoin because of its complex smart contracts and faster block time.

  • Is blockchain cloud storage?

Blockchain nodes store copies locally and may use cloud storage services if the requirements are too high. Cloud storage isn’t recommended as it typically centralizes the network, although there are decentralized alternatives in development.

Join The Leading Crypto Channel

JOIN

Disclaimer:Please note that nothing on this website constitutes financial advice. Whilst every effort has been made to ensure that the information provided on this website is accurate, individuals must not rely on this information to make a financial or investment decision. Before making any decision, we strongly recommend you consult a qualified professional who should take into account your specific investment objectives, financial situation and individual needs.

User Avatar

Max

Max is a European based crypto specialist, marketer, and all-around writer. He brings an original and practical approach for timeless blockchain knowledge such as: in-depth guides on crypto 101, blockchain analysis, dApp reviews, and DeFi risk management. Max also wrote for news outlets, saas entrepreneurs, crypto exchanges, fintech B2B agencies, Metaverse game studios, trading coaches, and Web3 leaders like Enjin.

Search The Blog
Latest Video
Latest Youtube Video
Latest Podcast
Latest Podcast
Newsletter Subscribe
Share This Article
The LL Librarian

Your Genius Liquid Loans Knowledge Assistant