2023 marked significant shifts in technology and adoption for the Filecoin network. From the launch of the Filecoin Virtual Machine, to other developments across Retrievals and Compute, 2023 lay the foundation for Filecoin’s continued expansion. This blogpost will provide a summary of the notable milestones the Filecoin ecosystem reached in 2023, and in the later portion, growth drivers to watch for 2024.

TL;DR

2023 Retrospective:

Storage: Active deals reached 1,800 PiB, and storage utilization grew to 20%
FVM: FVM launch in March 2023 enabled FIL Lending (Staking) which supplied 11% of total collateral locked by Storage Providers; TVL broke USD 200M
Retrievals: Retrievability of Filecoin data improved, alongside notable releases from Saturn (3,000+ nodes, sub 60ms TTFB) and Station
Compute: Bacalhau amongst others, demonstrated real-world utility

Themes for 2024:

Compute, AI and DePIN networks: Synergistic growth of Filecoin together with physical resource & compute networks
Web2 Enterprise Data Storage: Led by strengthened offerings by teams such as Banyan, Seal Storage, and Steeldome
Continued DeFi growth: DEXes, Oracles, CDPs, spurred by service revenue coming on-chain

2023 Retrospective

To recap, Filecoin enables open services for data, built on top of IPFS. While Filecoin initially focused on storage, its vision includes the infrastructure to store, distribute, and transform data. The State & Direction of Filecoin, Summarized blog post shared an initial framework for Filecoin’s key components. This framework will serve as an anchor for discussing 2023’s traction.

1) Storage Markets: Active storage deals reached 1,800 PiB with storage utilization of 20%

In 2023, Filecoin’s stored data volume grew dramatically to 1,800 PiB, marking a 3.8x increase from the start of the year. Storage utilization grew to 20% from 3%. Currently, Filecoin represents 99% market share of total data stored across decentralized storage protocols (Filecoin, Storj, Sia, and Arweave).

Growth in Active Storage Deals was driven by two factors:

1) Storing data was easier in 2023. Continued development across on-ramps such as Singularity.Storage, NFT.Storage, and Web3.Storage increased Web3 adoption. Singularity alone onboarded 180 plus clients and 270 PiB of data. This growth was enabled by advances in its S3 compatibility, data preparation, and deal making.

2) Large dataset clients grew exponentially in 2023. Over 1,800 large dataset clients onboarded datasets by the end of 2023, from an initial base of 500 plus clients. 37% of these clients onboarded datasets exceeding 100 TiB in storage size.

In 2023, key clients onboarded include UC Berkeley, Internet Archive, and The Victor Chang Research Institute. This growth supports a thriving storage market that serves as the foundation for retrieval and compute services currently being built.

Source: DeStor

2) Retrievals: Greater reliability for Filecoin Retrievals, alongside releases from Saturn & Station

Filecoin’s retrieval capabilities were bolstered by improvements both in its tooling and offerings. Several teams, such as Titan, Banyan, Saturn and Station, are laying the groundwork for new use cases to be anchored into the Filecoin economy, including decentralized CDNs and hot storage.

Saturn: A Decentralized CDN

Saturn is a decentralized CDN network built on Filecoin, that seeks to address the need for application-level retrievals. The Saturn network currently has over 3,000 nodes distributed across the globe, enabling low-latency content regardless of location.

Distribution of Nodes: 35% in North America, 34% in Europe, 24% in Asia, 7% RoW
Source: Saturn Explorer as of January 08, 2024

Across 2023, Saturn reduced its effective Time-to-First-Byte (median TTFB) to 60 milliseconds. This makes Saturn the fastest dCDN for content-addressable data, with TTFB remaining consistent across all geographies. Saturn was also capable of supporting 400 million retrieval requests on its busiest day of the year.

At the end of 2023, Saturn launched a private beta for customers (clients include Solana-based NFT platform Metaplex).

Station: A Deploy Target for Protocols (Enabling Spark Retrieval Checks)

Station, a desktop app for Filecoin, was launched in July 2023. Station is a deployment target for other protocols allowing DePIN networks, DA layers, and others to run on a distributed network of providers.

Station’s first module, Spark, is a protocol for performing retrieval checks on Storage Providers (SPs). Spark helps establish a reputational base for SP data retrievability, and supports teams looking to provide a hot storage cache for Filecoin. Since launch in Nov 2023, Spark has grown to 21 million daily jobs on 10,000 active nodes as of January 2024.

3) Filecoin Virtual Machine: The FVM launch introduced a new class of use cases for the Filecoin Network. Early DeFi adoption broke $200 million in TVL.

The Filecoin Virtual Machine (FVM) launched in March 2023 with the EVM being the first supported VM deployed on top. FVM brought Ethereum-style smart contracts to Filecoin, broadening the slate of services anchoring into Filecoin’s block space. Two areas of early adoption have been in liquid staking services (led by GLIF and other DeFi protocols) and micropayments via the FVM.

Liquid Staking

One of the core economic loops in the Filecoin economy is the process of pledging, where SPs put up collateral to secure capacity and data on the network. Prior to the FVM, borrowed Filecoin collateral was sourced through managed offerings from operators like Darma Capital, Anchorage, and CoinList. Post-FVM, roughly a dozen staking protocols have launched to grow Filecoin’s on-chain capital markets.

In aggregate, FVM-based protocols supply almost 11% of the total locked collateral (approx. 19 million FIL) on the network, giving yield access to token holders, and increasing the access to capital for hundreds of Filecoin SPs. From Filecoin’s collateral markets alone, the ecosystem has broken past 200 million in TVL.

Payments

Adjacent to the core storage offering on Filecoin, new services are being built that anchor into Filecoin’s block space. As mentioned in the Retrieval Markets section, two notable services (Station and Saturn) have actually started leveraging FVM for payments in 2023.

To date, Station users have completed 161 million jobs with more than 400 addresses receiving FIL rewards. Saturn averaged over 2,000 earning nodes in 2023 with 448,905 FIL disbursed to date.

4) Compute: Traction for Decentralized Compute Networks

Filecoin’s design enables compute networks to run synergistically on Filecoin’s Storage Providers. Sharing hardware with compute networks is also valuable to the Filecoin network: (1) sharing allows Filecoin to offer the cheapest storage by running side-by-side with compute operations, and (2) it brings additional revenue streams into the Filecoin economy.

Two key developments made running compute jobs on Filecoin nodes:

Sealing-as-a-service: Sealing-as-a-service is the process by which Storage Providers (SPs) can outsource production of sealed sectors to third-party marketplaces. This gives SPs greater flexibility in operations and reduces costs of sector production. One marketplace, Web3mine, has thousands of machines participating in its protocol offering cost savings to SPs of up to 70%. On top of the cost savings, the infrastructure built may also eventually benefit SPs by allowing them to leverage their GPUs for synergistic workloads (e.g. compute jobs)
Reduced Onboarding Costs: Supranational shipped proof optimizations reducing sealing server cost by 90% and overall cost of storage by 40%

On top of these developments, 2023 saw emerging compute protocols building in the Filecoin ecosystem. Two notable examples:

Distributed compute platform Bacalhau demonstrated real-world utility among Web2 and DeSci clients. Most recently, the U.S. Navy chose Bacalhau to assist them in deploying AI capabilities in undersea operations. Bacalhau is a platform agnostic compute platform intended to run on Web3 and Web2 infrastructure alike. Launched in November 2022, Bacalhau’s public network surpassed 1.5 million jobs and in some cases slashed compute costs by up to 99%

Source: Bacalhau

Up-and-coming compute networks like Io.net allow ML engineers to access a distributed network of GPUs at a fractional cost of individual cloud providers. Io.net recently incorporated 1,500 GPUs from Filecoin SPs — positioning Filecoin providers to offer their services to Io.net’s customer base. Io.net has over 7,400 users since its launch in November 2023, serving 15,000 hours of compute to users.

Source: io.net

2024 Outlook

2024 will be a critical growth year for Filecoin as groundwork laid in 2023 comes to fruition. Native improvements to storage markets, greater speed of retrievals, new levels of customizability & scalability brought by FVM and Interplanetary Consensus (IPC), all expand the universe of use cases that Filecoin can address.

In a Web3 climate where there is substantial attention on DePIN (and the tying of real world services with Web3 capabilities) these changes will be critical building blocks for even better services. Here are three themes to look for in 2024:

1) Synergies with Compute, AI and other DePIN networks

In 2024, foundational improvements to the network will substantially improve Filecoin’s ability to compose with other ecosystems.

Fast finality allows better cross-network interactions with app chains in other ecosystems (e.g. Cosmos, Ethereum, Solana).
Customizable subnets allow for novel types of networks to form on top of Filecoin such as general purpose compute subnets (e.g. Fluence) and storage pools (e.g. Seal Storage).
Hot storage allows for broader use case support including serving data assets for physical resource networks (e.g. WeatherXM/Tableland), caching data for compute networks (e.g. Bacalhau), and more.

This is all scratching the surface. As the Web3 space and DePIN category grows, Filecoin is well positioned to support new communities that form given its 9 EiB of network capacity and flexibility. There exists a sizable opportunity within physical resource networks producing high amounts of data, such as Hivemapper (over 100M km mapped), and Helium (1 million hotspots globally). Compute networks are also a likely growth area, given the backdrop of a GPU shortage (particularly for AI purposes) in traditional cloud markets.

Source: Messari

2) Focused Growth in Web2 Enterprise Data Storage

Web2 enterprise storage is a unique challenge for decentralized networks – requirements from these customers are not easily supported by most networks. Typical requirements from enterprise clients can include end-to-end encryption, certification for data centers, fast retrievals, access controls, S3 compatibility, and data provenance/compliance. Crucially, these requirements tend to differ across segments and verticals, which means that a level of adaptability is required. Filecoin’s architecture enables it to layer on support for the features these customers need.

A few teams worth keeping an eye on:

Banyan: Banyan simplifies how enterprise clients integrate with decentralized infrastructure by bundling hot storage, end-to-end encryption, and access controls, on top of a pool of high-performing storage providers. With the Filecoin network, Banyan provides content-addressable storage, which it plans to complement with hot storage proofs by utilizing FVM. This implementation makes Banyan compatible not only for enterprise, but DePIN and compute networks.

Seal: Seal has established itself as one of the best storage onramps in the Filecoin ecosystem, and is responsible for onboarding several key clients onto the network, such as UC Berkeley, Starling Labs, the Atlas Experiment, and the Casper Network. The team has been one of the driving forces in enterprise adoption to date, and most recently has achieved SOC 2 Compliance. In 2024, they plan on launching a subnet to enable a market for enterprise deals. On the back of their enterprise deal flow, they are positioned to bring petabytes of data into the network over the coming year via their new market.

Steeldome: Steeldome offers enterprise clients seeking data archival, backup and recovery with an alternative that is cost-competitive, efficiently deployed and scalable. It does so by combining Filecoin in its stack with Web2 technologies, allowing a fuller feature set to complement Filecoin’s cost-effective and secure archival storage. The Steeldome team has succeeded in onboarding clients across insurance, manufacturing, and media. In 2024, they plan to continue that trajectory, while offering a managed service for Storage Providers.

3) Greater On-chain DeFi activity

There is likely to be continued activity in the on-chain economy with an increase in the number and type of DeFi protocols on Filecoin.

The first protocols will increase service revenues (from Storage, to Retrievals, and Compute) coming on-chain. As previously described, more services are coming online in the Filecoin network, and are utilizing FVM for payments (e.g. Saturn, Station).
Key releases in 2023, including SushiSwap going live in Nov 2023, and the UniSwap community’s approval of integrating on FVM will lead to more diverse DeFi services coming on-chain. This will include CDPs (Collateralized Debt Positions), and Price Oracles (e.g. Pyth), among others.

Final Thoughts

In 2024, the Filecoin network will experience greater adoption, particularly by Compute, AI and DePIN networks, as well as targeted enterprise verticals. This adoption brings on-chain service revenue and supports the growth of DeFi activity beyond collateral markets. Continued improvements on storage markets, retrievability driven by hot storage proofs and CDN networks, as well as releases by FVM and IPC will enable the teams building on Filecoin to drive this next stage of growth.

To stay updated on the latest in DePIN and the Filecoin ecosystem, follow the @Filecointldr handle.

This blogpost is co-authored by Savan Chokkalingam and Nathaniel Kok on behalf of FilecoinTLDR. Many thanks to HQ Han and Jonathan Victor for reviewing and providing valuable insights and to all the ecosystem partners and teams for their timely input.

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.

Filecoin’s larger roadmap aims to turn cloud services into permissionless markets on which any provider can offer their services. The network started with Storage markets, with the Mainnet launch in October 2022. More recently, the Filecoin Virtual Machine (FVM) was introduced to bring smart contract functionality onto the network. This allows for user programmability around key services on the Filecoin network: which includes Large-scale Storage, and soon, Retrievals.

In this post, we dive into the Retrieval markets that Filecoin is developing and one of its lighthouse projects. We will cover the following topics:

Filecoin’s Retrieval Markets and the Retrieval Markets Working Group (RMWG)
Content Delivery Networks (CDNs) and the role of Project Saturn
Saturn’s approach to a decentralized CDN and its traction to date
What’s next for Saturn

Filecoin’s Retrieval Markets and the RMWG

As covered earlier in our previous post, Filecoin seeks to build open services for data, which consists of three main pillars (Storage, Retrieval, and Compute-over-Data). Storage has been a key emphasis for Filecoin from 2020 to 2022 — it has emerged as the largest decentralized storage network to date, with over 1,170 PiB of data stored, and 200,000+ users ranging from Opensea to the Internet Archive. The remaining pillars of Retrieval and Compute-over-Data have been in development since 2022, with working groups (open for any individual or entity to join) organized around building these markets. The working groups encourage modularity and often consist of different teams that tackle different pieces of the puzzle.

Key components of Filecoin’s roadmap for open data services

The Retrieval Market Working Group (RMWG) is centered around building a decentralized CDN (Content Delivery Network) for the Filecoin ecosystem. Over 15 teams (such as Magmo, Ken Labs, Protocol Labs, and more) are contributing to tackling technical challenges in the space, from enabling ultra-fast payments to data transfer protocol enhancements to crypto-economic models for data retrieval. The following are building blocks the RMWG has been organizing itself around since H1 2022 and is built off an envisioned retrieval flow that can interact freely with Filecoin’s storage markets.

Building blocks of a functional retrieval markets identified by the RMWG

Even in its R&D stage, projects in the RMWG are already serving 160 million daily retrieval requests and more than 2 PB of data per month. Collectively, these projects will seek to enable a decentralized CDN that can serve not just the web3 space, but the web2 market as well.

CDNs and the role of Project Saturn

Content delivery networks are a key part of the Internet’s infrastructure. Groups of servers work together to provide fast delivery of internet content, from static web pages to YouTube videos. Incumbent CDN providers include players such as Cloudflare, Akamai, and Fastfly. Businesses pay for these services instead of the end-user, which means that service consistency, coverage, and pricing are critical.

Content Delivery Networks today are highly centralized and dominated by big players

The CDN market today is highly centralized and dominated by a few big players. Only 7 CDN providers serve over 80% of market needs. This brings about sizable concentration risk in the event of network failures (e.g. CloudFlare outage in 2022), and higher latencies in regions far remote from the closest data centers (e.g. Africa).

A distributed model of smaller CDNs in various regions could effectively solve these problems, but economies of scale have prevented smaller, distributed CDNs from challenging incumbent providers (capital outlay could reach up to billions per year). Delivering web content better than incumbent providers will open up a significant commercial opportunity. The global CDN market size accounts for US 20Bn in 2022 and is expected to reach around 100Bn by 2032 (not counting new web3-based use cases such as NFTs). This is where Project Saturn comes in.

A web3 CDN can potentially overcome this challenge by allowing anyone to contribute resources for content retrieval (provided they fulfill minimum criteria) in a network. This shifts the burden from a single company to thousands (or more) of companies supporting the network, reducing barriers to entry. This is where Project Saturn comes in. Project Saturn is a decentralized CDN network built on Filecoin, that seeks to enable reliable, performant, and economic retrieval of content on the Internet. It is one of the key projects in the RMWG, with a public launch in November 2022. Saturn seeks to achieve the following:

Democratize the CDN market, by allowing anyone to serve as a Saturn node operator in return for crypto-incentives. Nodes can join in a permissionless manner, allowing for multiple companies or individuals to contribute towards a retrieval network (think franchising), which leads to a wider and more distributed footprint
Performant retrievals, with under 100ms TTFB, high network bandwidth, and low latency across all geographies, owing to a high density of nodes being distributed across each continent. While this does not exist today, it can potentially be achieved given a wider geographic distribution of nodes
No single point of failure unlike traditional CDN networks
Data integrity and authenticity by leveraging content-addressability. Project Saturn is the only decentralized CDN that is natively compatible with content-addressing

Saturn approach and traction to date (Aug 2023)

The data below is accurate as of August 2023, unless otherwise stated. Data for number of Active Nodes are accurate as of November 2023, owing to a upgrade in October 2023 that removed multi-noding behavior in the network, while keeping TTFB performance stable.

While Saturn’s ambition is to serve as a credible alternative to traditional CDN networks, its near-term goal is to effectively fulfill the billions of requests received each week for content-addressed data on Filecoin and IPFS. This is currently being fulfilled by IPFS Gateway, which serves as a key benchmark for Saturn as it improves its network capacity and performance.

Flow chart for how network actors in Saturn enable retrievals from Filecoin and IPFS

Saturn’s approach involves four main network actors in enabling retrievals from Filecoin and IPFS:

Node operators offer their hardware and resources to the Saturn network by running Saturn nodes in different geo-locations around the world. They are rewarded based on how many bytes they serve to clients over each payment epoch. Saturn nodes join the network by registering with the Saturn Orchestrator. The network of Saturn L1s provides a huge geographically distributed cache of content-addressed data for Saturn clients
Saturn Orchestrator manages the membership of node operators in the Saturn Network and facilitates the payment process to these nodes. This is a key function in democratizing data retrievals while ensuring that qualified participants enter the market. Over time, the aim is for the orchestrator to run entirely on theFilecoin Virtual Machine (FVM)
Clients: Network users make requests for content from the Saturn network. The Client is the device used to make the request. Clients make HTTP requests to the Saturn network and get back CAR files, allowing clients to verify the file incrementally. When a Saturn L1 doesn’t have a file in its cache, it “cache-misses” to wherever the file is stored in either the IPFS Network or the Filecoin network, and returns it to the client
Customers use the Saturn Network as a CDN to accelerate their content to their users. Saturn customers can accelerate their content to a large number of Saturn nodes around the world to create a performant experience for end users

Significant developments have been made on Project Saturn to date. Following its public launch in November 2022, Saturn now has 80ms Time-to-first-byte (TTFB) at the 50th percentile, serves 30% of mirrored traffic from IPFS.io via the Bifrost Gateway, and has launched a verifiable node reward payout system on FVM.

It also has made significant headway in developing a network that is geographically diverse, capable of handling high-volume requests, and able to deliver content in a performant manner (low time-to-first byte). Since its public launch (just 8 months), Saturn has achieved:

Over 2,000 global points of presence (across 59 countries)
Capacity to serve 478 Million requests each day (in July 2023)
80 milliseconds time-to-first-byte (TTFB) for IPFS content

1) Over 2,200 retrieval providers worldwide (comparably distributed to traditional CDN providers)

Over 2,200 retrieval providers are currently on Saturn contributing to network bandwidth. This is a strong 11.8% MoM (month-on-month) growth, starting with only 662 nodes at the end of 2022. As a point of comparison, Filecoin’s storage markets grew by 21% MoM in its first 6 months, with approximately 3,500 storage providers on the network today (the largest in the web3 storage space)

This is comparably distributed to traditional CDN providers today. Akamai, the largest CDN globally, with a 35% market share, also has over 4,000 points of presence globally, while the next closest player (Alibaba) has only an estimated 2,800 points of presence (with the majority in China).

This speed of growth attests to the accessibility of serving as a retrieval provider on the Saturn network. It takes only 4 terabytes (TB) of storage and Saturn’s open-source software to run a Saturn CDN node (considerably less resource-intensive than being a storage provider on Filecoin). Saturn will allow more individuals to participate in Filecoin’s decentralized markets for data services.

Source: Saturn Explorer Nov 2023 (Explore Project Saturn’s live representation of nodes here)

Participation across geographies remains diverse: with the most nodes in Europe (800+), followed by North America (600+), and Asia (500+). Median TTFB remains consistently low across all continents, with Europe, Asia, Oceania, and South America experiencing sub-100 ms TTFB.

Distribution of these nodes is important, as it allows for the lowest possible distance between clients and nodes, translating to low latency for end-users (overcoming the speed of light problem experienced with traditional CDN providers). Saturn’s permissionless and crypto-incentivized attributes allow for a more ‘elastic’ supply to match with fast demand growth in developing regions like Asia and Africa, which are currently experiencing those latency issues today.

2) Saturn served an average of ~10.3 Billion requests monthly across 2023

Saturn has a network capacity of around 25+ terabits per second (approximately 10% of the network capacity of Cloudflare). An average of 10.3 Billion requests are served monthly across 2023, with 3.7 Million Gigabytes of monthly bandwidth served. Over 478 million daily requests were being handled as of the end of July 2023, which is close to 50% of IPFS Gateway’s daily requests in the same time frame. Despite its progress thus far, there remains room for stabilization in Saturn’s network capacity.

Source: Project Saturn data

3) Time-to-first-Byte is already under 80 milliseconds

Speed is an area where Saturn has shown significant results; the median TTFB (time-to-first-byte) is already under 80 milliseconds. Typically, a good TTFB lies below 100 ms for static content, and 200–500 ms for dynamic content. Saturn today is already the fastest content-addressable CDN globally with 80 ms TTFB and has further headroom for improvement as the network continues to become denser. Aside from Saturn, there also exist parallel developments to drive improved retrieval performance in the Filecoin network. This includes projects such as Rhea, which is seeking to optimize IPFS Gateway performance.

Source: Project Saturn data

What’s next for Saturn

Since its public launch, Saturn has achieved significant progress as an open-source, community-run CDN network. Moving forward, the team looks to continue pushing towards better TTFB speeds, while improving performance correctness and latency. Towards the end of 2023, Saturn looks to achieve further milestones. These include serving 100% of IPFS.io traffic, implementing metering and billing on the customer demand side, and launching a web app to enable customer self-onboarding who want to accelerate content with Saturn.

Upcoming milestones in Project Saturn’s roadmap

You can keep up to date with Project Saturn, and other projects within the RMWG here. Data in this post is accurate as of 31st August 2023 unless otherwise stated.

Many thanks to the amazing HQ Han, Jonathan Victor, Alexander Kintsler, and the Project Saturn team for their input in publishing this piece.

🇨🇳Filecoin检索市场发展更新：聚焦Filecoin Saturn

Editor’s Note: This blog is a repost of original content from IOSG Ventures. IOSG Ventures is a community-friendly and research-driven early-stage venture firm. This blog post represents the independent views of the author, who has given permission for re-publication.

Projects Mentioned: Filecoin, GLIF, STFIL, Collectif, Web3Mine, Stacks

TLDR:

The programmable layer on FIL, the FVM, allows for trustless marketplaces to be built
This calls for a need for a marketplace that currently exists off-chain, i.e. FIL borrowing to be brought on-chain, where FIL token holders lease their FIL to Storage Providers (which some call “miners”) who borrow FIL from the pool(s)
FIL borrowing is essentially taking cash forward on the future block rewards accrued by the Storage Providers, and this makes FIL block rewards from data storage more capital-efficient
There are obvious trade-offs to be made between centralization-capital efficiency- and security in protocol design
The market size for borrowing FIL is reducing over time but the introduction of stablecoins, etc. Can unlock unique projects to be built on top of these protocols

The launch of a programmability layer on a seasoned blockchain generally comes with a lot of excitement. The launch of Stacks (STX) on the Bitcoin blockchain brought a new paradigm of thinking amongst the community built around it.

A very similar narrative happened with the launch of the FVM on Filecoin. The robust Filecoin community now has to see its vision through a completely different lens. A lot of open problems that the ecosystem had could now be addressed. Creating trustless marketplaces via programmability was a key piece of the puzzle.

Liquid staking on Filecoin was the first “Request-for-build” from the Filecoin ecosystem during the launch of FVM and was given high importance. To understand why this is, let us first understand how the economics of Filecoin work.

How Filecoin Incentives Work

Unlike an Ethereum validator, there is no one-time staking in Filecoin. Every time a Storage Provider (SP) provides services, they need to put up a pledge amount in FIL. This pledge is required to seal the sectors and store the sealed sector in the SP. Such a structure ensures that the SP is going to store data for their clients for the period of the deal that they agree to, in exchange for rewards. Rewards are distributed via PoSt (Proof of Space-Time), where the SPs are rewarded for proving that they have the right client data stored.

SPs are selected via a leader selection mechanism called DRAND. DRAND chooses the leader with some initial requirements and also the % of raw byte power of the network controlled by the SPs.

SPs will have to keep ramping up raw byte power (RBP) to be chosen as the leader to “mine” a block and receive incentives. This helps the SP subsidize their storage costs.

Although there are many more factors that govern the supply rate of these incentives, the baseline is that for storage providers/miners, to maximize their bottom line will have to try to maximize RBP and onboard (and renew) more deals.

This creates a positive loop for the Filecoin network

Economics of a Storage Provider

When an SP receives block rewards, these rewards are not liquid. Only 25% of the rewards are liquid, and the remaining 75% of the block rewards vest linearly over 180 days (~ 6 months). This poses a problem for SPs. The rewards, which are supposed to be an SP’s operating income, are now delayed payments for as long as the SP onboards/renews deals.

Let us look at the SP balance of the top miner in the network (as of 6th August 2023)

Storage Provider Account Balance Change, Source: FilFox

When you look at the graph, one can see that only about 1% of the rewards (or operating income) of the SP is actually liquid. If this SP now wants to either:

Pay for operating income
Upgrade hardware
Pay for maintenance
Or onboard/ renew deals

The SP will have to either borrow fiat currency or borrow FIL from third parties just to make up for these “delayed” payments.

At the moment many storage providers (miners) in the network rely on CeFi lenders such as DARMA Capital, Coinlist, and a few others. As these are loan products, storage providers will have to go through KYC and a strict audit process to be able to borrow FIL.
When we look at the map below, we can see a very high concentration of Filecoin SPs in Asia, and with centralized providers being mostly in the West, it is very hard for them to underwrite FIL loans to Asian miners with favorable terms, and most Asian miners/ SPs don’t have access to such providers.

Source: Observable Dashboard by @jimpick

This becomes a hindrance for new SPs to come in and participate in the system, and existing SPs can scale their business only as much as the total FIL pool size of these CeFi lenders

So why not just borrow fiat currency from a bank? With FIL being a volatile asset, it will pose additional capital management challenges for SPs who borrow.

To solve this problem, there needs to be a marketplace for FIL lenders (who could be holders of FIL) and FIL borrowers (SPs)

Filecoin Staking

With the launch of the FVM, this marketplace idea can come to fruition. FIL lenders/stakers can now put their FIL to work and SPs can borrow from this pool (either in a permissioned or permissionless manner) all governed by smart contracts.

There are many players in the ecosystem who are already building this and waiting to launch in the coming months.

More than calling such marketplaces staking protocols, it is a lot closer to a lending protocol by the nature of this business.

Some base features of such a FIL lending product would be:

Lenders deposit idle FIL and receive a “liquid staking” token
Borrowers (SPs) can borrow from the pool against collateral that exists in the SP actor (Essentially Initial Pledge + Locked Rewards)
Borrowers will make interest payments every week, or any specified time period, by signing over the “OwnerID” of the SP to a smart contract
Lenders receive the interest (minus protocol fees) as APY either via a rebase token or a value accrual token

Existing players in the system include:

Source: Starboard Leaderboard

Different liquid staking protocols have different schools of thought when it comes to borrowing:

Over/ Fully collateralized vs. Undercollateralized

In Over-collateralized or fully collateralized models, the debt-to-equity ratio is always going to be less than or equal to 100%. This means that if my SP balance is say 1000 FIL, I can only borrow up to 1000 FIL (depending on the protocol rules as well). This can easily be coded into smart contracts and default risk is built in. This allows for greater transparency and also security to the stakers (lenders). Another advantage of such a model is that it allows for permissionless borrowing as well. This is where the product blocks more like Aave/ Compound rather than a Lido or RocketPool.

In an uncollateralized model, the lenders are bearing risk while the risk is being managed by the protocol. In such a model, risk modeling is complex math that cannot be baked into smart contracts, and needs to be off-chain which sacrifices transparency. But, since there is leverage involved, it makes the system a lot more capital-efficient for the borrower. The more permissionless a leveraged system will get, the more risk the lenders bear and this would call for a very robust and dynamic risk management model that is run by the protocol developers

The trade-offs being made are:

Capital efficiency vs. staker risk
Capital efficiency vs. transparency
Lender risk vs. borrower entry to the system

Single Pool vs Multi-Pool

Protocols can also opt to build a multi-pool model where lenders can choose to stake FIL in different pools with different risk parameters. This allows for risk to be managed on-chain, but liquidity will be fragmented. In a single-pool model, risk will have to be maintained off-chain. Overall the trade-offs will still remain the same as the ones mentioned above.

Trade-off: Liquidity fragmentation vs Risk management transparency

Risks

In an overcollateralized model, even if the miner gets slashed multiple times, as soon as the Debt-to-equity ratio hits 100% the miner will get liquidated and the stakers will be comparatively safe

In an undercollateralized model, the borrowers can be penalized for failing to prove sectors. There are many more faults in failing to prove data storage rather faults in the consensus itself. This is more common in Filecoin than in other general-purpose blockchains because there is an actual commodity that is being stored from an off-chain entity. This will affect the collateral value and lever the borrower more. Liquidation thresholds will have to be set very carefully in such a model.

Filecoin Miner Penalties (90 days). Source: Starboard

Comparison of Key Market Players

What about Ethereum Staking/Lending protocols entering the market?

In the Filecoin ecosystem, unlike the Ethereum ecosystem, the nodes (Miners/Validators/SPs) are responsible for much more than general uptime. They are supposed to market themselves to be chosen as SPs, and regularly upgrade their hardware to support more storage, seal, store, maintain, and retrieve data. Filecoin storage and reward mining for SPs is a full-time job.

Unlike an Ethereum validator, there is no one-time staking in Filecoin. Every time an SP provides storage to a client, they need to put up a pledge. This pledge is required to seal the sectors and store the sealed sector in the SP. Storage provision on Filecoin is a very capital-intensive process and this discourages many new SPs from participating in the network and existing SPs from staying and contributing to the network.

Since the participants on the borrow side are SPs only it is also going to be intensive for newcomers in the Filecoin ecosystem to bootstrap borrower trust.

The mechanics of Filecoin alone don’t allow Ethereum staking or even lending protocols to deploy easily on the FVM.

Economics of the Protocol

Is there enough FIL in the market to supply for lending?

As of August 6th, 2023, there are about 264.2 million FIL circulating that are not committed as sector pledges or rewards that are to be released. This can be counted as the total amount of FIL that can be staked by the lenders into the pool

Source: FilScan

Is there enough demand for borrowing?

While FIL borrowing is essential to SPs, what are they actually borrowing? They are taking a forward payment on their locked-up rewards in an overcollateralized model, and in the undercollateralized model, they are taking a forward payment on future rewards.

Looking at the graphs above, we can see that the total locked rewards are about 223M FIL, and the supply can match the demand. The demand-to-supply ratio is almost 84%. This shows even power dynamics on either side, and either side cannot squeeze the other on interest rates/ APY.

What does the future look like?

Estimating the market for future demand of FIL for borrowing is essentially the amount of FIL that will be released in the future as rewards.

The good folks at Messari ran a simulation of FIL circulating supply with a 3-year and a 50-year forecast using different cases.

Source: Messari

According to the top left graph, considering a conservative scenario where there is low onboarding of data and only 10% of the total deals are renewed, the new reward emissions over 3 years are close to around 100M FIL and in an aggressive scenario where there is a high amount of data onboarding and 70% of existing deals renewed, the extra rewards come to about 200M FIL

So one can expect a market size of somewhere between 100M — 200M FIL over the next 3 years. At the current price of FIL (Aug 6th), which is $4.16, there could be a borrowing TAM of about $400M — $800M. This could be counted as the TAM of the product’s borrow side.

On the supply side, in the conservative estimate, there can be about 300M FIL that will be emitted, and in a more aggressive scenario, the circulating supply is simulated to be around the same as it is today. Why? It is because if more deals are being onboarded and renewed, there will be a lot more FIL locked-in sector pledges.

In the more aggressive scenario, the demand is going to outweigh the supply and the interest charged can be higher in this competitive market.

Where I think this can go

Amongst the different designs, there need not be a winner-takes-all type of model. Intuitively, the long-term winner (by TVL) is generally the protocol that is built most safely. Very much like Lido in the Ethereum ecosystem. I for one am biased towards safer structures more than optimizing for 2–3% more yield, and I think FIL whales would also prioritize capital safety over a slightly higher yield.

This is after considering the amount of penalties miners pay for not being able to prove space-time.

From the borrower (SP) end, the SP could borrow from different protocols for different purposes. If the SP already has a lot of collateral and doesn’t need to lever up to pay for opex, then the safer, overcollateralized model will work better, since it is safer. Whereas if I am a newer SP with a lot of sectors to be pledged I would borrow with leverage from an undercollateralized pool.

After studying the above models, we can see:

Staking in Filecoin is important to bridge the supply and demand for FIL in the ecosystem. The FVM has recently been released allowing for a lending marketplace to exist. Although the problem is real, the FVM release was probably too late for most FIL staking/lending protocols as the pie (mining rewards) is decreasing over time making it a niche market.

However, a few fascinating use cases can emerge on top of these staking protocols. With the introduction of stablecoins, the rewards can be taken as cash forwards. Something similar to what Alkimiya is building on Ethereum. This can result in the injection of new capital into the Filecoin ecosystem and also increase the TVL in these protocols.

Ethereum’s and Filecoin’s tech is different, their miners are different, their developers are different, their apps are different, and hence their communities. And for staking in particular, with every miner being “non-fungible” bootstrapping the demand side becomes a BD exercise and the success of it is directly proportional to the protocol’s reputation in the community.

Filecoin staking is a critical solution that needs to be built to get more SPs in the system, for retail to put their capital to work, create greater economic incentives as an ecosystem to attract more developers, and build useful products to build a positive flywheel. To know more beyond staking in the Filecoin ecosystem and the criticality of the FVM you can read this previous piece we published.

There are many more open problems to be solved in the Filecoin ecosystem, but we are positive that the Filecoin Ecosystem is working in the right direction to achieve its vision of storing humanity’s data in an efficient system.

🇨🇳Filecoin质押经济学

Editor’s Note: This article draws heavily from David Aronchick’s presentation at the Filecoin Unleashed Paris 2023. David is the CEO of Expanso and former head of Compute-over-data at Protocol Labs which is responsible for the launch of the Bacalhau project. This blog post represents the independent view of the creator of the original content, who has given permission for this re-publication.

The world will store more than 175 zettabytes of data by 2025, according to IDC. That’s a lot of data, precisely 175 trillion 1GB USB sticks. Most of this data will be generated between 2020 and 2025, with an estimated compound annual growth of 61%.

The rapidly growing data sphere broadly poses two major challenges today:

Moving data is slow and expensive. If you attempted to download 175 zettabytes at current bandwidth, it would take you roughly 1.8 billion years.
Compliance is hard. There are hundreds of data-related governances worldwide which makes compliance across jurisdictions an impossible task.

The combined result of poor network growth and regulatory constraints is that nearly 68% of enterprise data is unused. That’s precisely why moving compute resources to where the data is stored (broadly referred to as compute-over-data) rather than moving data to the place of computation becomes all the more important, something which compute-over-data (CoD) platforms like Bacalhau are working on.

In the upcoming sections, we will briefly cover:

How organizations are currently handling data today
Propose alternative solutions based on compute-over-data
Lastly, postulate why decentralized computation matters

The Present Scenario

There are three main ways in which organizations are navigating the challenges of data processing today — none of which are ideal.

Using Centralized Systems

The most common approach is to lean on centralized systems for large-scale data processing. We often see enterprises use a combination of compute frameworks — Adobe Spark, Hadoop, Databricks, Kubernetes, Kafka, Ray, and more — forming a network of clustered systems that are attached to a centralized API server. However, such systems fall short of effectively addressing network irregularities and other regulatory concerns around data mobility.

This is partly responsible for companies coughing up billions of dollars in governance fines and penalties for data breaches.

Building It Themselves

An alternative approach is for developers to build custom orchestration systems that possess the awareness and robustness the organizations need. This is a novel approach but such systems are often exposed to risks of failure by an over-reliance on a few individuals to maintain and run the system.

Doing Nothing

Surprisingly, more often than not, organizations do nothing with their data. A single city, for example, may collect several petabytes of data from CCTV recordings a day and only view them on local machines. The city does not archive or process these recordings because of the enormous costs involved.

Building Truly Decentralized Compute

There are 2 main solutions to the data processing pain points.

Solution 1: Build on top of open-source compute-over-data platforms.

Solution 1: Open Source Compute Over Data Platforms

Instead of using a custom orchestration system as specified earlier, developers can use an open-source decentralized data platform for computation. Because it is open source and extensible, companies can build just the components they need. This setup caters to multi-cloud, multi-compute, non-data-center scenarios with the ability to navigate complex regulatory landscapes. Importantly, access to open-source communities makes the system less vulnerable to breakdowns as maintenance is no longer dependent on one or a few developers.

Solution 2: Build on top of decentralized data protocols.

With the help of advanced computational projects like Bacalhau and Lilypad, developers can go a step further and build systems not just on top of open-source data platforms as mentioned in Solution 1, but on truly decentralized data protocols like the Filecoin network.

Solution 2: Decentralized Compute Over Data Protocols

What this means is that organizations can leverage decentralized protocols that understand how to orchestrate and describe user problems in a much more granular way and thereby unlock a universe of compute right next to where data is generated and stored. This switchover from data centers to decentralized protocols can be carried out ideally with very few changes to the data scientists’ experience.

Decentralization is About Maximizing Choices

By deploying on decentralized protocols like the Filecoin network, the vision is that clients can access hundreds (or thousands) of machines spread across geographies on the same network, following the same protocol rules as the rest. This essentially unlocks a sea of options for data scientists as they can request the network to:

Select a dataset from anywhere in the world
Comply with any governance structures, be it HIPAA, GDPR, or FISMA.
Run at the cheapest rates possible

Juan’s Triangle | Decoding Acronyms: FHE (Fully Homomorphic Encryption), MPC (Multi-Party Compute), TEE (Trusted Execution Environment), ZKP (Zero-Knowledge Proofs)

The concept of maximizing choices brings us to what’s called “Juan’s triangle,” a term coined after Protocol Labs’ founder Juan Benet for his explanation of why different use cases will have (in the future) different decentralized compute networks backing them.

Juan’s triangle explains that compute networks often have to trade off between 3 things: privacy, verifiability, and performance. The traditional one-size-fits-all approach for every use case is hard to apply. Rather, the modular nature of decentralized protocols enables different decentralized networks (or sub-networks) that fulfill different user requirements — be it privacy, verifiability, or performance. Eventually, it is up to us to optimize for what we think is important. Many service providers across the spectrum (shown in boxes within the triangle) fill these gaps and make decentralized compute a reality.

In summary, data processing is a complex problem that begs out-of-the-box solutions. Utilizing open-source compute-over-data platforms as an alternative to traditional centralized systems is a good first step. Ultimately, deploying on decentralized protocols like the Filecoin network unlocks a universe of compute with the freedom to plug and play computational resources based on individual user requirements, something that is crucial in the age of Big Data and AI.

Follow the CoD working group for all the latest updates on decentralized compute platforms. To learn more about recent developments in the Filecoin ecosystem, tune into our blog and follow us on social media at TL;DR, Bacalhau, Lilypad, Expanso, and COD WG.

🇨🇳Filecoin洞察：分布式数据计算的重要性和商业潜力

This blog post is contributed to Filecoin TL;DR by a guest writer. Catrina is an Investment Partner at Portal Ventures.

Until recently, startups led the way in technological innovation due to their speed, agility, entrepreneurial culture, and freedom from organizational inertia. However, this is no longer the case in the rapidly growing era of AI. So far, big tech incumbents like Microsoft-owned OpenAI, Nvidia, Google, and even Meta have dominated breakthrough AI products.

What happened? Why are the “Goliaths” winning over the “Davids” this time around? Startups can write great code, but they are often too hindered to compete with big tech incumbents due to several challenges:

Compute costs remain prohibitively high
AI has a reverse salient problem: a lack of necessary guardrails impedes innovation due to fear and uncertainty around societal ramifications
AI is a black box
The data “moat” of scaled players (big tech) creates a barrier to entry for emerging competitors

So, what does this have to do with blockchain technology, and where does it intersect with AI? While not a silver bullet, DePIN (Decentralized Physical Infrastructure Networks) in Web3 unlocks new possibilities for solving the aforementioned challenges. In this blog post, I will explain how AI can be enhanced with the technologies behind DePIN across four dimensions:

Reduction of infrastructure costs
Verification of creatorship and humanity
Infusion of Democracy & Transparency in AI
Installation of incentives for data contribution

In the context of this article,

“web3” is defined as the next generation of the internet where blockchain technology is an integral part, along with other existing technologies
“blockchain” refers to the decentralized and distributed ledger technology
“crypto” refers to the use of tokens as a mechanism for incentivizing and decentralizing

Reduction of infra cost (compute and storage)

Every wave of technological innovation has been unleashed by something costly becoming cheap enough to waste

— Society’s Technical Debt and Software’s Gutenberg Momentby SK Ventures

The importance of infra affordability (in AI’s case, the hardware costs to compute, deliver, and store data) is highlighted by Carlota Perez’s Technological Revolution framework, which proposed that every technological breakthrough comes with two phases:

Source: Carlota Perez’s Technological Revolution

The Installation stage is characterized by heavy VC investments, infrastructure setup, and a “push” go-to-market (GTM) approach, as customers are unclear on the value proposition of the new technology.
The Deployment stage is characterized by a proliferation of infrastructure supply that lowers the barrier for new entrants and a “pull” GTM approach, implying a strong product-market fit from customers’ hunger for more yet-to-be-built products.

With definitive evidence of ChatGPT’s product-market fit and massive customer demand, one might think that AI has entered its deployment phase. However, there is still one piece missingstill: an excess supply of infrastructure that makes it cheap enough for price-sensitive startups to build on and experiment with.

Problem

The current market dynamic in the physical infrastructure space is largely a vertically integrated oligopoly, with companies such as AWS, GCP, Azure, Nvidia, Cloudflare, and Akamai enjoying high margins. For example, AWS has an estimated 61% gross margin on commoditized computing hardware.

Compute costs are prohibitively high for new entrants in AI, especially in LLM.

ChatGPT costs an est. of 4M/training, and ~$700,000 per day to operate in hardware inference costs
Version two of Bloom will likely cost $10M to train & retrain
If ChatGPT were deployed into Google Search, it would result in $36B reduction in operating income for Google, a massive transfer of profitability from the software platform (Google) to the hardware provider (Nvidia)

Source: Peeling The Onion’s Layers — Large Language Models Search Architecture And Cost

Solution

DePIN networks such as Filecoin (the pioneer of DePIN since 2014 focused on amassing internet-scale hardware for decentralized data storage), Bacalhau, Gensyn.ai, Render Network, and ExaBits (the coordination layers to match the demand for CPU/GPU with supply) can deliver 75% — 90%+ cost savings in infra costs via below three levers

1. Pushing up the supply curve to create a more competitive marketplace

DePIN democratizes access for hardware suppliers to become service providers. It introduces competition to these incumbents by creating a marketplace for anyone to join the network as a “miner,” contributing their CPU/GPU or storage power in exchange for financial rewards.

While companies like AWS undoubtedly enjoy a 17-year head start in UI, operational excellence, and vertical integration, DePIN unlocks a new customer segment that was previously out-priced by centralized providers. Similar to how eBay does not compete directly with Bloomingdale but rather introduces more affordable alternatives to meet similar demand, DePIN networks do not replace centralized providers but rather aim to serve a more price-sensitive segment of users.

2. Balancing the economy of these markets with crypto-economic design

DePIN creates a subsidizing mechanism to bootstrap hardware providers’ participation in the network, thus lowering the costs to end users. To understand how let’s first compare the costs & revenue of storage providers in web2 vs. web3 using AWS and Filecoin.

Lower fees for clients: DePIN networks create competitive marketplaces that introduce Bertrand-style competition resulting in lower fees for clients. In contrast, AWS EC2 needs a mid-50% margin and 31% overall margin to sustain operations, plus

Token incentives/block rewards are emitted from DePIN networks as a new revenue source. In the context of Filecoin, hosting more real data translates to earning more block rewards (tokens) for storage providers. Consequently, storage providers are motivated to attract more clients and win more deals to maximize revenue. The token structures of several emerging compute DePIN networks are still under wraps, but will likely follow a similar pattern. Examples of such networks include:

Bacalhau: a coordination layer to bring computing to where data is stored without moving massive amounts of data
exaBITS: a decentralized computing network for AI and computationally intensive applications
Gensyn.ai: a compute protocol for deep learning models

3. Reducing overhead costs

Benefits of DePIN networks like Bacalhau and exaBITS, and IPFS/content-addressed storage include:

Creating usability from latent data: there is a significant amount of untapped data due to the high bandwidth costs of transferring large datasets. For instance, sports stadiums generate vast amounts of event data that is currently unused. DePIN projects unlock the usability of such latent data by processing data on-site and only transmitting meaningful output.
Reducing OPEX costs such as data input, transport, and import/export by ingesting data locally.
Minimizing manual processes to share sensitive data: for example, if hospitals A and B need to combine respective sensitive patient data for analysis, they can use Bacalhau to coordinate GPU power to directly process sensitive data on-premise instead of going through the cumbersome administrative process to handle PII (Personal Identifiable Information) exchange with counterparties.
Removing the need to recompute foundational datasets: IPFS/content-addressed storage has built-in properties that deduplicate, trace lineage, and verify data. Here’s a further read on the functional and cost efficiencies brought about by IPFS.

Summary by AI: AI needs DePIN for affordable infrastructure, which is currently dominated by vertically integrated oligopolies. DePIN networks like Filecoin, Bacalhau, Render Network, and ExaBits can deliver cost savings of 75%-90%+ by democratizing access to hardware suppliers and introducing competition, balancing the economy of markets with cryptoeconomic design, and reducing overhead costs.

Verification of Creatorship & Humanity

Problem

According to a recent poll, 50% of A.I. scientists agree that there is at least a 10% chance of A.I. leading to the destruction of the human race.

This is a sobering thought. A.I. has already caused societal chaos for which we currently lack regulatory or technological guardrails — the so-called “reverse salient.”

To get a taste of what this means, check out this Twitter clip featuring podcaster Joe Rogan debating the movie Ratatouille with conservative commentator Ben Shapiro in an AI-generated video.

Source: Bloomberg Article on Deep Fake

Unfortunately, the societal ramifications of AI go much deeper than just fake podcast debates & images:

The 2024 presidential election cycle will be among the first where a deep fake AI-generated political campaign becomes indistinguishable from the real one
An altered video of Senator Elizabeth Warren made it appear like Warren was saying Republicans should not be allowed to vote (debunked)
The voice clone of Biden criticizing transgender women
A group of artists filed a class-action lawsuit against Midjourney and Stability AI for unauthorized use of artists’ work to train AI imagery that infringed on those artists’ trademarks & threatened their livelihood
A deepfake AI-generated soundtrack, “Heart on My Sleeve” featuring The Weeknd and Drake, went viral before being taken down by the streaming service. Such controversy around copyright violation is a harbinger of the complications that can arise when a new technology enters the mainstream consciousness before the necessary rules are in place. In other words, it is a Reverse Salient problem.

What if we can do better in web3 by putting some guardrails on AI?

Solution

Proof of Humanity and Creatorship with cryptographic proof of origination on-chain

This is where we can actually use Blockchain for its technology — as a distributed ledger of immutable records that contain tamper-proof history on-chain. This makes it possible to verify the authenticity of digital content by checking its cryptographic proof.

Proof of Creatorship & Humanity with Digital Signature

To prevent deep fakes, cryptographic proof can be generated using a digital signature that is unique to the original creator of the content. This signature can be created using a private key, which is only known to the creator, and can be verified using a public key that is available to everyone. By attaching this signature to the content, it becomes possible to prove that the content was created by the original creator — whether they are human or AI — and authorized/unauthorized changes to this content.

Proof of Authenticity with IPFS & Merkle Tree

IPFS is a decentralized protocol that uses content addressing and Merkle trees to reference large datasets. To prove changes to a file’s content, a Merkle proof is generated, which is a list of hashes that shows a specific data chunk in the Merkle tree. With every change, a new hash is generated and updated in the Merkle tree, providing proof of file modification.

A pushback against such a cryptographic solution may be incentive alignment: after all, catching a deep fake generator doesn’t generate as much financial gain as it reduces the negative societal externality. The responsibility will likely fall on major media distribution platforms like Twitter, Meta, Google, etc. to flag, which they are already doing. So why do we need Blockchain for this?

The answer is that these cryptographic signatures and proof of authenticity are much more effective, verifiable, and deterministic. Today, the process to detect deep fakes is largely through machine learning algorithms (such as the “Deepfake Detection Challenge”of Meta, “Asymmetric Numeral Systems” (ANS) of Google, and c2pa) to recognize patterns and anomalies in visual content, which is not only inaccurate at times but also falling behind the increasingly sophisticated deep fakes. Often, human reviewer intervention is required to assess authenticity, which is not only inefficient but also costly.

Imagine a world where each piece of content has its cryptographic signature so that everyone will be able to verifiably prove the origin of creation and flag manipulation or falsification — a brave new one.

Summary by AI: AI poses a significant threat to society, with deep fakes and unauthorized use of content being major concerns. Web3 technologies, such as Proof of Creatorship with Digital Signature and Proof of Authenticity with IPFS and Merkle Tree, can provide guardrails for AI by verifying the authenticity of digital content and preventing unauthorized changes.

Infusion of Democracy in AI

Problem

Today, AI is a black box comprised of proprietary data + proprietary algorithms. Such closed-door nature of Big Tech’s LLM precludes the possibility of what I call an “AI Democracy,” where every developer or even user should be able to contribute both algorithms and data to an LLM model, and in term receive a fraction of the future profits from the model (as discussed here).

AI Democracy = visibility(the ability to see the data & algorithm input into the model)

contribution(the ability to contribute data or algorithm to the model).

Solution

AI Democracy aims to make generative AI models accessible for, relevant to, and owned by everyone. The below table is a comparison illustrating what is possible today vs. What will be possible, enabled by blockchain technology in Web3.

Today:

For consumers:

One-way recipient of the LLM output
Little control over the use of their personal data

For developers:

Little composability possible
Little reproducibility because there’s no traceability of the ETL performed on the data
Single-source data contribution within the confine of the owner organization
Close sourced only accessible through API for a charge
80% of data scientists’ time is wasted on performing low-level data cleansing work because of the lack of verifiability to share data output

What blockchain will enable

For consumers:

Users can provide feedback (e.g. on bias, content moderation, granular feedback on output) as input into continuous fine-tuning
Users can opt to contribute their data for potential profits from model monetization

For developers:

Decentralized data curation layer: crowdsource tedious & time-consuming data preparation processes such as data labeling
Visibility & ability to compose & fine-tune algorithms with verifiable & built-on lineage (meaning they can see a tamper-proof history of all changes in the past)
The sovereignty of both data (enabled by content-addressing/IPFS) and algorithm(e.g. Urbitenables peer-to-peer composability and portability of data & algorithm)
Accelerated innovation in LLM from the outpour of variants from the base open-source models
Reproducibility of training data output via Blockchain’s immutable record of past ETL operations & queries (e.g. Kamu)

One might argue that there’s a middle ground of Web2 open source platforms, but it’s still far from optimal for reasons discussed in this blog post by exaBITS.

Summary by AI: The closed-door nature of Big Tech’s LLM precludes the possibility of an “AI Democracy,” where every developer or user should be able to contribute both algorithms and data to an LLM model, and in turn receive a fraction of the future profits from the model. AI should be accessible for, relevant to, and owned by everyone. Blockchain networks will enable users to provide feedback, contribute data for potential profits from model monetization, and enable developers to have visibility and the ability to compose and fine-tune algorithms with verifiability and built-on lineage. The sovereignty of both data and algorithm will be enabled by web3 innovations such as content-addressing/IPFS and Urbit. Reproducibility of training data output via Blockchain’s immutable record of past ETL operations and queries will also be possible.

Installation of Incentives for Data Contribution

Problem

Today, the most valuable consumer data is proprietary to big tech platforms as an integral business moat. The tech giants have little incentive to ever share that data with outside parties.

What about getting such data directly from data originators/users? Why can’t we make data a public good by contributing our data and open-source it for talented data scientists to use?

Simply put, there’s no incentive or coordination mechanism for that. The tasks of maintaining data and performing ETL (extract, transform & load) incur significant overhead costs. In fact, data storage alone will become a $777 billion industry by 2030, not even counting computing costs. Why would someone take on the data plumbing work and costs for nothing in return?

Case in point, OpenAI started off as open-source and non-profit but struggled with monetization to cover its costs. Eventually, in 2019, it had to take the capital injection from Microsoft and close off its algorithm from the public. In 2024, OpenAI is expected to generate $1 billion in revenue.

Solution

Web3 introduces a new mechanism called dataDAO that facilitates the redistribution of revenue from the AI model owners to data contributors, creating an incentive layer for crowd-sourced data contribution. Due to length constraints, I won’t elaborate further, but below are two related pieces.

How DataDAO works by HQ Han of Protocol Labs
How data contribution and monetization works in web3, where I dove into the mechanics, missing pieces, and emerging inevitable opportunities in dataDAOs

In conclusion, DePIN is an exciting new category that offers an alternative fuel in hardware to power today’s renaissance of innovations in web3 and AI. Although big tech companies have dominated the AI industry, there is potential for emerging players to compete by leveraging blockchain technologies: DePIN networks lower the barrier to entry in compute costs; blockchain’s verifiable & decentralized properties make true open IA possible; innovative mechanisms, such as dataDAOs, incentivize data contribution; and the immutable and tamper-proof property of Blockchain provides proof of creatorship to address concerns regarding the negative societal impact of AI.

Some good resources for DePIN deep dive

Shoutout to below SMEs and friends for providing input and/or reviewing my draft: Jonathan Victor& HQ Han of Protocol Labs, Zackary Bof exaBITS, Professor Daniel Rock of Wharton, Evan Fisherof Portal Ventures, David Aronchickof Bacalhau, Josh Lehmanof Urbit, and more

About Author: Partner at Portal Ventures | @CuriousCatWang on Twitter

🇨🇳AI为什么离不开区块链——来看DePIN如何助力人工智能

Editor’s Note: This blogpost is a repost of the original content published on 7 June 2023, by Luffistotle from Zee Prime Capital. Zee Prime Capital is a VC firm investing in programmable assets and early-stage founders globally. They call themselves a totally supercool and chilled VC (we tend to agree) investing in programmable assets, collaborative intelligence and other buzzwords. Luffistotle is an Investor at Zee Prime Capital. This blogpost represents the independent view of the author, who has given permission for this re-publication.

Table of Contents

- History of P2P

- Decentralized Storage Network Landscape

- FVM

- Permanent Storage

- Web 3’s First Commercial Market

- Consequences of Composability

Storage is a critical part of any computing stack. Without this fundamental element, nothing is possible. Through the continued advancement of computational resources, a great deal of excess and underutilized storage has been created. Distributed Storage Networks (DSNs) offer a way to coordinate and utilize these latent resources and turn them into productive assets. These networks have the potential to bring the first real commerce vertical into Web 3 ecosystems.

History of P2P

The history of real Peer-to-peer file sharing began to hit the mainstream with the advent of Napster. While there were early methods of sharing files on the internet before this, the mainstream finally joined with the sharing of MP3 files that Napster brought. From this initial starting point, the distributed systems world exploded with activity. The centralization within Napster’s model (for indexing) made it easy to shut down given its legal transgressions, however, it laid the foundation for more robust methods of file sharing.

The Gnutella Protocol followed this trailblazing and had many different effective front-ends leveraging the network in different ways. As a more decentralized version of the napstereqsue query network, it was much more robust to censorship. Even in its day, it experienced censorship. AOL had acquired the developing company Nullsoft, and quickly realized the potential, shutting distribution down almost immediately. However, it had already made it outside and was quickly reverse-engineered. Bearshare, Limewire, and Frostwire are likely the most notable of these front-end applications you may have encountered. Where it ultimately failed was the bandwidth requirements (a deeply limited resource at the time) combined with the lack of liveness and content guarantees.

Remember this? If not do not worry, it has been reborn as an NFT marketplace…

What came next was Bittorrent. This presented a level-up due to the two-sided nature of the protocol and its ability to maintain Distributed Hash Tables (DHTs). DHTs are important because they serve as a decentralized version of a ledger that stores the locations of files and is available for lookup by other participating nodes in the network.

After the advent of Bitcoin and blockchains, people started thinking big about how this novel coordination mechanism could be used to tie together networks of latent unused resources and commodities. What followed soon after was the development of DSNs.

Something that would perhaps surprise many people, is that the history of tokens and P2P networks goes back much farther than the existence of bitcoin and blockchains. What pioneers of these networks realized very quickly was a couple of the following points:

1. Monetizing a useful protocol you have built is difficult as a result of forking. Even if you monetize a front end and serve ads or utilize other forms of monetization, a fork will likely undercut you.

1. Not all usage is created equal. In the case of Gnutella, 70% of users did not share files and 50% of requests were for files hosted by the top 1% of hosts.

Power laws.

How does one remedy these problems? For BitTorrent it is seeding ratios (download/upload ratio), for others, it was the introduction of primitive token systems. Most often called credits or points they were allocated to incentivize good behavior (that promotes the health of the protocol) and stewardship of the network (like regulating content in the form of trust ratings). For a deeper dive into the broader history of all of this, I highly recommend these (now deleted, available via web archive) articles by John Backus:

- Fat Protocols Aren’t New

- What If BitTorrent had a Token?

Interestingly a DSN was part of the original vision for Ethereum. The “holy trinity” as it was called was meant to provide the necessary suite of tools for the world computer to flourish. Legend has it, that it was Gavin Wood’s idea for the concept of Swarm as the storage layer for Ethereum with Whisper as the messaging layer.

Mainstream DSNs followed and the rest is history.

Decentralized Storage Network Landscape

The decentralized storage landscape is most interesting because of the huge disparity between the size of the leader (Filecoin) and the other more nascent storage networks. While many people think of the storage landscape as two giants of Filecoin and Arweave, it would likely surprise most people that Arweave is the 4th largest by usage, behind Storj and Sia (although Sia seems to be declining in usage). And while we can readily question how legitimate the FIL data stored is, even if we handicapped it by say 90%, FIL usage is still ~400x Arweave.

What can we infer from this?

There is clear dominance in the market right now, but the continuity of this is dependent on the usefulness of these storage resources. The DSNs all roughly use the same architecture, node operators have a bunch of unused storage assets (hard drives), and they can pledge these to the network, mine blocks, and earn miner rewards for storing data. While the approaches to pricing and permanence may differ, the most important will be how easy and affordable retrieval and computation of the stored data is.

Fig 1. Storage Networks by Capacity and Usage

N.B:

1. Arweave Capacity is not directly measurable; instead, node operators are always incentivized to have sufficient buffer and to increase supply to meet demand. How big is the buffer? Given the immeasurability of it, we can not know.

1. Swarm’s actual network usage is impossible to tell, we can only look at how much storage has been paid for already. Whether it is used is unknown.

While this is the table of live projects, there are other DSNs in the works. These include ETH Storage, Maidsafe, and others.

FVM

Before going further it is probably worth noting that Filecoin has recently launched the Filecoin Ethereum Virtual Machine (FEVM). The FVM is a WASM VM that can support many different other runtimes on top via hypervisor. For instance, this recently launched FEVM is an Ethereum Virtual Machine runtime on top of the FVM/FIL network. The reason this is worth highlighting is that it facilitates the explosion of activity concerning smart contracts (i.e. stuff) on top of FIL. Before the March launch, there were 11 active smart contracts on FIL, following the FVM launch this has exploded. It benefits from composability in the form of leveraging all the work done in solidity to build out new businesses on top of FIL. This means innovations like quasi-liquid staking type primitives from teams like GLIF, and the various additional financialization of these markets you can build on top of such a platform. We believe this will accelerate storage providers because of the increases in capital efficiency (SPs need FIL to actively mine/seal storage deals). This differs from typical LSDs as there is an element of assessing the credit risk of the individual storage providers.

Permanent Storage

I believe Arweave gets the most airtime on this front, it has a flashy tagline that appeals to the deepest desires of Web 3 Participants:

Permanent Storage.

But what does this mean? It is an extremely desirable property, but in reality, execution is everything. Ultimately execution comes down to sustainability and cost for the end users. Arweave’s model is based on a pay-once, store forever (200 years upfront + deflation of storage value assumption) model. This kind of pricing model works well in a deflationary pricing environment of the underlying asset, as there is a constant goodwill accrual (i.e. old deals subsidize new deals) however the inverse is true in inflationary environments. History tells us this shouldn’t be an issue as the cost of computer storage has more or less been down only since inception but hard drive cost alone is not the whole picture.

Arweave creates permanent storage via the incentives of the Succinct Proof of Random Access (SPoRA) algorithm which incentivizes miners to store all the data and prove they can randomly produce a historical block. Doing so gives them a higher probability of being selected to create the next block (and earn the corresponding rewards).

While this model does a good job of getting node runners to want to store all of the data, it does not mean it is guaranteed to happen. Even if you set super high redundancy and use conservative heuristics to decide the parameters of the model, you can never get rid of this underlying risk of loss.

Source: Twitter

Fundamentally the only way to truly execute permanent storage would be to deterministically force somebody (everybody?) to and throw them in the gulag when they screw up. How do you properly incentivize personal responsibility such that you can achieve this? There is nothing wrong with the heuristic approach, but we need to identify the optimal way to achieve and price permanent storage.

All of this is a long-winded way of getting to the point of asking what level of security we deem acceptable for permanent storage, and then we can think about that pricing over a given time frame. In reality, consumer preferences will fall along the spectrum of replication (permanence), and thus they should be able to decide what this level is and receive the corresponding pricing.

In traditional investing literature and research, there is infamous knowledge about how the benefits of diversification work on the overall risk of a portfolio. While adding stocks initially brings risk reduction to your portfolio, very quickly the diversification benefits of adding stock become more or less not valuable.

I believe the pricing of storage over and above some default standard of replication on the DSN should follow a similar curve but for cost and security of the storage with an increasing amount of replication.

For the future, I am most excited about what more DSNs with easily accessible smart contracting can bring to the market for permanent storage. I think overall consumers will benefit the most from this as the market opens up this spectrum of permanence.

For instance, in the chart above we can think of the area in green as the area of experimentation. It may be possible to achieve exponential decreases in the cost of that storage with minimal changes to the number of replications and level of permanence.

Additional ways of constructing permanence could come from replication across different storage networks rather than just within a single network. These kinds of routes are more ambitious but naturally lead to more differentiated levels of permanence. The biggest question here would be is there some kind of “permanence-free lunch” we could achieve by spreading it across DSNs in the same way we diversify market risk across a portfolio of publicly traded equities?

The answer could be yes, but it depends on node provider overlap and other complex factors. It could also be constructed via forms of insurance, possibly by node runners subjecting themselves to higher levels of slashing conditions in exchange for these assurances. Maintaining such a system would also be extremely complex as multiple codebases and coordination between them are required. Nonetheless, we look forward to this design scape expanding significantly and forwarding the general idea of permanent storage for our industry.

Web 3’s First Commercial Market

Matti tweeted recently about the promise of storage as the use case to bring Web 3 some real commerce. I believe this is likely.

I was having a conversation recently with a team from a layer one where I told them it is their moral imperative to fill their blockspace as stewards of the L1, but even more than this, it is to do this with economic activity. The industry often forgets the second part of its name.

The whole currency part.

Any protocol that launches a token that would not like to be down only is asking for some kind of economic activity to be conducted in that currency. For layer 1s it’s their native token, processing payments (executing computation) and charging a gas fee for doing so. The more economic activity happening, the more gas is used, and the more demand for their token. This is the crypto-economic model. For other protocols, it is likely some kind of middleware SaaS service.

What makes this model most interesting is when it is paired with some kind of commercial good, in the case of classical L1s it is computation. The problem with this is that as it pertains to something like financial transactions, having variable pricing on the execution is a horrible UX. The cost of execution should be the least important part of a financial transaction such as a swap.

What becomes difficult is filling this blockspace with economic activity in the face of this bad UX. While scaling solutions are on the way that will help stabilize this (I highly recommend this whitepaper on Interplanetary Consensus warning PDF), the flooded market of layer 1s makes it difficult to find enough activity for a given one.

This problem is much more addressable when you pair this computational capacity with some kind of additional commercial good. In the case of DSNs, this is storage. The economic activity of data being stored and the related elements such as financing and securitization of these storage providers is an immediate filler.

But this storage also needs to be a functional solution for traditional businesses to use. Particularly those who deal with regulations around how their data is stored. This most commonly comes in the form of auditing standards, geographical restrictions, and making the UX simple enough to use.

We’ve discussed Banyan before in our Middleware Thesis part 2, but their product is a fantastic step in the right direction on this front. Working with node operators in a DSN to secure SOC certifications for the storage being provided while offering a simple UX to facilitate the upload of your files.

But this alone is not enough.

The content stored also needs to be easily accessible with efficient retrieval markets. One thing we are very excited about at Zee Prime is the promise of creating a Content Distribution Network (CDN) on top of a DSN. A CDN is a tool to cache content close to the users and deliver improved latency when retrieving the content.

We believe this is the next critical component to making DSNs widely adaptable as this allows videos to load quickly (think building decentralized Netflix, Youtube, TikTok, etc.). One proxy to this space is our portfolio company Glitter, which focuses on indexing DSNs. This is important because it is a critical piece of infrastructure to improve the efficiency of retrieval markets and facilitate these more exciting use cases.

The potential for these kinds of products excites us as they have demonstrated PMF with high demand in Web 2. Despite this adoption, many face frictions that could benefit from leveraging the permissionless nature of Web 3 solutions.

Consequences of Composability

Interestingly, we think some of the best alpha on DSNs is hiding in plain sight. In these two pieces by Jnthnvctr he shares some great ideas on how these markets will evolve and the products that will come (on the Filecoin side):

- State and Direction of Filecoin

- Business Models on the FVM

One of the most interesting takeaways is the potential for pairing off-chain computation in addition to storage and on-chain computation. This is because of the natural computational needs of providing storage resources in the first place. This natural pairing can add commercial activity to flow through the DSN while opening up new use cases.

The launch of the FEVM makes many of these level-ups possible and will make the storage space much more interesting and competitive. For founders searching for new products to build, there is even a resource of all the products Protocol Labs is requesting people to build with potential grants available.

In Web 2 we learned that data has a kind of gravitational pull, where companies that collected/created a lot of it can reap the rewards and were accordingly incentivized to close it off in such a way to protect that.

If our dreams of user-controlled data solutions become mainstream, we can ask ourselves how the point where this value accrual happens changes. While users become primary beneficiaries, receiving cash flows in exchange for their data, no doubt monetization tools that unlock this potential also benefit, but where and how this data is stored and accessed also changes dramatically. Naturally, this kind of data can sit on DSNs which benefit from the usage of this data via robust query markets. This is a shift from exploitation toward flow.

What comes after could be extremely exciting.

When we think about the future of decentralized storage, it is fun to consider how it might interact with operating systems of the future like Urbit. For those unfamiliar, Urbit is a kind of personal server built with open-source software that allows you to participate in a peer-to-peer network. A true decentralized OS to do self-hosting and interact with the internet is a P2P way.

If the future plays out the way Urbit maximalists might hope, decentralized storage solutions undoubtedly become a critical piece of the individual stack. One can easily imagine hosting all their user-related data encrypted on one of the DSNs and coordinating actions via their Urbit OS. Further to this, we could expect further integrations with the rest of Web 3 and Urbit, especially with projects such as Uqbar Network, which brings smart contracting to your Nook environment.

These are the consequences of composability, the slow burn continues to build up exponentially until it delivers something really exciting. What feels like thumb twaddling becomes a revolution, an alternative path towards existing in a hyper-connected world. While Urbit might not be the end solution on this front (it has its criticisms), what it does show us is how these pieces can come together to open up a new river of exploration.

🇨🇳Web3中的分布式存储和商业模式

Introduction and Overview

This blog post explains what Filecoin’s Virtual Machine (FVM) is, why it matters, and how one might evaluate and prioritize the opportunities it unlocks.

First, we will explain what the Filecoin Virtual Machine is and how Filecoin becoming programmable lays the foundation for composable products and services in the Filecoin economy.
Secondly, we introduce a framework to categorize the universe of potential FVM-powered use cases into distinct opportunity areas. This section discusses specific opportunities unlocked by the FVM and how venture investors, developers, storage providers and other stakeholders may evaluate and prioritize them.
In the third section, we explore how, by commoditizing cloud services and by allowing anyone to create and monetize products and services around data, the FVM stands to fully unleash and augment the full potential of a trillion dollar open data economy.
We conclude by discussing what has energized many hundreds of teams to start building on the FVM, and by providing information on how you can stay up to date on all things FVM.

Please note that this blog post builds on previous posts that explored the Filecoin Virtual machine in the context of the broader Filecoin roadmap, its potential impact on the Filecoin economy and its potential to unlock new use cases in an economy of open data services. You may find it useful to review these articles for further exploration of the topics discussed.

1. The FVM brings user programmability to Filecoin creating a watershed moment for innovation

Filecoin’s larger roadmap aims to turn the services of the cloud into permissionless markets on which any provider can offer their services. The launch of smart contracts (also known as ‘actors’) on FVM on March 14 2023 is a critical component of this larger vision as FVM allows for user programmability around the key services of the Filecoin network: Large-scale Storage, Retrieval, and Compute over data.

What’s exciting about FVM is the ability for any developer¹ to deploy smart contracts on the network. One may compare this to the moment that phones became programmable: The ability to write and install apps significantly augmented what people could do with phones and allowed the devices to go far beyond the capabilities of their pre-installed, hard-coded software. It was a watershed moment for innovation.

Similar dynamics stand to unfold when Filecoin becomes programmable by anyone through FVM. Since Filecoin’s storage market is currently the primary service anchored into the Filecoin blockchain today, the first big unlock will likely be programming around the state of Filecoin’s storage deals. Specifically, anyone will be able to (within certain limits) write software on what, how, when and by/for whom data is stored via Filecoin, the world’s largest open access storage network.

The FVM is a state machine that allows users to program around the state of the Filecoin blockchain

While the FVM does not directly interact with the data stored on the Filecoin network (just its metadata!), it will enable automation of storage and related services (e.g. off-chain data indexers, oracles) and settlement of value. This automation unlocks many new use cases. Illustratively, find a few select opportunities (Perpetual Storage, DataDAOs, Filecoin staking) that arise from assembling the different building blocks that the FVM provides access to below.

The FVM enables the settlement of value as well as building smart contracts that can interact with the metadata of services anchoring into the Filecoin blockchain (e.g., storage)

These illustrative recipes for use cases can be endlessly re-combined with each other and other components (e.g., developer tooling, end-user interfaces etc.). Additionally, they may eventually also leverage other building blocks of the Filecoin economy which anchor into the same blockspace (e.g., retrieval of and computation on content-addressed data) as well as those of other economies (e.g., via cross-chain bridges or oracles). These combinations will result in the emergence of ever more sophisticated services.

2. How we see the use cases unlocked by the FVM

The use cases unlocked by the FVM are loosely grouped into the following opportunity areas:

Data onboarding & management: e.g., tools automating storage deal-making to unlock use cases like perpetual storage
Data curation & monetization: e.g., tools facilitating the collective creation, curation and monetization of valuable datasets
Decentralized finance: e.g., to provide access to collateral for the thousands of storage providers offering services on the network and to create new opportunities for token-holders to participate more actively in the Filecoin economy
Network participant discovery and analytics: e.g., storage provider reputation services or data retrievability oracles that create differentiation opportunities and may enhance the reliability of the decentralized cloud for its users
Integration, interoperability and other services: e.g., cross-chain bridges to integrate with other economies or NFT-standards with built-in storage guarantees, developer tooling and more

It is important to note that due to the recombinant nature of FVM-powered use cases and web3 in more broadly, businesses may cut across several of these opportunity areas.

The use cases unlocked by the FVM can be loosely grouped into five opportunity areas, each unlocking sizable markets

One of the unique advantages of FVM-powered services starting up in the Filecoin economy is access to a large number of storage providers seeking to boost their profitability. In February 2023, storage providers on the Filecoin network had collectively locked up 130M+ FIL in collateral to secure their commitments to keeping the network and their clients’ data safe.

Additionally, storage providers have collectively invested millions of dollars to stand up the data centers that provide many exabytes of capacity. Partly due to the absence of DeFi solutions, access to collateral and other financing is relatively difficult today.² As a result, services that lower these costs and reduce complexity for storage providers and/or unlock new revenue streams represent sizable addressable markets. This advantage is also a distinguishing dynamic that makes Filecoin a differentiated L1.

To lower cost, DeFi lending services may provide cheaper access to capital (which is required for collateralizing storage deals) by using storage provider reputation scores to improve underwriting or by using smart contracts to enforce repayment (e.g., by using future block rewards as collateral) directly to those lenders that made them possible. Such services³ also stand to benefit from being able to tap into the large number of token-holders — many of which are eager to participate in the Filecoin economy more actively.⁴ Furthermore, data on-boarding and management services may lower costs for storage providers by lowering storage client acquisition costs (e.g., via deal aggregators) and/or by reducing the overhead of being a Filecoin storage provider (e.g., by modularizing the operations).

FVM-powered services may also increase SP revenue by enabling data access or encryption solutions (e.g., Medusa, Fission, Lit or Lighthouse) which make Filecoin solutions viable for new customer segments, by giving SPs more opportunities to differentiate (e.g., via reputation or compliance-certification tools), increasing the duration of storage deals (potentially in perpetuity) and allocating Fil+ datacap more efficiently.

Cross-chain bridges and messaging to other (web3) ecosystems which facilitate the importing and exporting of services to and from the Filecoin economy also represent large opportunities for growth — both in utility for the Filecoin network (and thus in demand for Filecoin blockspace) and for increasing utility for web3 overall. In the near term, developers may also capitalize on opportunities around the tooling that makes building the services outlined above more feasible.

Collectively, these opportunities represent markets with hundreds of millions of dollars in revenue potential. Different considerations will inform how the stakeholders may prioritize the development of each opportunity area. Additionally, stakeholders may strategically sequence the development of different opportunity areas to maximize impact. The below framework provides an overview of potential dimensions that one may use to assess opportunities unlocked by the FVM.

Different considerations, such as the ones provided in this non-exhaustive list, will inform how stakeholders may prioritize and sequence the development of use cases in different opportunity areas

3. Empowering more people to create and capture value around data to unleash the full potential of an open data economy

Importantly, Web3 and Filecoin, and the FVM specifically, will continue to commoditize the services of the cloud and allow for better value attribution in the data economy. This section explains why this commoditization is needed and how it opens up access to the data economy and augments the ways in which value can be created, captured, and more equitably shared.

Let the data economy be defined broadly as covering businesses around data analytics and transformation (e.g., software for encryption, transcoding and more), data access and monetization (e.g., on-demand streaming services), the financial services facilitating transactions between market participants as well as the sale of hardware required to power these businesses.

Today, the core tenets underpinning this data economy (i.e., cloud storage and compute, content-delivery networks and more) are dominated by a few large companies. These organizations are successfully leveraging economies of scale to moat their businesses. The resulting dominance of a few large players has not only created central points of failure, but also stifled innovation and kept prices artificially high.

Web3, Filecoin, and the FVM stand to challenge these dominant players: The composability and power of crypto-economic incentives allowed the Filecoin community to rapidly bootstrap into existence an ever-expanding and competitively priced network of storage providers. Filecoin’s open access storage network is already the world’s largest of its kind, its capacity eclipsing even some publicly traded centralized storage providers. As Filecoin moves closer to its vision, infrastructure anchoring into the Filecoin network will also commoditize the services around content-delivery (e.g., via Filecoin Saturn) and compute over data (e.g., via Bacalhau).

Secondly, in today’s data economy value creation often stems from collectively generated datasets (think: users creating content for social networks and other signaling interest for that content), the capture of that value is largely privatized in today’s data economy. This is due to the fact that well-functioning mechanisms that would let data originators participate directly in the capture of value around datasets that they collectively create did not successfully emerge in web2. Instead, the most effective in monetizing collectively created datasets turned out to be advertising.

Web3 tech is, however, structurally better positioned to trace and retroactively reward contributors (even for data points with small marginal value) by using its decentralized ledgers. With the FVM, specifically, any number of unaffiliated parties could use the Filecoin token (or a FVM-powered L2 abstraction thereof) to form an incentivized collective around the creation, preservation and monetization of datasets (that otherwise may never emerge):

Imagine individuals monetizing their contributions to the training data of an AI model or to a social graph
Or researchers using the proceeds of selling access to a database of bacterial pathogens (which may be used to inform antibiotic diagnostics or improve drug discovery) to finance the continued curation and maintenance of said database
Or an ever-expanding encyclopedia rewarding its most active contributors directly as opposed to letting the value flow through an intermediary foundation

As projects like these emerge, the ability to capture value will likely depend to an ever greater degree on having access to proprietary datasets. Additionally, the open nature and the composability of such services also allows users to preserve their agency to exit, remain loyal to, or voice their concerns about data-centric services. This agency preservation will force institutions to think about dataset privacy and sharing in more sophisticated ways than “just trust us”. It may also result in allowing users that own their data to keep a (larger) share.

In summary, the trends around commoditization and better value attribution fueled by FVM and Filecoin will not only lead to fiercer competition and more pressure to innovate across all sectors of the data economy, but also solidify Filecoin’s position as the Layer-1 blockchain uniquely poised to power this open data economy.

Conclusion: FVM creates fertile ground for accelerated growth of the Filecoin ecosystem

FVM stands to unlock the development of applications, markets and organizations that will eclipse the scale and breadth of services offered by centralized cloud providers.

This potential has energized the community tremendously: Weeks before the official launch of the FVM, there are hundreds of teams building with the FVM on the testnet. Additionally, FVM-specific hackathons (e.g., Spacewarp) leading up to the launch of the FVM saw the highest registration numbers for a Filecoin-exclusive hackathon ever.

The Filecoin community continues to foster a radically recombinant ecosystem that accelerates the best ideas. FVM is highly cross-compatible thus catalyzing the adoption of Web3 technologies alongside and in partnership with other ecosystems and L1 blockchains.

Similar to how a computer may support multiple operating systems, Filecoin’s Virtual Machine will support multiple runtimes. The first runtime to launch will be the Filecoin Ethereum Virtual Machine (FEVM). This eases the ramp-up for EVM developers who can leverage tried and tested developer tooling infrastructure and port over existing EVM-based smart contracts.

These all are compelling reasons to build with the FVM. If you are interested in following along on what teams are building with the FVM, check out this non-exhaustive list. Also, review the Space Warp initiative to learn more about the program leading up to the launch of Filecoin’s Virtual Machine, including FVM-specific hackathons, grants, community leaderboards and acceleration programs. To familiarize with the technical details around the FVM consider visiting fvm.filecoin.io or watching one of the many videos introducing the technology.

Disclaimer: Personal views and not reflective of my employer nor should be treated as “official”. This information is intended to be used for informational purposes only. It is not investment advice. Although we try to ensure all information is accurate and up to date, occasionally unintended errors and misprints may occur.

[1]: Before March 14, the Filecoin Virtual Machine powered only so-called ‘native actors’. These built-in actors powered Filecoin’s core logic (e.g., computing and posting the proofs necessary to on-board and continuously verify storage onto the network) thus enabling storage deals and enforcing good behavior by the storage providers. Amending the functionality of these native actors requires the involvement of many stakeholders in the Filecoin community via the Filecoin Improvement Proposal (FIP) process. The complexity of this process limited the freedom to experiment, develop and deploy new functionalities on the Filecoin network.

[2]: The absence of better opportunities to more optimally allocate FIL in the Filecoin economy, may explain why the locking rate of Filecoin (38%) is lower than that of many other Layer-1 blockchains (often >60%)

[3] For example, check out Glif’s work on Filecoin pools

🇨🇳Filecoin虚拟机：工作原理、解锁的用例及其重要性

As we’ve written about previously, Filecoin is building an economy of open data services. While today, Filecoin’s economy is primarily oriented around storage services, as other services (retrieval, compute) come online, the utility of the Filecoin network will compound as they all anchor in and can be triggered from the same block space.

The Filecoin Virtual Machine (FVM) allows us to compose these services together, along with other on-chain services (e.g. financial services), to create more sophisticated offerings. This is similar to how the composability in Defi enables the construction of key financial markets services in a permissionless manner (e.g. auto investment capabilities (Yearn) which builds on liquidity pools like Curve and lending protocols like Compound). The FVM is an important milestone for Filecoin, as it allows anyone to build protocols to improve the Filecoin network and build valuable services for other participants. Smart contracts on Filecoin are unique in that they pair web3 offerings with real world services like storage and compute, provided by an open market.

In this blogpost, we’ll unpack a sample use case and its supporting components for the FVM, how these services might compose together, and the potential business opportunities behind them. One of the neat artifacts of what the FVM enables is for modularity between solutions, meaning components built for one protocol can be reused for others. While designing these solutions, hopefully builders (potentially you!) keep this in mind to maximize the customer set.

This is only a subset of the opportunities that the broader Filecoin community has put forward here, but the aim is to show how these services might intertwine and how the Filecoin economy might evolve.

Note: Over time, it’s likely that a number of these services will migrate to subnets via Interplanetary Consensus — but for this blogpost we want to paint a more detailed picture of what the Filecoin economy might look like early on.

The rest of the blog is laid out as follows:

Motivating Use Case
– Perpetual Storage

Managing Flows of Data
– Aggregator / Caching Nodes
– Trustless Notaries
– Retrievability Oracles

Managing Flows of Funds
– Staking
– Cross Chain Messaging
– Automated Market Makers (AMMs)

Motivating Use Case

Perpetual Storage

Perpetual storage is a useful jumping point — as it motivates a number of the other services (both infrastructure and economic) in the network. Permanent storage (which we’ve argued previously is a subset of perpetual storage) is a market valued at ~$300 million.

The basic goal of perpetual storage is straightforward: enable users to specify terms for how their datasets should be stored (e.g. ensure there are always at least 10 copies of this data on the network) — without having to run additional infrastructure to manage repair or renewal of deals. As long as the funds exist to pay for storage services, the perpetual storage protocol should automatically incentivize the repair and restoration of any under-replicated dataset to meet the specified terms.

This tweet thread shares a mental model for how one might create and fund such a contract. In the simplest form, you can boil down a perpetual storage contract to a minimum set of requirements <cid, number of copies, USDC, FIL, rules for an auction>, and the primitives to verify proper execution. Filecoin’s proofs are critical — as they can tell us when data is under-replicated and allow us to trigger auctions to bring replication back to a minimum threshold.

In order to build the above, a number of services are required. While one protocol could try and solve for all the services required in a monolithic architecture, modular solutions would allow for re-use in other protocols. Below we’ll cover some of the middleware services that might exist to help enable the full end-to-end flow.

Managing the Flow of Data

A sample flow for how data may move across the Filecoin economy.

Aggregator / Caching Nodes

In our perpetual storage protocol, the client specifies some data that should be replicated. This leads to an interesting UX question — in many cases, users don’t want to have to wait for storage proofs to land on-chain to know the data will be stored and replicated. Instead, users might prefer to have their data persisted by an incentivized actor with guarantees that all other services will occur, similar to the role that data on-ramps play (like Estuary and NFT.Storage).

Note: One of the nice things about content addressing is that relying on incentivized actors is totally optional! Users wait for their data to land on-chain themselves if they’d like — or send their data to an incentivized network (as described here) that manages this onboarding process for them.

One solution to this UX question might be to design a protocol for an incentivized IPFS nodes operating with weaker storage guarantees to act as incentivized caches. These nodes might lock some collateral (to ensure good behavior, enact penalties if services are not rendered properly), and when data is submitted return a commitment to store the data Filecoin according to the specified requirements of the client. This commitment might include a merkle proof (showing the client’s data was included inside of a larger set of data that might be stored in aggregate), a max block height by which the deal would start, etc.

Revenue Model: One neat feature of this design for aggregator services is they can take small microtransactions on service on both sides — a small fee from clients (pricing the temporary storage costs, compute for aggregation, bandwidth costs, etc), and potentially an onboarding bounty from an auction protocol (an example described in the Trustless Notaries section below).

Trustless Notaries (Auction Protocols)

To actually make the deal on Filecoin, we might want to automate the process of using Filecoin Plus. Filecoin has two types of storage deals — verified deals and unverified deals. Verified deals refer to storage deals done via the Filecoin Plus program, and are advantageous for data clients as it leverages Filecoin’s network incentives to help reduce the cost of storage.

Today, Filecoin Plus uses DataCap (allocated by Notaries) to help imbue individual clients with the ability to store fixed amounts of data on the network. Notaries help add a layer of social trust to publicly verify that clients are authentic and prevent sybils from malicious actors. This works when clients are human — but it leaves an open question on how one can verify non-human (e.g. smart contract!) actors.

One solution would be to design a trustless notary. A trustless notary is a smart contract, where it would be economically irrational to attempt to sybil the deal-making process.

A basic flow of how a trustless notary interact with clients and storage providers.

What might this look like? A trustless notary might be an on-chain auction, where all participants (clients, storage providers) are required to lock some collateral (proportional to the onboarding rate) to participate. When the auction is run, storage providers can submit valid bids (even negative ones!) accommodating the requirements of the client. By running an auction via a smart contract — everyone can verify that the winning bidder(s) came from a transparent process. Economic collateral (both from the clients and storage providers) can be used to disincentivize malicious actors and ensure closed auctions result in on-chain deals. The auction process might also allow for more sophisticated negotiations between a prospective client and storage provider — not just on the terms of the deal, but on the structure of the payment as well. A client looking to control costs might offer a payment in fiat (to cover a storage provider’s opex) along with a loan in Filecoin (and in return expect a large share of the resulting block rewards).

Revenue Model: For running the auction, the notary maintainer might collect some portion of fees for the deal clearing, collect a fee on locked collateral (e.g. if staked FIL is used as the collateral some slice of the yield), or some combination of both. One nice artifact about running a transparent auction is it can also allow for negative prices for storage (which can be used to fund an insurance fund for datasets, bounties for teams that help onboard new clients, distributed to tokenholders who participate in governance of the trustless notary, etc).

Note: Trustless notaries (if designed correctly) have a distinct advantage of being permissionless — where they can support any number of use cases that might not want humans in the loop (e.g. ETL pipelines that want to automatically store derivative datasets). Today, 393 PiB of data have been stored via verified deals.

In our perpetual storage use case, we’d likely want to be able to leverage the trustless notary to trigger the deal renewals and auctions any time a dataset is under replicated. On the first iteration, this means that storage providers might grab the data out of the caching nodes and on subsequent iterations from other storage providers who have copies of the data.

Retrievability Oracles

For both the deals struck by the trustless notaries, as well as for the caching done by the aggregators — we need to ensure data is properly transferred and protect clients against price gouging. One solution to this problem are retrievability oracles.

Retrievability oracles are consortiums that allow a storage provider to commit to a maximum retrieval price for the data stored. The basic mechanism is as follows:

When striking a deal with a client, a storage provider additionally can commit to retrieval terms.

In doing so, the storage provider locks collateral with the retrievability oracle along with relevant commitment (e.g. max price to charge per GiB for some duration).

In normal operation, the client and the storage provider continue to store and retrieve data as normal.

In the event the storage provider refuses to serve data (against whatever terms previously agreed), the client can appeal to the retrievability oracle who can request the data from the storage provider.
→ If the storage provider serves the data to the oracle, the data is forwarded to the client.
→ If the storage provider doesn’t serve the data, the storage provider is slashed.

Revenue Model: For running the retrieval oracles, the consortium may collect fees (either from storage providers for using the service, fees for accepting different forms of collateral, yield from staked collateral, or perhaps upon retrieval of data on behalf of the client).

By including a retrievability oracle in this loop, we can ensure incentives exist to proper transfer of data at the relevant points of the lifecycle of our perpetual storage protocol.

Building the Economic Loop

With all of the above, we’ve effectively created incentivized versions of the relevant components for the dataflows. Now with this out of the way, it’s worthwhile to focus on the economic flows and how we can ensure that our perpetual storage protocol can fully fund the operations above.

A sample view on the economic flows for enabling perpetual storage. Note other DeFi primitives might help mitigate risk and volatility (e.g. options, perpetuals)

Aside from the initial onboarding costs, the remainder of the costs will come down to storage and repairs. While there are many approaches to calculating the upfront “price”, a conservative strategy will likely involve the perpetual storage protocol generating revenue in the same currencies (fiat, Filecoin) as the liabilities incurred due to storage (i.e the all-in costs of storage). This approach relies on the fact that storage on Filecoin has two types of (fairly predictable) expenses:

The Filecoin portion (the cost of a loan in FIL, the FIL required to send messages over the network) and

The fiat portion (the cost of the harddrive, running the proofs, electricity)

A perpetual storage protocol that builds an endowment in the same mix of currencies as its liabilities can ensure that its costs are fully covered despite the volatility of a single token (as might be the case if the endowment was backed by a single currency). In addition, by putting the capital to work and generating yield, the upfront cost for the client can be reduced.

Staking

To generate yield in Filecoin, the natural place to focus would be on the base protocol of Filecoin itself. Storage providers are required to lock FIL as collateral in order to onboard capacity, and by running Filecoin’s consensus, they earn a return in the form of block rewards and transaction fees. The collateral requirements for the now ~4,000 storage providers on the Filecoin network create a high demand to borrow the FIL token. FIL staking would allow a holder of Filecoin to lend FIL — locking their capital with a storage provider and receiving yield by sharing in the rewards of the storage provider.

Today, programs exist with companies like Anchorage, Darma, and Coinlist to deploy Filecoin with some storage providers, but these programs can service only a subset of the storage providers and don’t support protocols (such as our perpetual storage protocol) that might be looking to generate yield.

Staking protocols can uniquely solve this problem — allowing for permissionless aggregation (allowing smart contracts to generate yield), and deployment of Filecoin to all storage providers directly on-chain. Similar to Lido or Rocketpool in Ethereum, these protocols could also create tokenized representations of the yield bearing versions of Filecoin — further allowing these tokenized representations to be used as collateral in other services (e.g. the trustless notary, retrievability oracles listed above).

Revenue Model: Staking protocols can monetize in a number of ways — including taking a percentage of the yield generated from deployed assets.

Note: Today, roughly 34% of the circulating supply of Filecoin is locked as collateral, less than half of some proof of stake networks (e.g. 69% for Solana, 71% for Cardano).

Cross Chain Messaging

The other portion of the storage costs (the fiat denominated portion) will need to generate yield — and while it makes sense that some of these solutions might be deployed on the FVM, it’s worth discussing the areas where DeFi in other ecosystems might be used to fund operations on Filecoin.

Cross chain messaging could connect the Filecoin economy to other ecosystems allowing perpetual storage protocols to create pools for their non-Filecoin assets (e.g. USDC) on other networks. This would allow these protocols to generate yield on stablecoins in deeper markets (e.g. Ethereum) and bridge back portions as needed to Filecoin when renewing deals. Perpetual storage protocols can offer differentiated sources of recurring demand for these lending protocols, as they likely will have a much more stable profile in terms of their deployment capital given their cost structure — similar to pension funds in the traditional economy.

Given an early source of demand for many perpetual storage protocols include communal data assets (e.g. NFTs, DeSci datasets) which primarily involve on-chain entities, it’s likely that over time we’ll see steady demand for these cross chain services. For cross chain messaging protocols, this offers a unique opportunity to capture value between the “trade” of these different economies — as services are rendered on either side.

Automated Market Makers (AMMs)

One last component worth mentioning in the value flow for our perpetual storage protocol is a need for AMMs. The protocols listed above offer solutions for yield generation, but at the moment of payment conversion of assets will likely need to happen (e.g. converting from a staked representation of Filecoin to Filecoin itself). This is where AMMs can help!

Outside of helping convert staked representations of Filecoin to Filecoin, AMMs can also be useful for allowing perpetual storage protocols to accept multiple types of currencies for payment (e.g. allowing ETH or SOL to be swapped into the appropriate amounts of FIL and stablecoins to fund the perpetual storage protocol). These conversions might happen on other chains as well — but similar to the traditional economy, it’s likely that over time we’ll see trade balances emerge between these economies and swaps to happen on both sides.

Conclusion

These examples are a subset of the use cases and business models that the FVM enables. While I focused on tracing the flow of data and value through the Filecoin economy, it’s worth underscoring this is just a single use case — many of these components could be re-used for other data flows as well. Given all services require remuneration, these naturally will tie to economic flows as well.

As Filecoin launches its retrieval markets and compute infrastructure, the network will support more powerful interactions — as well as creating new business opportunities to connect services to support higher order use cases. Furthermore, as more primitives are built out on the base layer of Filecoin (e.g. verifiable credentials around the greenness of storage providers, certifications for HIPAA compliance), you can imagine permutations of each of the base primitives built above allowing for more control at the user level about the service offerings they wish to consume (e.g. a perpetual storage protocol that will only allow green storage providers to participate in storing their data).The FVM dramatically increases the design space for developers looking to build on the hardware and services in the Filecoin economy — and can hopefully also provide tangible benefits for all of web3, allowing for existing protocols and services to now be connected to a broader set of use cases.Disclaimer: Personal views and not reflective of my employer nor should be treated as “official”. This information is intended to be used for informational purposes only. It is not investment advice. Although we try to ensure all information is accurate and up to date, occasionally unintended errors and misprints may occur.Special thanks to @duckie_han for helping shape this.

🇨🇳Filecoin虚拟机(FVM)上的主要商业模式

Introduction

The Filecoin network launched in October 2020, introducing an incentive layer to the IPFS protocol and enabling open services for data. Today, the network primarily focuses on storage as an open service — however, an earlier blog post summarizing the direction of the network presents the larger vision for Filecoin to also “include the infrastructure to store, distribute, and transform data.”

The Filecoin “master plan” to execute this vision begins with accumulating a critical mass of hardware resources (storage capacity and compute power). This is important, as the only way web3 infrastructure will credibly subsume (not just compete with) the traditional cloud, is if it can operate at orders of magnitude and scale that exceed current offerings. While no web3 protocol has achieved this yet, Filecoin has made the most meaningful progress.

Furthermore, to drive long-term demand for network resources (capacity, retrieval, and compute power), it’s critical to bootstrap the network with useful data and to develop the software and tooling to enable compute and composable services on top of data. Ultimately, the demand for these services will be the basis for a robust economy on top of the Filecoin blockchain.

Filecoin’s tokenomics were designed with this long-term vision in mind and have helped incentivize the network’s rapid growth and development; to date, the network contains >16 EiB of committed storage capacity, with an explosion of ecosystem developments driving demand for network resources. This growth and continued success is due to the efforts of the Filecoin community, an interconnected network of Storage Clients, Developers, Storage Providers (SP’s), Ecosystem Partners, and Token Holders.

Filecoin the Token

This global digital economy also requires a singular valid currency for transactions. Since Filecoin is a permissionless market with cryptographically verifiable goods, design constraints could only be satisfied with a native utility token, the filecoin (see Engineering the Filecoin Economy for further color). The FIL token serves a number of functions:

Pays for messages on-chain.
Used as collateral creating economic incentives for reliable data storage over time and to secure the blockchain (and subnets or “shards” once Interplanetary Consensus goes live!).
Burned to regulate shared resources (blockspace).

Unlike other storage networks, the Filecoin token is primarily concerned with incentivizing reliable services and facilitating the on-chain economy. The storage market exists off-chain, but is anchored on-chain by messages containing cryptographic proofs of storage. Crucially, this means that the token can accrue value without creating undue pressure on users of the network’s various services (storage, retrieval, compute, etc), while Token Holders can still benefit from FIL consumption due to increased utilization (i.e. demand for blockspace).

Cryptoeconomics for Token Holders

Filecoin’s Token Allocation

Filecoin’s cryptoeconomic constructions help ensure that value accrual for participants aligns with the long-term utility of the protocol. As such, the initial allocation of Filecoin at network launch was designed to support a protocol that incentivizes sustainable value creation (see below).

Source: Filecoin Spec. For illustrative purposes only.

As of current protocol specifications, a maximum of 2 Billion FIL will ever be created; of this, 70% was allocated for storage and related services (i.e. minting tokens to reward Storage Providers), and 20% (vesting over 6 years, starting October 2020) was allocated to Protocol Labs and the Filecoin Foundation to support network development, adoption, and ecosystem growth. The remainder was FIL allocated to SAFT investors, which began vesting in October 2020 over time periods ranging between 6 months and 3 years.

2 Billion FIL is the maximum amount that could theoretically be minted and vested. However, it may not be the amount of FIL that enters the network’s circulating supply:

The 300 Million FIL held as a Mining Reserve would require a protocol upgrade to be tapped — meaning the community determines how much (if any) of this should be released.
The growth of the network requires utilization and consumption of the token, reducing the available token supply. As of September 2022 end, ~520 Million FIL has been minted or vested; of this, approximately 70% is in circulation since significant amounts of FIL is burned (permanently removed from circulation) due to network transaction fees, or locked as collateral to secure the network and incentivize reliable storage. As the network matures, the rate of token emissions (vesting and minting) is expected (and was designed) to taper, since Filecoin has finite vesting schedules and a minting model in which emissions are indexed to network growth.

Token Supply: Sources and Sinks

Token Holders can benefit from an understanding of the sources and sinks that determine token supply over time because it potentially informs their relative purchasing power.

For informational purposes only and may be based on assumptions/third party info.

Minting, Vesting, Locking, and Burning contribute to the net inflow or outflow of tokens from the circulating supply. See below for the evolution of Filecoin’s token supply since mainnet launch.

Source: Starboard Ventures. Data as of September 30, 2022. This chart represents purely historical data. Filecoin circulating supply cannot be accurately predicted because it is subject to market inputs.

Currently, the network is expanding; tokens vesting for The Filecoin Foundation and Protocol Labs support ecosystem development and growth, while block rewards for Storage Providers subsidize the onboarding of deals and storage capacity. This presents a tremendous opportunity for Token Holders to participate in the growth of the ecosystem since they can lend to SP’s, facilitating SP’s much-needed access to FIL, and share in resulting network rewards. In this way, Token Holders can index themselves to token emissions whilst also supporting network growth and advancing Filecoin’s vision.

Ecosystem Roadmap: Implications for Token Holders

We are at an incredibly exciting point in the Filecoin Ecosystem, with continued improvements in User Programmability, Data Onboarding, Data Retrievability, Scalability, and Computation slated for implementation over the next 4 quarters. These initiatives should have positive effects for Token Holders through supporting client demand, unlocking various network use cases, and ultimately driving usage of the token. Some of these important improvements as well as their potential economic implications are discussed below.

Filecoin Virtual Machine (FVM)

The Filecoin Virtual Machine unlocks boundless possibilities, ranging from programmable storage primitives, to cross-chain interoperability bridges, to data-centric Decentralized Autonomous Organizations (DAOs), and Layer 2 solutions. In short, this means smart contracts and user programmability are coming to Filecoin! Slated for release in H1 2023, this upgrade to the network should positively impact Token Holders since it may increase FIL’s use cases, and will likely impact the “sinks” component of the circulating supply equation (FIL Locked and Burnt).

FVM and “Sinks” to Token Supply

As long as there is action or utility on the network, FIL tokens will be consumed (“burned”) to compensate for the computation and storage resources on-chain messages consume. Introducing smart contract capabilities may increase demand for block space, resulting in increased burnt FIL. This rate of token consumption is in the hands of the community as network participants compete for on-chain resources.

Furthermore, the FVM protocol upgrade may increase the amount of FIL put to use and/or locked. Currently, the majority of FIL locked on the network comes from Storage Provider collateral. FVM introduces the possibility of substantial funds locked to support various smart contract applications. Decentralized Finance (DeFi) protocols are just a subset of applications that may leverage Filecoin’s Proof of Useful Work chain, not only increasing token burn, but also generating new use cases for locking/staking FIL.

The FVM is unique in that it brings expressivity to the Filecoin network — while today you can contract a storage provider to store your data, with the FVM you can add additional rules, automation, and composability with other services (e.g. DeFi). As other components of the roadmap land (retrieval markets, compute over data, etc) — we expect the FVM to facilitate more sophisticated offerings leveraging Filecoin’s base primitives, promoting broader network adoption.

Note: For a flavor of what might be possible, this tweet thread might help elucidate how one might use smart contracts and the base primitives of Filecoin to build more sophisticated offerings.

Filecoin Plus and Data Onboarding

Filecoin Plus (FIL+) emerged as a pragmatic solution to incentivize productive use of the Filecoin Network. Since it is difficult to algorithmically distinguish real useful data from generated randomness, FIL+ introduced a layer of social trust to the network. In the FIL+ program, clients that undergo a verification process are granted a novel resource, DataCap, with which they can spend and make storage deals with SP’s. SP’s are incentivized to enter into storage deals with Clients who spend DataCap, because FIL+ deals boost the amount of block rewards they receive as compared to regular (non-FIL+) deals, or through committing storage capacity to the network. In order to receive these boosted rewards from FIL+, Storage Providers put up more collateral (roughly 10x the collateral for non-FIL+ storage sectors), and lock more tokens on the network.

Year to date, we have seen robust growth in data from FIL+ clients, indicating increasing amounts of useful data onboarded.

Source: Starboard Ventures. Data as of September 30, 2022. For illustrative purposes only.

From a tokenomics perspective, this trend persisting may increase tokens locked on the network and decrease the circulating token supply. More importantly, as the Filecoin network grows towards its mission of storing useful data, so too may the consumption and utilization of the token.

Interplanetary Consensus

Interplanetary Consensus (IPC) is an upcoming network upgrade designed to increase the use cases that Filecoin can support while also improving the network’s scalability, throughput, and finality. An exciting blockchain innovation that horizontally scales the chain with subnets (shards), IPC also has some potential cryptoeconomic implications, and may increase the:

amount of FIL burned while also decreasing gas fees for users due to scalability improvements coupled with FVM utilization.
amount of FIL locked (or staked) as collateral in order to secure subnets.
demand for FIL consumption with shards supporting customized on-chain use cases.

Conclusion

As a novel data storage and application network, Filecoin’s mission is to create a decentralized, efficient, and robust foundation for humanityʼs information. Developments in the Filecoin ecosystem to support this mission are numerous, ongoing, and can positively impact Token Holders. By design, the protocol’s cryptoeconomics intend to reward long-term participation and contribution to the network. Community participants can benefit from understanding the tokenomics, and, in particular the sources of token inflow and outflow that undergird the token economy. Looking forward, continuing improvements to the protocol may present new and exciting use cases for the network and the token, potentially enabling further Token Holder engagement and positively driving the value proposition of this fast-growing ecosystem.

Many thanks to Jon Victor (@jnthnvctr), as well as to the TLDR and CryptoEconLab teams for their help shaping this.

🇨🇳Filecoin 加密经济学：进化中的经济体

Overview

There is an overwhelming amount of work going on in the Filecoin ecosystem, and it can be difficult to see how all the pieces fit together. In this blog post, I’m going to explain the structure of Filecoin and various components of the roadmap to hopefully simplify navigating the ecosystem. This blog is organized into the following sections:

What is Filecoin?
Diving into the Major Components
Final Thoughts

This post is intended to be a primer on the major goings-on in Filecoin land; it is by no means exhaustive of everything happening! Hopefully, this post serves as a useful anchor and the embedded links are jumping-off points for the intrepid reader.

What is Filecoin?

My short answer: Filecoin is enabling open services for data, built on top of the IPFS protocol.

IPFS allows data to be uncoupled from specific servers —reducing the siloing of data to specific machines. In IPFS land, the goal is to allow permanent references to data — and do things like compute, storage, and transfer — without relying on specific devices, cloud providers, or storage networks. Why content addressing is super powerful and what CIDs unlock is a separate topic — worthy of its own blog post — that I won’t get into here.

Filecoin is an incentivized network on top of IPFS — in that it allows you to contract out services around data on an open market.

Today, Filecoin focuses primarily on storage as an open service— but the vision includes the infrastructure to store, distribute, and transform data. Looking at Filecoin through this lens, the path the project is pursuing and the bets/tradeoffs that are being taken become clearer.

It’s easier to bucket Filecoin into a few major components:

There are 3 core pillars of Filecoin, enabled by 2 critical protocol upgrades

Storage Market(s): Exists today (cold storage), improvements in progress.
Retrieval Market(s): In progress
Compute over Data (Off-chain Compute): In progress
FVM (Programmable Applications): In progress
Interplanetary Consensus (Scaling): In progress

Diving into the Major Components

Storage Market(s)

Storage is the bread and butter of the Filecoin economy. Filecoin’s storage network is an open market of storage providers — all offering capacity on which storage clients can bid. To date, there are 4000+ storage providers around the world offering 17EiB (and growing) of storage capacity.

Filecoin is unique in that it uses two types of proofs (both related to storage space and data) for its consensus: Proof-of-replication (PoRep) and Proof-of-Spacetime (PoST).

PoRep allows a miner to prove both that they’ve allocated some amount of storage space AND that there is a unique encoding of some data (could be empty space, could be a user’s data) into that storage space. This proves that a specific replica of data is being stored on the network.
PoST allows a miner to prove to the network that data from sets of storage space are indeed still intact (the entire network is checked every 24 hrs). This proves that said data is being stored (space) over time.

These proofs are tied to economic incentives to reward miners who reliably store data (block rewards) and severely penalize those who lose data (slashing). One can think of these incentives like a cryptographically enforced service-level agreement, except rather than relying on the reputation of a service provider — we use cryptography and protocols to ensure proper operation.

In summary, the Filecoin blockchain is a verifiable ledger of attestations about what is happening to data and storage space on the network.

A few features of the architecture that make this unique:

The Filecoin Storage Network (total storage capacity) is 17EiB of data — yet the Filecoin blockchain is still verifiable on commodity hardware at home. This gives the Filecoin blockchain properties similar to that of an Ethereum or a Bitcoin, but with the ability to manage internet-scale capacity for the services anchoring into the blockchain.
This ability is uniquely enabled by the fact that Filecoin uses SNARKs for its proofs — rather than storing data on-chain. In the same way zk-rollups can use proofs to assert the validity of some batched transactions, Filecoin’s proofs can be used to verify the integrity of data off-chain.
Filecoin is able to repurpose the “work” that storage providers would normally do to secure our chain via consensus to also store data. As a result, storage users on the network are subsidized by block rewards and other fees (e.g. transaction fees for sending messages) on the network. The net result is Filecoin’s storage price is super cheap (best represented in scientific notation per TiB/year).
Filecoin gets regular “checks” via our proofs about data integrity on the network (the entire network is checked 24 hrs!). These verifiable statements are important primitives that can lead to unique applications and programs being built on Filecoin itself.

While this architecture has many advantages (scalability! verifiability!), it comes at the cost of additional complexity — the storage providing process is more involved and writing data into the network can take time. This complexity makes Filecoin (as it is today) best suited for cold storage. Many folks using Filecoin today are likely doing so through a developer on-ramp (Estuary.tech, NFT.Storage, Web3.Storage, Chainsafe’s SDKs, Textile’s Bidbot, etc) which couples hot caching in IPFS with cold archival in Filecoin. For those using just Filecoin alone, they’re typically storing large scale archives.

However, as improvements land both to the storage providing process and the proofs, expect more hot storage use cases to be enabled. Some major advancements to keep an eye on:

✅ SnapDeals — coupled with the below, storage providers can turn the mining process into a pipeline, injecting data into existing capacity on the network to dramatically lessen time to data landing on-chain.
🔄 Sealing-as-a-service / SNARKs-as-a-service — allowing storage providers to focus on data storage and outsource expensive computations to a market of specialized providers.
🔄 Proofs optimizations — tuning hardware to optimize for the generation of Filecoin proofs.
🔄 More efficient cryptographic primitives — reducing the footprint or complexity of proof generation.

Note: All of this is separate from the “read” flow — which techniques for faster reads exist today via unsealed copies. However, for Filecoin to get to web2 latency we will need Retrieval Market(s), discussed in the next section.

Retrieval Market(s)

The thesis with retrieval markets is straightforward: at scale, caching data at the edge via an open market can solve for the speed of light problem and result in performant delivery at lower costs than traditional infrastructure.

Why might this be the case? The argument is as follows:

The magic of content addressing (using fingerprints of content as the canonical reference) means data is verifiable.
This maps neatly to building a permissionless CDN — meaning anyone can supply infrastructure and serve content — as end users can always verify that the content they receive back is the content they requested (even from an untrusted computer).
If anyone can supply infrastructure into this permissionless network, a CDN can be created from a market of edge-caching nodes (rather than centrally planning where to put these nodes) and use incentive mechanisms to bootstrap hardware — leading to the optimal tradeoff on performance and cost.

The way retrieval markets are being designed on Filecoin, the aim is not to mandate a specific network to be used — rather to let an ecosystem evolve (e.g. Magmo, Ken Labs, Myel, Filecoin Saturn, and more) to solve the components involved with building a retrieval market.

Source: https://www.youtube.com/watch?v=acqTSORhdoE&ab_channel=Filecoin (From April ‘22)

This video is a good primer on the structure and approach of the working group and one can follow progress here.

Note: Given latency requirements, retrievals happen off-chain, but the settlement for payment for the services can happen on-chain.

Compute over Data (Off-chain Compute)

Compute over data is the third piece of the open services puzzle. When one thinks of what needs to be done with data, it’s typically not just storage and retrieval — users also want to be able to transform the data. The goal with these compute over data protocols are generally to perform computation over IPLD.

For the unfamiliar, IPLD aims to be the data layer for content-addressed systems. It can be used to describe a filesystem (like UnixFS which IPFS uses), Ethereum data, Git data, — really anything that is hash linked. This video might be a helpful primer.

The neat thing about IPLD being generic is that it can be an interface for all sorts of data — and by building computation tools that interact with IPLD, we reduce the complexity for teams building these tools to have their networks be compatible with a wide range of underlying types of data.

Note: This should be exciting for any network building on top of IPFS / IPLD (e.g. Celestia, Gala Games, Audius, Ceramic, etc)

Of course, not all compute is created equal — and for different use cases, different types of compute will be needed. For some use cases, there might be stricter requirements for verifiability — and one may want a zk proof along with the result to know the output was correctly calculated. For others, one might want to keep the data entirely private — and so instead might require fully homomorphic encryption. For others, one may want to just run batch processing like on a traditional cloud (and rely on economic collateral or reputational guarantees for correctness).

Source: https://www.youtube.com/watch?v=-d4iJm-RbyA&t=537s&ab_channel=ProtocolLabs

There are a bunch of teams working on different types of compute — from large scale parallel compute (e.g. Bacalhau), to cryptographically verifiable compute (e.g. Lurk), to everything in between.

One interesting feature of Filecoin is that the storage providers have compute resources (GPUs, CPUs — as a function of needing to run the proofs) colocated with their data. Critically, this feature sets up the network well to allow compute jobs to be moved to data — rather than moving the data to external compute nodes. Given that data has gravity, this is a necessary step to set the network up to support use cases for compute over large datasets.

Filecoin is set up well to have compute layers be deployed on top as L2s.

One can follow the compute over data working group here.

FVM (Programmable Applications)

Up until this point, I’ve talked about three services (storage, retrieval, and compute) that are related to the data stored on the Filecoin network. These services and their composability can lead to compounding demand for the services of the network — all of which ultimately anchor into the Filecoin blockchain and generate demand for block space.

But how can these services be enhanced?

Enter the FVM — Filecoin’s Virtual Machine.

The FVM will enable computation over Filecoin’s state. This service is critical — as it gives the network all the powers of smart contracts from other networks — but with the unique ability to interact with and trigger the open services mentioned above.

With the FVM, one can build bespoke incentive systems to make more sophisticated offerings on the network:

DataDAOs
Retrievability Oracles
Perpetual storage contracts / storage endowments
Repair bounties
Undercollateralized lending markets for storage providers
ETL pipelines
… and so much more

Filecoin’s virtual machine is a WebAssembly (WASM) VM designed like a hypervisor. The vision with the FVM is to support many foreign runtimes, starting with the Ethereum Virtual Machine (EVM). This interoperability means Filecoin will support multiple VMs — on the same network contracts designed for the EVM, MoveVM, and more can be deployed.

By allowing for many VMs, Filecoin developers can deploy hardened contracts from other ecosystems to build up the on-chain infrastructure in the Filecoin economy, while also making it easier for other ecosystems to natively bridge into the services on the Filecoin network. Multiple VM support also allows for more native interactions between the Filecoin economy and other L1 economies.

Note the ipld-wasm module — the generalized version of this will be the IPVM work (which could be backported here). Source: https://fvm.filecoin.io

The FVM is critical as it provides the expressiveness for people to deploy and trigger custom data services from the Filecoin network (storage, retrieval, and compute). This feature allows for more sophisticated offerings to be built on Filecoin’s base primitives, and expand the surface area for broader adoption.

Note: For a flavor of what might be possible, this tweet thread might help elucidate how one might use smart contracts and the base primitives of Filecoin to build more sophisticated offerings.

Most importantly, the FVM also sets the stage for the last major pillar to be covered in this post: interplanetary consensus.

One can follow progress on the FVM here, and find more details on the FVM here.

Interplanetary Consensus (Scaling)

Before diving into what interplanetary consensus is, it’s worth restating what Filecoin is aiming to build: open services for data (storage, retrieval, compute) as credible alternatives to the centralized cloud.

To do this, the Filecoin network needs to operate at a scale orders of magnitude above what blockchains are currently delivering:

Product requirements for the Filecoin network.

Looking at the above requirements, it may seem contradictory for one chain to target all of these properties. And it is! Rather than trying to force all these properties at the base layer, Filecoin is aiming to deliver these properties across the network.

With interplanetary consensus, the network allows for recursive subnets to be created on the fly. This framework allows each subnet to tune its own trade off between security and scalability (and recursively spin up subnets of its own) — while still checkpointing information to their respective parent subnets.

This setup means that while Filecoin’s base layer can be highly secure (allowing many folks to verify at home on commodity hardware) — Filecoin can have subnets that are natively connected that can make different trade offs, allowing for more use cases to be unlocked.

In this diagram, the root would be the Filecoin base layer. Source: https://research.protocol.ai/blog/2022/scaling-blockchains-with-hierarchical-consensus/

A few interesting properties based on how interplanetary consensus is being designed:

Each subnet can spin up their own subnets (enabling recursive subnets)
Native messaging up, down, and across the tree — meaning any of these subnets can communicate with each other
Tunable tradeoffs between security and scalability (each subnet can choose their own consensus model and can choose to maintain their own state tree).
Firewall-esque security guarantees from children to parents (think of each subnet as being like a limited liability chain up to the tokens injected from the perspective of the parent chain).

To double click on some of the things interplanetary consensus sets Filecoin up for:

Because subnets can have different consensus mechanisms, interplanetary consensus opens the door for subnets that allow for native communication with other ecosystems (e.g. a Tendermint subnet for Cosmos).
Enabling subnets to tune between scalability and security (and enabling communications to subnets that make different trade offs) means Filecoin can have different regions of the network with different properties. Performant subnets can get hyper fast local consensus (to enable things like chat apps) — while allowing for results to checkpoint into the highly secure (and verifiable and slow) Filecoin base layer.
In a very high throughput subnet (a single data center, running a few nodes) — the FVM/IPVM work could be used to simply task schedule and execute computation directly “on-chain” — with native messaging and payment bubbling back up to more secure base layers.

Learn more by reading this blogpost and following the progress of ConsensusLab. This Github discussion may also be useful to contextualize IPC vs L2s.

Final Thoughts

So, after reading all the above, hopefully clearer what Filecoin is — and how it’s not exactly like any other protocol out there. Filecoin’s ambition is not just to be a storage network (as Tesla’s ambition was not to just ship the Roadster) — the goal is to facilitate a fully decentralized web powered by open services.

Compared to most other web3 infra plays, Filecoin is aiming to be substantially more than a single service. Compared to most L1s, Filecoin is targeting a set of use cases that are uniquely enabled through the architecture of the network. Excitingly, this means rather than competing for the same use cases, Filecoin can uniquely expand the pie for what can actually be done on crypto rails.

Disclaimer: Personal views, not reflective of my employer nor should be treated as “official”. This is my distillation of what Filecoin is and what makes it different based on my time in the ecosystem. Thanks to @duckie_han and @yoitsyoung for helping shape this.

🇨🇳Filecoin当前状态和发展方向总结

Follow us on

TL;DR

2023 Retrospective:

2023 Retrospective

2024 Outlook

Final Thoughts

Filecoin’s Retrieval Markets and the RMWG

CDNs and the role of Project Saturn

Saturn approach and traction to date (Aug 2023)

What’s next for Saturn

How Filecoin Incentives Work

Economics of a Storage Provider

Filecoin Staking

Over/ Fully collateralized vs. Undercollateralized

Single Pool vs Multi-Pool

Risks

What about Ethereum Staking/Lending protocols entering the market?

Economics of the Protocol

Is there enough FIL in the market to supply for lending?

Is there enough demand for borrowing?

What does the future look like?

Where I think this can go

The Present Scenario

Using Centralized Systems

Building It Themselves

Doing Nothing

Building Truly Decentralized Compute

Solution 1: Build on top of open-source compute-over-data platforms.

Solution 2: Build on top of decentralized data protocols.

Decentralization is About Maximizing Choices

Reduction of infra cost (compute and storage)

Problem

Solution

1. Pushing up the supply curve to create a more competitive marketplace

2. Balancing the economy of these markets with crypto-economic design

3. Reducing overhead costs

Verification of Creatorship & Humanity

Problem

Solution

Proof of Humanity and Creatorship with cryptographic proof of origination on-chain

Proof of Creatorship & Humanity with Digital Signature

Proof of Authenticity with IPFS & Merkle Tree

Infusion of Democracy in AI

Problem

Solution

Today:

What blockchain will enable

Installation of Incentives for Data Contribution

Problem

Solution

Some good resources for DePIN deep dive

History of P2P

Decentralized Storage Network Landscape

FVM

Permanent Storage

Web 3’s First Commercial Market

Consequences of Composability

Introduction and Overview

1. The FVM brings user programmability to Filecoin creating a watershed moment for innovation

2. How we see the use cases unlocked by the FVM

3. Empowering more people to create and capture value around data to unleash the full potential of an open data economy

Conclusion: FVM creates fertile ground for accelerated growth of the Filecoin ecosystem

Motivating Use Case

Managing the Flow of Data

Building the Economic Loop

Introduction

Filecoin the Token

Cryptoeconomics for Token Holders

Filecoin’s Token Allocation

Token Supply: Sources and Sinks

Ecosystem Roadmap: Implications for Token Holders

Filecoin Virtual Machine (FVM)

FVM and “Sinks” to Token Supply

Filecoin Plus and Data Onboarding

Interplanetary Consensus

Conclusion

Overview

What is Filecoin?

Diving into the Major Components

Storage Market(s)