Editor’s Note: This blogpost is a summarised repost of the original content published on 25 April 2025, by Molly Mackinlay from FilOz. Founded in 2024, FilOz is a team of 14 protocol researchers, engineers, TPMs, and community engineers focused on securing, upgrading, and expanding the Filecoin network.


Introduction

Nearly five years into its launch, Filecoin – now the world’s largest decentralized storage network, caters to more than one type of user. It serves a diverse range of customer segments where adoption and traction are already taking hold. A strong signal of product-market fit is paying demand – users who see enough value to pay for the service.

This blogpost offers a comprehensive look at Filecoin’s Ideal customer profiles (ICPs), analyzes current adoption trends, identifies high-opportunity areas based on payment signals and data volume, and outlines key strategies to drive future client success.


Identifying and Targeting Ideal Customer Profiles (ICPs)

At the heart of this demand push is a clear focus on Ideal Customer Profiles (ICPs) – specific categories of users the network is targeting for adoption. These ICPs represent use cases where Filecoin’s decentralized storage offers immediate value. Paying demand is one of the strongest indicators of product-market fit, and Filecoin is seeing traction in several key segments:

Four Core Vertical Markets:

  1. Large-Scale Data Clients (Primarily Web2):
    Traditionally focused on archival storage, now evolving toward faster reads/writes due to AI workloads. Think multi-exabyte archives with 24-hour retrieval tolerances.
  2. Web2 Object Storage (e.g., AWS S3 alternative):
    Demands fast access, pricing competitiveness, Snowflake integrations, and strict controls over data locality—especially amid rising geopolitical tensions.
  3. Web3 Object Storage:
    Adoption from decentralized websites, NFTs, social apps, and AI agents. Nascent potential includes data DAOs and decentralized AI use cases.
  4. Web3 Chain Storage:
    Early traction from chains like Solana and Cardano, with data from Ethereum L2s scaling up to terabytes and petabytes. Strong potential, but more robust on-ramps are needed.
Source: State of Client Adoption and ICPs

It’s also worth noting that beyond these four verticals, DePINs – particularly those collecting large volumes of consumer data are a key focus for 2025. In addition, edge computing and AI are emerging as high-potential sectors, driven by their rapidly growing, data-intensive workloads.


Mapping Filecoin’s Strengths to Its Most Promising Use Cases

To effectively serve the right customer segments, it’s essential to ask: what core strengths does the Filecoin network offer – and how can those be applied to the ideal customer profiles that stand to benefit most?

The following slide summarizes Filecoin’s strengths (and its learnings):

Source: State of Client Adoption and ICPs

Building on Filecoin’s core strengths, the next slide highlights the key Ideal Customer Profiles (ICPs) driving real-world adoption today. It identifies which client segments are actively paying, which have rapidly growing data volumes, and where early adoption is gaining momentum:

Source: State of Client Adoption and ICPs

This alignment between strengths and needs is essential for driving meaningful adoption and long-term growth.


Scaling Client Success & Ideal Customer Profiles (ICPs)

While Filecoin has attracted paying users across various sectors, scaling from a few customers to a broad, thriving user base remains a key challenge. A major bottleneck is the lack of strong on-ramps specialized Layer 2 solutions – that serve high-volume, fast-growing ICPs like Web3 chain storage. This gap presents a clear opportunity for builders to develop targeted solutions.

Scaling client success ties directly to Filecoin’s 2025 core KPIs:

  • Revenue from on-chain paid storage deals: The main indicator of product-market fit. Despite off-chain payments, on-chain revenue is near zero. Bridging this gap with Proof of Data Possession (PDP) and Filecoin Web Services (FWS) is critical.
  • Growing a satisfied client base: Paying clients exist, but better transparency and dashboards are needed to track success.
  • Increasing service activity and value accrual: Tools like the USDFC stablecoin and FIP-100 protocol are in place, but accelerating on-chain payments and fee flows is essential.

To achieve this growth, several strategic steps are required, as outlined below:

Source: State of Client Adoption and ICPs

Conclusion

Filecoin is actively transitioning from a capacity-focused infrastructure to a user-driven ecosystem. With a sharpened focus on ICPs, a maturing network of on-ramps, and powerful protocol innovations, it’s well-positioned to scale demand and adoption. Key challenges, especially around on-chain revenue and retrievability remain, but the foundation for sustained growth is rapidly taking shape.

To learn more about State of Client Adoption and ICPs for Filecoin, explore the following talk that happened during FDS-6:

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.

Editor’s Note: This blogpost is a summarised repost of the original content published on 25 April 2025, by HQ Han from Ansa Research. Ansa Research is a research firm focused on distributed infrastructure. The firm covers digital networks aiming to rebuild how internet infrastructure operates.


Executive Summary

Filecoin has now entered the next phase of development, focusing on users, demand generation and adoption. 

In this regard, Filecoin is not alone – while DePIN networks have now proven very successful at bootstrapping supply, the focus of this sector has also now turned towards demand. 

There has been some exciting progress on Filecoin’s demand story: 

  1. Growth in the underlying demand for Filecoin’s services 
    • We are seeing data clients in both Web2 and Web3 paying for storage on Filecoin. In some cases, Filecoin has been chosen in parallel or over Web2 providers.
    • The storage provider landscape is reflective of this – whilst capacity and # of SPs have dropped, there has been an increase in quality and data utilization on the network – showing a shift in network resources towards demand
  2. There is a pathway to scale
    • Filecoin is still expanding its core services to cater to demand with important protocol level upgrades – F3, PDP, and Filecoin Web Services (FWS) 
    • New Filecoin-powered storage solutions have come to market (Akave, Storacha, Recall etc) – and all already charging for deals within their target markets
    • DeFi continues to scale on FVM – notably, FIL-backed stablecoins coming online 
    • Storing more data on the network and creating multi-service APIs creates the building blocks to bring compute needs to the data 

Filecoin is at an inflection point as its services mature to meet the demands of AI, enterprise & nation state focus on data locality, and global cost-cutting driving orgs away from costly Web2 cloud providers.

This blog post examines Filecoin’s adoption, including milestones and use cases, its scaling path via infrastructure, tooling, and coordination, and cryptoeconomics, covering incentives, token dynamics, and network sustainability.


Adoption & Client Demand

Over the past year, Filecoin has seen a rise in paying customers, with projections suggesting over 1 EiB of paid storage deals could close in 2025. This would raise the network’s utilization from its current 29% to nearly 100% with fully paid usage. 

The number and quality of leads are growing, driven by efforts from teams like Ansa Research and the Filecoin Foundation, who are actively sourcing paid deals for on-ramps and Storage Providers. Business development has also shifted focus toward high-potential DePIN categories, particularly those collecting large volumes of consumer data as well as Web2 clients with archival and hot/cold storage needs.

Recent notable paid deals include:

  • Heurist / Akave (AI)
  • 375ai / Akave (DePIN)
  • Humanode / Storacha (Identity)
  • Gaianet / Storacha (AI)
  • FanTV / Lighthouse (Video)
  • Intuizi / Akave (Web2 SaaS)
  • The Defiant / Akave (Media)
  • Cornell University astrophysics simulation data / Ramo (Web2 R&D Data)
Source: State of Filecoin to Investors

Utilization and Go-to-Market Momentum

Filecoin’s network utilization has risen to around 29%, signaling increased demand. A growing number of large-scale clients (storing over 1,000 TiB) possibly in the enterprise and/or long-term archival space, demonstrates that efforts by on-ramps and Storage Providers (SPs) are on the right path.

After earlier declines as the network shifted its focus from generating supply to generating demand, average daily new deals have recently increased by over 10% from Q3’24 to Q4’24.

Source: State of Filecoin to Investors

Storage Providers Evolving Beyond Capacity

The Storage Provider (SP) landscape is shifting from a focus on raw capacity to delivering useful, client-driven storage. In its early phase, the network prioritized onboarding as much storage as possible, often without regard for actual usage or retrievability. Now, as Filecoin emphasizes demand generation and adoption, SPs are adapting by pursuing paid deals and ensuring robust data retrieval. This marks a clear move from quantity to quality, with incentives increasingly aligned to real-world client needs – reflected in a 388% year-over-year surge in the number of SPs achieving successful retrievals.

Source: State of Filecoin to Investors

Path To Scale

The network’s path to scale is envisioned along three main lines: 

  • Expanding core protocol capabilities with the introduction of Proof of Data Possession (PDP), Fast Finality (F3), and Filecoin Web Services (FWS)
  • Launching new on-ramps and Layer 2 solutions to support vertical-specific adoption
  • Improving economic efficiency through decentralized finance (DeFi)

1. Expanding Core Protocol Services

Filecoin will go through key protocol updates, primarily;

  1. Proof of Data Possession (PDP) – that has shipped as of 8th May 2025, and Storage Providers can participate in PDP SPX, a short-term initiative to onboard select Storage Providers to test, validate, and demonstrate Proof of Data Possession (PDP).
  2. Fast Finality (F3) – that has arrived early and went live on Filecoin Mainnet as of April 2025, bringing 100x improvement on transaction speeds
  3. Filecoin Web Services (FWS) – a composable service marketplace for offering multi-service deals
Source: State of Filecoin to Investors

Vision for FWS Architecture

Introduced last year as part of Filecoin’s broader vision, Filecoin Web Services (FWS) marks a major step toward expanding the network’s capabilities beyond storage. At its core, FWS is a composable service marketplace that enables users to bundle multiple services—such as cold and hot storage, retrieval, compute, and encryption—into a single deal.

FWS aims to offer a more flexible and integrated alternative to traditional Web2 cloud platforms. It also opens the door for other DePIN networks to offer and resell their services within the Filecoin ecosystem. With integrated payments and escrow, FWS supports the creation of customizable service combinations, enhancing utility for both consumers and enterprises.

Source: State of Filecoin to Investors

2. New On-Ramps and L2s

A key part of Filecoin’s scaling strategy is the emergence of new storage on-ramps, functioning like Layer 2s – that are launching mainnets and targeting specific verticals. These startups tailor Filecoin’s storage stack to meet the needs of niche markets, helping to establish beachhead use cases. Notable examples include:

  • Akave: Focused on Web2 enterprises, Akave offers a hot storage layer on top of Filecoin, supporting archival, hot/cold storage, and S3 integrations. They’ve also integrated with Snowflake.
  • Storacha: Targeting Web3 applications in social, AI, and identity, they specialize in storage solutions for decentralized platforms.
  • Recall: Aimed at the AI sector, specifically towards AI agent storage and data processing.
Source: State of Filecoin to Investors

3. Scaling DeFi

Filecoin’s DeFi ecosystem is growing, playing a key role in improving economic efficiency across the network. A major focus is on stablecoins, which help retain economic activity within the ecosystem. Secured Finance has introduced USDFC, a FIL-backed stablecoin that allows FIL holders and Storage Providers to use their tokens as collateral instead of selling them—similar to MakerDAO on Ethereum. 

USDFC is now live, as recently announced at FDS-6 in Toronto.

Source: State of Filecoin to Investors

Cryptoecon Update

The central forecast: FIL’s circulating supply growth is expected to slow and may turn negative or deflationary by late 2026. This shift stems from a combination of reduced token issuance and increased demand sinks that lock or remove FIL from circulation.

Supply-Side Pressures Easing

Several key developments are reducing new FIL issuance:

  • Vesting Completion: Token vesting from early stakeholders, including Protocol Labs and the Filecoin Foundation, ends in October 2026 – removing a major source of new tokens.
  • Decreasing Block Rewards: FIL block rewards follow a declining emission schedule by design.
  • FIL-Backed Stablecoins: Stablecoins like USDFC by Secured Finance allow FIL holders to use tokens as collateral instead of selling, keeping more value within the network and reducing sell pressure.

Demand-Side Sinks Growing

At the same time, FIL demand is increasing through new utility and locking mechanisms:

  • Rising Collateral Requirements: Storage Provider collateral is set to increase in 2025, partly due to a fix under FIP-81 that enhances locking behavior.
  • Increased Protocol Revenue: FIP-100 is projected to boost FIL-denominated revenue, much of which is burned or otherwise removed from circulation.

Together, these trends suggest a pivotal moment in Filecoin’s economic evolution: a potential transition to a deflationary supply model, signaling a tighter and potentially more valuable FIL economy.

[Disclaimer: Circulating supply analysis is based on a 3rd party model: https://mechafil-jax-web-levers.streamlit.app/ 
These models are based on many assumptions, and should not be relied upon as the source of truth. There are many factors that can and will affect the actual numbers. Simulations should not be relied upon and are for illustrative purposes only. DYOR and adjust the model yourself, or build your own models in Dune.]

Source: State of Filecoin to Investors

Filecoin Data (where to get them to DYOR)

For those looking to dive deeper into the Filecoin ecosystem, Ansa Research has compiled a curated directory of key metrics and data sources. These resources provide essential insights into network health, development trends, and adoption signals – making them useful for both regular monitoring and deeper research.

Whether you’re tracking protocol upgrades, storage deals, or adoption patterns, this data directory is a valuable starting point for your own analysis.

👉 Access the Filecoin Data Directory here


In Conclusion

Filecoin is entering a new phase centered on demand, adoption, and long-term sustainability. With supply successfully bootstrapped, focus has shifted to real usage – evidenced by rising paid storage deals, potentially exceeding 1 exabyte by 2025, and a shift toward higher-quality, retrievable data.

Key upgrades like Proof of Data Possession (PDP), faster finality (F3), and new Layer 2 solutions are unlocking capabilities across data and DeFi, including FIL-backed stablecoins that help retain value within the network.

At the same time, token issuance is set to slow, with vesting ending in late 2026 and block rewards declining – while demand sinks like collateral locking (FIP-81) and protocol revenue (FIP-100) increase. Together, these trends suggest a potential shift to a deflationary FIL supply and a more mature, sustainable network economy.

To listen to the entire talk by HQ Han (Ansa Research) at FDS-6, watch here on YouTube:

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.

Editor’s Note: This blogpost is a summarised repost of the original content published on 26 March 2025, by Luca from FilOz. Founded in 2024, FilOz is a team of 14 protocol researchers, engineers, TPMs, and community engineers focused on securing, upgrading, and expanding the Filecoin network.


Introduction

As Web3 applications mature, fast and reliable access to data becomes just as important as storing it. Whether it’s video streaming, serving assets for dApps, or powering AI agents with on-demand data, retrievability is a foundational piece of user experience.

Unlike traditional cloud providers where data retrieval is instant and guaranteed by a centralized service (often aided by CDNs), retrieval of data in Web3 introduces new challenges — spanning considerations around network performance, data redundancy, storage provider reliability, and incentive alignment. Filecoin has built the world’s largest decentralized storage network and how to tackle retrievals on that foundation deserves some exploration.  How should Filecoin evolve its retrieval capabilities in a decentralized internet? What is the best approach for its users? 

This guide aims to unpack the state of Retrievability on Filecoin, exploring where we are today and where improvements are needed. What we will aim to cover:

  • Overview of Retrievability on Filecoin, including key strategies, challenges, and improvements
  • Explore retrieval strategies and protocols, their guarantees, and limitations
  • Outline payment models and how SPs and clients can select each other
  • Highlight potential protocol-level improvements to enhance retrievability

Retrievability on Filecoin

Filecoin enables decentralized storage, allowing clients to store data with Storage Providers (SPs) and retrieve it on demand. Unlike storage, which is provable through Proof of Replication and Proof of SpaceTime, retrieval is a separate process that isn’t always provable, depending on protocols and strategies addressing various challenges.

Retrievability on Filecoin is influenced by factors such as:

  • Network and SP performance
  • Data availability
  • Retrieval optimization protocols

There is no one-size-fits-all solution for retrievability. The best strategy depends on a client’s needs, including retrieval speed, reliability, and cost constraints. Clients can choose simple retrieval from a single SP for non-critical data or more complex solutions like redundancy, SLAs, or off-chain backups for mission-critical files. Advanced protocols, like Spark or CDN integration, offer higher performance but come with added costs or complexity.

Trust plays a crucial role in decentralized storage networks like Filecoin. Without trust, retrieval failures or delays can occur, especially with payment strategies like upfront payments. A CDN-like solution (e.g., retriev.org) could address these trust challenges by:

  • Providing monitoring services to ensure SPs meet obligations
  • Offering arbitration to resolve disputes and penalize non-performing SPs
  • Ensuring retrieval promises are backed by financial incentives and penalties

Filecoin offers various retrievability options, from redundancy models to retrieval networks and off-chain solutions. Below is a summarized table of these options:

For the full breakdown on Retrievability Options on Filecoin, read here.

Key Metrics

When evaluating retrievability, clients need to consider several performance metrics that measure different aspects of the data retrieval process. These metrics ensure that data is not only stored but also accessible and retrievable efficiently when required. Key metrics include:

  • Availability Metrics: Measure the likelihood of data being accessible and quickly recoverable when issues arise. High availability and redundancy ensure retrieval even if some systems or replicas fail.
  • Performance Metrics: Assess retrieval speed and responsiveness, impacting user experience. Factors like throughput and latency are influenced by network bandwidth and data availability.
  • Reliability Metrics: Reflect the consistency and stability of data retrieval, including success rate, error rate, and data integrity. High uptime ensures high availability, while a low error rate guarantees data accuracy and successful retrieval attempts.
  • Cost-related Metrics: Help balance performance with cost efficiency, particularly in managing retrieval speed, bandwidth usage, and associated costs.
  • Quality Metrics: These metrics measure the overall quality of the retrieval process, ensuring a satisfactory user experience.

By grouping the metrics into these categories, clients can evaluate retrievability from multiple dimensions, ensuring efficient, reliable, and cost-effective data access. Below is a comprehensive list of key retrievability metrics, grouped by their category:

For the full breakdown on key metrics, read here.

Payment Models

Retrievability on Filecoin combines payment options (how value is transferred) with payment strategies (when and under what terms payments occur). This modular structure allows clients and Storage Providers (SPs) to tailor agreements based on performance, trust, and cost considerations. 

Payment Options – methods used to pay for data retrieval:

  • Off-Chain Payments: Peer-to-peer arrangements that are typically fast and gas-free but rely on trust between parties.
  • On-Chain Payments (FIL): The native token used for transparent, trustless retrieval settlements.
    • On-Chain Payments (Stablecoins): e.g., USDFC, offering price stability and compatibility with broader ecosystems.
    • On-Chain Payments (ERC-20 Tokens): Enables use of other tokens for interoperability with DeFi and cross-chain networks.

Payment Strategies — structures for when and how payments are made:

  • Upfront Payment: Clients pay the full cost of retrieval in advance.
  • Pay-to-Retrieve: Payments are made on a per-retrieval basis, similar to a metered model.
  • Periodical Payment: Recurring payments that grant ongoing or unlimited retrieval access.
  • Retrievability Tickets: Prepaid, redeemable claims for future data retrievals.
  • Hybrid Models: Flexible approaches that combine aspects of multiple strategies to optimize for specific needs.

Retrieval Services: SP Selection & Client Selection Strategies

Selecting Storage Providers (SPs) for retrievability in Filecoin requires balancing control, cost, reliability, and trust. This process involves two key aspects: the deal-making process and the selection mechanism. Just as clients must carefully choose SPs, SPs also evaluate which client retrieval requests to fulfill, following the same two key components—deal-making and selection mechanisms.

Deal-Making Process

The deal-making process determines how clients and Storage Providers (SPs) establish retrieval agreements, balancing control, efficiency, and risk. It involves two key approaches:

  • Direct Negotiation: Clients and SPs engage directly to define retrieval terms, including cost, performance guarantees, and service conditions. This method offers full control but requires manual effort and carries risks such as extended negotiations, misunderstandings, and potential SP unreliability.
  • Automated or Delegated Deal-Making: Intermediaries or automated systems—such as content delivery networks (CDNs), smart contracts, or auction systems—facilitate the process. This reduces manual effort, optimizes terms based on real-time data, and enables market-driven pricing. However, it can introduce additional costs, reduced control, and reliance on third-party mechanisms for enforcement and dispute resolution.

SP Selection Mechanisms (POV of a Client)

Once the deal-making process is determined, the actual selection mechanism defines how the SPs are chosen. These can range from reputation-based systems to auction-based or automated selections. 

Below is summarised table on the various selection mechanisms:

For the full breakdown on SP Selection Mechanism, read here.

Client Selection Mechanisms (POV of a Storage Provider)

Once the deal-making process is determined, the client selection mechanism helps further narrow down on which clients’ retrieval requests the SP wishes to fulfill. Below is a summarised table to cover the various mechanisms:

For the full breakdown on Client Selection Mechanism, read here.

Next Steps

Retrievability guarantees for data stored on the Filecoin Network is essential for its long-term success and sustainability. However, we also recognize that each user may have distinct needs, preferences, and requirements when it comes to data accessibility and security guarantees.

To address this, we foresee a modular approach that allows users to select from a diverse range of services and combine them in a way that meets their specific retrievability and reliability goals. This flexibility will enable users to tailor their storage solutions to their unique use cases, ensuring both customization and scalability.

A promising path forward for enhancing retrievability guarantees on the Filecoin Network involves integrating advanced protocols and tools. By leveraging technologies and protocols like CDN Gateways, reputation systems, smart contract-powered storage solutions and incentives, we can create a more robust and reliable infrastructure.

These combined innovations will not only improve data accessibility and security but will also foster the overall growth and resilience of the Filecoin ecosystem.

For more pieces from FilOz, check out their Medium page here.

To stay updated on the latest in the Filecoin ecosystem, follow the @Filecointldr handle or join us on Discord.

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.

2024 has been a pivotal year for Filecoin, with significant progress in the Filecoin Virtual Machine (FVM), Storage, Retrievals and Compute. In this blogpost, we’ll recap the key milestones of 2024 and take a look at the major growth drivers shaping Filecoin’s path into 2025.


Source: Charting Success for Filecoin, 2024

2024 Retrospective

In our earlier blogpost ‘Charting Success for Filecoin 2024’, we mapped out three key priorities for the ecosystem in 2024:

  1. Accelerating Paid Deals: Boosting paid services (storage, retrieval, compute) on Filecoin to generate cashflow for service providers. This helps to support more sustainable hardware funding beyond token incentives.
  2. Growing On-Chain Activity: Increasing activity through programmable services, DeFi, and new use cases.
  3. Becoming Indispensable: Establishing Filecoin as an integral component of other projects and businesses. 

These priorities are not mutually exclusive – they layer onto each other and are all signs that the Filecoin ecosystem is growing increasingly valuable. 

So how did we fare across these priorities in 2024?

1. Accelerating Paid Deals

Paid Deals is an ecosystem-level metric that reflects the volume of paid services within the Filecoin network. FilecoinTLDR is currently tracking this metric here.

In 2024, Filecoin made significant strides in accelerating paid deals by reducing friction for businesses entering the ecosystem, with key advancements like the development of Proof of Data Possession (PDP) and the emergence of Layer 2 solutions.

  • Enabling Efficient Hot Storage with PDP
    • Projected for Q1 2025, Proof of Data Possession (PDP) introduces a new proof primitive to the Filecoin network, marking the first major proof development since Proof of Replication (PoRep) and Proof of Spacetime (PoSt). Unlike PoRep, which excels at cold storage through sealed sectors, PDP is designed for “hot data”, which is data that needs fast and frequent retrieval.
    • This new proof type enables cost-effective “cache” storage on Filecoin without sealing and unsealing, enabling rapid data onboarding and retrieval. PDP opens the door for a new class of storage providers focused on hot storage and fast retrievals, benefiting onramps like Basin, Akave, and Storacha.
  • Scaling Filecoin with L2s
    • In 2024, we saw a rise in Layer 2 solutions built on top of Filecoin (We also covered this in our earlier blogpost State of L2s on Filecoin”). L2s like Basin, Akave and Storacha enable both horizontal and vertical scaling with secure, customizable subnets. These L2s enhance Filecoin by unlocking new use cases: including managing data-intensive workloads, supporting AI and unstructured data, powering gaming and privacy-focused applications — all of which create more opportunities for paid deals.

2. Growing On-Chain Activity

Filecoin has made notable progress in accelerating on-chain activity through the FVM, which spurred growth in its DeFi economy. The proposed Filecoin Web Services (FWS) and launch of FIL-collateralized stablecoins are set to further boost this momentum.

Source: Defillama (as of December 16, 2024)
  • DeFi Milestones
    • As of December 16 2024, more than 4,700 unique contracts have been deployed on FVM, enabling over 3 million transactions. DeFi activity on FVM saw average net deposits exceeding 30M FIL ($200M), driven by staking, liquid staking, and DEXs, with GLIF leading at 62%, followed by FilFi (10%) and SFT Protocol (9%). Net borrows averaged 26M FIL ($173M), highlighting strong growth in Filecoin’s DeFi ecosystem.
  • FIL-Collateralized Stablecoin for the Filecoin Ecosystem
    • USDFC is a FIL-backed stablecoin launched by Secured Finance in Q4 2024 to address key challenges in the Filecoin ecosystem. It introduces stability to a network previously lacking stablecoin options, reducing volatility and enhancing value storage, much like DAI did for Ethereum. 
    • By allowing FIL holders and SPs to collateralize their assets for USD, USDFC helps cover operational costs without selling FIL, preserving asset value and network support. It also boosts liquidity in lending markets by providing FIL-backed stablecoin liquidity, driving more efficient capital flows within the Filecoin ecosystem.

3. Becoming Indispensable

DePIN gained prominence, with Filecoin strengthening its position through key partnerships with AI and compute projects. Meanwhile, on-chain archival received significant recognition through major on-ramp partnerships.

“…thanks to Filecoin for building an awesome decentralized archive layer. “ – Anatoly (Solana Co-Founder)

  • Notable On-Ramps of 2024
    • At Solana Breakpoint this year, Filecoin founder Juan Benet highlighted how Filecoin’s zero-knowledge (ZK) storage is securing the entire Solana ledger.
    • Similarly, Cardano apps now have the opportunity to boost data redundancy and decentralization through the Blockfrost integration with Filecoin.
    • SingularityNET’s integration with Filecoin (via Lighthouse) emphasizes the growing need for scalable and cost-effective storage in the AI-driven era, where managing vast amounts of data efficiently is critical.
    • These meaningful partnerships help signal Filecoin as a key player in both the Chain Archival and AI narratives.
Source:  Filecoin (X)
  • Compute & AI Partnerships
    • This year, Filecoin has positioned itself as a key player in the growing field of Decentralized AI. The onset of projects within the ecosystem like Ramo (network participation), Bagel (AI & cryptography research), Swan Chain (AI training and development), and Lilypad (distributed compute for AI) highlight Filecoin’s expanding role in powering AI innovation.

2024 Filecoin Challenges

Despite the immense progress, we noted some challenges that the community faced. Though bearing in mind that Web3 products are still very early, and the problem statement of forming a credible alternative to the centralized cloud is a huge one.

Product Market Fit:

  • Roadblocks like limited retrievability and high costs (driven by data replication), challenge the efficiency of the Filecoin network.
  • There is a need to make payments easier by allowing transactions directly on the Filecoin network, using methods like stablecoins or flexible payment options.
  • Improving visibility into the onboarding process and using customer data can help refine strategies and boost performance in key areas.

Building a Sustainable Economic Model + Stronger Economic Loops:

Viewing Filecoin as an island economy highlights its focus on accruing value by exporting goods and services while also keeping as much value as possible within the network by minimizing outflows.

Source: Realizing Filecoin’s Vision (Part 2) – Juan Benet
  • A key challenge lies in reducing external outflows while finding ways to boost exports and capture more demand within the ecosystem.
  • Ensuring that transactions remain on-chain is equally crucial to strengthening this economic model and creating stronger economic loops.

Filecoin’s 2025 Outlook

Looking ahead to 2025, Filecoin’s evolution continues. Here are three key themes that could drive transformative growth for the network while addressing the 2024 challenges outlined above.

Filecoin is at an inflection point.” – Blockworks Research

1. Accelerating Filecoin by 450x with Fast Finality (F3)

Fast Finality (F3), is one of the most impactful upgrades to Filecoin’s consensus layer since the launch of its mainnet. By drastically reducing transaction finality times, F3 overcomes a key limitation of the network’s original consensus mechanism. This enhancement is scheduled to go live on the mainnet in Q1 2025.

Old vs. New Finality:

  • Before F3, Filecoin’s consensus mechanism ensured secure block validation but required 7.5 hours (900 epochs) to finalize transactions, which was too slow for applications like smart contracts or cross-chain bridges.
  • With F3, transactions can now optimistically finalize in minutes—a 450X improvement.

What this means for Filecoin:

  • Enhanced Speed & UX: Transactions finalize within minutes, enabling low-latency applications and eliminating the long waits previously experienced.
  • Expanded Use Cases & Accessibility: L2 subnets like Interplanetary Consensus (IPC), Efficient smart contracts and decentralized applications, Blockchain bridges for interoperability with other chains.

Ultimately, this allows Filecoin to improve its usability across a wider variety of applications.

2. Moving Beyond Storage with FWS

Filecoin Web Services (FWS), emerged this year as a pivotal concept. It represents a strategic shift for Filecoin, expanding its scope from primarily a decentralized storage network to a broader marketplace for blockchain-based cloud services. This diversification can attract a wider range of users and use cases, potentially creating more positive economic loops within the network. Here are some pointers on why FWS should be on your radar:

  1. Strengthening Filecoin’s Competitive Edge: FWS will introduce features like Programmatic SLAs (which automate and enforce service agreements through smart contracts, ensuring clear performance expectations and penalties) and Verifiable Proofs (which provide cryptographic evidence of service delivery, allowing clients to independently verify service execution).
  2. Expands Filecoin’s Capabilities: Goes beyond Proof of Replication (PoRep) by adding Proof of Data Possession (PDP), enabling robust hot storage use cases. PDP will help improve data retrievability, a crucial factor in achieving product-market fit that has been widely discussed within the Filecoin community this year.
  3. Positions Filecoin as a leading platform in the decentralized web: FWS will facilitate the integration of multiple networks and protocols, creating a cohesive marketplace for storage, compute, bandwidth, and other services. This could make Filecoin a key player in the growth of the decentralized web.

FWS is currently a concept in development, with a new storage service featuring PDP (v0) underway. Following this milestone, the development of the FWS marketplace will begin with its expected launch in Q1 2025.

3. Unlocking new value streams in Filecoin

As a Layer 1 blockchain, Filecoin primarily generates revenue through gas fee burns (which happen when chain resources are used or when faults arise). However, relying on gas fee burns as a main source of revenue is not scalable and more importantly increases operational expense costs as well as service costs.

A sustainable approach involves value returning to the Filecoin economy through the use of services in the FWS marketplace, fostering a more scalable and balanced revenue model. A proposed value accrual mechanisms includes:

  • FWS Fees: Commission (%) charged based on the transaction volume in the marketplace.
  • Service Fees: Applied when a user accesses a service or a vendor provides one
  • SLA Penalties: Imposed on service providers who fail to meet agreed-upon performance standards

This shift promises a more robust and diversified revenue stream, ensuring Filecoin’s continued relevance and profitability in the evolving market.


Final Thoughts

As data grows in value, we expect advancements in privacy-preserving machine learning, data-driven business models, and the increasing role of AI agents in unlocking decentralized storage’s potential.

Looking towards 2025, with the upcoming Fast Finality (F3) launch on the mainnet and the continued development of Filecoin Web Services, Filecoin is set to play a central role in shaping the future of data and AI within decentralized ecosystems. We expect to see these advancements positioning Filecoin beyond storage and unlocking a sustainable economic model through new revenue streams generated by FWS.

To stay updated on the latest in the Filecoin ecosystem, follow the @Filecointldr handle or join us on Discord.

Many thanks to HQ Han and Jonathan Victor for reviewing and providing valuable insights to this piece.

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.

At first, the convergence of artificial intelligence (AI) and blockchain seemed like an awkward pairing of buzzwords—a notion often met with skepticism among early adopters. But in merely a year’s time, decentralized AI has evolved from being an obscure idea to one that is central to conversations around the Web3 environment. Such swift transformation owes its momentum to a few crucial elements:

  • Influence of AI: AI is set to significantly impact how we interact with the world. As AI agents grow more sophisticated, they will manage tasks like financial transactions and personal coaching. This evolution raises important questions about control and governance in AI development.
  • The Risks of Centralized Power: Centralized AI models controlled by a few tech giants pose serious risks, including bias, censorship, and data privacy concerns. This concentration of power stifles innovation and creates vulnerabilities, as highlighted by the recent security breach at Hugging Face.
  • The Demand for an Inclusive AI Ecosystem: Decentralized AI offers a pathway to a more equitable and accessible AI landscape by distributing computational processes across various systems. Key benefits include:
    • Reduced Costs: Lower barriers enable smaller developers and startups to innovate in AI.
    • Enhanced Data Integrity: Verifiable data provenance increases transparency and trust in AI models.
    • Combating Censorship: Aligning AI development with market needs fosters a more democratic technological environment.

These points highlight the value of an alternative approach to centralized AI.


The Pillars of Decentralized AI

Decentralized AI comprises 3 pillars: leverages idle computing power from users, utilizes secure decentralized storage, and implements transparent data labeling.

  • Decentralized Storage: Utilizing decentralized storage networks like Filecoin ensures secure and verifiable storage for large datasets.
  • Decentralized Compute: By leveraging idle computing power from individual users and distributing tasks across a network, Decentralized AI makes AI development more accessible and cost-effective.
  • Decentralized Data Labeling and Verification: Transparent and verifiable data labeling processes help ensure data quality and reduce bias, fostering trust in AI systems.

A closer look: Decentralized AI Projects in the Filecoin Ecosystem 

To take a closer look into how the Web3 stack can offer benefits to the AI space, we’ll explore the various approaches 4 decentralized AI projects are taking. These projects are utilizing some or all of the pillars of decentralized AI as outlined above.

Source: Unlocking Decentralized Storage for AI Workloads and Beyond – Vukasin Vukoje

Ramo – Simplifying Decentralized Network Participation (Funding Stage: Seed)

Ramo plays a crucial role in powering AI workloads by coordinating capital and hardware. By merging resources from various providers, Ramo facilitates the execution of complex tasks such as storage, SNARK generation, and computation, while allowing hardware resources to be jointly funded across multiple networks.

  • Multi-Network Jobs: Ramo supports jobs across multiple networks (e.g., read from Filecoin, process on Fluence, write back to Filecoin), helps maximize hardware providers’ revenue and reduces coordination complexity.
Source: Decentralized Business Intelligence with Swan Chain’s AI Agent – Charles Cao

Swan Chain – Decentralized AI Training and Deployment (Funding Stage: Seed)

Swan Chain is a decentralized compute network, connecting users with idle computing resources for AI tasks like model training. Filecoin serves as its primary storage layer, ensuring secure, transparent, and accessible storage of AI data, aligning with the principles of decentralized AI.

  • Decentralized Compute Marketplace: Swan Chain aggregates global computing resources, offering a cost-effective alternative to centralized cloud services. Users can bid for computing jobs, and Swan Chain matches them with suitable providers based on requirements.
  • Filecoin Integration for Secure Data Storage: Swan Chain utilizes Filecoin and IPFS to securely store AI models and outputs, ensuring transparency and accountability in the AI development process.
  • Support for Diverse AI Workloads: Swan Chain supports various AI tasks, including model training, inference, and rendering, with examples like large language models and image/music generation.
Source: The role of open, verifiable systems in AI with Filecoin and Lilypad – Ally Haire

Lilypad – Distributed Compute for AI (Funding Stage: Seed)

Lilypad aims to build a trustless, distributed compute network that unleashes idle processing power and creates a new marketplace for AI, machine learning, and other large-scale computations. By integrating Filecoin and utilizing IPFS for hot storage, Lilypad ensures secure, transparent, and verifiable data handling throughout the AI workflow, supporting an open and accountable AI development landscape.

  • Job-Based Compute Matching: Lilypad’s job-based model matches user-defined compute needs (e.g., GPU type, resources) with providers, creating a marketplace for developers to share and monetize AI models within the decentralized AI ecosystem.
Source: bagel.net

Bagel – AI & Cryptography Research Lab (Funding Stage: Pre-Seed)

Bagel is an AI and cryptography research lab creating a decentralized machine learning ecosystem that enables AI developers to train and store models using the computing and storage power of decentralized networks like Filecoin. Its innovative GPU Restaking technology enhances Filecoin’s utility for AI applications by allowing storage providers (SPs) to contribute to both storage and compute networks simultaneously, thereby expanding support for AI developers and generating new revenue opportunities for SPs.

  • Increased Revenue for Filecoin SPs: Bagel helps storage providers monetize both storage and compute resources, boosting their income and incentivizing greater network participation.
  • Optimized Compute Utilization: With dynamic routing, Bagel directs GPUs to profitable networks, maximizing efficiency and returns for providers and users.

In Conclusion 

The intersection of Filecoin and AI marks a significant step forward in the evolution of technology. By combining verifiable storage with computing networks, we are not only addressing current challenges but also paving the way for future innovations. As these technologies continue to develop, their impact on AI and beyond will be profound, offering new possibilities for businesses and developers alike.

To understand more about Ramo, Swan Chain, Lilypad or Bagel dive into the respective keynotes and links here:

To stay updated on the latest in the Filecoin ecosystem, follow the @Filecointldr handle or join us on Discord.

Many thanks to HQ Han and Jonathan Victor for reviewing and providing valuable insights to this piece.

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.

Layer 2 solutions (L2s) are essential innovations in blockchain technology that enhance the scalability, efficiency, and functionality of their respective networks. For Filecoin, an L1 focused on decentralized storage, L2 solutions play a crucial role in bringing new capabilities to the base infrastructure of the network.

As Filecoin continues to grow, L2 solutions are helping bring Filecoin to market, and creating tailored offerings to builders focused on specific verticals. This post explores the current state of L2 solutions on Filecoin, highlighting the pioneering advancements and future directions.


Underlying Architecture

Before diving into the L2s, it’s useful to understand the shared framework that many of Filecoin’s L2s are built on: Interplanetary Consensus (IPC).

InterPlanetary Consensus (IPC) is a framework designed to solve the problem of scalability in decentralized applications (dApps). IPC achieves this by allowing the creation of subnets, which are independent blockchains that can be customized with specific consensus algorithms tailored to the needs of the application. These subnets can communicate seamlessly with each other, minimizing the need for cross-chain bridges – but anchored into the root Filecoin network.

Builders are drawn to IPC for several reasons. First, IPC subnets inherit the security features of their parent network, ensuring a high level of security for the dApps they host. Second, IPC leverages the Filecoin Virtual Machine (FVM), a versatile execution environment that supports various programming languages, allowing for greater interoperability with other blockchains. Finally, IPC’s tight integration with Filecoin, a large decentralized storage network, offers dApps easy access to robust data storage and retrieval capabilities. This combination of scalability, security, interoperability, and storage integration makes IPC an attractive choice for developers building the next generation of dApps.


Advancements in Data Management with Filecoin’s L2 Solutions

As the demand for efficient data management increases, Filecoin’s Layer 2 solutions are rising to meet these needs. These advancements focus on optimizing data storage and retrieval, offering enhanced scalability and cost-effectiveness across various applications. Basin is one such startup leading the charge.

Source: An introduction to Basin: The first data L2 on Filecoin

Basin, the first data Layer 2 on Filecoin, represents an advancement in decentralized data infrastructure, bringing a swath of new services into the Filecoin ecosystem targeted at data heavy applications:

  • Key Features and Innovations
    • Hot and Cold Data Layers: Basin’s dual-layer approach incorporates a hot cache layer for real-time data access and a cold storage component for long-term archiving. This setup ensures both immediate accessibility and cost-effective storage, catering to diverse data needs.
    • Scalable Infrastructure: Basin’s architecture combines Filecoin’s secure storage capabilities with a flexible, scalable design ideal for handling high-volume data from IoT and AI applications.
    • Familiar Interfaces: Basin supports compatibility with S3, allowing developers to use familiar tools for managing data, which facilitates a smoother transition to decentralized solutions.
Source: A New Era of Web3 Scaling: The World’s First Data L2 on Filecoin – Marla Natoli

Basin is being actively used in real-world applications, such as handling weather data for decentralized stations with WeatherXM, and generating synthetic data for smart contracts. These use cases highlight Basin’s ability to efficiently store, manage, and monetize diverse data types, advancing practices in AI and machine learning.


Simplifying Decentralized Storage: Innovations and Challenges

Efficiently managing decentralized storage involves overcoming challenges related to user accessibility, cost, and integration. Providing more intuitive and cost-effective tools for data management will help address these challenges – and this is where Akave comes in.

Source: Akave X (X)

Akave is the first L2 storage chain powering on-chain data lakes, offering a novel approach to managing large volumes of data within a decentralized network. Data lakes are used in traditional enterprises to manage all types of data – typically feeding into large scale compute flows (e.g. for big data analytics). By leveraging Filecoin’s infrastructure, Akave aims to become a leading solution in decentralized data management, with a focus on enhancing data handling capabilities and integrating advanced security measures.

  • Key Features and Innovations
    • On-Chain Data Management: Akave focuses on creating on-chain data lakes, which provide a highly scalable and secure solution for managing large volumes of data directly on Filecoin.
    • Advanced Data Handling: The platform supports customizable data handling options such as replication policies and erasure coding, enhancing data security and availability.
    • Integration with Filecoin: Akave leverages Filecoin’s blockchain for improved data management, security, and decentralization.
Source: Akave: AI Trust Unlocked

Akave’s Decentralized Data Lakes revolutionize data storage with faster local access by placing data close to compute stacks, cutting egress costs compared to centralized clouds, and ensuring immutability and integrity through ZK Proofs. Users benefit from competitive pricing and diverse options via an open marketplace. Continuous visibility into data status and history is provided through Akave’s integration with Filecoin’s InterPlanetary Consensus (IPC), enhancing transparency and trust.


Enhancing AI and Unstructured Data Storage with Filecoin’s L2 Solutions

In the realm of AI and unstructured data, specialized storage solutions are crucial for managing and processing large datasets efficiently. Filecoin’s Layer 2 solutions like Storacha Network are stepping up to provide high-performance storage tailored for these needs. 

Source: Storacha Network

Storacha Network is a cutting-edge storage solution designed to enhance the management of AI and unstructured data. Leveraging Filecoin’s robust infrastructure, Storacha Network offers high-performance decentralized storage tailored for advanced applications. Looking ahead, Sriracha envisions evolving into a federated network with increased public participation, aiming to enhance global data access through a decentralized CDN and fostering broad community involvement.

  • Key Features and Innovations
    • High-Performance Storage: Storacha offers decentralized hot object storage, ensuring rapid access and retrieval of data, essential for AI applications that require quick processing and scaling.
    • Provenance and Ownership: Users maintain control over their data through UCANs (User-Controlled Authorization Networks), providing secure, cryptographic proofs of data ownership and access rights without frequent blockchain interactions.
    • Efficient Data Handling: Storacha handles large datasets by sharding files, facilitating quick retrieval and efficient management, crucial for large-scale AI operations.
Source: Storacha Network Revealed: Bringing the Heat to AI Data Storage – Alexander Kinstler

Storacha Network supports a range of AI use cases by providing fast, scalable object storage optimized for both structured and unstructured data. It addresses key needs such as verifiability and provenance for decentralized GPU networks, ensuring that training processes are executed as expected and checkpoints are maintained. 

Additionally, Storacha allows users to bring their own storage to training jobs, facilitating the sharing of hyperparameters and weights while ensuring ownership of training results.


In Conclusion

To wrap up, Filecoin’s Layer 2 solutions are paving the way for a new era in decentralized data management. Innovations like Basin, Akave, and Storacha Network are not only addressing the challenges of scalability and cost but also enhancing the efficiency and performance of data handling. As these technologies evolve, they promise to transform how data is stored, managed, and utilized, marking significant progress in the Web3 ecosystem.

To understand more about Basin, Akave or Storacha Network, watch their keynotes at the recent FIL Dev Summit.

Many thanks to Jonathan Victor for reviewing and providing valuable insights to this piece.

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.

At the latest Filecoin Developer Summit (FDS), Nicola Greco (of FilOz) introduced a vision to evolve Filecoin’s decentralized cloud services: Filecoin Web Services (FWS). FWS aims to provide a framework for deploying composable cloud services – allowing new protocols to bootstrap into a shared marketplace of offerings, all composable with each other.


Source: Filecoin Web Services: Ecosystem of verifiable cloud services - Nicola G
Source: Filecoin Web Services: Ecosystem of verifiable cloud services – Nicola G

Expanding Beyond Proof of Replication: Expanding the functionality of Filecoin

To understand FWS, it’s useful to first recap how the existing service offerings exist inside the Filecoin network.

The core storage offering, Proof of Replication (PoRep), allows storage providers to use proofs over uniquely encoded data to show that they are still in possession of specific pieces of data. Filecoin uses PoRep both for storage and for consensus – this requires higher security parameters, and therefore makes Filecoin’s base storage offering akin to cold storage. This makes Filecoin’s base offering ideal for datasets that might have a need for strong guarantees around uniqueness and existence but can accept slower access times. 

Furthermore, because Filecoin launched prior to the FVM, much of the onchain tooling (e.g. to set up and maintain storage deals, to enable payments) exist as “system actors” or non-programmable functions on the network. This means many of those functions were built to support the original storage functions on the network – but for any evolution would require a full network upgrade in order to modify or support new functionality.

However, as more storage on-ramp’s pushed into building storage solutions for customers, it became clear that there was a need for more storage offerings over and above the base offering from Filecoin. As a result, new types of proofs (such as proof of data possession) have been proposed to run on the Filecoin network – allowing for more use cases to be natively supported.

In designing these new offerings to sit on top of Filecoin, it became clear that many of these new proof offerings would need generalized versions of the onchain tooling that exists as “system actors” – such as payment rails. Rather than having many systems independently evolve their own architecture (and risk losing composability), a new proposal was put forward to focus on building modular, composable systems via FWS.


Source: Filecoin Web Services: Ecosystem of verifiable cloud services - Nicola G
Source: Filecoin Web Services: Ecosystem of verifiable cloud services – Nicola G

Introducing Filecoin Web Services: A Modular Approach to Cloud Services

At the core of Greco’s vision is the concepts of modularity and reuse. If each new service were to build their entire protocol from scratch, they’d need to develop work ranging from deal management to escrow and SLA enforcement – which would lead to a high barrier to entry for new services. FWS proposes a unified protocol that standardizes these components, allowing developers to focus on building specific services rather than recreating the entire stack.

FWS would serve as a thin, opinionated layer that manages payments, collateral, deal structuring, and SLA enforcement across various services. This standardization would enable seamless integration of new services, whether they are storage-related like PDP and retrieval services or entirely new offerings like markets for zk-SNARK proofs or AI-based computations. By providing a common framework, FWS would reduce complexity, lower development costs, and increase the rate of development for building within the Filecoin ecosystem.


Source: Filecoin Web Services: Ecosystem of verifiable cloud services - Nicola G
Source: Filecoin Web Services: Ecosystem of verifiable cloud services – Nicola G

The Power of a Unified Marketplace: Enhancing Efficiency and Accessibility

One of the key benefits of FWS is its potential to streamline the user experience. Without FWS, users would need to lock tokens in multiple smart contracts to access different services, leading to inefficiencies in collateral management and prepayment and increase users’ cost. FWS envisions a single entry point where users can determine how they’d like to pay – prepaid or pay-as-you-go – with the same rails being usable by multiple services. This model mirrors the convenience of traditional cloud services, where users simply provide a payment method and are periodically billed.

Moreover, by consolidating financial management into a single contract, FWS would improve collateral efficiency and reduce the overhead associated with managing multiple service contracts. This would also allow utilization of one service to enable credit in other services – allowing a credit history to be built up across disparate protocols. This approach not only simplifies the user experience but also enhances the overall liquidity and flexibility of the Filecoin ecosystem.


Source: Filecoin Web Services: Ecosystem of verifiable cloud services - Nicola G
Source: Filecoin Web Services: Ecosystem of verifiable cloud services – Nicola G

A Vision for the Future: FWS as a Distribution Layer for Decentralized Services

Looking ahead, Greco envisions FWS not just as a tool for enhancing Filecoin’s storage capabilities but as a broader distribution layer for decentralized services. As the ecosystem grows, FWS could facilitate the integration of multiple networks and protocols, creating a cohesive marketplace for storage, compute, bandwidth, and other services. This would position Filecoin at the center of a vibrant, interconnected ecosystem, driving innovation and adoption across the decentralized web. By offering a marketplace for diverse services such as zero-knowledge proof generation, decentralized compute, FWS could position Filecoin as a leading platform in the decentralized web, supporting a wide array of applications beyond storage.

To understand more about FWS, watch the full keynote by Nicola Greco on Youtube

Many thanks to Jonathan Victor for reviewing and providing valuable insights to this piece.

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.

Editor’s Note: This blogpost is a repost of the original content published on 5 March 2024, by Bidhan Roy and Marcos Villagra from Bagel. Founded in 2023 by CEO Bidhan Roy, Bagel is a machine learning and cryptography research lab building a permissionless, privacy-preserving machine learning ecosystem. This blogpost represents the independent view of these authors, whom have given their permission for this re-publication.


Trillion-dollar industries are unable to leverage their immensely valuable data for AI training and inference due to privacy concerns. The potential for AI-driven breakthroughs—genomic secrets that could cure diseases, predictive insights to eliminate supply chain waste, and chevrons of untapped energy sources—remain locked away. Privacy regulations also closely guard this valuable and sensitive information.

To propel human civilization forward in energy, healthcare, and collaboration, it is crucial to enable AI systems that train and generate inference on data while maintaining full end-to-end privacy. At Bagel, pioneering this capability is our mission. We believe accessing a fundamental resource like knowledge, for both human-driven and autonomous AI, should not entail a compromise on privacy.

We have applied and experimented with almost all the major privacy-preserving machine learning (PPML) mechanisms. Below, we share our insights, our approach, and some research breakthroughs.

And if you’re in a rush, we have a TLDR at the end.

Privacy-preserving Machine Learning (PPML)

Recent advances in academia and industry have focused on incorporating privacy mechanisms into machine learning models, highlighting a significant move towards privacy-preserving machine learning (PPML). At Bagel, we have experimented with all the major PPML techniques, particularly those post differential privacy. Our work, positioned at the intersection of AI and cryptography, draws from the cutting edge in both domains.

Our research covered a wide range of PPML techniques suitable for our platform. Among those, Differential Privacy (DP), Federated LearningZero-knowledge Machine Learning (ZKML) and Fully Homomorphic Encryption Machine Learning (FHEML) stood out for their potential in PPML.

First, we will delve into each of these, examining their advantages and drawbacks. In subsequent posts, we will describe Bagel’s approach to data privacy, which addresses and resolves the challenges associated with the existing solutions.


Differential Privacy (DP)

One of the first and most important techniques with a mathematical guarantee for incorporating privacy into data is differential privacy or DP (Dwork et al. 2006), addressing the challenges faced by earlier methods with a quantifiable privacy definition.

DP ensures that a randomized algorithm, A, maintains privacy across datasets D1 and D2—which differ by a single record—by keeping the probability of A(D1) and A(D2) generating identical outcomes relatively unchanged. This principle implies that minor dataset modifications do not significantly alter outcome probabilities, marking a pivotal advancement in data privacy.

The application of DP in machine learning, particularly in neural network training and inference, demonstrates its versatility and effectiveness. Notable implementations include adapting DP for supervised learning algorithms by integrating random noise at various phases: directly onto the data, within the training process, or during inference, as highlighted by Ponomareva et al. (2023) and further references.

The balance between privacy and accuracy in DP is influenced by the noise level: greater noise enhances privacy at the cost of accuracy, affecting both inference and training stages. This relationship was explored by Abadi et al. in (2016) through the introduction of Gaussian noise to the stochastic gradient descent (DP-SGD) algorithm, observing the noise’s impact on accuracy across the MNIST and CIFAR-10 datasets.

An innovative DP application, Private Aggregation of Teacher Ensembles (PATE) by Papernot et al. in (2016), divides a dataset into disjoint subsets, training networks on each without privacy, termed as teachers. These networks’ aggregated inferences, subjected to added noise for privacy, inform the training of a student model to emulate the teacher ensemble. This method also underscores the trade-off between privacy enhancement through noise addition and the resultant accuracy reduction.

Further studies affirm that while privacy can be secured with little impact on execution times (Li et a. 2015), stringent privacy measures can obscure discernible patterns essential for learning (Abadi et al. 2016). Consequently, a certain level of privacy must be relinquished in DP to facilitate effective machine learning model training, illustrating the nuanced balance between privacy preservation and learning efficiency.

Pros of Differential Privacy

The advantages of using DP are:

Effortless. Easy to implement into algorithms and code.

Algorithm independence. Schemes can be made independent of the training or inference algorithm.

Fast. Some DP mechanisms have shown to have little impact on the execution times of algorithms.

Tunable privacy. The degree of desired privacy can be chosen by the algorithm designer.

Cons of Differential Privacy

Access to private data is still necessary. Teachers in the PATE scheme must have full access to the private data (Papernot et al. 2016) in order to train a neural network. Also, the stochastic gradient descent algorithm based on DP only adds noise to the weight updates and needs access to private data for training (Abadi et al. 2016).

Privacy-Accuracy-Speed trade-off on data. All implementations must sacrifice some privacy in order to get good results. If there is no discernable pattern in the input, then there is nothing to train (Feyisetan et al. 2020). The implementation of some noise mechanisms can impact execution times, necessitating a balance between speed and the goals of privacy and accuracy.


Zero-Knowledge Machine Learning (ZKML)

A zero-knowledge proof system (ZKP) is a method allowing a prover P to convince a verifier V about the truth of a statement without disclosing any information apart from the statement’s veracity. To affirm the statement’s truth, P produces a proof π for V to review, enabling V to be convinced of the statement’s truthfulness.

Zero-Knowledge Machine Learning (ZKML) is an approach that combines the principles of zero-knowledge proofs (ZKPs) with machine learning. This integration allows machine learning models to be trained and to infer with verifiability.

For an in-depth examination of ZKML, refer to the work by Xin et al. in (2023). Below we provide a brief explanation that focuses on the utilization of ZKPs for neural network training and inference.

ZKML Inference

Consider an unlabeled dataset A and a pretrained neural network N tasked with labeling each record in A. To generate a ZK proof of N‘s computation during labeling, an arithmetic circuit C representing N is required, including circuits for each neuron’s activation function. Assuming such a circuit C exists and is publicly accessible, the network’s weights and a dataset record become the private and public inputs, respectively. For any record a of AN‘s output is denoted by a pair (l,π), where l is the label and π is a zero-knowledge argument asserting the existence of specific weights that facilitated the labeling.

This model illustrates how ZK proves the accurate execution of a neural network on data, concealing the network’s weights within a ZK proof. Consequently, any verifier can be assured that the executing agent possesses the necessary weights.

ZKML Training

ZKPs are applicable during training to validate N‘s correct execution on a labeled dataset A. Here, A serves as the public input, with an arithmetic circuit C depicting the neural network N. The training process requires an additional arithmetic circuit to implement the optimization function, minimizing the loss function. For each training epoch i, a proof π_i is generated, confirming the algorithm’s accurate execution through epochs 1 to i-1, including the validity of the preceding epoch’s proof. The training culminates with a compressed proof π, proving the correct training over dataset A.

The explanation above illustrates that during training, the network’s weights are concealed to ensure that the training is correctly executed on the given dataset A. Additionally, all internal states of the network remain undisclosed throughout the training process.

Pros of ZKML

The advantages of using ZKPs with neural networks are:

Privacy of model weights. The weights of the neural network are never revealed during training or inference in any way. The weights and the internal states of the network algorithm are private inputs for the ZKP.

Verifiability. The proof certifies the proper execution of training or inference processes and guarantees the accurate computation of weights.

Trustlessness. The proof and its verification properties ensure that the data owner is not required to place trust in the agent operating the neural network. Instead, the data owner can rely on the proof to confirm the accuracy of both the computation and the existence of correct weights.

Cons of ZKML

The disadvantages of using ZKPs with neural networks are:

No data privacy. The agent running the neural network needs access to the data in order to train or do inference. Data is considered a parameter that is publicly known to the data owner and the prover running the neural network (Xing et al. 2023).

No privacy for the model’s algorithm. In order to create a ZK proof, the algorithm of the entire neural network should be publicly known. This includes the activation functions, the loss function, optimization algorithm used, etc (Xing et al. 2023).

Proof generation of an expensive computation. Presently, the process of generating a ZK proof is computationally demanding—-see for example this report on the computation times of ZK provers. Creating a proof for each epoch within a training algorithm can exacerbate the computational burden of an already resource-intensive task.


Federated Learning (FL)

In Federated Learning or FL we look to train a global model using a dataset that is distributed in multiple servers with local data samples but without each server sharing their local data.

In FL there is a global objective function that is being optimized which is defined as

𝑓(𝑥1,…,𝑥𝑛)=1𝑛∑𝑖=1𝑛𝑓𝑖(𝑥𝑖),\(f(x_1,\dots,x_n)=\frac 1 n \sum_{i=1}^n f_i(x_i),\)

where n is the number of servers, each variables is the set of parameter as viewed by the server i, and each function is a local objective function of server i. FL tries to find the best set of values that optimizes f.

The figure below shows the general process in FL.

  1. Initialization. An initial global model is created and distributed by a central server to all other servers.
  2. Local training. Each server trains the model using their local data. This ensures data privacy and security.
  3. Model update. After training, each server shares with the central server their local updates like gradients and parameters.
  4. Aggregation. The central server receives all local updates and aggregates them into the global model, for example, using averaging.
  5. Model distribution. The updated model is distributed again with local servers and the previous steps are repeated until a desired level of performance is achieve by the global model.

Since local servers never share their local data, FL guarantees privacy over that data. However, the model being constructed is shared among all parties, and hence, its structure and set of parameters are not hidden.

Pros of FL

The advantages of using FL are:

Data privacy. The local data on the local servers are never shared. All computations are done locally, and there is no need of communication between them.

Distributed computing. The creation of the global model is distributed among local servers, thereby parallelizing a resource-intensive computation. Thus, FL is considered a distributed machine learning framework (Xu et al. 2021).

Cons of FL

The disadvantages of using FL are:

Model is not private. The global model is shared among each local server in order to do their computations locally. This includes the aggregated weights and gradients at each step of the FL process. Thus, each local server is aware of the entire architecture of the global model (Konečný et al. 2016).

Data leakage. Recent research indicates that data leakage remains a persistent issue, notably through mechanisms such as gradient sharing—see for example Jin et al. (2022). Consequently, FL cannot provide complete assurances of data privacy.

Trust. Since no proofs are generated in FL, every party involved in the process need to be trusted that their computation and parameters were computed as expected (Gao et al. 2023).


Fully Homomorphic Encryption (FHE)

At its core, homomorphic encryption permits computations on encrypted data. By “homomorphic,” we refer to the capacity of an encryption scheme to allow specific operations on ciphertexts that, when decrypted, yield the same result as operations performed directly on the plaintexts.

Consider a scenario with a secret key k and a plaintext m. In an encryption scheme (E,D), where E and D represent encryption and decryption algorithms respectively, the condition D(k,E(k,m))=m must hold. A scheme (E,D) is deemed fully homomorphic if for any key k and messages m, the properties E(k,m+m’)=E(k,m)+E(k,m’) and E(k,m*m’)=E(k,m)* E(k,m’) are satisfied, with addition and multiplication defined over a finite field. If only one operation is supported, the scheme is partially homomorphic. This definition implies that operations on encrypted data mirror those on plaintext, crucial for maintaining data privacy during processing.

In plain words, if we have a fully homomorphic encryption scheme, then operating over the encrypted data is equivalent to operating over the plaintext. We will write FHE to refer to a fully homomorphic encryption scheme. The figure below shows how an arbitrary homomorphic operation works over a plaintext and ciphertext.

The homomorphic property of FHE makes it invaluable in situations where data must remain secure while still being used for computations. For instance, if we possess sensitive data and require a third party to perform data analysis on it, we can rely on FHE to encrypt the data. This allows the third party to conduct analysis on the encrypted data without the need for decryption. The mathematical properties of FHE guarantee the accuracy of the analysis results.

FHE Inference

Fully Homomorphic Encryption (FHE) can be used to perform inference in neural networks while preserving data privacy. Let’s consider a scenario where N is a pretrained neural network, A is a dataset, and (E,D) is an asymmetric FHE scheme. The goal is to perform inference on a record a of A without revealing the sensitive information contained in a to the neural network.

The inference process using FHE begins with encryption. The data owner encrypts the record a using the encryption algorithm E with the public key public_key, obtaining the encrypted record a’ = E(public_key, a).

Next, the data owner sends the encrypted record a’ along with public_key to the neural network N. The neural network N must have knowledge of the encryption scheme (E,D) and its parameters to correctly apply homomorphic operations over the encrypted data a’. Any arithmetic operation performed by N can be safely applied to a’ due to the homomorphic properties of the encryption scheme.

One challenge in using FHE for neural network inference is handling non-linear activation functions, such as sigmoid and ReLU, which involve non-arithmetic computations. To compute these functions homomorphically, they need to be approximated by low-degree polynomials. The approximations allow the activation functions to be computed using homomorphic operations on the encrypted data a’.

After applying the necessary homomorphic operations and approximated activation functions, the neural network N obtains the inference result. It’s important to note that the inference result is still in encrypted form, as all computations were performed on encrypted data.

Finally, the encrypted inference result is sent back to the data owner, who uses the private key associated with the FHE scheme to decrypt the result using the decryption algorithm D. The decrypted inference result is obtained, which can be interpreted and utilized by the data owner.

By following this inference process, the neural network N can perform computations on the encrypted data a’ without having access to the original sensitive information. The FHE scheme ensures that the data remains encrypted throughout the inference process, and only the data owner with the private key can decrypt the final result.

It’s important to note that the neural network N must be designed and trained to work with the specific FHE scheme and its parameters. Additionally, the approximation of non-linear activation functions by low-degree polynomials may introduce some level of approximation error, which should be considered and evaluated based on the specific application and accuracy requirements.

FHE Training

The process of training a neural network using Fully Homomorphic Encryption (FHE) is conceptually similar to performing inference, but with a few key differences. Let’s dive into the details.

Imagine we have an untrained neural network N and an encrypted dataset A’ = E(public_key, A), where E is the encryption function and public_key is the public key of an asymmetric FHE scheme. Our goal is to train N on the encrypted data A’ while preserving the privacy of the original dataset A.

The training process unfolds as follows. Each operation performed by the network and the training algorithm is executed on each encrypted record a’ of A'. This includes both the forward and backward passes of the network. As with inference, any non-arithmetic operations like activation functions need to be approximated using low-degree polynomials to be compatible with the homomorphic properties of FHE.

A fascinating aspect of this approach is that the weights obtained during training are themselves encrypted. They can only be decrypted using the private key of the FHE scheme, which is held exclusively by the data owner. This means that even the agent executing the neural network training never has access to the actual weight values, only their encrypted counterparts.

Think about the implications of this. The data owner can outsource the computational heavy lifting of training to a third party, like a cloud provider with powerful GPUs, without ever revealing their sensitive data. The training process operates on encrypted data and produces encrypted weights, ensuring end-to-end privacy.

Once training is complete, the neural network sends the collection of encrypted weights w’ back to the data owner. The data owner can then decrypt the weights using his private key, obtaining the final trained model. He is the sole party capable of accessing the unencrypted weights and using the model for inference on plaintext data.

There are a few caveats to keep in mind. FHE operations are computationally expensive, so training a neural network with FHE will generally be slower than training on unencrypted data.

Pros of FHE

The advantages of using FHE are:

Data privacy. Third-party access to encrypted private data is effectively prevented, a security guarantee upheld by the assurances of FHE and lattice-based cryptography (Gentry 2009).

Model privacy. Training and inference processes are carried out on encrypted data, eliminating the need to share or publicize the neural network’s parameters for accurate data analysis.

Effectiveness. Previous studies have demonstrated that neural networks operating on encrypted data using FHE maintain their accuracy—see for example Nandakumar et al. (2019) and Xu et al. (2019). Therefore, we can be assured that employing FHE for training and inference processes will achieve the anticipated outcomes.

Quantum resistance. The security of FHE, unlike other encryption schemes, is grounded in difficult problems derived from Lattice theory. These problems are considered to be hard even for quantum computers (Regev 2005), thus offering enhanced protection against potential quantum threats in the future.

Cons of FHE

The disadvantages of using FHE are:

Verifiability. FHE does not offer proofs of correct encryption nor correct computation. Hence, we must rely on trust that the data intended for encryption is indeed the correct data (Viand et al. 2023).

Speed. Relative to conventional encryption schemes, FHE is still considered to be slow during parameter setups, encryption and decryption algorithms (Gorantala et al. 2023).

Memory requirements. The number of weights that need to be encrypted are proportional to the size of the network. Even for small networks, the RAM memory requirements are in the order of gigabytes (Chen et al. 2018)(Nandakumar et al. 2019).

Usability. FHE schemes use many parameters that need to be carefully tuned and requires extensive experience from users (Al Badawi et al. 2022)(Halevi & Shoup 2020).


TLDR

We examined the four most widely used privacy-preserving techniques in machine learning, focusing on neural network training and inference. We evaluated these techniques across four dimensions: data privacy, model algorithm privacy, model weights privacy, and verifiability.

Data privacy considers the model owner’s access to private data. Differential privacy (DP) and zero-knowledge machine learning (ZKML) require access to private data for training and proof generation, respectively. Federated learning (FL) enables training and inference without revealing data, while fully homomorphic encryption (FHE) allows computations on encrypted data.

Model algorithm privacy refers to the data owner’s access to the model’s algorithms. DP does not require algorithm disclosure, while ZKML necessitates it for proof generation. FL distributes algorithms among local servers, and FHE operates without accessing the model’s algorithms.

Model weights privacy concerns the data owner’s access to the model’s weights. DP and ZKML keep weights undisclosed or provide proofs of existence without revealing values. FL involves exchanging weights among servers for decentralized learning, contrasting with DP and ZKML’s privacy-preserving mechanisms. FHE enables training and inference on encrypted data, eliminating the need for model owners to know the weights.

Verifiability refers to the inherent capabilities for verifiable computation. ZKML inherently provides this capability. DP, FL, and FHE would not provide similar levels of integrity assurance.

The table below summarizes our findings:

What’s Next 🥯

At Bagel, we recognize that existing privacy-preserving machine learning solutions fall short in providing end-to-end privacy, scalability, and strong trust assumptions. To address these limitations, our team has developed a novel approach based on a modified version of homomorphic encryption (FHE).

Our pilot results are extremely promising, indicating that our solution has the potential to revolutionize the field of privacy-preserving machine learning. By leveraging the strengths of homomorphic encryption and optimizing its performance, we aim to deliver a scalable, trustworthy, and truly private machine learning framework.

We believe that our work represents a paradigm shift in the way machine learning is conducted, ensuring that the benefits of AI can be harnessed without compromising user privacy or data security. As we continue to share more about our approach, we invite you to follow our progress by subscribing to the Bagel blog.


For more thought pieces from Bagel, follow out their blog here.

To stay updated on the latest Filecoin happenings, follow the @Filecointldr handle.

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.

The Decentralized Storage space is rapidly evolving. Filecoin is at an important moment – and in this blog we propose both areas for the ecosystem to double down on and ways we can track that progress. It is by no means exhaustive, but written from the vantage point of having been embedded in the Filecoin ecosystem for many years, gathering feedback from users, builders and the community, and having thought deeply about what is needed as the network moves forward. 


The blog is organized in the following sections: 

  • What matters for Filecoin in 2024
  • Why these matter and how to measure progress

It is our hope that with the right north star, teams will be able to better coordinate and identify convergences between project-level interests & ecosystem interests. The proposed framework and metrics should make it easier for capital and resource allocators in the ecosystem to evaluate the level of impact each team is creating, and distribute capital and resources accordingly. For startups, this can help frame where broader ecosystem efforts may dovetail into your roadmap and releases.


WHAT MATTERS IN 2024

  1. Accelerating conversions to paid deals: Helping Filecoin providers increase their paid services (storage, retrieval, compute) is critical for driving cashflows into Filecoin and to support sustainable funding of its hardware outside of token incentives.

  2. Growing on-chain activity: Filecoin is not aiming to be just another L1 fighting over the same use cases. But it does have a unique value proposition as a base layer with “real world” services anchored into it. This enables new use cases (programmable services, DeFi around cash flows, etc.) that are unique to Filecoin. Building out and growing adoption of these services can help prove that Filecoin is not just “a storage layer”, but an economy with a stable set of cash flows.

  3. Making Filecoin indispensable to others: Bull cycles mean velocity is critical – as is making Filecoin an indispensable part of the stack for more teams. There are many emerging themes to capitalize on (Chain Archival, Compute, AI) – and Filecoin positioning itself matters. The ecosystem collectively wins when more participants leverage Filecoin as a core part of their story. For individual teams, this means that shipping to your users matters. At the ecosystem level, it means orienting efforts to unblock the teams closest to driving integrations and building services on Filecoin.

The verticals in our framework remain relatively high-level – and many of these objectives will have their own set of tasks. But it is more critical first, for the ecosystem to be aligned that this is the right set of verticals to progress against. We dive into each vertical and some tangible metrics that the ecosystem should start tracking against.



WHY THESE MATTER AND HOW TO MEASURE PROGRESS

1) Accelerating conversion to paid deals

As a storage network – Filecoin should maximize the cashflows it can bring into its economy. Having incentives as an accelerant is fine – but without having a steady (and growing ramp) of paid deals Filecoin can’t achieve its maximum potential.

Paid deals (when settled on-chain) are a net capital inflow into the Filecoin economy that can be the substrate for use cases uniquely possible in our ecosystem. DeFi as an example has a real opportunity to provide actual services to businesses (e.g. converting currencies to pay for storage).

There are two main paths that we can drive growth of paid services:

  • Drive growth in existing services (data archival)
  • Expand to new markets with additional services (hot storage, compute, indexing, etc.)


In both cases, there’s work to be done to reduce friction for paid on-ramps or ship new features that raise the floor (as informed by on-ramps and projects trying to bring Filecoin services to market). It is critical that the Filecoin ecosystem collectively prioritizes the right efforts to make Filecoin services sellable, and allocate resources accordingly.

There are already a number of teams making substantial progress on this front (CID.Gravity, Seal Storage, Holon, Banyan, Lighthouse.storage, Web3Mine, Basin, among others) – and we can best measure progress by helping reduce their friction and helping drive their success.


We propose measuring success for this vertical in two forms:

  1. Dollars and Total Data Stored for Paid Deals (self reported)
  2. Dollars and Total Data Stored for Paid Deals (on-chain)

There are a number of initiatives from public goods teams along these efforts for the quarter (Q2 2024) which include: 

  • FilOz: is working on a FIP for new proofs to reduce storage costs and dramatically improve retrieval speeds
  • DeStor: is helping drive enterprise adoption for business ready on-ramps
  • Ansa Research, Filecoin Foundation, etc.: Web3 BD support for ecosystem builders
  • Targeted grant funding for efforts that directly support growth of sustainable on-chain paid deal activity


2) Growing on-chain activity

Filecoin, as an L1, has more than just its storage service. Building a robust on-chain economy is critical for accelerating the services and tooling with which others can compose. In the Filecoin ecosystem, we have a unique opportunity in that there are real economic flows to enable via paid on-chain deals.

Centering our on-chain economy around supporting those flows – be it from automating renewals, designing incentives for retrievals, creating endowments for perpetual storage, or building economic efficiency for the operators of the network – can lead to compounding growth as it creates a flywheel.

As Filecoin owns more of its own economic activity on-chain, value will accrue for the token – enabling ecosystem users to use Filecoin in more productive ways, generating real demand for services inside the ecosystem. 

We propose the following metrics for us to collectively measure success: 

  1. Contract calls
  2. Active Filecoin addresses
  3. Volume of on-chain payments

There are notable builders already seeding the on-chain infrastructure to leverage some of these primitives (teams like GLIF working on liquid staking, Lighthouse on storage endowments, and teams like Fluence enabling compute).

There’s a set of improvements that can dramatically reduce friction for driving on-chain activity, and there several efforts prioritized against this for Q2 2024:

  • FilOz: F3 to bring fast finality to Filecoin can both improve the bridging experience, and enable more “trade” between Filecoin and other economies (e.g. native payments from other ecosystems for services in Filecoin). 
  • FilOz: Refactoring how deals work on Filecoin to enable more flexible payment (e.g. with stablecoins)
  • FilPonto, FilOz: Reducing EVM tech debt to substantially reduce friction for builders porting Solidity contracts onto Filecoin (and hardening the surrounding infrastructure for more stable services)



3) Making Filecoin indispensable to others

This vertical is broad, but we would argue that there are two key ways to be consider the impact that the Filecoin ecosystem is driving:

  1. The first is along high profile integrations, where Filecoin is critical to the success of the customer and its proposition. It is especially critical for the ecosystem to provide the necessary support for these cross-chain integrations.
  1. The second is along specific verticals, where there is a large and growing trend in activity; Filecoin is uniquely positioned to provide value here, both in terms of the primitives it has, as well as in its cost profile and scale
    1. Opportunities are brimming in Web3 at the moment, and the ecosystem should rally workstreams around on-ramps that are making Filecoin integral to narratives such as Compute, DePIN (sensors), Social, Gaming, AI, and Chain Archival.


We propose that the metrics to evaluate for Filecoin indispensability as: 

  1. Number of partnerships and integrations

There are a number of efforts from ecosystem teams aimed at helping onramps succeed on this front in the quarter (Q2 2024): 

  • Ansa Research, Filecoin Foundation, DeStor and others: Forming a new working group to accelerate shared ecosystem BD and marketing resources
    • Shared BD resources for builders in the Filecoin ecosystem
    • Shared Marketing resources and amplification (#ecosystem-amplification-requests in the Filecoin slack) to help signal boost ecosystem wins
    • Community Discord to help expand accessibility, visibility, and drive community engagement



FINAL THOUGHTS

After reading the above, we hope that the direction of Filecoin in the coming year is clearer. Filecoin is at a pivotal moment where many of its pieces are coming together. Protocols and ecosystems naturally evolve and each stage calls for different priorities and strategies for the next leg of growth. By focusing efforts in the ecosystem, we believe that the Filecoin ecosystem can make its resources and support go that much further. 

We are excited for what is to come and how Filecoin can continue to expand the pie for what can be done on Web3 rails. Moving forward, Ansa Research will post periodic updates on the key metrics for Filecoin’s ecosystem progress.

To stay updated on the latest Filecoin happenings, follow the @Filecointldr handle.

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.

Editor’s Note: This blogpost is a repost of the original content published on 5 April 2024, by Turan Vural Yuki Yuminaga from Fenbushi Capital. Established in 2015, Fenbushi Capital holds the distinction of being Asia’s pioneering blockchain-focused asset management firm with an AUM of $1.6 billion. Through research and investment, the firm aims to play a vital role in shaping the future of blockchain tech across diverse sectors. This blogpost is an example of these efforts, and represents the independent view of these authors, whom have given permission for this re-publication.

Data availability (DA) is a core technology in the scaling of Ethereum, allowing a node to efficiently verify that data is available to the network without having to host the data in question. This is essential for the efficient building of rollups and other forms of vertical scaling, allowing execution nodes to ensure that transaction data is available during the settlement period. This is also crucial for sharding and other forms of horizontal scaling, a planned future update to the Ethereum network, as nodes will need to prove that transaction data (or blobs) stored in network shards are indeed available to the network.

Several DA solutions have been discussed and released recently (e.g., CelestiaEigenDAAvail), all with the intent of providing performant and secure infrastructure for applications to post DA.

The advantage of an external DA solution over an L1 such as Ethereum is that it provides an inexpensive and performant vehicle for on-chain data. DA solutions often consist of their own public chains built to enable cheap and permissionless storage. Even with modifications, the fact remains that hosting data natively from a blockchain is extremely inefficient.

Thus, we find that it is intuitive to explore a storage-optimized solution such as Filecoin for the basis of a DA layer. Filecoin uses its blockchain to coordinate storage deals between clients and storage providers but allows data to be stored off-chain.

In this post, we investigate the viability of a DA solution built on top of a Distributed Storage Network (DSN). We consider Filecoin specifically, as it is the most adopted DSN to date. We outline the opportunities that such a solution would offer, and the challenges that need to be overcome to build it.

A DA layer provides the following to services relying on it:

  1. Client Safety: No node can be convinced that unavailable data is available.
  2. Global Safety: The un/availability of data is agreed upon by all except at most a small minority of nodes.
  3. Efficient data retrievability.

All of this needs to be done efficiently to enable scaling. A DA layer provides higher performance at a lower cost across the three points above. For example, any node can request a full copy of the data to prove custody, but this is inefficient. By having a system that provides all three of these, we achieve a DA layer that provides the security required for L2s to coordinate with an L1, along with stronger lower bounds in the presence of a malicious majority.

Custody of Data

Data posted to a DA solution has a useful lifetime: long enough to settle disputes or verify a state transition. Transaction data needs to be available only long enough to verify a correct state transition or to give validators enough opportunity to construct fraud proofs. As of writing, Ethereum calldata is the most common solution used by projects (rollups) requiring data availability.

Efficient Verification of Data

Data Availability Sampling (DAS) is the standard method of answering the question of DA. It comes with additional security benefits, strengthening network actors’ ability to verify state information from their peers. However, it relies on nodes to perform sampling: DAS requests must be answered to ensure mined transactions won’t be rejected, but there is no positive or negative incentive for a node to request samples. From the perspective of nodes that request samples, there is no negative penalty for not performing DAS. As an example, Celestia provides the first and only light client implementation to perform DAS, delivering stronger security assumptions to users and reducing the cost of data verification.

Efficient Access

A DA needs to provide efficient access to data to the projects using it. A slow DA may become the bottleneck for the services relying on it, causing inefficiencies at best and system failures at worst.

Decentralized Storage Network

A Decentralized Storage Network (DSN, as formalized in the Filecoin Whitepaper¹) is a permissionless network of storage providers that offer storage services for users of the network. Informally, it allows independent storage providers to coordinate storage deals with clients that need storage services and provides cheap and resilient data storage to clients seeking storage services at a low price. This is coordinated through a blockchain that records storage deals and enables the execution of smart contracts.

A DSN scheme is a tuple of three protocols: Put, Get, and Manage. This tuple comes with properties such as fault tolerance guarantees and participation incentives.

Put(data) → key
Clients execute Put to store data under a unique key. This is achieved by specifying the duration for which data will be stored on the network, the number of replicas of the data that are to be stored for redundancy, and a negotiated price with storage providers.

Get(key) → data
Clients execute Get to retrieve data that is being stored under a key.

Manage()
The Manage protocol is called by network participants to coordinate the storage space and services made available by providers and repair faults. In the case of Filecoin, this is managed via a blockchain. This blockchain records data deals being made between clients and data providers and proofs of correctly stored data to ensure that data deals are being upheld. Correctly stored data is proved via the posting of proofs generated by data providers in response to challenges from the network. A storage fault occurs when a storage provider fails to generate a Proof-of-Replication or Proof-of-Spacetime promptly when requested by the Manage protocol, which results in the slashing of the storage provider’s stake. Deals can self-heal in the case of a storage fault if more than one provider is hosting a copy of the data on the network by finding a new storage provider to honor the storage deal.

DSN Opportunities

The work done thus far in DA projects has been to transform a blockchain into a platform for hot storage. Since a DSN is storage-optimized, rather than transforming a blockchain into a storage platform, we can simply transform a storage platform into one that provides data availability. The collateral of storage providers in the form of native FIL token can provide crypto-economic security that guarantees data is stored. Finally, the programmability of storage deals can provide flexibility around the terms of data availability.

The most compelling motivation to transform the capabilities of a DSN to solve DA is the cost reduction in the data storage under the DA solution. As we discuss below, the cost of storing data on Filecoin is significantly cheaper than storing data on Ethereum. Given current Ether/USD prices, it costs over 3 million USD to write 1 GB of calldata to Ethereum, only to be pruned after 21 days. This calldata expense can contribute to over half of the transaction cost of an Ethereum-based rollup. However, 1 GB of storage on Filecoin costs less than .0002 USD per month. Securing DA at this or any similar price would bring transaction costs down for users and contribute to the performance and scalability of Web3.

Economic Security

In Filecoin, collateral is required to make storage space available. This collateral is slashed when a provider fails to honor its deals or uphold network guarantees. A storage provider that fails to provide services faces losing both its posted collateral and any profit that would have been earned from providing storage.

Incentive Alignment

Many of Filecoin’s protocol incentives align with the goals of DA. Filecoin provides disincentives for malicious or lazy behavior: storage providers must actively provide proofs of storage during consensus in the form of Proof-of-Replicas and Proof-of-Spacetime, continuously proving that the storage exists without honest majority assumptions. Failure of a storage provider to provide proof results in stake slashing, and removal from consensus, among other penalties. Current DA solutions lack incentive for nodes to perform DAS, relying on ad-hoc altruistic behavior for proof of DA.

Programmability

The ability to customize data deals also makes a DSN an attractive platform for DA. Data deals can have varying durations, allowing users of a DSN-based DA to pay for only the DA that they need. Fault tolerance can also be tuned by setting the number of copies that are to be stored throughout the network. Further customization is supported via smart contracts on Filecoin (called Actors), which are executed on the FEVM. This leads to Filecoin’s growing ecosystem of DApps, from compute-over-storage solutions such as Bacalhau to DeFi and liquid staking solutions such as GlifRetriev makes use of Filecoin Actors to provide incentive-aligned retrieval with permissioned referees. Filecoin’s programmability can be used to tailor DA requirements needed for different solutions, so that platforms that rely on DA are not paying for more DA than they need.

Challenges to a DSN-Based DA Architecture

In our investigation, we have identified significant challenges that need to be overcome before a DA service can be built on a DSN. As we now talk about the feasibility of implementation, we will use Filecoin as our main focus of the discussion.

Proof Latency

The cryptographic proofs that ensure the integrity of deals and stored data on Filecoin take time to prove. When data is committed to the network, it is partitioned into 32 gigabyte sectors and “sealed.” The sealing of data is the foundation of both the Proof-of-Replication (PoRep), which proves that a storage provider is storing one or more unique copies of the data, and Proof-of-Spacetime (PoST), which proves that a storage provider stored a unique copy continuously throughout the duration of the storage deal. Sealing has to be computationally expensive to ensure that storage providers aren’t sealing data on demand to undermine the required PoReP. When the protocol presents the periodic challenge to a storage provider to provide proof of unique and continuous storage, sealing has to safely take longer than the response window so that a storage provider can’t falsify proofs or replicas on the fly. For this reason, it can take providers approximately three hours to seal a sector of data.

Storage Threshold

Because of the computational expense of the sealing operation, the sector size of the data being sealed has to be economically worthwhile. The price of storage has to justify the cost of sealing to the storage provider, and likewise, the resulting cost of data being stored has to be low enough at scale (in this case, for an approximately 32GB chunk) for a client to want to store data on Filecoin. Although smaller sectors could be sealed, this would drive up the price of storage to compensate storage providers. To get around this, data aggregators collect smaller pieces of data from users to be committed to Filecoin as a chunk close to 32 GB. Data aggregators commit to user’s data via a Proof-of-Data-Segment-Inclusion (PoDSI), which guarantees the inclusion of a user’s data in a sector, and a sub-piece CID (pCID), which the user will be able to use to retrieve the data from the network.

Consensus Constraints

Filecoin’s consensus mechanism, Expected Consensus, has a block time of 30 seconds and finality within hours, which may improve in the near future (see FIP-0086 for fast finality on Filecoin). This is generally too slow to support the transaction throughput needed for a Layer 2 relying on DA for transaction data. Filecoin’s block time is lower-bounded by storage provider hardware; the lower the block time, the more difficult it is for storage providers to generate and provide proofs of storage, and the more storage providers will be falsely penalized for missing the proving window for the proper storage of data. To overcome this, InterPlanetary Consensus (IPC) subnets can be leveraged to take advantage of faster consensus times. IPC uses Tendermint-like consensus and DRAND for randomness: in the case that DRAND is the bottleneck, we would be able to achieve a 3-second block-time with an IPC subnet. In the case of a Tendermint bottleneck, PoCs such as Narwhal have achieved blocktimes in the hundreds of milliseconds.

Retrieval Speed

The final barrier-to-build is retrieval. From the constraints above, we can deduce that Filecoin is suitable for cold or lukewarm storage. However, the DA data is hot and needs to support performant applications. Incentive-aligned retrieval is difficult in Filecoin; data needs to be unsealed before it is served to clients, which adds latency. Currently, rapid retrieval is done via SLAs or the storage of un-sealed data alongside sealed sectors, neither of which can be relied on in the architecture of a secure and permissionless application on Filecoin. Especially with Retriev proving that retrieval can be guaranteed via the FVM, incentive-aligned rapid retrieval on Filecoin remains an area to be further explored.

Cost Analysis

In this section, we consider the cost that comes from these design considerations. We show the cost of storing 32GB as Ethereum calldata, Celestia blobdata, EigenDA blobdata, and as a sector on Filecoin using near-current market prices.

The analysis highlights the price of Ethereum calldata: 100 million USD for 32 GB of data. This price showcases the cost of security behind Ethereum’s consensus, and is subject to the volatility of Ether and gas prices. The Dencun upgrade, which introduced Proto-Danksharding (EIP-4844), introduced blob transactions with a target of 3 blobs per block of approximately 125 KB each, and variable gas blob pricing to maintain the target amount of blobs per block. This upgrade cut the cost of Ethereum DA by ⅕: 20 million USD for 32 GB of blob data.

Celestia and EigenDA provide significant improvements: 8,000 and 26,000 USD for 32 GB of data, respectively. Both are subject to the volatility of market prices and reflect to some extent the cost of consensus securing their data: Celestia with its native TIA token, and EigenDA with Ether.

In all of the above cases, the data stored is not permanent. Ethereum calldata is stored for 3 weeks, with blobs stored for 18 days. EigenDA stores blobs for a default of 14 days. As of the current Celestia implementation, blob data is stored indefinitely by archival nodes but only sampled by light nodes for a maximum of 30 days.

The final two tables are direct comparisons between Filecoin and current DA solutions. Cost equivalence first lists the cost of a single byte of data on the given platform. The amount of Filecoin bytes that can be stored for the same amount of time for the same cost is then shown.

This shows that Filecoin is orders of magnitude cheaper than current DA solutions, costing fractions of a cent to store the same amount of data for the same amount of time. Unlike Ethereum nodes and that of other DA solutions, Filecoin’s nodes are optimized to provide storage services, and its proof system allows nodes to prove storage, rather than replicate storage across every node in the network. Without accounting for the economics of storage providers (such as the energy cost to seal data), it shows that the basic overhead of the storage process on Filecoin is negligible. This shows a market opportunity in the millions of USD per gigabyte compared to Ethereum for a system that can provide secure and performant DA services on Filecoin.

Throughput

Below, we consider the capacity of DA solutions and the demand that is generated by major layer 2 rollups.

Because Filecoin’s blockchain is organized in tipsets with multiple blocks at every block-height, the number of deals that can be done is not restricted by consensus or block size. The strict data constraint of Filecoin is that of its network-wide storage capacity, not what is allowed via consensus.

For daily DA demand, we pull data from Rollups DA and Execution from Terry Chung and Wei Dai, which includes a daily average across 30 days and a singular sampled day. This allows us to consider average demand while not overlooking aberrations from the average (for example, Optimism’s demand on 8/15/2023 of approximately 261,000,000 bytes was over 4x its 30 day average of 64,000,000 bytes).

From this selection, we see that despite the opportunity of lower DA cost, we would need a dramatic increase in DA demand to make efficient use of the 32 GB sector size of Filecoin. Although sealing 32 GB sectors with less than 32 GB of data would be a waste of resources, we could do so while still reaping a cost advantage.

Architecture

In this section, we consider the technical architecture that can be achieved if we were to build this today. We will consider this architecture in the context of arbitrary L2 applications and an L1 chain that the L2 is serving. Since this solution is an external DA solution, like that of Celestia and EigenDA, we do not consider Filecoin as example L1.

Components

Even at a high-level, a DA on Filecoin will make use of many different features of the Filecoin ecosystem.

Transactions: Downstream users make transactions on a platform that requires DA. This could be an L2.

Platforms Using DA: These are the platforms that use DA as a service. This could be an L2 which posts transaction data to the Filecoin DA and commitments to an L1, such as Ethereum.

Layer 1: This is any L1 that contains commitments pointing to data on the DA solution. This could be Ethereum, supporting an L2 that leverages the Filecoin DA solution.

Aggregator: The frontend of Filecoin-based DA solution is an aggregator, a centralized component which receives transaction data from L2’s and other DA clients and aggregates them into 32 GB sectors suitable for sealing. Although a simple proof-of-concept would include a centralized aggregator, platforms using the DA solution could also run their own aggregator,for example as a sidecar to an L2 sequencer. The centralization of the aggregator can be seen as similar to that of an L2 sequencer or EigenDA’s disperser. Once the aggregator has compiled a payload near 32GB, it makes a storage deal with storage providers to store the data. Clients are given a guarantee that their data will be included in the sector in the form of a PoDSI (Proof of Data Segment Inclusion), and a pCID to identify their data once it is on the network. This pCID is what would be included in the state commitments on the L1 to reference supporting transaction data.

Verifiers: Verifiers request the data from the storage providers to ensure the integrity of state commitments and build fraud proofs, which are committed to the L1 in the case of provable fraud.

Storage Deal: Once the aggregator has compiled a payload near 32GB, the aggregator makes a storage deal with storage providers to store the data.

Posting blobs (Put): To initiate a put, a DA client will submit their blob containing transaction data to the aggregator. This can be done in an off-chain manner, or an on-chain manner via an on-chain aggregator oracle. To confirm receipt of the blob, the aggregator returns a PoDSI to the client to prove that their blob is included in the aggregated sector that will be committed to the subnet. A pCID (sub-piece Content IDentifier) is also returned. This is what the client and any other interested party will use to reference the blob once it is being served on Filecoin.

Data deals would appear on-chain within minutes of the deal being made. The largest barrier to latency is the sealing time, which can take 3 hours. This means that although the deal has been made, and the client can be confident that the data will appear in the network, the data cannot be guaranteed to be queryable until the sealing process is complete. The Lotus client has a fast-retrieval feature in which an unsealed copy of the data is stored alongside the sealed copy that may be able to be served as soon as the unsealed data is transferred to the data storage provider, as long as a retrieval deal does not depend on the proof of sealed data to appear on the network. However, this functionality is at the discretion of the data provider, and is not cryptographically guaranteed as part of the protocol. If a fast-retrieval guarantee is to be provided, there would need to be changes to consensus and dis/incentive mechanisms in place to enforce it.

Retrieving blobs (Get): Retrieval is similar to a put operation. A retrieval deal needs to be made, which will appear on-chain within minutes. Retrieval latency will depend on the terms of the deal and whether an unsealed copy of data is stored for fast retrieval. In the fast retrieval case, the latency will depend on network conditions. Without fast retrieval, data will need to be unsealed before being served to the client, which takes the same amount of time as sealing, on the order of 3 hours. Thus without optimizations we have a maximum round-trip of 6 hours, major improvement in data serving would need to be made before this becomes a viable system for DA or fraud proofs.

Proof of DA: proof of DA can be considered in two steps; via the PoDSI that is given when the data is committed to the aggregator while the deal is being made and then the continued commitment of PoRep and PoST that storage providers provide via Filecoin’s consensus mechanism. As discussed above, the PoRep and PoST give scheduled and provable guarantees of data custody and persistence.

This solution will make heavy use of bridging, as any client that relies on DA (regardless of the construction of proofs) will need to be able to interact with Filecoin. In the case of the pCID included in the state transition that is posted to the L1, a verifier can make an initial check to make sure that a bogus pCID wasn’t committed. There are several ways that this could be done, for example, via an oracle that posts Filecoin data on the L1 or via verifiers that verifies the existence of a data deal or sector that corresponds to the pCID. Likewise, the verification of validity or fraud proofs that get posted to the L1 may need to make use of a bridge to be convinced of a proof. Current available bridges are Axelar and Celer.

Security Analysis

Filecoin’s integrity is enforced through the slashing of collateral. Collateral can be slashed in two casesstorage faults or consensus faults. A storage fault corresponds to a storage provider not being able to provide proof of stored data (either PoRep or PoST), which would correlate to a lack of data availability in our model. A consensus fault corresponds to malicious action in consensus, the protocol that manages the transaction ledger from which the FEVM is abstracted.

  • Sector Fault refers to the penalty incurred from the failure to post proof of continuous storage. Storage providers are allowed a one-day grace period during which a penalty is not incurred for faulty storage. After 42 days from a sector becoming faulty, the sector is terminated. Incurred fees are burnt.

BR(t) = ProjectedRewardFraction(t) * SectorQualityAdjustedPower

  • Sector Termination occurs after a sector has been faulty for 42 days or a storage provider purposefully terminates a deal. Termination fees are equivalent to the maximum amount that a sector has earned up to termination, with an upper bound of 90 days’ worth of earning. Unpaid deal fees are returned to the client. Incurred fees are burnt.

max(SP(t), BR(StartEpoch, 20d) + BR(StartEpoch, 1d) * terminationRewardFactor * min(SectorAgeInDays, 140))

  • Storage Market Actor Slashing occurs in the event of a terminated deal. This is the slashing of the collateral that the storage provider puts up behind the deal.

The security provided by Filecoin is very different from that of other blockchains. Whereas blockchain data is typically secured via consensus, Filecoin’s consensus only secures the transaction ledger, not the data referred to by the transaction. The data that is stored on Filecoin has only enough security to incentive-align storage providers to provide storage. This means that the data stored on Filecoin is secured via fault penalties and business incentives such as reputation with clients. In other words, a data fault on a blockchain is equivalent to a breach of consensus, and breaks the safety of the chain or its notion of the validity of transactions. Filecoin is designed to be fault tolerant when it comes to data storage, and therefore only uses its consensus to secure its dealbook and deal-related activities. The cost of a storage miner not fulfilling its data deal has a maximum of 90 days worth of storage reward in penalties, and the loss of the collateral put up by the miner to secure the deal.

Therefore, the cost of a data withholding attack being launched from Filecoin providers simply the opportunity cost a retrieval deal. Data retrieval on Filecoin relies on the storage miner being incentivized by a fee paid for by the client. However, there is no negative impact to a miner for not responding to a data retrieval request. To mitigate the risk of a single storage miner ignoring or refusing data retrieval deals, data on Filecoin can be stored by multiple miners.

Since the economic security behind the data being stored on Filecoin is considerably less than that of blockchain based solutions, the prevention of data manipulation must also be considered. Data manipulation is protected via Filecoin’s proof system. Data is referred to via CIDs, through which data corruption is immediately detectable. A provider therefore cannot serve corrupt data, as it is easy to verify whether the fetched data matches the requested CID. Data providers cannot store corrupted data in the place of uncorrupted data. Upon the receipt of client data, providers must provide proof of a correctly sealed data sector to initiate the data deal (check this). Therefore, a storage deal cannot be started with corrupt data. During the lifetime of the storage deal, PoSTs are provided to prove custody (recall that this proves both custody of the sealed data sector and custody since the last PoST). Since the PoST is reliant on the sealed sector at the time of proof generation, a corrupt sector would result in a bogus PoST, resulting in a sector failure. Therefore, a storage provider can neither store nor serve corrupted data, cannot claim reward for services provided for uncorrupted data, and cannot avoid being penalized for tampering with a client’s data.

Security can be strengthened through increasing the collateral committed by the storage provider to the Storage Market Actor, which is currently decided by the storage provider and the client. If we assume that this was sufficiently high enough (for example, the same stake as an Ethereum validator) to incentivize a provider not to default, we can think of what is left to secure (even though this would be extremely capital-inefficient, as this stake would be needed to secure each transaction blob or sector with aggregated blobs). Now, a data provider could choose to make data unavailable for maximums of 41-day chunks before the storage deal is terminated by the Storage Market Actor. Assuming a shorter data deal, we could assume that the data can be made unavailable until the last day of the deal. In the absence of coordinated malicious actors, this can be mitigated via replication on multiple storage providers so that the data can continue being served.

We can consider the cost of an attacker overriding consensus to either accept a bogus proof or rewrite ledger history to remove a deal from the orderbook without penalizing the responsible storage provider. It is worth noting however that in the case of such a safety violation, an attacker would be able to manipulate Filecoin’s ledger however they want. In order for an attacker to commit such an attack, they would need at least a majority stake in the Filecoin chain. Stake is related to storage provided to the network; with a current 25 EiB (10¹⁶ bytes) of data securing the Filecoin chain, at least 12.5 EiB would be needed for a malicious actor to offer its own chain that would win the fork-choice rule. This is further mitigated by slashing related to consensus faults, for which the penalty is the loss of all pledged collateral and block rewards and all suspension from participation in consensus.

Aside: Withholding attacks on other DA solutions
Although the above shows that Filecoin is lacking in protecting data from withholding attacks, it is not alone.

  • Ethereum: In general, the only way to guarantee that a request to the Ethereum network is answered is to run a full node. Full nodes have no requirements to fulfill data retrieval requests outside of consensus — and therefore. Constructs such as PeerDAS introduce a peer scoring system for a node’s responses to data retrieval in which a node with a low enough score (essentially a DA reputation) could be isolated from the network.
  • Celestia: Even though Celestia has much stronger security per-byte against withholding attacks in comparison to our Filecoin construction, the only way to take advantage of this security is to host your own full node. Requests to Celestia infrastructure that are not owned and operated in-house can be censored without penalty.
  • EigenDA: Similar to Celestia, any service can run an EigenDA Operator node to ensure retrieval of their own data. As such, any out protocol data retrieval request can be censored. Also note that EigenDA has a centralized and trusted dispenser in charge of data encoding, KZG commitment, and data dispersal, similar to our aggregator.

Retrieval Security

Retrievability is necessary for DA. Ideally, market forces motivate economically rational miners to accept retrieval deals, and compete with other miners to keep prices for clients low. It is assumed that this is enough for data providers to provide retrieval services, however given the importance of DA, it is reasonable to require more security.

Retrieval is currently not guaranteed via the economic security stipulated above. This is because it is cryptographically difficult to prove that data wasn’t received by a client (in the case where a client needs to refute a storage miner’s claim of sending data) in a trust-minimized manner. A protocol-native retrieval guarantee would be required in order for retrieval to be secured through the Filecoin’s economic security. With minimal changes to the protocol, this means that retrieval would need to be associated with a sector fault or deal termination. Retriev is a proof-of-concept which was able to provide data retrieval guarantees by using trusted “referees” to mediate data retrieval disputes.

Aside: Retrieval on other DA solutions
As can be seen above, Filecoin lacks the protocol-native retrieval guarantees necessary to keep storage (or retrieval providers) from acting selfishly. In the case of Ethereum and Celestia, the only way to guarantee that data from the protocol can be read is to self-host a full node, or trust a SLA from an infrastructure provider. It is not trivial to guarantee retrieval as a Filecoin storage provider; the analogous setting in Filecoin would be to become a storage provider (requiring significant infrastructure cost) and successfully accept the same storage deal as a storage provider that was posted as a user, at which point one would be paying themselves to provide storage to themselves.

Latency Analysis

Latency on Filecoin is determined by several factors, such as network, topology, storage mining client configuration, and hardware capabilities. We provide a theoretical analysis which discusses these factors, and the performance that can be expected by our construct.

Due to the design of Filecoin’s proof system and lack of retrieval incentives, Filecoin is not optimized to provide high-performance round trip latency from the initial posting of data to the initial retrieval of data. High performance retrieval on Filecoin is an active area of research that is constantly changing as storage providers increase their capabilities and as Filecoin introduces new features. We define a “round trip” as the time from the submission of a data deal to the the earliest moment the data submitted to Filecoin can be downloaded.

Block Time
In Filecoin’s Expected Consensus, data deals can be included within the block-time of 30 seconds. 1 hour is the typical time for confirmation of sensitive on-chain data (such as coin transfers).

Data Processing
Data processing time varies widely between storage providers and configurations. The sealing process is designed to take 3 hours with standard storage mining hardware. Miners often outperform this 3 hour threshold via special client configurations, parallelization, and investing in more capable hardware. This variation also affects the duration of sector un-sealing, which can be circumvented altogether by quick retrieval options in Filecoin client implementations such as Lotus. The quick retrieval setting stores an unsealed copy of data alongside sealed data, significantly speeding up retrieval time. Based on this, we can assume a worst-case delay of three hours from the acceptance of a data deal to when the data is available on-chain.

Conclusion and Future Directions

This article explores building a DA by leveraging an existing DSN, Filecoin. We consider the requirements of a DA with respect to its role as a critical element of scaling infrastructure in Ethereum. We consider building on top of Filecoin for the viability of DA on a DSN, and use it to consider the opportunities that a solution on Filecoin would provide to the Ethereum ecosystem, or any that would benefit from a cost-effective DA layer.

Filecoin proves that a DSN can dramatically improve the efficiency of data storage in a distributed, blockchain-based system, with a proven saving of 100 million USD per 32 GB written at current market prices. Even though the demand for DA is not yet high enough to fill 32 GB sectors, the cost advantage of a DA still holds if empty sectors are sealed. Although current latency of storage and retrieval on Filecoin is not appropriate for the hot storage needs, storage miner-specific implementations can provide reasonable performance with data being available in under 3 hours.

The increased trust in Filecoin storage providers can be tuned via variable collateral, such as in EigenDA. Filecoin extends this tunabel security to allow for a number of replicas to be stored across the network, adding tunable byzantine tolerance. Guaranteed and performant data retrieval would need to be solved in order to robustly deter data withholding attacks, however like any other solution, the only way to truly guarantee retrievability is to self-host a node or trust infrastructure providers.

We see opportunities for DA in the further development of PoDSI, which could be used (alongside Filecoin’s current proofs) in place of DAS to guarantee data inclusion in a larger sealed sector. Depending on how this looks, this may make slow turnaround of data tolerable, as fraud proofs could be posted in a window of 1 day to 1 week, while DA could be guaranteed on demand. PoDSIs are still new and under heavy development, and so we make no implication yet on what an efficient PoDSI could look like, or the machinery needed to build a system around it. As there are solutions for compute on top of Filecoin data, the idea of a solution that computes a PoDSI on sealed or unsealed data may not be out of the realm of near-future possibilities.

As both the field of DA and Filecoin grows, new combinations of solutions and enabling technologies may enable new proof of concepts. As Solana’s integration with the Filecoin network shows, DSNs hold potential as a scaling technology. The cost of data storage on Filecoin provides an open opportunity with a large window of optimization. Although the challenges discussed in this article are presented in the context of enabling DA, their eventual solution will open a plethora of new tools and systems to be built beyond DA.

¹ Although this isn’t the construction of Filecoin, it is useful for those who are unfamiliar with programmable decentralized storage.

Graph data from Filecoin specEIP-4844EigenDACelestia implementationCeleniumStarboardfile.appRollups DA and Execution, and current approximate market prices.

For more research pieces from Fenbushi Capital, check out their Medium page here.

To stay updated on the latest Filecoin happenings, follow the @Filecointldr handle.

Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.