2024 has been a pivotal year for Filecoin, with significant progress in the Filecoin Virtual Machine (FVM), Storage, Retrievals and Compute. In this blogpost, we’ll recap the key milestones of 2024 and take a look at the major growth drivers shaping Filecoin’s path into 2025.
Accelerating Paid Deals: Boosting paid services (storage, retrieval, compute) on Filecoin to generate cashflow for service providers. This helps to support more sustainable hardware funding beyond token incentives.
Growing On-Chain Activity: Increasing activity through programmable services, DeFi, and new use cases.
Becoming Indispensable: Establishing Filecoin as an integral component of other projects and businesses.
These priorities are not mutually exclusive – they layer onto each other and are all signs that the Filecoin ecosystem is growing increasingly valuable.
So how did we fare across these priorities in 2024?
1. Accelerating Paid Deals
Paid Deals is an ecosystem-level metric that reflects the volume of paid services within the Filecoin network. FilecoinTLDR is currently tracking this metric here.
In 2024, Filecoin made significant strides in accelerating paid deals by reducing friction for businesses entering the ecosystem, with key advancements like the development of Proof of Data Possession (PDP) and the emergence of Layer 2 solutions.
Enabling Efficient Hot Storage with PDP
Projected for Q1 2025, Proof of Data Possession (PDP) introduces a new proof primitive to the Filecoin network, marking the first major proof development since Proof of Replication (PoRep) and Proof of Spacetime (PoSt). Unlike PoRep, which excels at cold storage through sealed sectors, PDP is designed for “hot data”, which is data that needs fast and frequent retrieval.
This new proof type enables cost-effective “cache” storage on Filecoin without sealing and unsealing, enabling rapid data onboarding and retrieval. PDP opens the door for a new class of storage providers focused on hot storage and fast retrievals, benefiting onramps like Basin, Akave, and Storacha.
Scaling Filecoin with L2s
In 2024, we saw a rise in Layer 2 solutions built on top of Filecoin (We also covered this in our earlier blogpost “State of L2s on Filecoin”). L2s like Basin, Akave and Storacha enable both horizontal and vertical scaling with secure, customizable subnets. These L2s enhance Filecoin by unlocking new use cases: including managing data-intensive workloads, supporting AI and unstructured data, powering gaming and privacy-focused applications — all of which create more opportunities for paid deals.
2. Growing On-Chain Activity
Filecoin has made notable progress in accelerating on-chain activity through the FVM, which spurred growth in its DeFi economy. The proposed Filecoin Web Services (FWS) and launch of FIL-collateralized stablecoins are set to further boost this momentum.
DeFi Milestones
As of December 16 2024, more than 4,700 unique contracts have been deployed on FVM, enabling over 3 million transactions. DeFi activity on FVM saw average net deposits exceeding 30M FIL ($200M), driven by staking, liquid staking, and DEXs, with GLIF leading at 62%, followed by FilFi (10%) and SFT Protocol (9%). Net borrows averaged 26M FIL ($173M), highlighting strong growth in Filecoin’s DeFi ecosystem.
FIL-Collateralized Stablecoin for the Filecoin Ecosystem
USDFC is a FIL-backed stablecoin launched by Secured Finance in Q4 2024 to address key challenges in the Filecoin ecosystem. It introduces stability to a network previously lacking stablecoin options, reducing volatility and enhancing value storage, much like DAI did for Ethereum.
By allowing FIL holders and SPs to collateralize their assets for USD, USDFC helps cover operational costs without selling FIL, preserving asset value and network support. It also boosts liquidity in lending markets by providing FIL-backed stablecoin liquidity, driving more efficient capital flows within the Filecoin ecosystem.
3. Becoming Indispensable
DePIN gained prominence, with Filecoin strengthening its position through key partnerships with AI and compute projects. Meanwhile, on-chain archival received significant recognition through major on-ramp partnerships.
“…thanks to Filecoin for building an awesome decentralized archive layer. “ –Anatoly (Solana Co-Founder)
Notable On-Ramps of 2024
At Solana Breakpoint this year, Filecoin founder Juan Benet highlighted how Filecoin’s zero-knowledge (ZK) storage is securing the entire Solana ledger.
Similarly, Cardano apps now have the opportunity to boost data redundancy and decentralization through the Blockfrost integration with Filecoin.
SingularityNET’s integration with Filecoin (via Lighthouse) emphasizes the growing need for scalable and cost-effective storage in the AI-driven era, where managing vast amounts of data efficiently is critical.
These meaningful partnerships help signal Filecoin as a key player in both the Chain Archival and AI narratives.
Compute & AI Partnerships
This year, Filecoin has positioned itself as a key player in the growing field of Decentralized AI. The onset of projects within the ecosystem like Ramo (network participation), Bagel (AI & cryptography research), Swan Chain (AI training and development), and Lilypad (distributed compute for AI) highlight Filecoin’s expanding role in powering AI innovation.
2024 Filecoin Challenges
Despite the immense progress, we noted some challenges that the community faced. Though bearing in mind that Web3 products are still very early, and the problem statement of forming a credible alternative to the centralized cloud is a huge one.
Product Market Fit:
Roadblocks like limited retrievability and high costs (driven by data replication), challenge the efficiency of the Filecoin network.
There is a need to make payments easier by allowing transactions directly on the Filecoin network, using methods like stablecoins or flexible payment options.
Improving visibility into the onboarding process and using customer data can help refine strategies and boost performance in key areas.
Building a Sustainable Economic Model + Stronger Economic Loops:
Viewing Filecoin as an island economy highlights its focus on accruing value by exporting goods and services while also keeping as much value as possible within the network by minimizing outflows.
A key challenge lies in reducing external outflows while finding ways to boost exports and capture more demand within the ecosystem.
Ensuring that transactions remain on-chain is equally crucial to strengthening this economic model and creating stronger economic loops.
Filecoin’s 2025 Outlook
Looking ahead to 2025, Filecoin’s evolution continues. Here are three key themes that could drive transformative growth for the network while addressing the 2024 challenges outlined above.
1. Accelerating Filecoin by 450x with Fast Finality (F3)
Fast Finality (F3), is one of the most impactful upgrades to Filecoin’s consensus layer since the launch of its mainnet. By drastically reducing transaction finality times, F3 overcomes a key limitation of the network’s original consensus mechanism. This enhancement is scheduled to go live on the mainnet in Q1 2025.
Old vs. New Finality:
Before F3, Filecoin’s consensus mechanism ensured secure block validation but required 7.5 hours (900 epochs) to finalize transactions, which was too slow for applications like smart contracts or cross-chain bridges.
With F3, transactions can now optimistically finalize in minutes—a 450X improvement.
What this means for Filecoin:
Enhanced Speed & UX: Transactions finalize within minutes, enabling low-latency applications and eliminating the long waits previously experienced.
Expanded Use Cases & Accessibility: L2 subnets like Interplanetary Consensus (IPC), Efficient smart contracts and decentralized applications, Blockchain bridges for interoperability with other chains.
Ultimately, this allows Filecoin to improve its usability across a wider variety of applications.
2. Moving Beyond Storage with FWS
Filecoin Web Services (FWS), emerged this year as a pivotal concept. It represents a strategic shift for Filecoin, expanding its scope from primarily a decentralized storage network to a broader marketplace for blockchain-based cloud services. This diversification can attract a wider range of users and use cases, potentially creating more positive economic loops within the network. Here are some pointers on why FWS should be on your radar:
Strengthening Filecoin’s Competitive Edge: FWS will introduce features like Programmatic SLAs (which automate and enforce service agreements through smart contracts, ensuring clear performance expectations and penalties) and Verifiable Proofs(which provide cryptographic evidence of service delivery, allowing clients to independently verify service execution).
Expands Filecoin’s Capabilities: Goes beyond Proof of Replication (PoRep) by adding Proof of Data Possession (PDP), enabling robust hot storage use cases. PDP will help improve data retrievability, a crucial factor in achieving product-market fit that has been widely discussed within the Filecoin community this year.
Positions Filecoin as a leading platform in the decentralized web: FWS will facilitate the integration of multiple networks and protocols, creating a cohesive marketplace for storage, compute, bandwidth, and other services. This could make Filecoin a key player in the growth of the decentralized web.
FWS is currently a concept in development, with a new storage service featuring PDP (v0) underway. Following this milestone, the development of the FWS marketplace will begin with its expected launch in Q1 2025.
3. Unlocking new value streams in Filecoin
As a Layer 1 blockchain, Filecoin primarily generates revenue through gas fee burns (which happen when chain resources are used or when faults arise). However, relying on gas fee burns as a main source of revenue is not scalable and more importantly increases operational expense costs as well as service costs.
A sustainable approach involves value returning to the Filecoin economy through the use of services in the FWS marketplace, fostering a more scalable and balanced revenue model. A proposed value accrual mechanisms includes:
FWS Fees: Commission (%) charged based on the transaction volume in the marketplace.
Service Fees: Applied when a user accesses a service or a vendor provides one
SLA Penalties: Imposed on service providers who fail to meet agreed-upon performance standards
This shift promises a more robust and diversified revenue stream, ensuring Filecoin’s continued relevance and profitability in the evolving market.
Final Thoughts
As data grows in value, we expect advancements in privacy-preserving machine learning, data-driven business models, and the increasing role of AI agents in unlocking decentralized storage’s potential.
Looking towards 2025, with the upcoming Fast Finality (F3) launch on the mainnet and the continued development of Filecoin Web Services, Filecoin is set to play a central role in shaping the future of data and AI within decentralized ecosystems. We expect to see these advancements positioning Filecoin beyond storage and unlocking a sustainable economic model through new revenue streams generated by FWS.
To stay updated on the latest in the Filecoin ecosystem, follow the @Filecointldr handle or join us on Discord.
Many thanks to HQ Han and Jonathan Victor for reviewing and providing valuable insights to this piece.
Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.
At first, the convergence of artificial intelligence (AI) and blockchain seemed like an awkward pairing of buzzwords—a notion often met with skepticism among early adopters. But in merely a year’s time, decentralized AI has evolved from being an obscure idea to one that is central to conversations around the Web3 environment. Such swift transformation owes its momentum to a few crucial elements:
Influence of AI: AI is set to significantly impact how we interact with the world. As AI agents grow more sophisticated, they will manage tasks like financial transactions and personal coaching. This evolution raises important questions about control and governance in AI development.
The Risks of Centralized Power: Centralized AI models controlled by a few tech giants pose serious risks, including bias, censorship, and data privacy concerns. This concentration of power stifles innovation and creates vulnerabilities, as highlighted by the recent security breach at Hugging Face.
The Demand for an Inclusive AI Ecosystem: Decentralized AI offers a pathway to a more equitable and accessible AI landscape by distributing computational processes across various systems. Key benefits include:
Reduced Costs: Lower barriers enable smaller developers and startups to innovate in AI.
Enhanced Data Integrity: Verifiable data provenance increases transparency and trust in AI models.
Combating Censorship: Aligning AI development with market needs fosters a more democratic technological environment.
These points highlight the value of an alternative approach to centralized AI.
The Pillars of Decentralized AI
Decentralized AI comprises 3 pillars: leverages idle computing power from users, utilizes secure decentralized storage, and implements transparent data labeling.
Decentralized Storage: Utilizing decentralized storage networks like Filecoin ensures secure and verifiable storage for large datasets.
Decentralized Compute: By leveraging idle computing power from individual users and distributing tasks across a network, Decentralized AI makes AI development more accessible and cost-effective.
Decentralized Data Labeling and Verification: Transparent and verifiable data labeling processes help ensure data quality and reduce bias, fostering trust in AI systems.
A closer look: Decentralized AI Projects in the Filecoin Ecosystem
To take a closer look into how the Web3 stack can offer benefits to the AI space, we’ll explore the various approaches 4 decentralized AI projects are taking. These projects are utilizing some or all of the pillars of decentralized AI as outlined above.
Ramo plays a crucial role in powering AI workloads by coordinating capital and hardware. By merging resources from various providers, Ramo facilitates the execution of complex tasks such as storage, SNARK generation, and computation, while allowing hardware resources to be jointly funded across multiple networks.
Multi-Network Jobs: Ramo supports jobs across multiple networks (e.g., read from Filecoin, process on Fluence, write back to Filecoin), helps maximize hardware providers’ revenue and reduces coordination complexity.
Swan Chain – Decentralized AI Training and Deployment (Funding Stage: Seed)
Swan Chain is a decentralized compute network, connecting users with idle computing resources for AI tasks like model training. Filecoin serves as its primary storage layer, ensuring secure, transparent, and accessible storage of AI data, aligning with the principles of decentralized AI.
Decentralized Compute Marketplace: Swan Chain aggregates global computing resources, offering a cost-effective alternative to centralized cloud services. Users can bid for computing jobs, and Swan Chain matches them with suitable providers based on requirements.
Filecoin Integration for Secure Data Storage: Swan Chain utilizes Filecoin and IPFS to securely store AI models and outputs, ensuring transparency and accountability in the AI development process.
Support for Diverse AI Workloads: Swan Chain supports various AI tasks, including model training, inference, and rendering, with examples like large language models and image/music generation.
Lilypad – Distributed Compute for AI (Funding Stage: Seed)
Lilypad aims to build a trustless, distributed compute network that unleashes idle processing power and creates a new marketplace for AI, machine learning, and other large-scale computations. By integrating Filecoin and utilizing IPFS for hot storage, Lilypad ensures secure, transparent, and verifiable data handling throughout the AI workflow, supporting an open and accountable AI development landscape.
Job-Based Compute Matching: Lilypad’s job-based model matches user-defined compute needs (e.g., GPU type, resources) with providers, creating a marketplace for developers to share and monetize AI models within the decentralized AI ecosystem.
Bagel – AI & Cryptography Research Lab (Funding Stage: Pre-Seed)
Bagel is an AI and cryptography research lab creating a decentralized machine learning ecosystem that enables AI developers to train and store models using the computing and storage power of decentralized networks like Filecoin. Its innovative GPU Restaking technology enhances Filecoin’s utility for AI applications by allowing storage providers (SPs) to contribute to both storage and compute networks simultaneously, thereby expanding support for AI developers and generating new revenue opportunities for SPs.
Increased Revenue for Filecoin SPs: Bagel helps storage providers monetize both storage and compute resources, boosting their income and incentivizing greater network participation.
Optimized Compute Utilization: With dynamic routing, Bagel directs GPUs to profitable networks, maximizing efficiency and returns for providers and users.
In Conclusion
The intersection of Filecoin and AI marks a significant step forward in the evolution of technology. By combining verifiable storage with computing networks, we are not only addressing current challenges but also paving the way for future innovations. As these technologies continue to develop, their impact on AI and beyond will be profound, offering new possibilities for businesses and developers alike.
To understand more about Ramo, Swan Chain, Lilypad or Bagel dive into the respective keynotes and links here:
To stay updated on the latest in the Filecoin ecosystem, follow the @Filecointldr handle or join us on Discord.
Many thanks to HQ Han and Jonathan Victor for reviewing and providing valuable insights to this piece.
Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.
Layer 2 solutions (L2s) are essential innovations in blockchain technology that enhance the scalability, efficiency, and functionality of their respective networks. For Filecoin, an L1 focused on decentralized storage, L2 solutions play a crucial role in bringing new capabilities to the base infrastructure of the network.
As Filecoin continues to grow, L2 solutions are helping bring Filecoin to market, and creating tailored offerings to builders focused on specific verticals. This post explores the current state of L2 solutions on Filecoin, highlighting the pioneering advancements and future directions.
Underlying Architecture
Before diving into the L2s, it’s useful to understand the shared framework that many of Filecoin’s L2s are built on: Interplanetary Consensus (IPC).
InterPlanetary Consensus (IPC) is a framework designed to solve the problem of scalability in decentralized applications (dApps). IPC achieves this by allowing the creation of subnets, which are independent blockchains that can be customized with specific consensus algorithms tailored to the needs of the application. These subnets can communicate seamlessly with each other, minimizing the need for cross-chain bridges – but anchored into the root Filecoin network.
Builders are drawn to IPC for several reasons. First, IPC subnets inherit the security features of their parent network, ensuring a high level of security for the dApps they host. Second, IPC leverages the Filecoin Virtual Machine (FVM), a versatile execution environment that supports various programming languages, allowing for greater interoperability with other blockchains. Finally, IPC’s tight integration with Filecoin, a large decentralized storage network, offers dApps easy access to robust data storage and retrieval capabilities. This combination of scalability, security, interoperability, and storage integration makes IPC an attractive choice for developers building the next generation of dApps.
Advancements in Data Management with Filecoin’s L2 Solutions
As the demand for efficient data management increases, Filecoin’s Layer 2 solutions are rising to meet these needs. These advancements focus on optimizing data storage and retrieval, offering enhanced scalability and cost-effectiveness across various applications. Basin is one such startup leading the charge.
Basin, the first data Layer 2 on Filecoin, represents an advancement in decentralized data infrastructure, bringing a swath of new services into the Filecoin ecosystem targeted at data heavy applications:
Key Features and Innovations
Hot and Cold Data Layers: Basin’s dual-layer approach incorporates a hot cache layer for real-time data access and a cold storage component for long-term archiving. This setup ensures both immediate accessibility and cost-effective storage, catering to diverse data needs.
Scalable Infrastructure: Basin’s architecture combines Filecoin’s secure storage capabilities with a flexible, scalable design ideal for handling high-volume data from IoT and AI applications.
Familiar Interfaces: Basin supports compatibility with S3, allowing developers to use familiar tools for managing data, which facilitates a smoother transition to decentralized solutions.
Basin is being actively used in real-world applications, such as handling weather data for decentralized stations with WeatherXM, and generating synthetic data for smart contracts. These use cases highlight Basin’s ability to efficiently store, manage, and monetize diverse data types, advancing practices in AI and machine learning.
Simplifying Decentralized Storage: Innovations and Challenges
Efficiently managing decentralized storage involves overcoming challenges related to user accessibility, cost, and integration. Providing more intuitive and cost-effective tools for data management will help address these challenges – and this is where Akave comes in.
Akave is the first L2 storage chain powering on-chain data lakes, offering a novel approach to managing large volumes of data within a decentralized network. Data lakes are used in traditional enterprises to manage all types of data – typically feeding into large scale compute flows (e.g. for big data analytics). By leveraging Filecoin’s infrastructure, Akave aims to become a leading solution in decentralized data management, with a focus on enhancing data handling capabilities and integrating advanced security measures.
Key Features and Innovations
On-Chain Data Management: Akave focuses on creating on-chain data lakes, which provide a highly scalable and secure solution for managing large volumes of data directly on Filecoin.
Advanced Data Handling: The platform supports customizable data handling options such as replication policies and erasure coding, enhancing data security and availability.
Integration with Filecoin: Akave leverages Filecoin’s blockchain for improved data management, security, and decentralization.
Akave’s Decentralized Data Lakes revolutionize data storage with faster local access by placing data close to compute stacks, cutting egress costs compared to centralized clouds, and ensuring immutability and integrity through ZK Proofs. Users benefit from competitive pricing and diverse options via an open marketplace. Continuous visibility into data status and history is provided through Akave’s integration with Filecoin’s InterPlanetary Consensus (IPC), enhancing transparency and trust.
Enhancing AI and Unstructured Data Storage with Filecoin’s L2 Solutions
In the realm of AI and unstructured data, specialized storage solutions are crucial for managing and processing large datasets efficiently. Filecoin’s Layer 2 solutions like Storacha Network are stepping up to provide high-performance storage tailored for these needs.
Storacha Network is a cutting-edge storage solution designed to enhance the management of AI and unstructured data. Leveraging Filecoin’s robust infrastructure, Storacha Network offers high-performance decentralized storage tailored for advanced applications. Looking ahead, Sriracha envisions evolving into a federated network with increased public participation, aiming to enhance global data access through a decentralized CDN and fostering broad community involvement.
Key Features and Innovations
High-Performance Storage: Storacha offers decentralized hot object storage, ensuring rapid access and retrieval of data, essential for AI applications that require quick processing and scaling.
Provenance and Ownership: Users maintain control over their data through UCANs (User-Controlled Authorization Networks), providing secure, cryptographic proofs of data ownership and access rights without frequent blockchain interactions.
Efficient Data Handling: Storacha handles large datasets by sharding files, facilitating quick retrieval and efficient management, crucial for large-scale AI operations.
Storacha Network supports a range of AI use cases by providing fast, scalable object storage optimized for both structured and unstructured data. It addresses key needs such as verifiability and provenance for decentralized GPU networks, ensuring that training processes are executed as expected and checkpoints are maintained.
Additionally, Storacha allows users to bring their own storage to training jobs, facilitating the sharing of hyperparameters and weights while ensuring ownership of training results.
In Conclusion
To wrap up, Filecoin’s Layer 2 solutions are paving the way for a new era in decentralized data management. Innovations like Basin, Akave, and Storacha Network are not only addressing the challenges of scalability and cost but also enhancing the efficiency and performance of data handling. As these technologies evolve, they promise to transform how data is stored, managed, and utilized, marking significant progress in the Web3 ecosystem.
Many thanks to Jonathan Victor for reviewing and providing valuable insights to this piece.
Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.
At the latest Filecoin Developer Summit (FDS), Nicola Greco(of FilOz) introduced a vision to evolve Filecoin’s decentralized cloud services: Filecoin Web Services (FWS). FWS aims to provide a framework for deploying composable cloud services – allowing new protocols to bootstrap into a shared marketplace of offerings, all composable with each other.
Expanding Beyond Proof of Replication: Expanding the functionality of Filecoin
To understand FWS, it’s useful to first recap how the existing service offerings exist inside the Filecoin network.
The core storage offering, Proof of Replication (PoRep), allows storage providers to use proofs over uniquely encoded data to show that they are still in possession of specific pieces of data. Filecoin uses PoRep both for storage and for consensus – this requires higher security parameters, and therefore makes Filecoin’s base storage offering akin to cold storage. This makes Filecoin’s base offering ideal for datasets that might have a need for strong guarantees around uniqueness and existence but can accept slower access times.
Furthermore, because Filecoin launched prior to the FVM, much of the onchain tooling (e.g. to set up and maintain storage deals, to enable payments) exist as “system actors” or non-programmable functions on the network. This means many of those functions were built to support the original storage functions on the network – but for any evolution would require a full network upgrade in order to modify or support new functionality.
However, as more storage on-ramp’s pushed into building storage solutions for customers, it became clear that there was a need for more storage offerings over and above the base offering from Filecoin. As a result, new types of proofs (such as proof of data possession) have been proposed to run on the Filecoin network – allowing for more use cases to be natively supported.
In designing these new offerings to sit on top of Filecoin, it became clear that many of these new proof offerings would need generalized versions of the onchain tooling that exists as “system actors” – such as payment rails. Rather than having many systems independently evolve their own architecture (and risk losing composability), a new proposal was put forward to focus on building modular, composable systems via FWS.
Introducing Filecoin Web Services: A Modular Approach to Cloud Services
At the core of Greco’s vision is the concepts of modularity and reuse. If each new service were to build their entire protocol from scratch, they’d need to develop work ranging from deal management to escrow and SLA enforcement – which would lead to a high barrier to entry for new services. FWS proposes a unified protocol that standardizes these components, allowing developers to focus on building specific services rather than recreating the entire stack.
FWS would serve as a thin, opinionated layer that manages payments, collateral, deal structuring, and SLA enforcement across various services. This standardization would enable seamless integration of new services, whether they are storage-related like PDP and retrieval services or entirely new offerings like markets for zk-SNARK proofs or AI-based computations. By providing a common framework, FWS would reduce complexity, lower development costs, and increase the rate of development for building within the Filecoin ecosystem.
The Power of a Unified Marketplace: Enhancing Efficiency and Accessibility
One of the key benefits of FWS is its potential to streamline the user experience. Without FWS, users would need to lock tokens in multiple smart contracts to access different services, leading to inefficiencies in collateral management and prepayment and increase users’ cost. FWS envisions a single entry point where users can determine how they’d like to pay – prepaid or pay-as-you-go – with the same rails being usable by multiple services. This model mirrors the convenience of traditional cloud services, where users simply provide a payment method and are periodically billed.
Moreover, by consolidating financial management into a single contract, FWS would improve collateral efficiency and reduce the overhead associated with managing multiple service contracts. This would also allow utilization of one service to enable credit in other services – allowing a credit history to be built up across disparate protocols. This approach not only simplifies the user experience but also enhances the overall liquidity and flexibility of the Filecoin ecosystem.
A Vision for the Future: FWS as a Distribution Layer for Decentralized Services
Looking ahead, Greco envisions FWS not just as a tool for enhancing Filecoin’s storage capabilities but as a broader distribution layer for decentralized services. As the ecosystem grows, FWS could facilitate the integration of multiple networks and protocols, creating a cohesive marketplace for storage, compute, bandwidth, and other services. This would position Filecoin at the center of a vibrant, interconnected ecosystem, driving innovation and adoption across the decentralized web. By offering a marketplace for diverse services such as zero-knowledge proof generation, decentralized compute, FWS could position Filecoin as a leading platform in the decentralized web, supporting a wide array of applications beyond storage.
To understand more about FWS, watch the full keynote by Nicola Greco on Youtube
Many thanks to Jonathan Victor for reviewing and providing valuable insights to this piece.
Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.
Editor’s Note: This blogpost is a repost of the original content published on 5 March 2024, by Bidhan Roy and Marcos Villagra from Bagel. Founded in 2023 by CEO Bidhan Roy, Bagel is a machine learning and cryptography research lab building a permissionless, privacy-preserving machine learning ecosystem. This blogpost represents the independent view of these authors, whom have given their permission for this re-publication.
Trillion-dollar industries are unable to leverage their immensely valuable data for AI training and inference due to privacy concerns. The potential for AI-driven breakthroughs—genomic secrets that could cure diseases, predictive insights to eliminate supply chain waste, and chevrons of untapped energy sources—remain locked away. Privacy regulations also closely guard this valuable and sensitive information.
To propel human civilization forward in energy, healthcare, and collaboration, it is crucial to enable AI systems that train and generate inference on data while maintaining full end-to-end privacy. At Bagel, pioneering this capability is our mission. We believe accessing a fundamental resource like knowledge, for both human-driven and autonomous AI, should not entail a compromise on privacy.
We have applied and experimented with almost all the major privacy-preserving machine learning (PPML) mechanisms. Below, we share our insights, our approach, and some research breakthroughs.
And if you’re in a rush, we have a TLDR at the end.
Privacy-preserving Machine Learning (PPML)
Recent advances in academia and industry have focused on incorporating privacy mechanisms into machine learning models, highlighting a significant move towards privacy-preserving machine learning (PPML). At Bagel, we have experimented with all the major PPML techniques, particularly those post differential privacy. Our work, positioned at the intersection of AI and cryptography, draws from the cutting edge in both domains.
First, we will delve into each of these, examining their advantages and drawbacks. In subsequent posts, we will describe Bagel’s approach to data privacy, which addresses and resolves the challenges associated with the existing solutions.
Differential Privacy (DP)
One of the first and most important techniques with a mathematical guarantee for incorporating privacy into data is differential privacy or DP (Dwork et al. 2006), addressing the challenges faced by earlier methods with a quantifiable privacy definition.
DP ensures that a randomized algorithm, A, maintains privacy across datasets D1 and D2—which differ by a single record—by keeping the probability of A(D1) and A(D2) generating identical outcomes relatively unchanged. This principle implies that minor dataset modifications do not significantly alter outcome probabilities, marking a pivotal advancement in data privacy.
The application of DP in machine learning, particularly in neural network training and inference, demonstrates its versatility and effectiveness. Notable implementations include adapting DP for supervised learning algorithms by integrating random noise at various phases: directly onto the data, within the training process, or during inference, as highlighted by Ponomareva et al. (2023) and further references.
The balance between privacy and accuracy in DP is influenced by the noise level: greater noise enhances privacy at the cost of accuracy, affecting both inference and training stages. This relationship was explored by Abadi et al. in (2016) through the introduction of Gaussian noise to the stochastic gradient descent (DP-SGD) algorithm, observing the noise’s impact on accuracy across the MNIST and CIFAR-10 datasets.
An innovative DP application, Private Aggregation of Teacher Ensembles (PATE) by Papernot et al. in (2016), divides a dataset into disjoint subsets, training networks on each without privacy, termed as teachers. These networks’ aggregated inferences, subjected to added noise for privacy, inform the training of a student model to emulate the teacher ensemble. This method also underscores the trade-off between privacy enhancement through noise addition and the resultant accuracy reduction.
Further studies affirm that while privacy can be secured with little impact on execution times (Li et a. 2015), stringent privacy measures can obscure discernible patterns essential for learning (Abadi et al. 2016). Consequently, a certain level of privacy must be relinquished in DP to facilitate effective machine learning model training, illustrating the nuanced balance between privacy preservation and learning efficiency.
Pros of Differential Privacy
The advantages of using DP are:
Effortless. Easy to implement into algorithms and code.
Algorithm independence. Schemes can be made independent of the training or inference algorithm.
Fast. Some DP mechanisms have shown to have little impact on the execution times of algorithms.
Tunable privacy. The degree of desired privacy can be chosen by the algorithm designer.
Cons of Differential Privacy
Access to private data is still necessary. Teachers in the PATE scheme must have full access to the private data (Papernot et al. 2016) in order to train a neural network. Also, the stochastic gradient descent algorithm based on DP only adds noise to the weight updates and needs access to private data for training (Abadi et al. 2016).
Privacy-Accuracy-Speed trade-off on data. All implementations must sacrifice some privacy in order to get good results. If there is no discernable pattern in the input, then there is nothing to train (Feyisetan et al. 2020). The implementation of some noise mechanisms can impact execution times, necessitating a balance between speed and the goals of privacy and accuracy.
Zero-Knowledge Machine Learning (ZKML)
A zero-knowledge proof system (ZKP) is a method allowing a prover P to convince a verifier V about the truth of a statement without disclosing any information apart from the statement’s veracity. To affirm the statement’s truth, P produces a proof π for V to review, enabling V to be convinced of the statement’s truthfulness.
Zero-Knowledge Machine Learning (ZKML) is an approach that combines the principles of zero-knowledge proofs (ZKPs) with machine learning. This integration allows machine learning models to be trained and to infer with verifiability.
For an in-depth examination of ZKML, refer to the work by Xin et al. in (2023). Below we provide a brief explanation that focuses on the utilization of ZKPs for neural network training and inference.
ZKML Inference
Consider an unlabeled dataset A and a pretrained neural network N tasked with labeling each record in A. To generate a ZK proof of N‘s computation during labeling, an arithmetic circuit C representing N is required, including circuits for each neuron’s activation function. Assuming such a circuit C exists and is publicly accessible, the network’s weights and a dataset record become the private and public inputs, respectively. For any record a of A, N‘s output is denoted by a pair (l,π), where l is the label and π is a zero-knowledge argument asserting the existence of specific weights that facilitated the labeling.
This model illustrates how ZK proves the accurate execution of a neural network on data, concealing the network’s weights within a ZK proof. Consequently, any verifier can be assured that the executing agent possesses the necessary weights.
ZKML Training
ZKPs are applicable during training to validate N‘s correct execution on a labeled dataset A. Here, A serves as the public input, with an arithmetic circuit C depicting the neural network N. The training process requires an additional arithmetic circuit to implement the optimization function, minimizing the loss function. For each training epoch i, a proof π_i is generated, confirming the algorithm’s accurate execution through epochs 1 to i-1, including the validity of the preceding epoch’s proof. The training culminates with a compressed proof π, proving the correct training over dataset A.
The explanation above illustrates that during training, the network’s weights are concealed to ensure that the training is correctly executed on the given dataset A. Additionally, all internal states of the network remain undisclosed throughout the training process.
Pros of ZKML
The advantages of using ZKPs with neural networks are:
Privacy of model weights. The weights of the neural network are never revealed during training or inference in any way. The weights and the internal states of the network algorithm are private inputs for the ZKP.
Verifiability. The proof certifies the proper execution of training or inference processes and guarantees the accurate computation of weights.
Trustlessness. The proof and its verification properties ensure that the data owner is not required to place trust in the agent operating the neural network. Instead, the data owner can rely on the proof to confirm the accuracy of both the computation and the existence of correct weights.
Cons of ZKML
The disadvantages of using ZKPs with neural networks are:
No data privacy. The agent running the neural network needs access to the data in order to train or do inference. Data is considered a parameter that is publicly known to the data owner and the prover running the neural network (Xing et al. 2023).
No privacy for the model’s algorithm. In order to create a ZK proof, the algorithm of the entire neural network should be publicly known. This includes the activation functions, the loss function, optimization algorithm used, etc (Xing et al. 2023).
Proof generation of an expensive computation. Presently, the process of generating a ZK proof is computationally demanding—-see for example this report on the computation times of ZK provers. Creating a proof for each epoch within a training algorithm can exacerbate the computational burden of an already resource-intensive task.
Federated Learning (FL)
In Federated Learning or FL we look to train a global model using a dataset that is distributed in multiple servers with local data samples but without each server sharing their local data.
In FL there is a global objective function that is being optimized which is defined as
𝑓(𝑥1,…,𝑥𝑛)=1𝑛∑𝑖=1𝑛𝑓𝑖(𝑥𝑖),\(f(x_1,\dots,x_n)=\frac 1 n \sum_{i=1}^n f_i(x_i),\)
where n is the number of servers, each variables is the set of parameter as viewed by the server i, and each function is a local objective function of server i. FL tries to find the best set of values that optimizes f.
The figure below shows the general process in FL.
Initialization. An initial global model is created and distributed by a central server to all other servers.
Local training. Each server trains the model using their local data. This ensures data privacy and security.
Model update. After training, each server shares with the central server their local updates like gradients and parameters.
Aggregation. The central server receives all local updates and aggregates them into the global model, for example, using averaging.
Model distribution. The updated model is distributed again with local servers and the previous steps are repeated until a desired level of performance is achieve by the global model.
Since local servers never share their local data, FL guarantees privacy over that data. However, the model being constructed is shared among all parties, and hence, its structure and set of parameters are not hidden.
Pros of FL
The advantages of using FL are:
Data privacy. The local data on the local servers are never shared. All computations are done locally, and there is no need of communication between them.
Distributed computing. The creation of the global model is distributed among local servers, thereby parallelizing a resource-intensive computation. Thus, FL is considered a distributed machine learning framework (Xu et al. 2021).
Cons of FL
The disadvantages of using FL are:
Model is not private. The global model is shared among each local server in order to do their computations locally. This includes the aggregated weights and gradients at each step of the FL process. Thus, each local server is aware of the entire architecture of the global model (Konečný et al. 2016).
Data leakage. Recent research indicates that data leakage remains a persistent issue, notably through mechanisms such as gradient sharing—see for example Jin et al. (2022). Consequently, FL cannot provide complete assurances of data privacy.
Trust. Since no proofs are generated in FL, every party involved in the process need to be trusted that their computation and parameters were computed as expected (Gao et al. 2023).
Fully Homomorphic Encryption (FHE)
At its core, homomorphic encryption permits computations on encrypted data. By “homomorphic,” we refer to the capacity of an encryption scheme to allow specific operations on ciphertexts that, when decrypted, yield the same result as operations performed directly on the plaintexts.
Consider a scenario with a secret key k and a plaintext m. In an encryption scheme (E,D), where E and D represent encryption and decryption algorithms respectively, the condition D(k,E(k,m))=m must hold. A scheme (E,D) is deemed fully homomorphic if for any key k and messages m, the properties E(k,m+m’)=E(k,m)+E(k,m’) and E(k,m*m’)=E(k,m)* E(k,m’) are satisfied, with addition and multiplication defined over a finite field. If only one operation is supported, the scheme is partially homomorphic. This definition implies that operations on encrypted data mirror those on plaintext, crucial for maintaining data privacy during processing.
In plain words, if we have a fully homomorphic encryption scheme, then operating over the encrypted data is equivalent to operating over the plaintext. We will write FHE to refer to a fully homomorphic encryption scheme. The figure below shows how an arbitrary homomorphic operation works over a plaintext and ciphertext.
The homomorphic property of FHE makes it invaluable in situations where data must remain secure while still being used for computations. For instance, if we possess sensitive data and require a third party to perform data analysis on it, we can rely on FHE to encrypt the data. This allows the third party to conduct analysis on the encrypted data without the need for decryption. The mathematical properties of FHE guarantee the accuracy of the analysis results.
FHE Inference
Fully Homomorphic Encryption (FHE) can be used to perform inference in neural networks while preserving data privacy. Let’s consider a scenario where N is a pretrained neural network, A is a dataset, and (E,D) is an asymmetric FHE scheme. The goal is to perform inference on a record a of A without revealing the sensitive information contained in a to the neural network.
The inference process using FHE begins with encryption. The data owner encrypts the record a using the encryption algorithm E with the public key public_key, obtaining the encrypted record a’ = E(public_key, a).
Next, the data owner sends the encrypted record a’ along with public_key to the neural network N. The neural network N must have knowledge of the encryption scheme (E,D) and its parameters to correctly apply homomorphic operations over the encrypted data a’. Any arithmetic operation performed by N can be safely applied to a’ due to the homomorphic properties of the encryption scheme.
One challenge in using FHE for neural network inference is handling non-linear activation functions, such as sigmoid and ReLU, which involve non-arithmetic computations. To compute these functions homomorphically, they need to be approximated by low-degree polynomials. The approximations allow the activation functions to be computed using homomorphic operations on the encrypted data a’.
After applying the necessary homomorphic operations and approximated activation functions, the neural network N obtains the inference result. It’s important to note that the inference result is still in encrypted form, as all computations were performed on encrypted data.
Finally, the encrypted inference result is sent back to the data owner, who uses the private key associated with the FHE scheme to decrypt the result using the decryption algorithm D. The decrypted inference result is obtained, which can be interpreted and utilized by the data owner.
By following this inference process, the neural network N can perform computations on the encrypted data a’ without having access to the original sensitive information. The FHE scheme ensures that the data remains encrypted throughout the inference process, and only the data owner with the private key can decrypt the final result.
It’s important to note that the neural network N must be designed and trained to work with the specific FHE scheme and its parameters. Additionally, the approximation of non-linear activation functions by low-degree polynomials may introduce some level of approximation error, which should be considered and evaluated based on the specific application and accuracy requirements.
FHE Training
The process of training a neural network using Fully Homomorphic Encryption (FHE) is conceptually similar to performing inference, but with a few key differences. Let’s dive into the details.
Imagine we have an untrained neural network N and an encrypted dataset A’ = E(public_key, A), where E is the encryption function and public_key is the public key of an asymmetric FHE scheme. Our goal is to train N on the encrypted data A’ while preserving the privacy of the original dataset A.
The training process unfolds as follows. Each operation performed by the network and the training algorithm is executed on each encrypted record a’ of A'. This includes both the forward and backward passes of the network. As with inference, any non-arithmetic operations like activation functions need to be approximated using low-degree polynomials to be compatible with the homomorphic properties of FHE.
A fascinating aspect of this approach is that the weights obtained during training are themselves encrypted. They can only be decrypted using the private key of the FHE scheme, which is held exclusively by the data owner. This means that even the agent executing the neural network training never has access to the actual weight values, only their encrypted counterparts.
Think about the implications of this. The data owner can outsource the computational heavy lifting of training to a third party, like a cloud provider with powerful GPUs, without ever revealing their sensitive data. The training process operates on encrypted data and produces encrypted weights, ensuring end-to-end privacy.
Once training is complete, the neural network sends the collection of encrypted weights w’ back to the data owner. The data owner can then decrypt the weights using his private key, obtaining the final trained model. He is the sole party capable of accessing the unencrypted weights and using the model for inference on plaintext data.
There are a few caveats to keep in mind. FHE operations are computationally expensive, so training a neural network with FHE will generally be slower than training on unencrypted data.
Pros of FHE
The advantages of using FHE are:
Data privacy. Third-party access to encrypted private data is effectively prevented, a security guarantee upheld by the assurances of FHE and lattice-based cryptography(Gentry 2009).
Model privacy. Training and inference processes are carried out on encrypted data, eliminating the need to share or publicize the neural network’s parameters for accurate data analysis.
Effectiveness. Previous studies have demonstrated that neural networks operating on encrypted data using FHE maintain their accuracy—see for example Nandakumar et al. (2019) and Xu et al. (2019). Therefore, we can be assured that employing FHE for training and inference processes will achieve the anticipated outcomes.
Quantum resistance. The security of FHE, unlike other encryption schemes, is grounded in difficult problems derived from Lattice theory. These problems are considered to be hard even for quantum computers (Regev 2005), thus offering enhanced protection against potential quantum threats in the future.
Cons of FHE
The disadvantages of using FHE are:
Verifiability. FHE does not offer proofs of correct encryption nor correct computation. Hence, we must rely on trust that the data intended for encryption is indeed the correct data (Viand et al. 2023).
Speed. Relative to conventional encryption schemes, FHE is still considered to be slow during parameter setups, encryption and decryption algorithms (Gorantala et al. 2023).
Memory requirements. The number of weights that need to be encrypted are proportional to the size of the network. Even for small networks, the RAM memory requirements are in the order of gigabytes (Chen et al. 2018), (Nandakumar et al. 2019).
We examined the four most widely used privacy-preserving techniques in machine learning, focusing on neural network training and inference. We evaluated these techniques across four dimensions: data privacy, model algorithm privacy, model weights privacy, and verifiability.
Data privacy considers the model owner’s access to private data. Differential privacy (DP) and zero-knowledge machine learning (ZKML) require access to private data for training and proof generation, respectively. Federated learning (FL) enables training and inference without revealing data, while fully homomorphic encryption (FHE) allows computations on encrypted data.
Model algorithm privacy refers to the data owner’s access to the model’s algorithms. DP does not require algorithm disclosure, while ZKML necessitates it for proof generation. FL distributes algorithms among local servers, and FHE operates without accessing the model’s algorithms.
Model weights privacy concerns the data owner’s access to the model’s weights. DP and ZKML keep weights undisclosed or provide proofs of existence without revealing values. FL involves exchanging weights among servers for decentralized learning, contrasting with DP and ZKML’s privacy-preserving mechanisms. FHE enables training and inference on encrypted data, eliminating the need for model owners to know the weights.
Verifiability refers to the inherent capabilities for verifiable computation. ZKML inherently provides this capability. DP, FL, and FHE would not provide similar levels of integrity assurance.
The table below summarizes our findings:
What’s Next 🥯
At Bagel, we recognize that existing privacy-preserving machine learning solutions fall short in providing end-to-end privacy, scalability, and strong trust assumptions. To address these limitations, our team has developed a novel approach based on a modified version of homomorphic encryption (FHE).
Our pilot results are extremely promising, indicating that our solution has the potential to revolutionize the field of privacy-preserving machine learning. By leveraging the strengths of homomorphic encryption and optimizing its performance, we aim to deliver a scalable, trustworthy, and truly private machine learning framework.
We believe that our work represents a paradigm shift in the way machine learning is conducted, ensuring that the benefits of AI can be harnessed without compromising user privacy or data security. As we continue to share more about our approach, we invite you to follow our progress by subscribing to the Bagel blog.
For more thought pieces from Bagel, follow out their blog here.
To stay updated on the latest Filecoin happenings, follow the @Filecointldr handle.
Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.
The Decentralized Storage space is rapidly evolving. Filecoin is at an important moment – and in this blog we propose both areas for the ecosystem to double down on and ways we can track that progress. It is by no means exhaustive, but written from the vantage point of having been embedded in the Filecoin ecosystem for many years, gathering feedback from users, builders and the community, and having thought deeply about what is needed as the network moves forward.
The blog is organized in the following sections:
What matters for Filecoin in 2024
Why these matter and how to measure progress
It is our hope that with the right north star, teams will be able to better coordinate and identify convergences between project-level interests & ecosystem interests. The proposed framework and metrics should make it easier for capital and resource allocators in the ecosystem to evaluate the level of impact each team is creating, and distribute capital and resources accordingly. For startups, this can help frame where broader ecosystem efforts may dovetail into your roadmap and releases.
WHAT MATTERS IN 2024
Accelerating conversions to paid deals: Helping Filecoin providers increase their paid services (storage, retrieval, compute) is critical for driving cashflows into Filecoin and to support sustainable funding of its hardware outside of token incentives.
Growing on-chain activity: Filecoin is not aiming to be just another L1 fighting over the same use cases. But it does have a unique value proposition as a base layer with “real world” services anchored into it. This enables new use cases (programmable services, DeFi around cash flows, etc.) that are unique to Filecoin. Building out and growing adoption of these services can help prove that Filecoin is not just “a storage layer”, but an economy with a stable set of cash flows.
Making Filecoin indispensable to others: Bull cycles mean velocity is critical – as is making Filecoin an indispensable part of the stack for more teams. There are many emerging themes to capitalize on (Chain Archival, Compute, AI) – and Filecoin positioning itself matters. The ecosystem collectively wins when more participants leverage Filecoin as a core part of their story. For individual teams, this means that shipping to your users matters. At the ecosystem level, it means orienting efforts to unblock the teams closest to driving integrations and building services on Filecoin.
The verticals in our framework remain relatively high-level – and many of these objectives will have their own set of tasks. But it is more critical first, for the ecosystem to be aligned that this is the right set of verticals to progress against. We dive into each vertical and some tangible metrics that the ecosystem should start tracking against.
WHY THESE MATTER AND HOW TO MEASURE PROGRESS
1) Accelerating conversion to paid deals
As a storage network – Filecoin should maximize the cashflows it can bring into its economy. Having incentives as an accelerant is fine – but without having a steady (and growing ramp) of paid deals Filecoin can’t achieve its maximum potential.
Paid deals (when settled on-chain) are a net capital inflow into the Filecoin economy that can be the substrate for use cases uniquely possible in our ecosystem. DeFi as an example has a real opportunity to provide actual services to businesses (e.g. converting currencies to pay for storage).
There are two main paths that we can drive growth of paid services:
Drive growth in existing services (data archival)
Expand to new markets with additional services (hot storage, compute, indexing, etc.)
In both cases, there’s work to be done to reduce friction for paid on-ramps or ship new features that raise the floor (as informed by on-ramps and projects trying to bring Filecoin services to market). It is critical that the Filecoin ecosystem collectively prioritizes the right efforts to make Filecoin services sellable, and allocate resources accordingly.
There are already a number of teams making substantial progress on this front (CID.Gravity, Seal Storage, Holon, Banyan, Lighthouse.storage, Web3Mine, Basin, among others) – and we can best measure progress by helping reduce their friction and helping drive their success.
We propose measuring success for this vertical in two forms:
Dollars and Total Data Stored for Paid Deals (self reported)
Dollars and Total Data Stored for Paid Deals (on-chain)
There are a number of initiatives from public goods teams along these efforts for the quarter (Q2 2024) which include:
FilOz: is working on a FIP for new proofs to reduce storage costs and dramatically improve retrieval speeds
DeStor: is helping drive enterprise adoption for business ready on-ramps
Ansa Research, Filecoin Foundation, etc.: Web3 BD support for ecosystem builders
Targeted grant funding for efforts that directly support growth of sustainable on-chain paid deal activity
2) Growing on-chain activity
Filecoin, as an L1, has more than just its storage service. Building a robust on-chain economy is critical for accelerating the services and tooling with which others can compose. In the Filecoin ecosystem, we have a unique opportunity in that there are real economic flows to enable via paid on-chain deals.
Centering our on-chain economy around supporting those flows – be it from automating renewals, designing incentives for retrievals, creating endowments for perpetual storage, or building economic efficiency for the operators of the network – can lead to compounding growth as it creates a flywheel.
As Filecoin owns more of its own economic activity on-chain, value will accrue for the token – enabling ecosystem users to use Filecoin in more productive ways, generating real demand for services inside the ecosystem.
We propose the following metrics for us to collectively measure success:
Contract calls
Active Filecoin addresses
Volume of on-chain payments
There are notable builders already seeding the on-chain infrastructure to leverage some of these primitives (teams like GLIF working on liquid staking, Lighthouse on storage endowments, and teams like Fluence enabling compute).
There’s a set of improvements that can dramatically reduce friction for driving on-chain activity, and there several efforts prioritized against this for Q2 2024:
FilOz: F3 to bring fast finality to Filecoin can both improve the bridging experience, and enable more “trade” between Filecoin and other economies (e.g. native payments from other ecosystems for services in Filecoin).
FilOz: Refactoring how deals work on Filecoin to enable more flexible payment (e.g. with stablecoins)
FilPonto, FilOz: Reducing EVM tech debt to substantially reduce friction for builders porting Solidity contracts onto Filecoin (and hardening the surrounding infrastructure for more stable services)
3) Making Filecoin indispensable to others
This vertical is broad, but we would argue that there are two key ways to be consider the impact that the Filecoin ecosystem is driving:
The first is along high profile integrations, where Filecoin is critical to the success of the customer and its proposition. It is especially critical for the ecosystem to provide the necessary support for these cross-chain integrations.
The second is along specific verticals, where there is a large and growing trend in activity; Filecoin is uniquely positioned to provide value here, both in terms of the primitives it has, as well as in its cost profile and scale
Opportunities are brimming in Web3 at the moment, and the ecosystem should rally workstreams around on-ramps that are making Filecoin integral to narratives such as Compute, DePIN (sensors), Social, Gaming, AI, and Chain Archival.
We propose that the metrics to evaluate for Filecoin indispensability as:
Number of partnerships and integrations
There are a number of efforts from ecosystem teams aimed at helping onramps succeed on this front in the quarter (Q2 2024):
Ansa Research, Filecoin Foundation, DeStor and others: Forming a new working group to accelerate shared ecosystem BD and marketing resources
Shared BD resources for builders in the Filecoin ecosystem
Shared Marketing resources and amplification (#ecosystem-amplification-requests in the Filecoin slack) to help signal boost ecosystem wins
Community Discord to help expand accessibility, visibility, and drive community engagement
FINAL THOUGHTS
After reading the above, we hope that the direction of Filecoin in the coming year is clearer. Filecoin is at a pivotal moment where many of its pieces are coming together. Protocols and ecosystems naturally evolve and each stage calls for different priorities and strategies for the next leg of growth. By focusing efforts in the ecosystem, we believe that the Filecoin ecosystem can make its resources and support go that much further.
We are excited for what is to come and how Filecoin can continue to expand the pie for what can be done on Web3 rails. Moving forward, Ansa Research will post periodic updates on the key metrics for Filecoin’s ecosystem progress.
To stay updated on the latest Filecoin happenings, follow the @Filecointldr handle.
Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.
Editor’s Note: This blogpost is a repost of the original content published on 5 April 2024, by Turan VuralYuki Yuminaga from Fenbushi Capital. Established in 2015, Fenbushi Capital holds the distinction of being Asia’s pioneering blockchain-focused asset management firm with an AUM of $1.6 billion. Through research and investment, the firm aims to play a vital role in shaping the future of blockchain tech across diverse sectors.This blogpost is an example of these efforts, and represents the independent view of these authors, whom have given permission for this re-publication.
Data availability (DA) is a core technology in the scaling of Ethereum, allowing a node to efficiently verify that data is available to the network without having to host the data in question. This is essential for the efficient building of rollups and other forms of vertical scaling, allowing execution nodes to ensure that transaction data is available during the settlement period. This is also crucial for sharding and other forms of horizontal scaling, a planned future update to the Ethereum network, as nodes will need to prove that transaction data (or blobs) stored in network shards are indeed available to the network.
Several DA solutions have been discussed and released recently (e.g., Celestia, EigenDA, Avail), all with the intent of providing performant and secure infrastructure for applications to post DA.
The advantage of an external DA solution over an L1 such as Ethereum is that it provides an inexpensive and performant vehicle for on-chain data. DA solutions often consist of their own public chains built to enable cheap and permissionless storage. Even with modifications, the fact remains that hosting data natively from a blockchain is extremely inefficient.
Thus, we find that it is intuitive to explore a storage-optimized solution such as Filecoin for the basis of a DA layer. Filecoin uses its blockchain to coordinate storage deals between clients and storage providers but allows data to be stored off-chain.
In this post, we investigate the viability of a DA solution built on top of a Distributed Storage Network (DSN). We consider Filecoin specifically, as it is the most adopted DSN to date. We outline the opportunities that such a solution would offer, and the challenges that need to be overcome to build it.
A DA layer provides the following to services relying on it:
Client Safety: No node can be convinced that unavailable data is available.
Global Safety: The un/availability of data is agreed upon by all except at most a small minority of nodes.
Efficient data retrievability.
All of this needs to be done efficiently to enable scaling. A DA layer provides higher performance at a lower cost across the three points above. For example, any node can request a full copy of the data to prove custody, but this is inefficient. By having a system that provides all three of these, we achieve a DA layer that provides the security required for L2s to coordinate with an L1, along with stronger lower bounds in the presence of a malicious majority.
Custody of Data
Data posted to a DA solution has a useful lifetime: long enough to settle disputes or verify a state transition. Transaction data needs to be available only long enough to verify a correct state transition or to give validators enough opportunity to construct fraud proofs. As of writing, Ethereum calldata is the most common solution used by projects (rollups) requiring data availability.
Efficient Verification of Data
Data Availability Sampling (DAS) is the standard method of answering the question of DA. It comes with additional security benefits, strengthening network actors’ ability to verify state information from their peers. However, it relies on nodes to perform sampling: DAS requests must be answered to ensure mined transactions won’t be rejected, but there is no positive or negative incentive for a node to request samples. From the perspective of nodes that request samples, there is no negative penalty for not performing DAS. As an example, Celestia provides the first and only light client implementation to perform DAS, delivering stronger security assumptions to users and reducing the cost of data verification.
Efficient Access
A DA needs to provide efficient access to data to the projects using it. A slow DA may become the bottleneck for the services relying on it, causing inefficiencies at best and system failures at worst.
Decentralized Storage Network
A Decentralized Storage Network (DSN, as formalized in the Filecoin Whitepaper¹) is a permissionless network of storage providers that offer storage services for users of the network. Informally, it allows independent storage providers to coordinate storage deals with clients that need storage services and provides cheap and resilient data storage to clients seeking storage services at a low price. This is coordinated through a blockchain that records storage deals and enables the execution of smart contracts.
A DSN scheme is a tuple of three protocols: Put, Get, and Manage. This tuple comes with properties such as fault tolerance guarantees and participation incentives.
Put(data) → key Clients execute Put to store data under a unique key. This is achieved by specifying the duration for which data will be stored on the network, the number of replicas of the data that are to be stored for redundancy, and a negotiated price with storage providers.
Get(key) → data Clients execute Get to retrieve data that is being stored under a key.
Manage() The Manage protocol is called by network participants to coordinate the storage space and services made available by providers and repair faults. In the case of Filecoin, this is managed via a blockchain. This blockchain records data deals being made between clients and data providers and proofs of correctly stored data to ensure that data deals are being upheld. Correctly stored data is proved via the posting of proofs generated by data providers in response to challenges from the network. A storage fault occurs when a storage provider fails to generate a Proof-of-Replication or Proof-of-Spacetime promptly when requested by the Manage protocol, which results in the slashing of the storage provider’s stake. Deals can self-heal in the case of a storage fault if more than one provider is hosting a copy of the data on the network by finding a new storage provider to honor the storage deal.
DSN Opportunities
The work done thus far in DA projects has been to transform a blockchain into a platform for hot storage. Since a DSN is storage-optimized, rather than transforming a blockchain into a storage platform, we can simply transform a storage platform into one that provides data availability. The collateral of storage providers in the form of native FIL token can provide crypto-economic security that guarantees data is stored. Finally, the programmability of storage deals can provide flexibility around the terms of data availability.
The most compelling motivation to transform the capabilities of a DSN to solve DA is the cost reduction in the data storage under the DA solution. As we discuss below, the cost of storing data on Filecoin is significantly cheaper than storing data on Ethereum. Given current Ether/USD prices, it costs over 3 million USD to write 1 GB of calldata to Ethereum, only to be pruned after 21 days. This calldata expense can contribute to over half of the transaction cost of an Ethereum-based rollup. However, 1 GB of storage on Filecoin costs less than .0002 USD per month. Securing DA at this or any similar price would bring transaction costs down for users and contribute to the performance and scalability of Web3.
Economic Security
In Filecoin, collateral is required to make storage space available. This collateral is slashed when a provider fails to honor its deals or uphold network guarantees. A storage provider that fails to provide services faces losing both its posted collateral and any profit that would have been earned from providing storage.
Incentive Alignment
Many of Filecoin’s protocol incentives align with the goals of DA. Filecoin provides disincentives for malicious or lazy behavior: storage providers must actively provide proofs of storage during consensus in the form of Proof-of-Replicas and Proof-of-Spacetime, continuously proving that the storage exists without honest majority assumptions. Failure of a storage provider to provide proof results in stake slashing, and removal from consensus, among other penalties. Current DA solutions lack incentive for nodes to perform DAS, relying on ad-hoc altruistic behavior for proof of DA.
Programmability
The ability to customize data deals also makes a DSN an attractive platform for DA. Data deals can have varying durations, allowing users of a DSN-based DA to pay for only the DA that they need. Fault tolerance can also be tuned by setting the number of copies that are to be stored throughout the network. Further customization is supported via smart contracts on Filecoin (called Actors), which are executed on the FEVM. This leads to Filecoin’s growing ecosystem of DApps, from compute-over-storage solutions such as Bacalhau to DeFi and liquid staking solutions such as Glif. Retriev makes use of Filecoin Actors to provide incentive-aligned retrieval with permissioned referees. Filecoin’s programmability can be used to tailor DA requirements needed for different solutions, so that platforms that rely on DA are not paying for more DA than they need.
Challenges to a DSN-Based DA Architecture
In our investigation, we have identified significant challenges that need to be overcome before a DA service can be built on a DSN. As we now talk about the feasibility of implementation, we will use Filecoin as our main focus of the discussion.
Proof Latency
The cryptographic proofs that ensure the integrity of deals and stored data on Filecoin take time to prove. When data is committed to the network, it is partitioned into 32 gigabyte sectors and “sealed.” The sealing of data is the foundation of both the Proof-of-Replication (PoRep), which proves that a storage provider is storing one or more uniquecopies of the data, and Proof-of-Spacetime (PoST), which proves that a storage provider stored a unique copy continuously throughout the duration of the storage deal. Sealing has to be computationally expensive to ensure that storage providers aren’t sealing data on demand to undermine the required PoReP. When the protocol presents the periodic challenge to a storage provider to provide proof of unique and continuous storage, sealing has to safely take longer than the response window so that a storage provider can’t falsify proofs or replicas on the fly. For this reason, it can take providers approximately three hours to seal a sector of data.
Storage Threshold
Because of the computational expense of the sealing operation, the sector size of the data being sealed has to be economically worthwhile. The price of storage has to justify the cost of sealing to the storage provider, and likewise, the resulting cost of data being stored has to be low enough at scale (in this case, for an approximately 32GB chunk) for a client to want to store data on Filecoin. Although smaller sectors could be sealed, this would drive up the price of storage to compensate storage providers. To get around this, data aggregators collect smaller pieces of data from users to be committed to Filecoin as a chunk close to 32 GB. Data aggregators commit to user’s data via a Proof-of-Data-Segment-Inclusion (PoDSI), which guarantees the inclusion of a user’s data in a sector, and a sub-piece CID (pCID), which the user will be able to use to retrieve the data from the network.
Consensus Constraints
Filecoin’s consensus mechanism, Expected Consensus, has a block time of 30 seconds and finality within hours, which may improve in the near future (see FIP-0086 for fast finality on Filecoin). This is generally too slow to support the transaction throughput needed for a Layer 2 relying on DA for transaction data. Filecoin’s block time is lower-bounded by storage provider hardware; the lower the block time, the more difficult it is for storage providers to generate and provide proofs of storage, and the more storage providers will be falsely penalized for missing the proving window for the proper storage of data. To overcome this, InterPlanetary Consensus (IPC) subnets can be leveraged to take advantage of faster consensus times. IPC uses Tendermint-like consensus and DRAND for randomness: in the case that DRAND is the bottleneck, we would be able to achieve a 3-second block-time with an IPC subnet. In the case of a Tendermint bottleneck, PoCs such as Narwhal have achieved blocktimes in the hundreds of milliseconds.
Retrieval Speed
The final barrier-to-build is retrieval. From the constraints above, we can deduce that Filecoin is suitable for cold or lukewarm storage. However, the DA data is hot and needs to support performant applications. Incentive-aligned retrieval is difficult in Filecoin; data needs to be unsealed before it is served to clients, which adds latency. Currently, rapid retrieval is done via SLAs or the storage of un-sealed data alongside sealed sectors, neither of which can be relied on in the architecture of a secure and permissionless application on Filecoin. Especially with Retriev proving that retrieval can be guaranteed via the FVM, incentive-aligned rapid retrieval on Filecoin remains an area to be further explored.
Cost Analysis
In this section, we consider the cost that comes from these design considerations. We show the cost of storing 32GB as Ethereum calldata, Celestia blobdata, EigenDA blobdata, and as a sector on Filecoin using near-current market prices.
The analysis highlights the price of Ethereum calldata: 100 million USD for 32 GB of data. This price showcases the cost of security behind Ethereum’s consensus, and is subject to the volatility of Ether and gas prices. The Dencun upgrade, which introduced Proto-Danksharding (EIP-4844), introduced blob transactions with a target of 3 blobs per block of approximately 125 KB each, and variable gas blob pricing to maintain the target amount of blobs per block. This upgrade cut the cost of Ethereum DA by ⅕: 20 million USD for 32 GB of blob data.
Celestia and EigenDA provide significant improvements: 8,000 and 26,000 USD for 32 GB of data, respectively. Both are subject to the volatility of market prices and reflect to some extent the cost of consensus securing their data: Celestia with its native TIA token, and EigenDA with Ether.
In all of the above cases, the data stored is not permanent. Ethereum calldata is stored for 3 weeks, with blobs stored for 18 days. EigenDA stores blobs for a default of 14 days. As of the current Celestia implementation, blob data is stored indefinitely by archival nodes but only sampled by light nodes for a maximum of 30 days.
The final two tables are direct comparisons between Filecoin and current DA solutions. Cost equivalence first lists the cost of a single byte of data on the given platform. The amount of Filecoin bytes that can be stored for the same amount of time for the same cost is then shown.
This shows that Filecoin is orders of magnitude cheaper than current DA solutions, costing fractions of a cent to store the same amount of data for the same amount of time. Unlike Ethereum nodes and that of other DA solutions, Filecoin’s nodes are optimized to provide storage services, and its proof system allows nodes to prove storage, rather than replicate storage across every node in the network. Without accounting for the economics of storage providers (such as the energy cost to seal data), it shows that the basic overhead of the storage process on Filecoin is negligible. This shows a market opportunity in the millions of USD per gigabyte compared to Ethereum for a system that can provide secure and performant DA services on Filecoin.
Throughput
Below, we consider the capacity of DA solutions and the demand that is generated by major layer 2 rollups.
Because Filecoin’s blockchain is organized in tipsets with multiple blocks at every block-height, the number of deals that can be done is not restricted by consensus or block size. The strict data constraint of Filecoin is that of its network-wide storage capacity, not what is allowed via consensus.
For daily DA demand, we pull data from Rollups DA and Execution from Terry Chung and Wei Dai, which includes a daily average across 30 days and a singular sampled day. This allows us to consider average demand while not overlooking aberrations from the average (for example, Optimism’s demand on 8/15/2023 of approximately 261,000,000 bytes was over 4x its 30 day average of 64,000,000 bytes).
From this selection, we see that despite the opportunity of lower DA cost, we would need a dramatic increase in DA demand to make efficient use of the 32 GB sector size of Filecoin. Although sealing 32 GB sectors with less than 32 GB of data would be a waste of resources, we could do so while still reaping a cost advantage.
Architecture
In this section, we consider the technical architecture that can be achieved if we were to build this today. We will consider this architecture in the context of arbitrary L2 applications and an L1 chain that the L2 is serving. Since this solution is an external DA solution, like that of Celestia and EigenDA, we do not consider Filecoin as example L1.
Components
Even at a high-level, a DA on Filecoin will make use of many different features of the Filecoin ecosystem.
Transactions: Downstream users make transactions on a platform that requires DA. This could be an L2.
Platforms Using DA: These are the platforms that use DA as a service. This could be an L2 which posts transaction data to the Filecoin DA and commitments to an L1, such as Ethereum.
Layer 1: This is any L1 that contains commitments pointing to data on the DA solution. This could be Ethereum, supporting an L2 that leverages the Filecoin DA solution.
Aggregator: The frontend of Filecoin-based DA solution is an aggregator, a centralized component which receives transaction data from L2’s and other DA clients and aggregates them into 32 GB sectors suitable for sealing. Although a simple proof-of-concept would include a centralized aggregator, platforms using the DA solution could also run their own aggregator,for example as a sidecar to an L2 sequencer. The centralization of the aggregator can be seen as similar to that of an L2 sequencer or EigenDA’s disperser. Once the aggregator has compiled a payload near 32GB, it makes a storage deal with storage providers to store the data. Clients are given a guarantee that their data will be included in the sector in the form of a PoDSI (Proof of Data Segment Inclusion), and a pCID to identify their data once it is on the network. This pCID is what would be included in the state commitments on the L1 to reference supporting transaction data.
Verifiers: Verifiers request the data from the storage providers to ensure the integrity of state commitments and build fraud proofs, which are committed to the L1 in the case of provable fraud.
Storage Deal: Once the aggregator has compiled a payload near 32GB, the aggregator makes a storage deal with storage providers to store the data.
Posting blobs (Put): To initiate a put, a DA client will submit their blob containing transaction data to the aggregator. This can be done in an off-chain manner, or an on-chain manner via an on-chain aggregator oracle. To confirm receipt of the blob, the aggregator returns a PoDSI to the client to prove that their blob is included in the aggregated sector that will be committed to the subnet. A pCID (sub-piece Content IDentifier) is also returned. This is what the client and any other interested party will use to reference the blob once it is being served on Filecoin.
Data deals would appear on-chain within minutes of the deal being made. The largest barrier to latency is the sealing time, which can take 3 hours. This means that although the deal has been made, and the client can be confident that the data will appear in the network, the data cannot be guaranteed to be queryable until the sealing process is complete. The Lotus client has a fast-retrieval feature in which an unsealed copy of the data is stored alongside the sealed copy that may be able to be served as soon as the unsealed data is transferred to the data storage provider, as long as a retrieval deal does not depend on the proof of sealed data to appear on the network. However, this functionality is at the discretion of the data provider, and is not cryptographically guaranteed as part of the protocol. If a fast-retrieval guarantee is to be provided, there would need to be changes to consensus and dis/incentive mechanisms in place to enforce it.
Retrieving blobs (Get): Retrieval is similar to a put operation. A retrieval deal needs to be made, which will appear on-chain within minutes. Retrieval latency will depend on the terms of the deal and whether an unsealed copy of data is stored for fast retrieval. In the fast retrieval case, the latency will depend on network conditions. Without fast retrieval, data will need to be unsealed before being served to the client, which takes the same amount of time as sealing, on the order of 3 hours. Thus without optimizations we have a maximum round-trip of 6 hours, major improvement in data serving would need to be made before this becomes a viable system for DA or fraud proofs.
Proof of DA: proof of DA can be considered in two steps; via the PoDSI that is given when the data is committed to the aggregator while the deal is being made and then the continued commitment of PoRep and PoST that storage providers provide via Filecoin’s consensus mechanism. As discussed above, the PoRep and PoST give scheduled and provable guarantees of data custody and persistence.
This solution will make heavy use of bridging, as any client that relies on DA (regardless of the construction of proofs) will need to be able to interact with Filecoin. In the case of the pCID included in the state transition that is posted to the L1, a verifier can make an initial check to make sure that a bogus pCID wasn’t committed. There are several ways that this could be done, for example, via an oracle that posts Filecoin data on the L1 or via verifiers that verifies the existence of a data deal or sector that corresponds to the pCID. Likewise, the verification of validity or fraud proofs that get posted to the L1 may need to make use of a bridge to be convinced of a proof. Current available bridges are Axelar and Celer.
Security Analysis
Filecoin’s integrity is enforced through the slashing of collateral. Collateral can be slashed in two cases: storage faults or consensus faults. A storage fault corresponds to a storage provider not being able to provide proof of stored data (either PoRep or PoST), which would correlate to a lack of data availability in our model. A consensus fault corresponds to malicious action in consensus, the protocol that manages the transaction ledger from which the FEVM is abstracted.
A Sector Fault refers to the penalty incurred from the failure to post proof of continuous storage. Storage providers are allowed a one-day grace period during which a penalty is not incurred for faulty storage. After 42 days from a sector becoming faulty, the sector is terminated. Incurred fees are burnt.
A Sector Termination occurs after a sector has been faulty for 42 days or a storage provider purposefully terminates a deal. Termination fees are equivalent to the maximum amount that a sector has earned up to termination, with an upper bound of 90 days’ worth of earning. Unpaid deal fees are returned to the client. Incurred fees are burnt.
Storage Market Actor Slashing occurs in the event of a terminated deal. This is the slashing of the collateral that the storage provider puts up behind the deal.
The security provided by Filecoin is very different from that of other blockchains. Whereas blockchain data is typically secured via consensus, Filecoin’s consensus only secures the transaction ledger, not the data referred to by the transaction. The data that is stored on Filecoin has only enough security to incentive-align storage providers to provide storage. This means that the data stored on Filecoin is secured via fault penalties and business incentives such as reputation with clients. In other words, a data fault on a blockchain is equivalent to a breach of consensus, and breaks the safety of the chain or its notion of the validity of transactions. Filecoin is designed to be fault tolerant when it comes to data storage, and therefore only uses its consensus to secure its dealbook and deal-related activities. The cost of a storage miner not fulfilling its data deal has a maximum of 90 days worth of storage reward in penalties, and the loss of the collateral put up by the miner to secure the deal.
Therefore, the cost of a data withholding attack being launched from Filecoin providers simply the opportunity cost a retrieval deal. Data retrieval on Filecoin relies on the storage miner being incentivized by a fee paid for by the client. However, there is no negative impact to a miner for not responding to a data retrieval request. To mitigate the risk of a single storage miner ignoring or refusing data retrieval deals, data on Filecoin can be stored by multiple miners.
Since the economic security behind the data being stored on Filecoin is considerably less than that of blockchain based solutions, the prevention of data manipulation must also be considered. Data manipulation is protected via Filecoin’s proof system. Data is referred to via CIDs, through which data corruption is immediately detectable. A provider therefore cannot serve corrupt data, as it is easy to verify whether the fetched data matches the requested CID. Data providers cannot store corrupted data in the place of uncorrupted data. Upon the receipt of client data, providers must provide proof of a correctly sealed data sector to initiate the data deal (check this). Therefore, a storage deal cannot be started with corrupt data. During the lifetime of the storage deal, PoSTs are provided to prove custody (recall that this proves both custody of the sealed data sector and custody since the last PoST). Since the PoST is reliant on the sealed sector at the time of proof generation, a corrupt sector would result in a bogus PoST, resulting in a sector failure. Therefore, a storage provider can neither store nor serve corrupted data, cannot claim reward for services provided for uncorrupted data, and cannot avoid being penalized for tampering with a client’s data.
Security can be strengthened through increasing the collateral committed by the storage provider to the Storage Market Actor, which is currently decided by the storage provider and the client. If we assume that this was sufficiently high enough (for example, the same stake as an Ethereum validator) to incentivize a provider not to default, we can think of what is left to secure (even though this would be extremely capital-inefficient, as this stake would be needed to secure each transaction blob or sector with aggregated blobs). Now, a data provider could choose to make data unavailable for maximums of 41-day chunks before the storage deal is terminated by the Storage Market Actor. Assuming a shorter data deal, we could assume that the data can be made unavailable until the last day of the deal. In the absence of coordinated malicious actors, this can be mitigated via replication on multiple storage providers so that the data can continue being served.
We can consider the cost of an attacker overriding consensus to either accept a bogus proof or rewrite ledger history to remove a deal from the orderbook without penalizing the responsible storage provider. It is worth noting however that in the case of such a safety violation, an attacker would be able to manipulate Filecoin’s ledger however they want. In order for an attacker to commit such an attack, they would need at least a majority stake in the Filecoin chain. Stake is related to storage provided to the network; with a current 25 EiB (10¹⁶ bytes) of data securing the Filecoin chain, at least 12.5 EiB would be needed for a malicious actor to offer its own chain that would win the fork-choice rule. This is further mitigated by slashing related to consensus faults, for which the penalty is the loss of all pledged collateral and block rewards and all suspension from participation in consensus.
Aside: Withholding attacks on other DA solutions Although the above shows that Filecoin is lacking in protecting data from withholding attacks, it is not alone.
Ethereum: In general, the only way to guarantee that a request to the Ethereum network is answered is to run a full node. Full nodes have no requirements to fulfill data retrieval requests outside of consensus — and therefore. Constructs such as PeerDAS introduce a peer scoring system for a node’s responses to data retrieval in which a node with a low enough score (essentially a DA reputation) could be isolated from the network.
Celestia: Even though Celestia has much stronger security per-byte against withholding attacks in comparison to our Filecoin construction, the only way to take advantage of this security is to host your own full node. Requests to Celestia infrastructure that are not owned and operated in-house can be censored without penalty.
EigenDA: Similar to Celestia, any service can run an EigenDA Operator node to ensure retrieval of their own data. As such, any out protocol data retrieval request can be censored. Also note that EigenDA has a centralized and trusted dispenser in charge of data encoding, KZG commitment, and data dispersal, similar to our aggregator.
Retrieval Security
Retrievability is necessary for DA. Ideally, market forces motivate economically rational miners to accept retrieval deals, and compete with other miners to keep prices for clients low. It is assumed that this is enough for data providers to provide retrieval services, however given the importance of DA, it is reasonable to require more security.
Retrieval is currently not guaranteed via the economic security stipulated above. This is because it is cryptographically difficult to prove that data wasn’t received by a client (in the case where a client needs to refute a storage miner’s claim of sending data) in a trust-minimized manner. A protocol-native retrieval guarantee would be required in order for retrieval to be secured through the Filecoin’s economic security. With minimal changes to the protocol, this means that retrieval would need to be associated with a sector fault or deal termination. Retriev is a proof-of-concept which was able to provide data retrieval guarantees by using trusted “referees” to mediate data retrieval disputes.
Aside: Retrieval on other DA solutions As can be seen above, Filecoin lacks the protocol-native retrieval guarantees necessary to keep storage (or retrieval providers) from acting selfishly. In the case of Ethereum and Celestia, the only way to guarantee that data from the protocol can be read is to self-host a full node, or trust a SLA from an infrastructure provider. It is not trivial to guarantee retrieval as a Filecoin storage provider; the analogous setting in Filecoin would be to become a storage provider (requiring significant infrastructure cost) and successfully accept the same storage deal as a storage provider that was posted as a user, at which point one would be paying themselves to provide storage to themselves.
Latency Analysis
Latency on Filecoin is determined by several factors, such as network, topology, storage mining client configuration, and hardware capabilities. We provide a theoretical analysis which discusses these factors, and the performance that can be expected by our construct.
Due to the design of Filecoin’s proof system and lack of retrieval incentives, Filecoin is not optimized to provide high-performance round trip latency from the initial posting of data to the initial retrieval of data. High performance retrieval on Filecoin is an active area of research that is constantly changing as storage providers increase their capabilities and as Filecoin introduces new features. We define a “round trip” as the time from the submission of a data deal to the the earliest moment the data submitted to Filecoin can be downloaded.
Block Time In Filecoin’s Expected Consensus, data deals can be included within the block-time of 30 seconds. 1 hour is the typical time for confirmation of sensitive on-chain data (such as coin transfers).
Data Processing Data processing time varies widely between storage providers and configurations. The sealing process is designed to take 3 hours with standard storage mining hardware. Miners often outperform this 3 hour threshold via special client configurations, parallelization, and investing in more capable hardware. This variation also affects the duration of sector un-sealing, which can be circumvented altogether by quick retrieval options in Filecoin client implementations such as Lotus. The quick retrieval setting stores an unsealed copy of data alongside sealed data, significantly speeding up retrieval time. Based on this, we can assume a worst-case delay of three hours from the acceptance of a data deal to when the data is available on-chain.
Conclusion and Future Directions
This article explores building a DA by leveraging an existing DSN, Filecoin. We consider the requirements of a DA with respect to its role as a critical element of scaling infrastructure in Ethereum. We consider building on top of Filecoin for the viability of DA on a DSN, and use it to consider the opportunities that a solution on Filecoin would provide to the Ethereum ecosystem, or any that would benefit from a cost-effective DA layer.
Filecoin proves that a DSN can dramatically improve the efficiency of data storage in a distributed, blockchain-based system, with a proven saving of 100 million USD per 32 GB written at current market prices. Even though the demand for DA is not yet high enough to fill 32 GB sectors, the cost advantage of a DA still holds if empty sectors are sealed. Although current latency of storage and retrieval on Filecoin is not appropriate for the hot storage needs, storage miner-specific implementations can provide reasonable performance with data being available in under 3 hours.
The increased trust in Filecoin storage providers can be tuned via variable collateral, such as in EigenDA. Filecoin extends this tunabel security to allow for a number of replicas to be stored across the network, adding tunable byzantine tolerance. Guaranteed and performant data retrieval would need to be solved in order to robustly deter data withholding attacks, however like any other solution, the only way to truly guarantee retrievability is to self-host a node or trust infrastructure providers.
We see opportunities for DA in the further development of PoDSI, which could be used (alongside Filecoin’s current proofs) in place of DAS to guarantee data inclusion in a larger sealed sector. Depending on how this looks, this may make slow turnaround of data tolerable, as fraud proofs could be posted in a window of 1 day to 1 week, while DA could be guaranteed on demand. PoDSIs are still new and under heavy development, and so we make no implication yet on what an efficient PoDSI could look like, or the machinery needed to build a system around it. As there are solutions for compute on top of Filecoin data, the idea of a solution that computes a PoDSI on sealed or unsealed data may not be out of the realm of near-future possibilities.
As both the field of DA and Filecoin grows, new combinations of solutions and enabling technologies may enable new proof of concepts. As Solana’s integration with the Filecoin network shows, DSNs hold potential as a scaling technology. The cost of data storage on Filecoin provides an open opportunity with a large window of optimization. Although the challenges discussed in this article are presented in the context of enabling DA, their eventual solution will open a plethora of new tools and systems to be built beyond DA.
¹ Although this isn’t the construction of Filecoin, it is useful for those who are unfamiliar with programmable decentralized storage.
For more research pieces from Fenbushi Capital, check out their Medium page here.
To stay updated on the latest Filecoin happenings, follow the @Filecointldr handle.
Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.
The Filecoin community celebrated the first anniversary of the Filecoin Virtual Machine (FVM) launch on March 14, 2024. The FVM has brought programmability to Filecoin’s verifiable storage and opened up a unique DeFi ecosystem anchored around improving on-chain collateral markets. Liquid Staking, for example, as a subset of Filecoin DeFi, has hit over $500 million in TVL. As the network grows, several critical infrastructures across AMMs, Bridges, Oracles, and Collateral Debt Positions (CDPs) are coming together to propel DeFi expansion in 2024.
In this blog post, let’s take a look at the latest DeFi projects launched on top of FVM and provide a view into future areas of activity.
DeFi Developments on FVM
Automated Market Makers
Automated Market Makers (AMMs) connect Filecoin with other Web3 ecosystems, enabling on-chain swaps, deeper liquidity, and fresh LP opportunities.
Decentralized Exchanges: ✅
Recently, leading Decentralized Exchanges Uniswap v3 (via Oku.trade) and Sushi integrated with Filecoin by deploying on the FVM. Oku Trade’s interface enables Uniswap users to easily exchange assets and provide liquidity on Filecoin. With this, FVM developers can effortlessly access bridged USDC and ETH assets natively on the Filecoin network, broadening Filecoin’s reach. As a foundational DeFi primitive, DEXes also opens the floodgates for non-native applications to leverage Filecoin’s robust storage and compute hardware.
Interoperability Networks
Bridges: ✅
Bridges help bring liquidity into DEXs and AMMs on FVM. For developers building on FVM, Bridges connects Filecoin’s verifiable data with tokens, users, and applications on any chain, ensuring maximum composability for DeFi protocols. For this purpose, messaging, and token bridging solutions by Axelar and Celer were added to the Filecoin network immediately post-FVM launch.
Today, AMMs Uniswap v3 and Sushi along with several other DeFi applications are natively bridged to Filecoin with the help of cross-chain infrastructure enabled by Axelar and Celer.
Liquid Staking
Liquid Staking protocols have been the prime mover within Filecoin DeFi. They’ve played a vital role in growing and improving on-chain collateral markets. Today, nearly 17% of the total locked collateral (approx. 30 million FIL) by storage providers comes from FVM-based protocols such as GLIF (52%), SFT Protocol (10%), Repl (9%) and the rest (29%). These protocols have increased capital access to storage providers while simultaneously enabling better yield access to token holders. Read more to learn how Filecoin staking works.
GLIF Points: 🔜
GLIF, the leading protocol on Filecoin, has a TVL of over $250 million. To put this into context, this surpasses the largest Liquid Staking protocols on L1 chains like Avalanche. As of writing this (March 06, 2024), 32% of all FIL stakes into GLIF liquidity pools were deposited shortly after its announcement to launch GLIF points (on Feb. 28, 2024), a likely precursor to a governance token.
Typically, to participate in the rewards program, GLIF users will have to deposit FIL and mint GLIF’s native token, iFIL. Similarly, the SFT protocollaunched a points program in 2023 based on its governance token to incentivize community participation.
Overall, we look forward to how the gameplay of points, popular among DApps in Web3 ecosystems, will act as a catalyst to decentralize governance and incentivize participation for Filecoin’s DeFi DApps.
New Staking Models: 👀
The influx of protocols experimenting with new models to inject liquidity into the ecosystem hasn’t slowed down. Two projects worth mentioning are Repl and FILLiquid.
Repl.fiintroduces the concept of “repledging.” Under repledging, SP’s pledged FIL are tokenized into pFIL, Repl’s native token, and used for other purposes including earning rewards. Repleding essentially increases the utility of locked assets thereby reducing opportunity costs for SPs. In just a few months after launch, Repl’s TVL has soared past $30 million.
FILLiquid, currently on testnet, models the business of FIL lending for SPs on algorithm-based fixed fees instead of traditional interest rates. The separation of payouts from the duration of deposits is expected to nudge long-term pledging and borrowing activities from token holders and SPs respectively, saving costs and increasing efficiency.
Price Oracles
Oracles, services that feed external data to smart contracts, are critical blockchain infrastructure essential for DeFi applications to grow and interact with the real world.
Pyth Network: ✅
Pyth recently launched its Price Feeds on the FVM. The integration allows FVM developers to access more than 400 real-time market data feeds while exploring opportunities to build on top of Filecoin’s storage layer. DeFi apps benefit from Pyth’s low-latency, high-fidelity financial data coming directly from global institutional participants such as exchanges, market makers, and trading firms.
Filecoin is also supported by Tellor, an optimistic oracle that gives FVM-based applications access to price feed data.
Collateralized Debt Positions
As DeFi activity on Filecoin is climbing, Collateralized Debt Positions (CDPs) will add more dimensions for other decentralized applications to build on FVM.
Chymia.Finance: 🔜
Chymia is an upcoming DeFi protocol on FVM. With a growing number of Liquid Staking Tokens (LST) on Filecoin, CDPs will extend the utility of locked tokens by generating stablecoins. Through Chymia, holders of LST can generate higher yields while using it as collateral for deeper liquidity.
Ajna: 🔜
Ajna is a noncustodial, peer-to-pool, permissionless lending, borrowing, and trading system requiring no governance or external price feed to function. As a result, any ERC20 on the FVM will be able to set up its own borrow or lend pools, making it easier for new developers to build a utility for their protocols.
Payments
Adjacent to storage offering on Filecoin, the FVM allows developers to bind DeFi payments to real-world primitives on the network. Built intuitively, Filecoin’s core economic flows enable paid services to settle on-chain. Station and Saturn are two notable Filecoin services to have successfully leveraged FVM for payments.
Filecoin Station: ✅
Station is a downloadable desktop application that uses idle computing resources on Station’s DePIN network to run small jobs. Participants in the network are rewarded with FIL earnings. Currently, Station operates the Spark and the Voyager modules, both aimed at improving retrievability on the network. In February, roughly 1,900 Station operators were rewarded with FIL for their participation.
Filecoin Saturn: ✅
Saturn, a decentralized CDN network built on Filecoin, also leverages FVM for disbursing FIL payments to retrieval nodes on the network. In 2023, Saturn averaged over 2,000 earning nodes (retrieval providers on the network receiving FIL) for their services.
Decentralized Options
With growing liquidity, options are yet another emerging product in DeFi. Options facilitate the buying or selling of assets at a predetermined price on a future date, giving token holders protection against price volatility and an opportunity to speculate on market moves.
Thetanuts:✅
Currently, Thetanuts Finance, a decentralized on-chain options protocol supports Filecoin. The platform allows FIL holders to earn yield on their holdings via the covered call strategy. Thetanuts FIL-covered call vaults are cash-settled and work on a bi-weekly tenor.
Wallets
To use dApps on the FVM, users would be required to hold FIL in a f410 or 0x type wallet address. Over time, many Web3 wallets such as MetaMask, FoxWallet, and Brave have started supporting 0x/f410 addresses. MetaMask also supports Ledger. With this, it is possible to hold funds in a Ledger wallet and interact with FVM directly.
In addition, exchanges like Binance natively supporting the FEVM drastically reduce complexities for FVM builders. To learn more about the most recent wallet upgrades, visit the Filecoin TLDR webpage.
What’s Next?
The obvious near-term impact of various integrations across AMMs, Bridges, and CDPs is a fresh influx of liquidity into the Filecoin ecosystem. Liquidity begets deeper liquidity with an increase in the number and diversity of DeFi protocols on Filecoin. DeFi’s growing economy clubbed with more services coming on-chain and utilizing FVM for payments will overall increase the revenue and utility of the network. We expect this strong DeFi traction to scale Filecoin as an L1 ecosystem, with core services of storage and compute becoming the backbone of the decentralized internet.
To stay updated on the latest Filecoin happenings, follow the @Filecointldr handle.
Many thanks to HQ Han and Jonathan Victor for reviewing and providing valuable insights and to all the ecosystem partners and teams for their timely input.
Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.
2023 marked significant shifts in technology and adoption for the Filecoin network. From the launch of the Filecoin Virtual Machine, to other developments across Retrievals and Compute, 2023 lay the foundation for Filecoin’s continued expansion. This blogpost will provide a summary of the notable milestones the Filecoin ecosystem reached in 2023, and in the later portion, growth drivers to watch for 2024.
TL;DR
2023 Retrospective:
Storage: Active deals reached 1,800 PiB, and storage utilization grew to 20%
FVM: FVM launch in March 2023 enabled FIL Lending (Staking) which supplied 11% of total collateral locked by Storage Providers; TVL broke USD 200M
Retrievals: Retrievability of Filecoin data improved, alongside notable releases from Saturn (3,000+ nodes, sub 60ms TTFB) and Station
Compute, AI and DePIN networks: Synergistic growth of Filecoin together with physical resource & compute networks
Web2 Enterprise Data Storage: Led by strengthened offerings by teams such as Banyan, Seal Storage, and Steeldome
Continued DeFi growth: DEXes, Oracles, CDPs, spurred by service revenue coming on-chain
2023 Retrospective
To recap, Filecoin enables open services for data, built on top of IPFS. While Filecoin initially focused on storage, its vision includes the infrastructure to store, distribute, and transform data. The State & Direction of Filecoin, Summarized blog post shared an initial framework for Filecoin’s key components. This framework will serve as an anchor for discussing 2023’s traction.
1) Storage Markets: Active storage deals reached 1,800 PiB with storage utilization of 20%
In 2023, Filecoin’s stored data volume grew dramatically to 1,800 PiB, marking a 3.8x increase from the start of the year. Storage utilization grew to 20% from 3%. Currently, Filecoin represents 99% market share of total data stored across decentralized storage protocols (Filecoin, Storj, Sia, and Arweave).
Growth in Active Storage Deals was driven by two factors:
1) Storing data was easier in 2023. Continued development across on-ramps such as Singularity.Storage, NFT.Storage, and Web3.Storage increased Web3 adoption. Singularity alone onboarded 180 plus clients and 270 PiB of data. This growth was enabled by advances in its S3 compatibility, data preparation, and deal making.
2) Large dataset clients grew exponentially in 2023. Over 1,800 large dataset clients onboarded datasets by the end of 2023, from an initial base of 500 plus clients. 37% of these clients onboarded datasets exceeding 100 TiB in storage size.
2) Retrievals: Greater reliability for Filecoin Retrievals, alongside releases from Saturn & Station
Filecoin’s retrieval capabilities were bolstered by improvements both in its tooling and offerings. Several teams, such as Titan, Banyan, Saturn and Station, are laying the groundwork for new use cases to be anchored into the Filecoin economy, including decentralized CDNs and hot storage.
Saturn: A Decentralized CDN
Saturn is a decentralized CDN network built on Filecoin, that seeks to address the need for application-level retrievals. The Saturn network currently has over 3,000 nodes distributed across the globe, enabling low-latency content regardless of location.
Distribution of Nodes: 35% in North America, 34% in Europe, 24% in Asia, 7% RoW Source: Saturn Explorer as of January 08, 2024
Across 2023, Saturn reduced its effective Time-to-First-Byte (median TTFB) to 60 milliseconds. This makes Saturn the fastest dCDN for content-addressable data, with TTFB remaining consistent across all geographies. Saturn was also capable of supporting 400 million retrieval requests on its busiest day of the year.
At the end of 2023, Saturn launched a private beta for customers (clients include Solana-based NFT platform Metaplex).
Station: A Deploy Target for Protocols (Enabling Spark Retrieval Checks)
Station, a desktop app for Filecoin, was launched in July 2023. Station is a deployment target for other protocols allowing DePIN networks, DA layers, and others to run on a distributed network of providers.
Station’s first module, Spark, is a protocol for performing retrieval checks on Storage Providers (SPs). Spark helps establish a reputational base for SP data retrievability, and supports teams looking to provide a hot storage cache for Filecoin. Since launch in Nov 2023, Spark has grown to 21 million daily jobs on 10,000 active nodes as of January 2024.
3) Filecoin Virtual Machine: The FVM launch introduced a new class of use cases for the Filecoin Network. Early DeFi adoption broke $200 million in TVL.
The Filecoin Virtual Machine (FVM) launched in March 2023 with the EVM being the first supported VM deployed on top. FVM brought Ethereum-style smart contracts to Filecoin, broadening the slate of services anchoring into Filecoin’s block space. Two areas of early adoption have been in liquid staking services (led by GLIF and other DeFi protocols) and micropayments via the FVM.
Liquid Staking
One of the core economic loops in the Filecoin economy is the process of pledging, where SPs put up collateral to secure capacity and data on the network. Prior to the FVM, borrowed Filecoin collateral was sourced through managed offerings from operators like Darma Capital, Anchorage, and CoinList. Post-FVM, roughly a dozen staking protocols have launched to grow Filecoin’s on-chain capital markets.
In aggregate, FVM-based protocols supply almost 11% of the total locked collateral (approx. 19 million FIL) on the network, giving yield access to token holders, and increasing the access to capital for hundreds of Filecoin SPs. From Filecoin’s collateral markets alone, the ecosystem has broken past 200 million in TVL.
Payments
Adjacent to the core storage offering on Filecoin, new services are being built that anchor into Filecoin’s block space. As mentioned in the Retrieval Markets section, two notable services (Station and Saturn) have actually started leveraging FVM for payments in 2023.
To date, Station users have completed 161 million jobs with more than 400 addresses receiving FIL rewards. Saturn averaged over 2,000 earning nodes in 2023 with 448,905 FIL disbursed to date.
4) Compute: Traction for Decentralized Compute Networks
Filecoin’s design enables compute networks to run synergistically on Filecoin’s Storage Providers. Sharing hardware with compute networks is also valuable to the Filecoin network: (1) sharing allows Filecoin to offer the cheapest storage by running side-by-side with compute operations, and (2) it brings additional revenue streams into the Filecoin economy.
Two key developments made running compute jobs on Filecoin nodes:
Sealing-as-a-service: Sealing-as-a-service is the process by which Storage Providers (SPs) can outsource production of sealed sectors to third-party marketplaces. This gives SPs greater flexibility in operations and reduces costs of sector production. One marketplace, Web3mine, has thousands of machines participating in its protocol offering cost savings to SPs of up to 70%. On top of the cost savings, the infrastructure built may also eventually benefit SPs by allowing them to leverage their GPUs for synergistic workloads (e.g. compute jobs)
Reduced Onboarding Costs:Supranational shipped proof optimizations reducing sealing server cost by 90% and overall cost of storage by 40%
On top of these developments, 2023 saw emerging compute protocols building in the Filecoin ecosystem. Two notable examples:
Distributed compute platform Bacalhau demonstrated real-world utility among Web2 and DeSci clients. Most recently, the U.S. Navy chose Bacalhau to assist them in deploying AI capabilities in undersea operations. Bacalhau is a platform agnostic compute platform intended to run on Web3 and Web2 infrastructure alike. Launched in November 2022, Bacalhau’s public network surpassed 1.5 million jobs and in some cases slashed compute costs by up to 99%
Source: Bacalhau
Up-and-coming compute networks likeIo.net allow ML engineers to access a distributed network of GPUs at a fractional cost of individual cloud providers. Io.net recently incorporated 1,500 GPUs from Filecoin SPs — positioning Filecoin providers to offer their services to Io.net’s customer base. Io.net has over 7,400 users since its launch in November 2023, serving 15,000 hours of compute to users.
2024 will be a critical growth year for Filecoin as groundwork laid in 2023 comes to fruition. Native improvements to storage markets, greater speed of retrievals, new levels of customizability & scalability brought by FVM and Interplanetary Consensus (IPC), all expand the universe of use cases that Filecoin can address.
In a Web3 climate where there is substantial attention on DePIN (and the tying of real world services with Web3 capabilities) these changes will be critical building blocks for even better services. Here are three themes to look for in 2024:
1) Synergies with Compute, AI and other DePIN networks
In 2024, foundational improvements to the network will substantially improve Filecoin’s ability to compose with other ecosystems.
Fast finality allows better cross-network interactions with app chains in other ecosystems (e.g. Cosmos, Ethereum, Solana).
Customizable subnets allow for novel types of networks to form on top of Filecoin such as general purpose compute subnets (e.g. Fluence) and storage pools (e.g. Seal Storage).
Hot storage allows for broader use case support including serving data assets for physical resource networks (e.g. WeatherXM/Tableland), caching data for compute networks (e.g. Bacalhau), and more.
This is all scratching the surface. As the Web3 space and DePIN category grows, Filecoin is well positioned to support new communities that form given its 9 EiB of network capacity and flexibility. There exists a sizable opportunity within physical resource networks producing high amounts of data, such as Hivemapper (over 100M km mapped), and Helium (1 million hotspots globally). Compute networks are also a likely growth area, given the backdrop of a GPU shortage (particularly for AI purposes) in traditional cloud markets.
Source: Messari
2) Focused Growth in Web2 Enterprise Data Storage
Web2 enterprise storage is a unique challenge for decentralized networks – requirements from these customers are not easily supported by most networks. Typical requirements from enterprise clients can include end-to-end encryption, certification for data centers, fast retrievals, access controls, S3 compatibility, and data provenance/compliance. Crucially, these requirements tend to differ across segments and verticals, which means that a level of adaptability is required. Filecoin’s architecture enables it to layer on support for the features these customers need.
A few teams worth keeping an eye on:
Banyan: Banyan simplifies how enterprise clients integrate with decentralized infrastructure by bundling hot storage, end-to-end encryption, and access controls, on top of a pool of high-performing storage providers. With the Filecoin network, Banyan provides content-addressable storage, which it plans to complement with hot storage proofs by utilizing FVM. This implementation makes Banyan compatible not only for enterprise, but DePIN and compute networks.
Seal: Seal has established itself as one of the best storage onramps in the Filecoin ecosystem, and is responsible for onboarding several key clients onto the network, such as UC Berkeley, Starling Labs, the Atlas Experiment, and the Casper Network. The team has been one of the driving forces in enterprise adoption to date, and most recently has achieved SOC 2 Compliance. In 2024, they plan on launching a subnet to enable a market for enterprise deals. On the back of their enterprise deal flow, they are positioned to bring petabytes of data into the network over the coming year via their new market.
Steeldome: Steeldome offers enterprise clients seeking data archival, backup and recovery with an alternative that is cost-competitive, efficiently deployed and scalable. It does so by combining Filecoin in its stack with Web2 technologies, allowing a fuller feature set to complement Filecoin’s cost-effective and secure archival storage. The Steeldome team has succeeded in onboarding clients across insurance, manufacturing, and media. In 2024, they plan to continue that trajectory, while offering a managed service for Storage Providers.
3) Greater On-chain DeFi activity
There is likely to be continued activity in the on-chain economy with an increase in the number and type of DeFi protocols on Filecoin.
The first protocols will increase service revenues (from Storage, to Retrievals, and Compute) coming on-chain. As previously described, more services are coming online in the Filecoin network, and are utilizing FVM for payments (e.g. Saturn, Station).
Key releases in 2023, including SushiSwap going live in Nov 2023, and the UniSwap community’s approval of integrating on FVM will lead to more diverse DeFi services coming on-chain. This will include CDPs (Collateralized Debt Positions), and Price Oracles (e.g. Pyth), among others.
Final Thoughts
In 2024, the Filecoin network will experience greater adoption, particularly by Compute, AI and DePIN networks, as well as targeted enterprise verticals. This adoption brings on-chain service revenue and supports the growth of DeFi activity beyond collateral markets. Continued improvements on storage markets, retrievability driven by hot storage proofs and CDN networks, as well as releases by FVM and IPC will enable the teams building on Filecoin to drive this next stage of growth.
To stay updated on the latest in DePIN and the Filecoin ecosystem, follow the @Filecointldr handle.
This blogpost is co-authored by Savan Chokkalingam and Nathaniel Kok on behalf of FilecoinTLDR. Many thanks to HQ Han and Jonathan Victor for reviewing and providing valuable insights and to all the ecosystem partners and teams for their timely input.
Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.
Filecoin’s larger roadmap aims to turn cloud services into permissionless markets on which any provider can offer their services. The network started with Storage markets, with the Mainnet launch in October 2022. More recently, the Filecoin Virtual Machine (FVM) was introduced to bring smart contract functionality onto the network. This allows for user programmability around key services on the Filecoin network: which includes Large-scale Storage, and soon, Retrievals.
In this post, we dive into the Retrieval markets that Filecoin is developing and one of its lighthouse projects. We will cover the following topics:
Filecoin’s Retrieval Markets and the Retrieval Markets Working Group (RMWG)
Content Delivery Networks (CDNs) and the role of Project Saturn
Saturn’s approach to a decentralized CDN and its traction to date
What’s next for Saturn
Filecoin’s Retrieval Markets and the RMWG
As covered earlier in our previous post, Filecoin seeks to build open services for data, which consists of three main pillars (Storage, Retrieval, and Compute-over-Data). Storage has been a key emphasis for Filecoin from 2020 to 2022 — it has emerged as the largest decentralized storage network to date, with over 1,170 PiB of data stored, and 200,000+ users ranging from Opensea to the Internet Archive. The remaining pillars of Retrieval and Compute-over-Data have been in development since 2022, with working groups (open for any individual or entity to join) organized around building these markets. The working groups encourage modularity and often consist of different teams that tackle different pieces of the puzzle.
The Retrieval Market Working Group (RMWG) is centered around building a decentralized CDN (Content Delivery Network) for the Filecoin ecosystem. Over 15 teams (such as Magmo, Ken Labs, Protocol Labs, and more) are contributing to tackling technical challenges in the space, from enabling ultra-fast payments to data transfer protocol enhancements to crypto-economic models for data retrieval. The following are building blocks the RMWG has been organizing itself around since H1 2022 and is built off an envisioned retrieval flow that can interact freely with Filecoin’s storage markets.
Even in its R&D stage, projects in the RMWG are already serving 160 million daily retrieval requests and more than 2 PB of data per month. Collectively, these projects will seek to enable a decentralized CDN that can serve not just the web3 space, but the web2 market as well.
CDNs and the role of Project Saturn
Content delivery networks are a key part of the Internet’s infrastructure. Groups of servers work together to provide fast delivery of internet content, from static web pages to YouTube videos. Incumbent CDN providers include players such as Cloudflare, Akamai, and Fastfly. Businesses pay for these services instead of the end-user, which means that service consistency, coverage, and pricing are critical.
The CDN market today is highly centralized and dominated by a few big players. Only 7 CDN providers serve over 80% of market needs. This brings about sizable concentration risk in the event of network failures (e.g. CloudFlare outage in 2022), and higher latencies in regions far remote from the closest data centers (e.g. Africa).
A distributed model of smaller CDNs in various regions could effectively solve these problems, but economies of scale have prevented smaller, distributed CDNs from challenging incumbent providers (capital outlay could reach up to billions per year). Delivering web content better than incumbent providers will open up a significant commercial opportunity. The global CDN market size accounts for US 20Bn in 2022 and is expected to reach around 100Bn by 2032 (not counting new web3-based use cases such as NFTs). This is where Project Saturn comes in.
A web3 CDN can potentially overcome this challenge by allowing anyone to contribute resources for content retrieval (provided they fulfill minimum criteria) in a network. This shifts the burden from a single company to thousands (or more) of companies supporting the network, reducing barriers to entry. This is where Project Saturn comes in. Project Saturn is a decentralized CDN network built on Filecoin, that seeks to enable reliable, performant, and economic retrieval of content on the Internet. It is one of the key projects in the RMWG, with a public launch in November 2022. Saturn seeks to achieve the following:
Democratize the CDN market, by allowing anyone to serve as a Saturn node operator in return for crypto-incentives. Nodes can join in a permissionless manner, allowing for multiple companies or individuals to contribute towards a retrieval network (think franchising), which leads to a wider and more distributed footprint
Performant retrievals, with under 100ms TTFB, high network bandwidth, and low latency across all geographies, owing to a high density of nodes being distributed across each continent. While this does not exist today, it can potentially be achieved given a wider geographic distribution of nodes
No single point of failure unlike traditional CDN networks
Data integrity and authenticity by leveraging content-addressability. Project Saturn is the only decentralized CDN that is natively compatible with content-addressing
Saturn approach and traction to date (Aug 2023)
The data below is accurate as of August 2023, unless otherwise stated. Data for number of Active Nodes are accurate as of November 2023, owing to a upgrade in October 2023 that removed multi-noding behavior in the network, while keeping TTFB performance stable.
While Saturn’s ambition is to serve as a credible alternative to traditional CDN networks, its near-term goal is to effectively fulfill the billions of requests received each week for content-addressed data on Filecoin and IPFS. This is currently being fulfilled by IPFS Gateway, which serves as a key benchmark for Saturn as it improves its network capacity and performance.
Saturn’s approach involves four main network actors in enabling retrievals from Filecoin and IPFS:
Node operators offer their hardware and resources to the Saturn network by running Saturn nodes in different geo-locations around the world. They are rewarded based on how many bytes they serve to clients over each payment epoch. Saturn nodes join the network by registering with the Saturn Orchestrator. The network of Saturn L1s provides a huge geographically distributed cache of content-addressed data for Saturn clients
Saturn Orchestrator manages the membership of node operators in the Saturn Network and facilitates the payment process to these nodes. This is a key function in democratizing data retrievals while ensuring that qualified participants enter the market. Over time, the aim is for the orchestrator to run entirely on theFilecoin Virtual Machine (FVM)
Clients: Network users make requests for content from the Saturn network. The Client is the device used to make the request. Clients make HTTP requests to the Saturn network and get back CAR files, allowing clients to verify the file incrementally. When a Saturn L1 doesn’t have a file in its cache, it “cache-misses” to wherever the file is stored in either the IPFS Network or the Filecoin network, and returns it to the client
Customers use the Saturn Network as a CDN to accelerate their content to their users. Saturn customers can accelerate their content to a large number of Saturn nodes around the world to create a performant experience for end users
Significant developments have been made on Project Saturn to date. Following its public launch in November 2022, Saturn now has 80ms Time-to-first-byte (TTFB) at the 50th percentile, serves 30% of mirrored traffic from IPFS.io via the Bifrost Gateway, and has launched a verifiable node reward payout system on FVM.
It also has made significant headway in developing a network that is geographically diverse, capable of handling high-volume requests, and able to deliver content in a performant manner (low time-to-first byte). Since its public launch (just 8 months), Saturn has achieved:
Over 2,000 global points of presence (across 59 countries)
Capacity to serve 478 Million requests each day (in July 2023)
80 milliseconds time-to-first-byte (TTFB) for IPFS content
1) Over 2,200 retrieval providers worldwide (comparably distributed to traditional CDN providers)
Over 2,200 retrieval providers are currently on Saturn contributing to network bandwidth. This is a strong 11.8% MoM (month-on-month) growth, starting with only 662 nodes at the end of 2022. As a point of comparison, Filecoin’s storage markets grew by 21% MoM in its first 6 months, with approximately 3,500 storage providers on the network today (the largest in the web3 storage space)
This is comparably distributed to traditional CDN providers today. Akamai, the largest CDN globally, with a 35% market share, also has over 4,000 points of presence globally, while the next closest player (Alibaba) has only an estimated 2,800 points of presence (with the majority in China).
This speed of growth attests to the accessibility of serving as a retrieval provider on the Saturn network. It takes only 4 terabytes (TB) of storage and Saturn’s open-source software to run a Saturn CDN node (considerably less resource-intensive than being a storage provider on Filecoin). Saturn will allow more individuals to participate in Filecoin’s decentralized markets for data services.
Participation across geographies remains diverse: with the most nodes in Europe (800+), followed by North America (600+), and Asia (500+). Median TTFB remains consistently low across all continents, with Europe, Asia, Oceania, and South America experiencing sub-100 ms TTFB.
Distribution of these nodes is important, as it allows for the lowest possible distance between clients and nodes, translating to low latency for end-users (overcoming the speed of light problem experienced with traditional CDN providers). Saturn’s permissionless and crypto-incentivized attributes allow for a more ‘elastic’ supply to match with fast demand growth in developing regions like Asia and Africa, which are currently experiencing those latency issues today.
2) Saturn served an average of ~10.3 Billion requests monthly across 2023
Saturn has a network capacity of around 25+ terabits per second (approximately 10% of the network capacity of Cloudflare). An average of 10.3 Billion requests are served monthly across 2023, with 3.7 Million Gigabytes of monthly bandwidth served. Over 478 million daily requests were being handled as of the end of July 2023, which is close to 50% of IPFS Gateway’s daily requests in the same time frame. Despite its progress thus far, there remains room for stabilization in Saturn’s network capacity.
3) Time-to-first-Byte is already under 80 milliseconds
Speed is an area where Saturn has shown significant results; the median TTFB (time-to-first-byte) is already under 80 milliseconds. Typically, a good TTFB lies below 100 ms for static content, and 200–500 ms for dynamic content. Saturn today is already the fastest content-addressable CDN globally with 80 ms TTFB and has further headroom for improvement as the network continues to become denser. Aside from Saturn, there also exist parallel developments to drive improved retrieval performance in the Filecoin network. This includes projects such as Rhea, which is seeking to optimize IPFS Gateway performance.
What’s next for Saturn
Since its public launch, Saturn has achieved significant progress as an open-source, community-run CDN network. Moving forward, the team looks to continue pushing towards better TTFB speeds, while improving performance correctness and latency. Towards the end of 2023, Saturn looks to achieve further milestones. These include serving 100% of IPFS.io traffic, implementing metering and billing on the customer demand side, and launching a web app to enable customer self-onboarding who want to accelerate content with Saturn.
You can keep up to date with Project Saturn, and other projects within the RMWG here. Data in this post is accurate as of 31st August 2023 unless otherwise stated.
Many thanks to the amazing HQ Han, Jonathan Victor, Alexander Kintsler, and the Project Saturn team for their input in publishing this piece.
Disclaimer: This information is for informational purposes only and is not intended to constitute investment, financial, legal, or other advice. This information is not an endorsement, offer, or recommendation to use any particular service, product, or application.