đź“– About

The project covers the code and artifacts to help process Gitcoin Grants data from the Allo Indexer Data API. Read more about the motivation and initial approach on my blog!

⚙️ How It Works

The portal is a customized instance of Datadex. With each push to main, a GitHub action will:

  1. Scraper and ingest data from multiple sources.
  2. Clean, transform, and join the tables to produce a set of final curated models.
  3. Export all the datasets as Parquet files and push them to IPFS.

📦 Key Features

  • Open.
    • Both code and data are fully open source.
    • Also relies on open standards/formats (Arrow ecosystem).
  • Permissionless.
    • Clone and edit things away! You’re not blocked by any API rate limits, or closed data like in Dune.
    • All other git features like branching, merging, pull requests, … are available because all the data is transformed declaratively as code.
  • Decentralized.
    • The project runs on a laptop, a server, a CI runner (that’s the way is working right now) or a even decentralized compute network like Bacalhau. Oh, it even works in GitHub Codespaces so you don’t even need to setup anything locally!
    • Data is stored in IPFS. You can run it locally, and it’ll generate the same IPFS files if nothing has changed. The more people runst it, the more distributed the IPFS files will be!
    • Data comes from multiple sources and can be exposed in multiple ways.
  • Data as Code.
    • Every commit generates all the table files and pushes them to IPFS. This means that we can always go back in time and see the data as it was at that point in time. For every commit, we’ll have the data as it was at that point in time.
  • Modular.
    • Each component can be replaced, extended, or removed. Works well in many environments (your laptop, in a cluster, or from the browser), and with multiple tools (tables are files at the end of the day).
  • Low Friction.
    • Data (raw and processed) is already there! No need to write your own scripts. You can always reproduce it but getting started is as easy as pasting a SQL query in your browser or doing pd.read_parquet(url) in a Notebook.
    • Every commit will also publish a set of Quarto Notebooks with the data. Could be used to generate reports/dahsboards, or as documentation.
  • Modern
    • It supports all the cool things data engineers want; typing, tests, materialized views, dev branches, …
    • Uses best practices (declarative transformations) and state of the art tooling (DuckDB).

đź“Š Quick Data Stats

ggdp = DuckDBClient.of()
url = "https://grantsdataportal.xyz/data/"

number_of_rounds = ggdp.query(`
    select
        count(id) as c
    from parquet_scan('${url}/allo_rounds.parquet')
`)

round_votes = ggdp.query(`
    select
        count(id) as c
    from parquet_scan('${url}/allo_donations.parquet')
`)

projects = ggdp.query(`
    select
        count(1) as c
    from parquet_scan('${url}/allo_projects.parquet')
`)

round_votes_scalar = round_votes[0]["c"]
number_of_rounds_scalar = number_of_rounds[0]["c"]
projects_scalar = projects[0]["c"]
  • Rounds of Gitcoin Grants:
  • Round votes:
  • Projects:

This data is queried live from your browser using DuckDB WASM.