Supported products as of now

  • Supports ingestion from RDBMS, NoSQL, and unstructured data sources
  • Provides scalable ETL pipelines for data lake and warehouse synchronization
  • Integrates with centralized data catalogs and access control systems
  • Compatible with multiple cloud providers and open table formats
Full support of reading various sources like structured, semi-structured, and unstructured data and ingestion into a centralized Data Lake in cloud storage
What is a Data Lake
Avocado Datalake High level architecture

How it works

The platform offers configurable pipelines that extract data from various sources, transform it as needed, and load it into centralized data lakes or warehouses. It includes support for open data formats and catalog interoperability, ensuring easy discovery and secure access control. The solution is designed for data engineers, data scientists, and BI teams seeking efficient data management and analytics infrastructure.
  • Pipelines infrastructure is built using Terraform and can be deployed on AWS, GCP, and Azure.
  • Pipeline ETL codebase is built using Apache Spark + Scala and fat Jar is created for each pipeline.
  • We do support PySpark + Python in Databricks + Snowflake as well.
  • Data catalog management using terraform and we do support Unity catalog in Databricks, AWS Glue Data Catalog, Azure Data Catalog, and Google Cloud Data Catalog.
  • Permission on the datacatalog is managed by terraform as a code.
Pricing is flexible, with options for free, freemium, or paid plans, depending on organizational needs. The platform is ideal for enterprises looking to accelerate their data lake and warehouse setup, improve data accessibility, and enable advanced analytics and machine learning applications.

Need data lake or data warehouse solution implementation or bootstrapping in your organization?

That App Show
Featured on findly.tools
Verified on Verified Tools
Data Lake ETL PaaS - Featured on Startup Fame
Data Lake ETL PaaS - Featured on Aura++