Structured Data Sources (RDBMS databases) to Data Lake
Our high level ingestion design of RDMBS data sources to data lake:
Full support of reading RDBMS (through JDBC connection or parsing binlog) and ingestion into centralized Data Lake with cloud storage such as // AWS S3, GCP GCS, and Azure Blob Storage.
Semi-Structured Data Sources (NoSQL databases) to Data Lake
Our Semi Structured Data Sources extraction classes are designed with the concept of Factory Design Pattern to extract sources based on the configuration parameters, source data and return it as a data frame, and then pass to transformation layers or Load layers where it will be written to centralized data lake based on the configuration of table format and your choicen cloud.
Readily available codebase for reading data from NoSQL database and writing to centralized datalake.
Unstructured Data Sources (CSV, JSON, Text documents) or Audio/Video to Data Lake
Leverage unstructured data into your centralized data lake for advanced analytics and machine learning using our highly configurable ETL pipeline.
Configurable or parttern based reading of unstructured files and writing into centralized datalake with required cleaning and transformation.
Data Lake/Other sources to Data Warehouse (AWS Redshift, GCP BigQuery, Snowflake)
Data Lake/Other sources to Data Warehouse (AWS Redshift, GCP BigQuery, Snowflake)
A highly configurable and scalable ETL pipelines for syncing data into enterprise Data Warehouse using Avocado Datalake pipelines.

For the each offering, visit the respective offering page and checkout the implementation details and the timeline needed to be implemented in your organization as per your chosen cloud provider.

About our ingestion solution into Data Lake

Once the data available in the data lake with open table format as below:

Apache Iceberg
Apache Hudi

Delta Lake
Apache XTable

In addition to this, we also provide interoperability using the Apache Xtable.

Table format and storage to centralized Data Lake on cloud — Storage and open table format supported by Avocado Datalake engineer's

We attach DataLake with centralized data catalog for discovery and utilization using the below tools as per your organization needs.

AWS Glue Data Catalog
GCP Data Catalog

Apache Atlas
Databricks Unity Catalog

After the data is available through any of the above centralized catalog, we will help you build the access controls using the IAM or Lakeformation in the AWS Glue data catalog. Access controls can be provisioned through our specialized Terraform module, as per your need to provide access to various groups in your organization, or you can configure the access yourself. Examples of intended access control users are:

AI/ML Engineers
LLM Engineers
Data Engineers

Data Scientists
Data Analysts
Business Intelligence

Our High level architecture design of the above centralized data lake for utilization by the end users from various tools is as follows:

Data driven aproach through Avocado data lake support and overall archictures — Data driven aproach through Avocado data lake engineer's support and overall archictures of Data Lake

For more information on Avocado Datalake codebase or if you want to get bootrapping in your organization or full access and support for any of the above proposed soltutions then email us at

We will shortly contact you and setup a call a free half an hour call.