Structured Data Sources (RDBMS databases) to Data Lake
List of RDBMS databases we support for ingestion into your data lake:
- MySQL
- Amazon Aurora
- Amazon RDS
- Google Cloud SQL
- PostgreSQL
- Oracle DB
- MariaDB
- MS SQL Server
- Any other RDBMS
High level design flow is shown below:

Our RDBMS data lake solutions enable you to seamlessly ingest data from relational databases table (like MySQL, AWS Aurora DB/RDS, Cloud SQL, Oracle database and PostgreSQL) into your data lake in open table format like Delta lake/Apache Iceberg/Apache Hudi with cloud storage such as AWS S3/GCP GCS/Azue Blob storage. This enables you to leverage your existing structured data readily for advanced analytics and machine learning initiatives with heavy workload without connecting to production RDBMS databases.
List of implemented ETL/ELT pipelines
- GCP Cloud SQL databases tables to data lake with Apache Iceberg/Hudi/Delta Lake in GCS
- We provide three types of ingestion:
- Full read and overwrite
- Incremental read and upsert
- Reading binlog using debezium and upsert
- Orchestration using cloud composers
- Job will run on dataproc/dataflow
- Setup of the entire workflow using IaaS Terraform
- All the ingested tables available in the GCP data catalog for end user's access
- Access to data lake tables will be designed based on IAM roles
- We provide three types of ingestion: