List of RDBMS databases we support for ingestion into your data lake:

MySQL
Amazon Aurora
Amazon RDS
Google Cloud SQL
PostgreSQL

Oracle DB
MariaDB
MS SQL Server
Any other RDBMS

High level design flow is shown below:

RDBMS to centralized Data Lake Architecture — Full support of reading RDBMS (through JDBC connection or parsing binlog) and ingestion into centralized Data Lake with cloud storage such as AWS S3, GCP GCS, and Azure Blob Storage.

Our RDBMS data lake solutions enable you to seamlessly ingest data from relational databases table (like MySQL, AWS Aurora DB/RDS, Cloud SQL, Oracle database and PostgreSQL) into your data lake in open table format like Delta lake/Apache Iceberg/Apache Hudi with cloud storage such as AWS S3/GCP GCS/Azue Blob storage. This enables you to leverage your existing structured data readily for advanced analytics and machine learning initiatives with heavy workload without connecting to production RDBMS databases.

List of implemented ETL/ELT pipelines

GCP Cloud SQL databases tables to data lake with Apache Iceberg/Hudi/Delta Lake in GCS
1. We provide three types of ingestion:
  - Full read and overwrite
  - Incremental read and upsert
  - Reading binlog using debezium and upsert
2. Orchestration using cloud composers
3. Job will run on dataproc/dataflow
4. Setup of the entire workflow using IaaS Terraform
5. All the ingested tables available in the GCP data catalog for end user's access
6. Access to data lake tables will be designed based on IAM roles

List of the products we offer as an ingestion service or ETL pipelines