Job Summary:
We are seeking a highly experienced and visionary Databricks Data Architect with over 14 years in data engineering and architecture, including deep hands-on experience in designing and scaling Lakehouse architectures using Databricks. The ideal candidate will possess deep expertise across data modeling, data governance, real-time and batch processing, and cloud-native analytics using the Databricks platform. You will lead the strategy, design, and implementation of modern data architecture to drive enterprise-wide data initiatives and maximize the value from the Databricks platform.
Key Responsibilities:
- Lead the architecture, design, and implementation of scalable and secure Lakehouse solutions using Databricks and Delta Lake.
- Define and implement data modeling best practices, including medallion architecture (bronze/silver/gold layers).
- Champion data quality and governance frameworks leveraging Databricks Unity Catalog for metadata, lineage, access control, and auditing.
- Architect real-time and batch data ingestion pipelines using Apache Spark Structured Streaming, Auto Loader, and Delta Live Tables (DLT).
- Develop reusable templates, workflows, and libraries for data ingestion, transformation, and consumption across various domains.
- Collaborate with enterprise data governance and security teams to ensure compliance with regulatory and organizational data standards.
- Promote self-service analytics and data democratization by enabling business users through Databricks SQL and Power BI/Tableau integrations.
- Partner with Data Scientists and ML Engineers to enable ML workflows using MLflow, Feature Store, and Databricks Model Serving.
- Provide architectural leadership for enterprise data platforms, including performance optimization, cost governance, and CI/CD automation in Databricks.
- Define and drive the adoption of DevOps/MLOps best practices on Databricks using Databricks Repos, Git, Jobs, and Terraform.
- Mentor and lead engineering teams on modern data platform practices, Spark performance tuning, and efficient Delta Lake optimizations (Z-ordering, OPTIMIZE, VACUUM, etc.).
Technical Skills:
- 10+ years in Data Warehousing, Data Architecture, and Enterprise ETL design.
- 5+ years hands-on experience with Databricks on Azure/AWS/GCP, including advanced Apache Spark and Delta Lake.
- Strong command of SQL, PySpark, and Spark SQL for large-scale data transformation.
- Proficiency with Databricks Unity Catalog, Delta Live Tables, Autoloader, DBFS, Jobs, and Workflows.
- Hands-on experience with Databricks SQL and integration with BI tools (Power BI, Tableau, etc.).
- Experience implementing CI/CD on Databricks, using tools like Git, Azure DevOps, Terraform, and Databricks Repos.
- Proficient with streaming architecture using Spark Structured Streaming, Kafka, or Event Hubs/Kinesis.
- Understanding of ML lifecycle management with MLflow, and experience in deploying MLOps solutions on Databricks.
- Familiarity with cloud object stores (e.g., AWS S3, Azure Data Lake Gen2) and data lake architectures.
- Exposure to data cataloging and metadata management using Unity Catalog or third-party tools.
- Knowledge of orchestration tools like Airflow, Databricks Workflows, or Azure Data Factory.
- Experience with Docker/Kubernetes for containerization (optional, for cross-platform knowledge).
Preferred Certifications (a plus):
- Databricks Certified Data Engineer Associate/Professional
- Databricks Certified Lakehouse Architect
- Microsoft Certified: Azure Data Engineer / Azure Solutions Architect
- AWS Certified Data Analytics – Specialty
- Google Professional Data Engineer