# Soumil Shah > AWS Data Engineer, Lakehouse Architect & AI Expert based in New York City. Lead Software Developer with 6+ years of experience building production data lakehouses, AI/ML pipelines, and real-time streaming platforms on AWS. ## About Soumil Shah is a Lead Software Developer and AWS Data Engineer specializing in Data Lakehouse architecture using Apache Hudi, Apache Iceberg, and Delta Lake. He designs and builds scalable data platforms on AWS with expertise in real-time streaming, ETL orchestration, and AI-augmented data workflows. He is a recognized content creator with 46K+ YouTube subscribers and 600+ published technical articles. ## Core Pages - [Home](https://soumilshah.com/): Portfolio homepage with professional summary, stats, and featured content - [Skills & Tech Stack](https://soumilshah.com/skills): Comprehensive technical skills including AWS, Lakehouse, AI/ML, Spark, and data engineering tools - [Technical Articles](https://soumilshah.com/articles): 600+ curated articles on data engineering, lakehouse architecture, Apache Hudi, Iceberg, Spark, and AI - [Certificates & Awards](https://soumilshah.com/certificates): 42+ professional certifications from Anthropic, AWS, Udemy, LinkedIn Learning, and the Zeta Global Builder Award - [Recommendations](https://soumilshah.com/recommendations): Professional endorsements and testimonials from colleagues and industry peers ## Technical Expertise - [Data Lakehouse Architecture]: Apache Hudi, Apache Iceberg, Delta Lake, open table formats, medallion architecture (bronze/silver/gold), ACID transactions, time travel, partition evolution - [AWS Data Engineering]: AWS Glue, Amazon S3, Amazon Athena, Amazon EMR, AWS Lambda, AWS Step Functions, Amazon Kinesis, Amazon DynamoDB, Amazon Bedrock - [Real-Time Streaming]: Apache Kafka, Apache Flink, Spark Streaming, CDC (Change Data Capture), exactly-once semantics - [AI & Generative AI]: Amazon Bedrock, Claude AI, LLM pipelines, agentic AI architectures, AI-augmented data platforms - [Data Pipeline & Orchestration]: Apache Spark, PySpark, Apache Airflow, ETL/ELT, data quality, data governance, data lineage - [Infrastructure]: Docker, Terraform, event-driven architecture, microservices, distributed systems ## External Profiles - [YouTube](https://www.youtube.com/channel/UC_eOodxvwS_H7x2uLQa-svw): 46K+ subscribers — tutorials on Apache Hudi, Iceberg, Spark, AWS, and AI - [LinkedIn](https://www.linkedin.com/in/shah-soumil/): 600+ articles on data engineering and lakehouse architecture - [GitHub](https://github.com/soumilshah1995): Open-source projects and code samples - [Medium](https://medium.com/@shahsoumil519): Technical blog posts on data engineering and AI - [Blog](https://soumilshah1995.blogspot.com): Long-form technical writing ## Contact - Website: https://soumilshah.com - LinkedIn: https://www.linkedin.com/in/shah-soumil/ - Email: shahsoumil519@gmail.com