DataDecode Logo

DataDecode

Your weekly source for insights in data engineering, AI engineering, and machine learning.

Data EngineeringAI EngineeringMachine LearningModern Analytics

Latest Articles

Published on Medium

View All
The Problem: S3 Listing Is Costing You More Than You Think
Data EngineeringFeb 19, 2026

The Problem: S3 Listing Is Costing You More Than You Think

If you’re running AWS Glue jobs at scale, there’s a subtle but significant cost driver that often flies under the radar: S3 LIST operations.Every time a Glue job starts up and scans a source path, it has to list all the files under that path before it can do anything useful....

Read on Medium
Athena + Glue + Hudi: A Surprisingly Powerful Big Data ETL Combo
Data EngineeringJan 8, 2026

Athena + Glue + Hudi: A Surprisingly Powerful Big Data ETL Combo

A Smarter Way to Split the WorkAt first glance, using both Amazon Athena and AWS Glue in the same ETL pipeline might feel unnecessary. After all, both tools can process data sitting in S3. However, this architecture becomes powerful when each service is used for what it does...

Read on Medium
Building an AI Agent That Actually Remembers: The LangGraph Sentinel Agent Story
Machine LearningDec 13, 2025

Building an AI Agent That Actually Remembers: The LangGraph Sentinel Agent Story

How we built a production-ready AI agent with long-term memory, intelligent planning, and real-time thinking — and why it matters.The Problem: AI Agents That Forget EverythingYou’ve probably had this frustrating experience: you’re chatting with an AI assistant, and after a few...

Read on Medium
How the Agentic Age Is Transforming Data Engineering and Analytics
Data EngineeringDec 5, 2025

How the Agentic Age Is Transforming Data Engineering and Analytics

The data world is going through one of the biggest shifts since the rise of cloud platforms and this time, the catalyst is the agentic age powered by large language models (LLMs). Before we look at where we’re heading, let’s quickly revisit what data engineering and data...

Read on Medium
The Hidden Cost of JSON: Why Your LLM Tokens Are Overpriced
AI EngineeringNov 23, 2025

The Hidden Cost of JSON: Why Your LLM Tokens Are Overpriced

Why Data Is Too Expensive for LLMsIf you work with large language models (LLMs) like GPT or Claude, you know the two biggest limits: the context window (how much data the model can see) and the cost (how much you pay per token). Every single character you send to the AI counts...

Read on Medium
Building a Self-Alerting Data Quality & Monitoring Pipeline on AWS
Data EngineeringOct 24, 2025

Building a Self-Alerting Data Quality & Monitoring Pipeline on AWS

Data engineers know the pain you’ve just finished an elegant ETL job, everything’s humming along, and then bam! your downstream dashboard shows garbage values.Somewhere, somehow, that CSV from marketing decided to drop a column, or the vendor API sent a null date for every...

Read on Medium

Never miss an update

Weekly insights on data engineering, AI, and machine learning — delivered straight to your LinkedIn feed.

Subscribe to Newsletter