
DataDecode
Your weekly source for insights in data engineering, AI engineering, and machine learning.
Latest Articles
Published on Medium
View All
The Problem: S3 Listing Is Costing You More Than You Think
If you’re running AWS Glue jobs at scale, there’s a subtle but significant cost driver that often flies under the radar: S3 LIST operations.Every time a Glue job starts up and scans a source path, it has to list all the files under that path before it can do anything useful....
Read on Medium
Athena + Glue + Hudi: A Surprisingly Powerful Big Data ETL Combo
A Smarter Way to Split the WorkAt first glance, using both Amazon Athena and AWS Glue in the same ETL pipeline might feel unnecessary. After all, both tools can process data sitting in S3. However, this architecture becomes powerful when each service is used for what it does...
Read on Medium
Building an AI Agent That Actually Remembers: The LangGraph Sentinel Agent Story
How we built a production-ready AI agent with long-term memory, intelligent planning, and real-time thinking — and why it matters.The Problem: AI Agents That Forget EverythingYou’ve probably had this frustrating experience: you’re chatting with an AI assistant, and after a few...
Read on Medium
How the Agentic Age Is Transforming Data Engineering and Analytics
The data world is going through one of the biggest shifts since the rise of cloud platforms and this time, the catalyst is the agentic age powered by large language models (LLMs). Before we look at where we’re heading, let’s quickly revisit what data engineering and data...
Read on Medium
The Hidden Cost of JSON: Why Your LLM Tokens Are Overpriced
Why Data Is Too Expensive for LLMsIf you work with large language models (LLMs) like GPT or Claude, you know the two biggest limits: the context window (how much data the model can see) and the cost (how much you pay per token). Every single character you send to the AI counts...
Read on MediumBuilding a Self-Alerting Data Quality & Monitoring Pipeline on AWS
Data engineers know the pain you’ve just finished an elegant ETL job, everything’s humming along, and then bam! your downstream dashboard shows garbage values.Somewhere, somehow, that CSV from marketing decided to drop a column, or the vendor API sent a null date for every...
Read on MediumTop Repositories
Essential open-source tools for data and ML engineering
tensorflow/tensorflow
An end-to-end open source machine learning platform for research and production. TensorFlow provides tools and libraries for building and deploying ML-powered applications.
huggingface/transformers
State-of-the-art machine learning for JAX, PyTorch, and TensorFlow. Provides thousands of pretrained models for NLP, vision, and audio tasks.
kubernetes/kubernetes
Production-grade container orchestration system for automating deployment, scaling, and management of containerized applications.
pytorch/pytorch
Tensors and dynamic neural networks in Python with strong GPU acceleration. PyTorch provides a flexible deep learning framework for research and production.
langchain-ai/langchain
Framework for developing applications powered by language models. Simplifies building LLM applications with chains, agents, and retrieval systems.
scikit-learn/scikit-learn
Machine learning library for Python built on NumPy, SciPy, and matplotlib. Provides simple and efficient tools for predictive data analysis.
Courses of the Month
Hand-picked video courses to level up your skills this month

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training
Comprehensive introduction to Apache Spark covering RDDs, DataFrames, and Spark SQL. Learn how to process large-scale data with hands-on examples.

Apache Airflow Tutorial for Beginners | What is Airflow? | Airflow Tutorial
Learn how to build, schedule, and monitor data pipelines using Apache Airflow. Covers DAGs, operators, and workflow orchestration.

Machine Learning Full Course - Learn Machine Learning 10 Hours | ML Tutorial
Comprehensive machine learning course covering supervised and unsupervised learning, algorithms, and real-world applications.

Neural Networks and Deep Learning - Full Course
Deep dive into neural networks and deep learning fundamentals. Covers backpropagation, CNNs, RNNs, and practical implementations.
Never miss an update
Weekly insights on data engineering, AI, and machine learning — delivered straight to your LinkedIn feed.
Subscribe to Newsletter