Skip to product information
1 of 2

Building Data Pipelines Using Apache Beam

Building Data Pipelines Using Apache Beam

SKU:9789349887879

Regular price $44.95 USD
Regular price Sale price $44.95 USD
Sale Sold out
Taxes included. Shipping calculated at checkout.
Type

Free Book Preview

ISBN: 9789349887879
eISBN: 9789349887527
Rights: Worldwide
Author Name: Nuzhi Meyen
Publishing Date: 08-Apr-2026
Dimension: 7.5*9.25 Inches
Binding: Paperback
Page Count: 349

Download code from GitHub

View full details

Collapsible content

Description

Build Data Pipelines that Survive Scale, Failure, and Change

Key Features
● Get a free one-month digital subscription to www.avaskillshelf.com
● Design unified batch and streaming pipelines using Apache Beam’s single programming model
● Build portable pipelines that run seamlessly across Dataflow, Flink, and Spark
● Achieve production readiness with proven strategies for scaling, tuning, monitoring, and reliability

Book Description
Building Data Pipelines Using Apache Beam provides a practical, production-focused guide to using Beam’s unified programming model to write processing logic once, and run it across multiple runners, without rewriting core code.

The book begins with the fundamentals of distributed data processing and Beam’s core abstractions—PCollections, transforms, and pipeline design. You will then progress into stateful and stateless processing, event-time semantics, windows, triggers, watermarks, state, and timers—building the mental models required to reason about correctness at scale. From there, the book moves into advanced transformations, coders, and optimization techniques to help you improve performance, control costs, and ensure reliability.

In the later chapters, you will learn how to deploy pipelines across runners such as Dataflow, Flink, and Spark, monitor and debug production workloads, and apply the best practices drawn from real-world case studies. Thus, by the end of the book, you will be able to design, deploy, and operate robust, portable, production-grade data pipelines with confidence.

What you will learn
● Design scalable batch and streaming pipelines with Apache Beam
● Implement event-time processing using windows, triggers, watermarks, state, and timers
● Build portable pipelines that execute consistently across multiple runners
● Apply advanced transformations and coders for efficient data processing
● Optimize pipelines for performance, latency, fault tolerance, and cost efficiency
● Deploy, monitor, debug, and operate production-grade data pipelines

Who is This Book For?
This book is tailored for Data Engineers, Senior Data Engineers, Analytics Engineers, Data Architects, and Platform Engineers who design, build, or operate batch and streaming data systems. Readers should be comfortable with Python or Java, SQL, and basic distributed system concepts such as parallelism, fault tolerance, event-time processing, and cloud-based data platforms.

Table of Contents

Table of Contents
1. Introduction to Apache Beam and Data Processing
2. Stateful and Stateless Processing with Apache Beam
3. Handling Event Time, Windows, and Triggers
4. Building Pipelines with Apache Beam
5. Transformations and Coders in Apache Beam
6. Advanced Pipeline Optimization Techniques
7. Deploying Apache Beam Pipelines on Different Runners
8. Monitoring, Debugging, and Tuning Apache Beam Pipelines
9. Case Studies: Apache Beam in the Real World
Index

About Author & Technical Reviewer

About the Author
Nuzhi Meyen
is a fintech entrepreneur, data scientist, and AI practitioner, Co-Founder and CEO of Helios P2P. He builds production-grade AI, analytics, and blockchain systems for lending and credit risk. With advanced degrees and strong community contributions, he bridges theory and practice to deliver scalable, real-world financial technology solutions.

About the Technical Reviewer
Akshay Krishna
operates at the intersection of data-driven decision systems, AI-enabled capabilities, and business strategy, focusing on translating complex operational problems into measurable commercial outcomes. His work centers on building analytical, predictive, and optimization frameworks that directly influence conversion, customer experience, and long-term value creation in digital platforms.

With deep expertise in supply chain strategy and optimization, Akshay has led initiatives across fulfillment optimization, transportation planning, and service-level design, applying machine learning, simulation, and experimentation to balance speed, cost, and reliability. In recent roles, he has been closely involved in designing and operationalizing Estimated Delivery Date (EDD) prediction systems, embedding reliability, delay risk, and customer trust directly into product and fulfillment decision flows.

Across analytics, applied AI, and product strategy roles, Akshay consistently focuses on ensuring that advanced models and AI systems are economically grounded—so that recommendations, promises, and operational actions are aligned with real business outcomes rather than siloed metrics.