Real-Time Analytics with Apache Spark

SKU:9788169646109

$44.95 USD

Sale Sold out

Taxes included. Shipping calculated at checkout.

Quantity

Type Paperback Kindle

Free Book Preview

ISBN: 9788169646109
eISBN: 9788169646116
Rights: Worldwide
Author Name: Subhadip Chanda, Harsha Pasala
Publishing Date: 15-June-2026
Dimension: 7.5*9.25 Inches
Binding: Paperback
Page Count: 365

Download code from GitHub

View full details

Description

Turn Data in Motion into Decisions in Real Time.

Key Features
● Get a free one-month digital subscription to www.avaskillshelf.com.
● Master Spark Structured Streaming from windowed aggregations and stateful processing to sub-second latency.
● Build production ingestion pipelines using Kafka, Kinesis, Event Hubs, and Auto Loader at scale.
● Deploy, monitor, and integrate ML inference into streaming workflows using CI/CD and Declarative Automation Bundles.

Book Description
The Next Generation of Data Platforms Will Be Real-Time, Intelligent, and Always On

Real-time Analytics with Apache Spark is your complete, comprehensive guide to building production-grade streaming systems using Apache Spark Structured Streaming on the Databricks platform, from first principles to enterprise-scale deployment.

You begin with Spark fundamentals and streaming concepts, then progressively advance through windowed aggregations, stateful processing with transformWithState, stream-stream joins, and the new Real-time Mode for sub-second latency. Every chapter combines clear explanations with production-ready code, preparing you to handle real-world challenges including late data, state management, and performance tuning across Kafka, Kinesis, Event Hubs, and Auto Loader.

The final section teaches you to think like a production engineer by packaging pipelines with Declarative Automation Bundles, automating deployments with CI/CD, integrating ML inference into streaming workflows, and building monitoring dashboards with custom alerts. By the end of the book, you will have a proven blueprint for delivering scalable, fault-tolerant streaming solutions on Apache Spark and Databricks.

What you will learn
● Build fault-tolerant streaming pipelines with exactly-once guarantees on Apache Spark.
● Apply windowed aggregations, watermarks, and stateful processing for real-time data workflows.
● Ingest streaming data from Kafka, Kinesis, Event Hubs, and Auto Loader at scale.
● Deploy streaming pipelines using Declarative Automation Bundles and CI/CD on Databricks.
● Integrate real-time ML inference into production streaming data workflows with confidence.
● Monitor, debug, and tune streaming jobs for production performance and operational reliability.

1. Real-Time Analytics Landscape and Use Cases
2. Apache Spark Fundamentals (with a Streaming Mindset)
3. Structured Streaming
4. Deep Dive into Sources and Sinks
5. Windowed and Stateful Operations
6. Writing Streaming Queries with Spark SQL
7. Low-Latency Streaming with Spark Real-Time Mode
8. Machine Learning for Streaming Applications
9. Monitoring, Debugging, and Performance Tuning
10. Packaging, Orchestration, and CI/CD Using Declarative Automation Bundles.
11. End-to-End Real-Time Analytics Project
Index

About Author & Technical Reviewer

Subhadip Chanda and Harsha Pasala are experts in real-time data engineering, specializing in scalable Spark and Databricks streaming architectures. Combining deep production experience with practical design insight, they guide readers beyond prototypes to build resilient, low-latency, and future-ready analytics pipelines that operate reliably at enterprise scale.

About the Technical Reviewer

Subhadip Chanda is a Solutions Architect at Databricks, based in Canada. Over the past several years, he has worked across data platform engineering, real-time analytics, and data governance, helping organizations design and operationalize systems built on Spark, Delta Lake, and Unity Catalog. Before Databricks, Subhadip spent time in solution architecture roles that gave him a practitioner's perspective on what works in production, and what only works in slide decks.

The book started with a gap Subhadip kept running into; despite Apache Spark being the dominant engine for large-scale data processing, practitioners still lacked a comprehensive guide to building production-grade streaming systems, end to end. Each chapter in this book reflects problems he had encountered, debugged, and solved in real environments. Thus, Subhadip wrote this book so that the next engineer would not have to piece the answers together from scattered documentation and Stack Overflow threads.

Harsha Pasala is a Specialist Solutions Architect at Databricks with over a decade of experience in data. He works with some of Canada’s largest organizations, including major banks, healthcare providers, and national railways, to solve the high-stakes challenges of moving data at scale.

His work focuses on the practical side of data engineering: fixing underperforming streaming pipelines, optimizing data layouts to reduce cloud costs, and ensuring that low-latency requirements hold up under pressure. This book reflects the years spent by Harsha in design reviews and technical deep dives alongside his colleagues at Databricks. It is exclusively designed to be a pragmatic guide for engineers who need Spark Streaming to work reliably in the real world.

Real-Time Analytics with Apache Spark