Streaming is a product, not a feature
Contracts, retries, and observability require the same discipline as application services.
CASE STUDYDATA PLATFORM · ANALYTICS
We built a streaming data platform on Google Cloud that unifies telemetry, client, and partner events through Dataflow, BigQuery, Looker, and Vertex AI. The result: a single source of truth so the numbers match production and forecasting runs without rebuilding the pipeline.

Role
Scope
Scale
Services
Tags
Multiple services produced events through event streams (e.g. Kafka); partners sent their own payloads; teams relied on slow, brittle reporting that never matched production.
We had to support streaming (stay fresh) and batch (stay correct). Late events, schema drift, and inconsistent contracts were constant. Compliance required separation of concerns and auditable handling of sensitive fields.
The platform had to be the single source of truth that analytics, operations, and ML teams could trust.
Ingestion path from event producers into Cloud Pub/Sub with standardized routing and partitioning.
Operational stores where domain truth is preserved and reconciled when durable state is required.
Streaming normalization, deduplication, enrichment, retries, and backfill with predictable operations.
Partitioned datasets with merge-on-key handling for late data, plus governance-friendly conventions.
Batch steps, dependency chains, and rebuild paths so corrections propagate without firefighting.
Curated datasets and conventions enabling repeatable dashboards and shared definitions.
Feature tables and training-ready extracts for forecasting without duplicating ingestion or modeling.
Metrics, alerting, and runbooks so failures are visible, diagnosable, and recoverable.
We treated freshness and correctness as first-class requirements. We considered keeping event streams on Kafka only and querying from there; we introduced a Pub/Sub bridge and BigQuery so all consumers share one model and SLAs.
We evaluated batch-only ETL; we chose streaming plus batch so operational dashboards stay near real-time while historical backfills remain correct.
Producers publish through event streams. In GCP, data is routed through Pub/Sub and anchored into domain repositories or operational stores where reconciliation and durable state are required. Dataflow validates and normalizes events and handles late-arriving data. Curated datasets land in BigQuery (partitioned by event date). Composer orchestrates rebuilds and dependencies. Looker and Vertex AI consume stable models and ML-ready datasets.
Fresh and correct — not one or the other.
Contracts, retries, and observability require the same discipline as application services.
Standardized models and naming prevent every consumer from reinventing definitions.
Composer owns correctness, rebuild paths, and dependency clarity.
Out-of-order events and backfills are normal; we engineered for them instead of patching later.
Slow reporting was replaced with a platform that stays fresh, correct, and operable. Dashboard latency went from hours to minutes, forecasting runs reused the same governed models, and observability reduced firefighting.
Without contracts, retries, and observability, streaming becomes chaos. Build it like a production service.
Stable BigQuery definitions reduce debates and accelerate dashboards and ML downstream.
Engineer for out-of-order events and backfills because real systems never behave perfectly.
Composer organizes rebuilds and dependencies so corrections propagate without firefighting.
We support the next phase: tightening event contracts, expanding domain models, automating data quality remediation, and scaling forecasting — while keeping governance and operational clarity as producers and consumers grow.