codefunded logo iconcodefunded logo
What we doFundsetWorkTeamContact
Talk to us
codefunded logo iconcodefunded logo

CONTACT

+48 514 731 513
+48 578 626 161
contact@codefunded.com
  • LinkedIn
  • GitHub

MENU

  • What we do
  • Fundset
  • Work
  • Team
  • Contact
  • Expertise
  • Work with us

SERVICES

  • Build
  • Scale
  • Advise
  • Fractional CTO
© 2026 · codefunded services sp. z o.o.—Privacy policy—
← back to Work

CASE STUDY·DATA PLATFORM · ANALYTICS

Streaming data platform on Google Cloud

We built a streaming data platform on Google Cloud that unifies telemetry, client, and partner events through Dataflow, BigQuery, Looker, and Vertex AI. The result: a single source of truth so the numbers match production and forecasting runs without rebuilding the pipeline.

Role

End-to-end data platform delivery partner

Scope

Streaming ingestion · Dataflow processing · BigQuery modeling · Composer orchestration · Looker outputs · Vertex AI datasets

Scale

Hours → minutes dashboard latency · Late data + schema drift handled · Operable pipelines · Governance-ready models

Services

Data platform · GCP · Analytics · ML forecasting

Tags

  • ETL
  • Data Migration
  • AI
  • Software Development

The challenge

Multiple services produced events through event streams (e.g. Kafka); partners sent their own payloads; teams relied on slow, brittle reporting that never matched production.

We had to support streaming (stay fresh) and batch (stay correct). Late events, schema drift, and inconsistent contracts were constant. Compliance required separation of concerns and auditable handling of sensitive fields.

The platform had to be the single source of truth that analytics, operations, and ML teams could trust.

What we delivered

  • Streaming ingestion backbone

    Ingestion path from event producers into Cloud Pub/Sub with standardized routing and partitioning.

  • Domain repository integration

    Operational stores where domain truth is preserved and reconciled when durable state is required.

  • Cloud Dataflow processing

    Streaming normalization, deduplication, enrichment, retries, and backfill with predictable operations.

  • BigQuery analytics layer

    Partitioned datasets with merge-on-key handling for late data, plus governance-friendly conventions.

  • Cloud Composer orchestration

    Batch steps, dependency chains, and rebuild paths so corrections propagate without firefighting.

  • Looker-ready outputs

    Curated datasets and conventions enabling repeatable dashboards and shared definitions.

  • Vertex AI-ready datasets

    Feature tables and training-ready extracts for forecasting without duplicating ingestion or modeling.

  • Operational observability

    Metrics, alerting, and runbooks so failures are visible, diagnosable, and recoverable.

How we built it

We treated freshness and correctness as first-class requirements. We considered keeping event streams on Kafka only and querying from there; we introduced a Pub/Sub bridge and BigQuery so all consumers share one model and SLAs.

We evaluated batch-only ETL; we chose streaming plus batch so operational dashboards stay near real-time while historical backfills remain correct.

Producers publish through event streams. In GCP, data is routed through Pub/Sub and anchored into domain repositories or operational stores where reconciliation and durable state are required. Dataflow validates and normalizes events and handles late-arriving data. Curated datasets land in BigQuery (partitioned by event date). Composer orchestrates rebuilds and dependencies. Looker and Vertex AI consume stable models and ML-ready datasets.

Fresh and correct — not one or the other.

Key decisions

Streaming is a product, not a feature

Contracts, retries, and observability require the same discipline as application services.

Warehouse contracts beat ad-hoc tables

Standardized models and naming prevent every consumer from reinventing definitions.

Orchestration as the control plane

Composer owns correctness, rebuild paths, and dependency clarity.

Design for late data

Out-of-order events and backfills are normal; we engineered for them instead of patching later.

Outcomes

Slow reporting was replaced with a platform that stays fresh, correct, and operable. Dashboard latency went from hours to minutes, forecasting runs reused the same governed models, and observability reduced firefighting.

Hours → min
dashboard latency
Streaming + batch
processing
One
source of truth
ML-ready
datasets

What we took away

Streaming is a product

Without contracts, retries, and observability, streaming becomes chaos. Build it like a production service.

Warehouse models are contracts

Stable BigQuery definitions reduce debates and accelerate dashboards and ML downstream.

Late data is normal

Engineer for out-of-order events and backfills because real systems never behave perfectly.

Orchestration enables correctness

Composer organizes rebuilds and dependencies so corrections propagate without firefighting.

What's next

We support the next phase: tightening event contracts, expanding domain models, automating data quality remediation, and scaling forecasting — while keeping governance and operational clarity as producers and consumers grow.

Bring us the hard part

A first version you need shipped, a second phase you've outgrown, or a decision your team can't agree on — write a paragraph and we'll come back inside a day with whether it's a shape we take on.