Project - Schema-Driven System Modernization

Modernized a chat platform processing 25 million messages daily by migrating legacy systems to Google Cloud and enforcing data consistency with Protocol Buffers.

Client
HarlemNext
Year
Service
Refactoring, Protocol Buffers

Overview

I played a key role in refactoring core applications at HarlemNext, a platform processing 25 million messages per day. The initiative was primarily driven by the need to eliminate the substantial financial overhead (over €100k annually) and synchronization issues caused by maintaining duplicate data across Legacy and Cloud systems. However, a deeper technical challenge existed within the Cloud architecture itself: many applications relied on largely identical data structures without a centralized schema. This lack of standardization led to significant code duplication and increased the risk of bugs.

The Challenge

Working as part of the Cloud Infrastructure team and a dedicated task force, my focus was on modernizing these systems to resolve both the cost and architectural inefficiencies, all while ensuring strict operational continuity for existing services.

  • Refactored Legacy Applications: Successfully migrated and modernized critical logic from monolithic legacy systems to scalable cloud services.
  • Established the Protobuf Ecosystem: Introduced Protocol Buffers as a standard, delivering production-ready Docker images, comprehensive documentation, and reference implementations to accelerate team adoption.
  • Integrated Generated SDKs: Built and deployed robust API clients across the service landscape utilizing the new, auto-generated SDKs.
  • Solved Structural Rigidity: Overcame the difficulty of modifying the legacy codebase by implementing a flexible schema-driven approach, allowing for evolution without massive upfront refactoring.

The Solution: Protocol Buffers

Code duplication and fragmented data structures were scattered throughout the ecosystem. I engineered a solution using Protocol Buffers to serve as the single source of truth for all data models. I extended this architecture with custom Python compiler plugins to generate tailored code for our specific needs, while utilizing Buf's breaking change detection to automate the prevention of any schema regressions.

Implementation Details

  • Code Generation: Developed multiple in-house compiler plugins to generate generic, dependency-free code that mirrored our existing class structures. This enabled us to adopt protocol schemas and deduplicated models immediately, avoiding the need for a full-scale refactor to gRPC.
  • Buf Integration: Used Buf (buf.build) for the implementation, including plugins for Linter and breaking change detection.
  • CI/CD: Integrated breaking change detection into the build process to prevent accidental regressions.
  • Docker: Set up the compiler as a Docker image, managing the project from A to Z and bringing it to production.
  • Python
  • Typescript
  • PHP
  • Google Cloud Functions
  • Buf
  • Protocol Buffers
  • Docker
Messages/Day
25M
Data Schema
Protobuf
Tooling
Buf
Type Safety
100%

More projects

OAuth Server & Two-Factor Authentication

Rewrote Solcon's OAuth server to align with modern specifications, added multi-provider two-factor authentication, implemented a new design, and upgraded the framework.

Read more

Reusable Authentication Infrastructure

Designing and building a Symfony bundle that unifies JWT authentication across multiple services, replacing scattered legacy code with configuration-driven context routing and role mapping.

Read more