Project - Schema-Driven System Modernization

Modernized a chat platform processing 25 million messages daily by migrating legacy systems to Google Cloud and enforcing data consistency with Protocol Buffers.

Client: HarlemNext
Year: 2022
Service: Refactoring, Protocol Buffers

Overview

I played a key role in refactoring core applications at HarlemNext, a platform processing 25 million messages per day. The initiative was primarily driven by the need to eliminate the substantial financial overhead (over €100k annually) and synchronization issues caused by maintaining duplicate data across Legacy and Cloud systems. However, a deeper technical challenge existed within the Cloud architecture itself: many applications relied on largely identical data structures without a centralized schema. This lack of standardization led to significant code duplication and increased the risk of bugs.

The Challenge

Working as part of the Cloud Infrastructure team and a dedicated task force, my focus was on modernizing these systems to resolve both the cost and architectural inefficiencies, all while ensuring strict operational continuity for existing services.

Refactored Legacy Applications: Successfully migrated and modernized critical logic from monolithic legacy systems to scalable cloud services.
Established the Protobuf Ecosystem: Introduced Protocol Buffers as a standard, delivering production-ready Docker images, comprehensive documentation, and reference implementations to accelerate team adoption.
Integrated Generated SDKs: Built and deployed robust API clients across the service landscape utilizing the new, auto-generated SDKs.
Solved Structural Rigidity: Overcame the difficulty of modifying the legacy codebase by implementing a flexible schema-driven approach, allowing for evolution without massive upfront refactoring.

The Solution: Protocol Buffers

Code duplication and fragmented data structures were scattered throughout the ecosystem. I engineered a solution using Protocol Buffers to serve as the single source of truth for all data models. I extended this architecture with custom Python compiler plugins to generate tailored code for our specific needs, while utilizing Buf's breaking change detection to automate the prevention of any schema regressions.

Implementation Details

Code Generation: Developed multiple in-house compiler plugins to generate generic, dependency-free code that mirrored our existing class structures. This enabled us to adopt protocol schemas and deduplicated models immediately, avoiding the need for a full-scale refactor to gRPC.
Buf Integration: Used Buf (buf.build) for the implementation, including plugins for Linter and breaking change detection.
CI/CD: Integrated breaking change detection into the build process to prevent accidental regressions.
Docker: Set up the compiler as a Docker image, managing the project from A to Z and bringing it to production.

Python
Typescript
PHP
Google Cloud Functions
Buf
Protocol Buffers
Docker

Messages/Day: 25M
Data Schema: Protobuf
Tooling: Buf
Type Safety: 100%

Project - Schema-Driven System Modernization

Overview

The Challenge

The Solution: Protocol Buffers

Implementation Details

More projects

OAuth Server & Two-Factor Authentication

Reusable Authentication Infrastructure