Project - Resume Processing Pipeline

Building a Google Cloud Dataflow pipeline that can process a virtually unlimited volume of resumes — from streaming ingestion through PubSub, to batched CSV output, to graph import into Neo4j.

Client
Curriculo
Year
Service
Data Engineering, Cloud Infrastructure

Overview

Curriculo's direct partners and prospects are among Brazil's largest companies, so the infrastructure needs to handle high volumes from the start. I built a streaming pipeline on Google Cloud Dataflow that decouples ingestion from processing and scales horizontally — designed to process a virtually unlimited number of resumes without architectural changes.

Key Contributions

Streaming Ingestion

Built a Go-based Dataflow consumer that subscribes to a PubSub topic and streams incoming resume events into the pipeline. The consumer writes incoming resumes to CSV files bucketed by time window, producing batches that can be processed independently. Google scheduled tasks trigger batch execution on a defined cadence, keeping throughput predictable and the pipeline resilient to spikes.

The architecture is designed so that scaling is a configuration concern rather than a code change — Dataflow handles the parallelism, and the pipeline can process a virtually unlimited volume of resumes.

Custom Neo4j Importer

The classified results need to be loaded into Neo4j for graph-based matching. Google provides a built-in Dataflow template for writing to Neo4j, but it requires Neo4j Enterprise Edition. Since Curriculo runs on Community Edition, I built a custom Python importer that reads the same JSON template configuration the official Neo4j template uses — node mappings, relationship definitions, property assignments — and interprets them to load data into Neo4j directly.

This gives us two things: a working pipeline on Community Edition today, and a clean upgrade path. If Curriculo moves to Neo4j Enterprise, the existing mapping configuration can be used with the official Dataflow template with minimal changes.

  • Google Cloud Dataflow
  • Go
  • Python
  • PubSub
  • Neo4j
  • Cloud Scheduler
Resume Throughput
Unlimited
Stream Consumer
Go
Graph Import
Neo4j
Pipeline
Dataflow

More projects

OAuth Server & Two-Factor Authentication

Rewrote Solcon's OAuth server to align with modern specifications, added multi-provider two-factor authentication, implemented a new design, and upgraded the framework.

Read more

Reusable Authentication Infrastructure

Designing and building a Symfony bundle that unifies JWT authentication across multiple services, replacing scattered legacy code with configuration-driven context routing and role mapping.

Read more