Google Cloud Pub/Sub, the jewel of streaming data
Use Cases and Deployment Scope
We used Google Cloud Pub/Sub to solve ETL/Streaming and real-time processing problems for high volumes of data. We used it either to fill datalakes, process and store in warehouses or data marts and also for processing events, either using JSON or protobuf.
This was integrated for many languages such as python, java, golang and kotlin. We had configured kubernetes auto scaling system based on some Google Cloud Pub/Sub metrics which worked very well. The main observed metrics for alerts and overall health indicator of our systems were both the size of each queue and the oldest message in queue, either indicating a high volume jam or some random specific error for a single message, respectively.
We had to handle idempotency since a duplicated message delivery is a possibility, this was usually paired and a Redis Cache to guarantee idempotency for a reasonable time window.
Pros
- Data Streaming
- Even Sourcing
- Protobuf message format
- Scalability
- Easy to Use
- Observability
- Integrated Dead Letter Queue (DLQ) functionality
Cons
- Deliver Once (idempotency) - currently in preview
- Vendor locked to Google
Most Important Features
- DLQ (Dead Letter Queues)
- Scalability
- Delivering backoff for failed messages
Return on Investment
- Scalable System
- Better Alerts (observability)
- Auto Scaling
Alternatives Considered
Apache Kafka
Other Software Used
MongoDB, HashiCorp Consul, HashiCorp Vault, Istio


