π‘ The Motivation
Letβs be real. Nobody likes waiting in queues, especially not the ones where moneyβs on the line and fraudsters are already ten steps ahead. Batch-processing systems? Too slow. By the time they catch a fraud, your cardβs already buying pizza in two continents.
So I set out to build a real-time fraud detection pipeline, one that catches shady transactions faster than you can say βKafka.β
βοΈ Tech Stack
- Apache Kafka β for scalable, real-time data streaming
- Python β The glue that holds the pipeline together
- Scikit-learn β for the K-Nearest Neighbors model
- Matplotlib & Seaborn β graphs for nerdy satisfaction
- Docker Compose β one command to bring the whole circus alive
π§ The Architecture (Visually)

ποΈ Project Structure
ccfraud_kafka/
β
βββ pipeline/
β βββ producer.py # Streams transaction data to Kafka
β βββ feature_processor.py # Scales and preprocesses features
β βββ fraud_detector.py # Runs the ML model and predicts fraud
β βββ alert_system.py # Sends alerts + plots graphs
β
βββ models/
β βββ train_model.py # Trains and evaluates KNN
β βββ fraud_model.pkl # Saved model
β βββ time_scaler.pkl # Time scaler
β βββ amount_scaler.pkl # Amount scaler
β
βββ ccprod.csv # Sample chunk of the credit card dataset
βββ docker-compose.yml # Container orchestration
βοΈ Pipeline Flow (TL;DR)
Producer reads transaction rows from CSV and streams them into the transactions Kafka topic.
Feature Processor consumes from that topic and applies RobustScaler to Time and Amount.
Fraud Detector loads a trained KNN model and evaluates fraud probability.
Alert System logs suspicious transactions with full timestamps and gives beautiful metrics and visualizations.
Best part? Some alerts clocked in under 30 milliseconds end-to-end! Take that, Flash.
π§ Code Snippets Youβll Love
π Streaming Producer
def _clean_transaction(self, transaction):
clean_tx = {k: float(v) for k, v in transaction.items()
if k not in ['Class']}
clean_tx['transaction_id'] = str(uuid.uuid4())
clean_tx['timestamp_received'] = datetime.utcnow().isoformat()
return clean_tx
π¬ Feature Processor
def _scale_features(self, transaction):
scaled = transaction.copy()
scaled['Time'] = self.scalers['Time'].transform([[transaction['Time']]])[0][0]
scaled['Amount'] = self.scalers['Amount'].transform([[transaction['Amount']]])[0][0]
return scaled
π€ KNN Prediction
proba = self.model.predict_proba(features)[0][1]
if proba >= 0.8:
# Itβs a fraud, my dude!
π οΈ Deployment (Docker-ized AF)
version: '3.8'
services:
kafka:
image: confluentinc/cp-kafka:7.0.1
zookeeper:
image: confluentinc/cp-zookeeper:7.0.1
# microservices are launched manually (or add them later!)
πThe Superheros:Kafka Topics
# Create raw transactions topic
docker compose exec kafka kafka-topics --create --bootstrap-server kafka:9092 --topic transactions --partitions 3 --replication-factor 1 --config retention.ms=604800000
# Create processed transactions topic
docker compose exec kafka kafka-topics --create --bootstrap-server kafka:9092 --topic processed_transactions --partitions 3 --replication-factor 1 --config retention.ms=604800000
# Create fraud predictions topic
docker compose exec kafka kafka-topics --create --bootstrap-server kafka:9092 --topic fraud_predictions --partitions 3 --replication-factor 1 --config retention.ms=2592000000
β¨ Output Sneak Peek
π§ Trained Model Metrics

πΈ Real-Time Fraud Alerts (Sample Logs)

β End-to-end latency: ~30ms
β Fast enough to warn Batman before Joker hits send.
π Visualizations (via alert_system.py)



π Results Worth Flexing
- Minimum Latency: 30ms π
- Average Inference Time: Sub-500ms
- Peak Throughput: 1200 tx/min
- Accuracy: 93%
Built for speed, precision, and modular deployment.
π‘ Future Scope
- Add Prometheus + Grafana for robust observability
- Upgrade to model versioning with MLFlow
- Shift to Spark Streaming or Flink if horizontal scaling is required
-Build a pipeline with the help of TARS and a conveniently located wormhole?(I need help)

βThe path of the fraudster is beset on all sides by the Kafka-powered processor and the righteous model...β
β Not Jules. But letβs pretend.
If this sparked your curiosity or made you laugh (even a little), you know the drill.
Have questions? Ping me. I donβt bite (unless you're a fraudulent transaction). π³π₯
More...