1. Introduction

This project was designed to analyze large-scale retail transaction data. It identifies customer segments using clustering techniques and supports data-driven marketing strategies through interactive dashboards. The goal was to design and implement a cloud-based solution to analyze large-scale retail transaction data and identify meaningful customer segments.

2. System Architecture

Data Processing Pipeline
  • Raw Transaction Data → Azure Storage
  • Azure Storage → Hadoop MapReduce
  • Hadoop MapReduce → K-means Clustering
  • K-means Clustering → Power BI Dashboards

End-to-end pipeline processing millions of records to deliver valuable customer insights

3. Development Process

Data Preprocessing
  • Led data cleaning and preprocessing of large-scale retail transaction datasets, ensuring data quality and consistency
Pipeline Development
  • Implemented data pipeline using Hadoop MapReduce for distributed processing and K-means clustering for customer segmentation
Dashboard Design & Team Leadership
  • Designed 3 interactive Power BI dashboards to visualize customer segments, supporting strategic business decisions

4. Technology Stack

Cloud & Big Data
  • Azure Storage for scalable data management
  • Hadoop MapReduce for distributed processing
  • K-means Clustering for customer segmentation
Development & Visualization
  • Python & Java for data processing
  • Power BI for interactive dashboards
  • Business Intelligence analytics

5. Project Gallery