B.
Back to projects

Amazon P&L Dashboard

Overview

A high-performance ETL financial data pipeline and dashboard that automates the reconciliation and visualisation of Amazon settlement data. It combines a concurrent data pipeline with an analytical backend to provide real-time insights into marketplace and SKU-level profitability, advertising performance (TACOS) and complex tax liabilities across global marketplaces.

What it does

  • Data Ingestion: Ingested and validated several different Amazon financial reports (Flat File V2 settlement reports, Advertising invoices and FBA Returns data) over multiple international marketplaces to a partitioned Parquet database using PyArrow with a custom schema that reduced row count by over 85%
  • Currency Normalisation: Integrated with the Frankfurter API to convert global marketplace data into a unified base currency (GBP) using historical exchange rates
  • Tax & Fee Logic: Automatically calculated UK VAT, reclaimable VAT on Amazon fees and handled dynamic fee type detection
  • Analytical Dashboard: Built an interactive Plotly Dash interface for multi-year P&L heatmaps, SKU performance tracking and marketplace analysis

Why it’s interesting

It solves the complex problem of Amazon financial reconciliation at scale. By leveraging a lakehouse architecture with DuckDB and Parquet, the engine achieves sub-second query latency (approximately 100ms) across almost 700,000 rows of raw data. It handles edge cases such as dynamically changing fee types and regional tax logic changes (for example, the August 2024 UK VAT changes), making it a production-ready financial tool.

Key Technical Points

  • High-throughput Pipeline: Parallelised the CPU-bound task of parsing large raw data files and implemented batched flushing to maintain a low memory footprint
  • Lakehouse Architecture: Utilised Apache Parquet with Hive partitioning for storage and DuckDB for vectorised SQL execution, enabling high compression and extremely fast column-wise scans
  • Performance at Scale: Optimised to process approximately 600,000 raw source rows with sub-second query latency for multi-dimensional P&L views
  • Modern UI Architecture: Prototyped a Streamlit dashboard before switching to a Plotly Dash application for production because of the finer control over state management and high-density data visualisations

Tech Stack

Languages: Python, SQL
Analytics: DuckDB, Apache Parquet, Pandas
Visualisation: Dash, Streamlit
APIs: Frankfurter API (Currency)