Insurance Pricing
RegressionAutoMLExplainable AISHAP

Insurance Pricing

Predictive pricing model that estimates annual insurance charges with transparent SHAP explanations and LLM-powered interpretation of each prediction.

Try Live Demo

Overview

The Problem

Insurance pricing requires balancing actuarial accuracy with customer transparency. Traditional black-box models produce estimates without explaining why, making it difficult for both underwriters and customers to understand pricing decisions.

Our Solution

Insurance Pricing combines an AutoGluon regression ensemble with SHAP explainability and LLM-powered interpretation. Every prediction comes with a breakdown of which factors (age, BMI, smoking status, etc.) drove the price up or down, plus a plain-English explanation.

Key Outcomes

  • Accurate annual charge predictions using ensemble of GBM, XGBoost, CatBoost, and Random Forest
  • Per-prediction SHAP feature impact visualization showing positive and negative drivers
  • LLM-generated plain-English interpretation with fallback to rule-based explanations
  • Extrapolation detection that warns when inputs fall outside training data range

Models & Tech Stack

AI/ML Models

AutoGluon TabularPredictor
Insurance charge regression from customer profile features

Ensemble of GBM, XGBoost, CatBoost, Random Forest, and Extra Trees with regularized hyperparameters, 5-fold bagging, and MAPE optimization. Trained on 6 features with engineered interactions (smoker×BMI, age×BMI).

SHAP TreeExplainer
Feature contribution explanations for tree-based models

Uses TreeExplainer for tree-based models with KernelExplainer fallback. Returns top 8 features by absolute SHAP value per prediction.

OpenAI GPT-4o-mini
Human-readable prediction interpretation

Generates structured interpretation with headline and bullet points explaining key pricing factors. Falls back to rule-based interpretation for robustness.

Tech Stack

Backend
Python 3.13FastAPIUvicornPydantic
Frontend
Next.js 16React 19TypeScriptTailwind CSS
ML/AI
AutoGluon 1.5SHAP 0.49scikit-learnOpenAI API
Data Processing
PandasNumPySciPyjoblib

Data & Methodology

Data Sources

US Health Insurance Dataset with 1,300 records with 6 features: age (18-64), sex, BMI (15-53), number of children (0-6), smoker status, and US region (northeast, northwest, southeast, southwest). Target: annual insurance charges in dollars.

Methodology

Preprocessing: Winsorization (IQR 1.5x) for BMI/charges outliers, log1p transformation on target, StandardScaler on age/BMI, one-hot encoding for region, binary encoding for sex/smoker, feature interactions (smoker×BMI, age×BMI). Stratified 80/20 train/test split by smoker and region. AutoGluon training with 300s time limit and 5-fold bagging.

Evaluation Metrics

Evaluation metrics: R², MAPE (Mean Absolute Percentage Error), and SMAPE. MAPE used as primary optimization target.

Preview

Try It Yourself

Experience Insurance Pricing with real data. No signup required.