
Insurance Pricing
Predictive pricing model that estimates annual insurance charges with transparent SHAP explanations and LLM-powered interpretation of each prediction.
Try Live DemoOverview
The Problem
Insurance pricing requires balancing actuarial accuracy with customer transparency. Traditional black-box models produce estimates without explaining why, making it difficult for both underwriters and customers to understand pricing decisions.
Our Solution
Insurance Pricing combines an AutoGluon regression ensemble with SHAP explainability and LLM-powered interpretation. Every prediction comes with a breakdown of which factors (age, BMI, smoking status, etc.) drove the price up or down, plus a plain-English explanation.
Key Outcomes
- Accurate annual charge predictions using ensemble of GBM, XGBoost, CatBoost, and Random Forest
- Per-prediction SHAP feature impact visualization showing positive and negative drivers
- LLM-generated plain-English interpretation with fallback to rule-based explanations
- Extrapolation detection that warns when inputs fall outside training data range
Models & Tech Stack
AI/ML Models
Ensemble of GBM, XGBoost, CatBoost, Random Forest, and Extra Trees with regularized hyperparameters, 5-fold bagging, and MAPE optimization. Trained on 6 features with engineered interactions (smoker×BMI, age×BMI).
Uses TreeExplainer for tree-based models with KernelExplainer fallback. Returns top 8 features by absolute SHAP value per prediction.
Generates structured interpretation with headline and bullet points explaining key pricing factors. Falls back to rule-based interpretation for robustness.
Tech Stack
Data & Methodology
Data Sources
US Health Insurance Dataset with 1,300 records with 6 features: age (18-64), sex, BMI (15-53), number of children (0-6), smoker status, and US region (northeast, northwest, southeast, southwest). Target: annual insurance charges in dollars.
Methodology
Preprocessing: Winsorization (IQR 1.5x) for BMI/charges outliers, log1p transformation on target, StandardScaler on age/BMI, one-hot encoding for region, binary encoding for sex/smoker, feature interactions (smoker×BMI, age×BMI). Stratified 80/20 train/test split by smoker and region. AutoGluon training with 300s time limit and 5-fold bagging.
Evaluation Metrics
Evaluation metrics: R², MAPE (Mean Absolute Percentage Error), and SMAPE. MAPE used as primary optimization target.
Preview
Try It Yourself
Experience Insurance Pricing with real data. No signup required.