Risk Profiler
AutoMLFraud DetectionExplainable AISHAP

Risk Profiler

Machine learning fraud detection system that scores insurance claims for fraud probability with explainable AI and natural language risk summaries.

Try Live Demo

Overview

The Problem

Insurance fraud costs the industry billions annually, but identifying fraudulent claims requires experienced investigators and is often inconsistent. Traditional rule-based systems miss complex fraud patterns and lack transparency in their decisions.

Our Solution

Risk Profiler uses an AutoGluon ensemble model trained on real insurance claim data to predict fraud probability, with SHAP explainability showing exactly which factors drive each prediction and LLM-generated natural language summaries for non-technical reviewers.

Key Outcomes

  • Fraud probability scoring with configurable threshold (default 0.65)
  • Per-prediction SHAP waterfall showing each feature's contribution to the score
  • Global feature importance ranking across the entire model
  • Natural language risk summaries generated by GPT-4o-mini with rule-based fallback

Models & Tech Stack

AI/ML Models

AutoGluon TabularPredictor
Fraud classification from structured claim features

Ensemble model with NeuralNetTorch backend, trained on 10 features including driver demographics, claim information, and vehicle safety ratings. Uses 0.65 probability threshold for fraud determination.

SHAP KernelExplainer
Per-instance feature contribution explanations

Computes Shapley values using 100 background samples to show how each input feature pushes the fraud probability up or down from the baseline.

OpenAI GPT-4o-mini
Natural language risk assessment summaries

Generates professional 2-3 sentence summaries of fraud assessments, highlighting key risk drivers and protective factors. Falls back to rule-based summaries when unavailable.

Tech Stack

Backend
Python 3.13FastAPIUvicornPydantic
Frontend
Next.js 16React 19TypeScriptTailwind CSS
ML/AI
AutoGluon 1.5SHAP 0.45scikit-learnOpenAI API
Data Processing
PandasNumPy

Data & Methodology

Data Sources

2023 Travelers NESS Statathon dataset from Kaggle, containing real insurance claim records with 10 features: age_of_driver, gender, high_education_ind, annual_income, living_status, claim_day_of_week, claim_est_payout, past_num_of_claims, witness_present_ind, and safety_rating.

Methodology

AutoGluon ensemble training with automatic model selection and hyperparameter tuning. Binary classification with 0.65 probability threshold. SHAP KernelExplainer with 100 background samples for per-instance explanations. Top 10 features ranked by global SHAP importance.

Evaluation Metrics

Top features by SHAP importance: annual income (1.0), age of driver (0.91), claim day of week (0.61). Model evaluated on holdout test set.

Preview

Try It Yourself

Experience Risk Profiler with real data. No signup required.