
Risk Profiler
Machine learning fraud detection system that scores insurance claims for fraud probability with explainable AI and natural language risk summaries.
Try Live DemoOverview
The Problem
Insurance fraud costs the industry billions annually, but identifying fraudulent claims requires experienced investigators and is often inconsistent. Traditional rule-based systems miss complex fraud patterns and lack transparency in their decisions.
Our Solution
Risk Profiler uses an AutoGluon ensemble model trained on real insurance claim data to predict fraud probability, with SHAP explainability showing exactly which factors drive each prediction and LLM-generated natural language summaries for non-technical reviewers.
Key Outcomes
- Fraud probability scoring with configurable threshold (default 0.65)
- Per-prediction SHAP waterfall showing each feature's contribution to the score
- Global feature importance ranking across the entire model
- Natural language risk summaries generated by GPT-4o-mini with rule-based fallback
Models & Tech Stack
AI/ML Models
Ensemble model with NeuralNetTorch backend, trained on 10 features including driver demographics, claim information, and vehicle safety ratings. Uses 0.65 probability threshold for fraud determination.
Computes Shapley values using 100 background samples to show how each input feature pushes the fraud probability up or down from the baseline.
Generates professional 2-3 sentence summaries of fraud assessments, highlighting key risk drivers and protective factors. Falls back to rule-based summaries when unavailable.
Tech Stack
Data & Methodology
Data Sources
2023 Travelers NESS Statathon dataset from Kaggle, containing real insurance claim records with 10 features: age_of_driver, gender, high_education_ind, annual_income, living_status, claim_day_of_week, claim_est_payout, past_num_of_claims, witness_present_ind, and safety_rating.
Methodology
AutoGluon ensemble training with automatic model selection and hyperparameter tuning. Binary classification with 0.65 probability threshold. SHAP KernelExplainer with 100 background samples for per-instance explanations. Top 10 features ranked by global SHAP importance.
Evaluation Metrics
Top features by SHAP importance: annual income (1.0), age of driver (0.91), claim day of week (0.61). Model evaluated on holdout test set.
Preview
Try It Yourself
Experience Risk Profiler with real data. No signup required.