← back to projects

VisionSleuth AI

Intelligent weapon detection & threat analysis platform for real-time video surveillance. Fine-tuned YOLO11 model with custom Threat Association Engine that evaluates spatial proximity and weapon-person associations.

Status: MSc Thesis Project (Defense Pending) | Lead: Fatma Duran

Thesis Supervisor: Prof. Dr. Kaan Yılancıoğlu
Head of Biosecurity Master's Program, Üsküdar University
kaan.yilancioglu@uskudar.edu.tr

try demo →research team →

System Architecture

  • Detection: YOLO11n/s (fine-tuned)
  • Tracking: ByteTrack + EMA smoothing
  • Threat Analysis: Proximity + Overlap scoring
  • Backend: FastAPI 0.115 + Python 3.11
  • Frontend: Next.js 14 + Canvas API
  • Database: PostgreSQL + SQLAlchemy
  • Deployment: Render + Vercel
V3 (YOLO11n)2.6M params, 5.4 MB
Live analysis (CPU)~94 ms/frame
V4 (YOLO11s)9.4M params, 57 MB
Video upload (GPU)~226 ms/frame

Machine Learning Pipeline

Two fine-tuned YOLO11 variants are maintained. V3 (YOLO11n) is deployed for live analysis due to low memory footprint; V4 (YOLO11s) provides higher accuracy (mAP@0.5 = 0.748) for offline video processing.

Custom Dataset: VisionSleuth Dataset

  • 28,409 images across 11 threat detection classes
  • → Classes: Handgun, Rifle, Knife, Blunt Weapon, Scissors, Toothbrush, Smartphone, Person + 3 others
  • Hard-negative classes (scissors, toothbrush, smartphone) included to suppress false positives
  • Test set: 770 images with full evaluation across IoU thresholds 0.50-0.95

Validation Metrics (V4 - YOLO11s)

  • mAP@0.5: 0.748 (mean average precision over all 11 classes)
  • mAP@0.5:0.95: ~0.52 (stricter localization standard)
  • Precision: 0.805 (80.5% of detections are correct)
  • Recall: 0.686 (68.6% of all true objects detected)

Per-Class Performance

Handgun: AP@0.5 = 0.908 — Excellent (4,777 training samples)

Knife: AP@0.5 = 0.737 — Good (4,574 training samples)

Blunt Weapon: AP@0.5 = 0.688 — Acceptable (2,960 training samples)

Rifle: AP@0.5 = 0.287 — CRITICAL (only 234 training samples) — Future work: augment data to 1,000+ samples

Threat Association Engine (v5)

Original academic contribution: Instead of simple binary IoU, computes continuous threat_score combining spatial proximity + overlap to determine weapon-person association.

threat_score = weapon_weight × (α × proximity_score + β × overlap_score) × weapon_confidence

  • α (proximity weight) = 0.40
  • β (overlap weight) = 0.60 ← weapon-in-hand is most certain signal
  • proximity_score = 1.0 when weapon at person's center; falls to 0 at 30% frame diagonal
  • overlap_score = 1.0 when weapon inside person bbox; 0.0 when no overlap

Alert Levels

CRITICALscore ≥ 0.65Red pulsing banner + audio alert
WARNINGscore ≥ 0.35Orange indicator
UNCONFIRMEDscore > 0.05Suppressed from UI (noise filtered)

Weapon Severity Weights

  • 🔴 Rifle: 1.00 (highest lethality)
  • 🔴 Handgun/Gun: 0.95 (high lethality, concealable)
  • 🟠 Machete/Axe: 0.85–0.90 (high-lethality bladed)
  • 🟠 Knife/Blade: 0.80 (common street weapon)
  • 🟡 Blunt Weapon/Baseball Bat: 0.65 (severe but lower lethality)
  • 🟡 Scissors: 0.50 (lowest threat; hard-negative class)

Temporal Tracking & Confirmation

ByteTrack + EMA smoothing maintains persistent object identity across frames. Exponential Moving Average (EMA) with α=0.40 provides fast response while dampening single-frame spikes.

Live Analysis Mode

  • CONFIRM_FRAMES: 1 (instant display)
  • FORGET_FRAMES: 5 (evict track after 5 missed)
  • EMA α: 0.40 (fast response)
  • Use case: Webcam real-time analysis

Video Upload Mode

  • CONFIRM_FRAMES: 2 (multi-frame confirmation)
  • FORGET_FRAMES: 5 (evict track after 5 missed)
  • EMA α: 0.40 (same smoothing)
  • Use case: Video file processing (more strict)

Frontend & Deployment

Detection Overlay

  • 🔵 Blue: Person detected (no weapon)
  • 🔴 Red (6px stroke): CRITICAL — armed suspect with outer glow
  • 🟠 Orange: WARNING — potential threat
  • 🟡 Yellow: Non-threat objects
  • 🔊 Audio: Three-burst 880 Hz tone (Web Audio API)
  • ⏱️ Debounce: 4-second suppression to prevent alert fatigue

Deployment Stack

  • Backend: Render (Web Service, uvicorn, 1 worker)
  • ML Model: best_v3.pt preloaded into RAM at startup
  • Frontend: Vercel (Next.js 14 static export)
  • Database: Render PostgreSQL (optional; memory-only fallback)
  • Monitoring: Sentry (error tracking), Prometheus (metrics)
  • CI/CD: GitHub Actions (lint, audit, test, secrets scan)

Known Limitations & Future Work

Explore real-time threat detection and analysis

open visionsleuth →meet the team →