DataSage

GRPO-trained LoRA adapters that teach small language models to clean, enrich, and answer questions about tabular data.

3
Environments
4
Domains
3
LoRA Adapters
192
Training Steps

Architecture

A pipeline of RL-trained agents, each specialized for one data operation.

Raw Data
Cleaning
Enrichment
Answering
Insight
GRPO Training Configuration

Training Curves

GRPO reward signals over 3 epochs of training, pulled from Weights & Biases.

Reward — Cleaning
Reward — Enrichment
Reward — Answering
Component Rewards — Cleaning
Component Rewards — Enrichment
Component Rewards — Answering
Loss Curves — All Tasks

Results Comparison

DataSage LoRA vs base model and external benchmarks against live environments.

DataSage LoRA vs Base Qwen2.5-3B

GPT-4o-mini vs Qwen3-8B (Live Benchmarks)

Tested against live HF Space environments, March 2026.

Radar — All Models

Detailed Breakdown

Answering performance across domains, personas, and individual episodes.

Answering Reward by Domain
Answering Reward by Persona
DataSage Per-Domain Heatmap
Domain Cleaning Enrichment Answering

Cleaning starts above done threshold (DQ>0.95) — minimal differentiation expected. Enrichment at 0.20 coverage across all domains — an area for improvement.

Best Answering Episodes