GRPO-trained LoRA adapters that teach small language models to clean, enrich, and answer questions about tabular data.
A pipeline of RL-trained agents, each specialized for one data operation.
GRPO reward signals over 3 epochs of training, pulled from Weights & Biases.
DataSage LoRA vs base model and external benchmarks against live environments.
Tested against live HF Space environments, March 2026.
Answering performance across domains, personas, and individual episodes.
| Domain | Cleaning | Enrichment | Answering |
|---|
Cleaning starts above done threshold (DQ>0.95) — minimal differentiation expected. Enrichment at 0.20 coverage across all domains — an area for improvement.
Source code, trained adapters, and live environments.