Castform

Castform auto-generates training data for task-specific models at open-source cost
Castform is an AI‑focused platform that automates the creation of high‑quality training data from existing corpora, agent traces, and other sources. By ingesting documents, logs, and structured datasets, Castform builds Retrieval‑Augmented Generation (RAG) corpora and prepares task‑specific datasets for fine‑tuning large language models. The service includes tools for data import, filtering, deduplication, and reward‑based monitoring, as well as integrations with vector stores such as Turbopuffer, Pinecone, Chroma, and PostgreSQL. Users can run end‑to‑end pipelines that generate prompts, evaluate model performance, and ship models at open‑source cost, enabling rapid development of domain‑expert systems for legal, regulatory, and other specialized applications.
