pricing

Every engagement starts with a Free dataset or a 30-day trial.

Choose a plan that works the best for you and your team. Start small, upgrade when you need.

Usage-based Pricing
Hosted by Dataframer

Platform Subscription

Up to 20 seats.

$2,000

Per month

Input Tokens

Includes Seed Samples and Analysis.

$2

Per Million Tokens

Output Tokens

Includes Generation and in-built evaluations for specified attributes, delivering superior quality datasets.

$4

Per Million Tokens

Custom Generation Models

Offers flexibility for hyper-specific use cases.

Let's Discuss

Custom Evaluations

Offers flexibility for hyper-specific use cases.

Let's Discuss

Yearly License Subscription
Dedicated / VPC Deployments

Startups and Small Businesses

Pricing based on needs: Seats, Support, SLAs, etc.

$10,000+

Per year

Enterprise

Unlimited seats and usage. 24/7 dedicated support with SLA guarantees.

$100,000+

Per year

Enablement

And Forward-deployed Engineers.

$100,000+

Per year

Custom pricing

Custom pricing is available for companies with a high volume of API calls or unique business models.

Contact sales

Used to power leading AI applications at:

FAQ

Frequent questions and answers

What is Dataframer?
Dataframer is a synthetic data generation platform that transforms small examples of your data into safe, scalable, and realistic datasets. It lets you build, test, and deploy AI systems without exposing sensitive information.
How does Dataframer work?
Dataframer follows a 3-step process: 1. Upload Seed Samples – Provide example data (CSV, TSV, TXT, JSONL, MD). 2. Automatic Analysis – Dataframer analyzes properties and axes of variation (patterns, attributes, distributions). 3. Generate Synthetic Data – Creates new datasets that mirror the statistical fidelity of your originals, without leaking PII/PHI.
What formats can I upload?
You can upload CSV, TSV, TXT, JSONL, or Markdown files. • Up to 300 files total • 40MB. • In CSV/JSONL formats, each row/line is treated as a sample.
Do I need a lot of data to get started?
No. Even a handful of representative seed samples is enough for Dataframer to learn the structure and generate larger, balanced datasets.
How is Dataframer different from anonymization or masking?
Anonymization removes identifiers from real data, but risks re-identification. Dataframer creates entirely new synthetic records that preserve statistical accuracy without exposing original sensitive values.
Can I use Dataframer for compliance-heavy industries like healthcare or finance?
Yes. Dataframer was designed with privacy, fairness, and compliance in mind. Enterprises in healthcare (HIPAA), finance (SEC, GDPR), and government use cases can safely train and test AI systems with synthetic data.
What are common use cases?
• Healthcare: Synthetic EMRs for model training without PHI. • Finance & Insurance: Fraud detection, AML, KYC, fair lending. • Conversational AI: Multi-turn chatbot training and edge-case testing. • Market Research: Synthetic survey panels and digital twins. • Traditional ML: Classification, anomaly detection, recommendations.
How does Dataframer handle long-form text?
For text extraction and NLP tasks, Dataframer uses a long-sample generation algorithm that creates realistic, complex documents (e.g., contracts, medical notes, research papers) to stress-test extraction models.
Can I control the output?
Yes. Dataframer gives you control over: • The axes of variation (e.g., demographics, time, categories). • The size of the generated dataset. • The algorithm choice (short-form vs. long-form).
How does Dataframer ensure quality?
Generated datasets are validated against the statistical properties of the seed data. Bias detection, drift monitoring, and fairness checks can be built into your workflows.
What's the ROI of using Dataframer?
• Save time: Cut data preparation cycles from months to weeks. • Reduce cost: Avoid expensive manual collection/annotation. • De-risk compliance: Train AI safely without exposing sensitive data.
How can I deploy Dataframer?
Dataframer offers flexible deployment options: • Hosted: Use Dataframer's managed cloud service for quick setup and maintenance-free operation. • On-premise: We deploy using Kubernetes-based deployments on any popular cloud platform (AWS, Azure, GCP) or custom cloud infrastructure for enhanced security and control.