Is your human data
actually human?

Contributors are quietly pasting ChatGPT into the “human” answers you sell to AI labs. Sorsh catches LLM-contamination and low-effort work in your RLHF, eval, and annotation pipelines, using behavioral telemetry that is hard to fake rather than a text detector anyone can beat.

The platform

Telemetry-first, not text-detection

Gold sets and inter-annotator agreement cannot catch a worker secretly using an LLM. The output scores high on exactly those metrics. Sorsh watches how the work was produced.

Behavioral signals

Paste ratio, tab-out-then-paste, typing cadence, time-vs-length, edit churn. The hard-to-fake fingerprint of real work.

Quality and effort

Length vs. what the task asked for, information density, gold coverage. Flags filler that passes traditional QA.

LLM as judge

An optional content detector running on Groq, layered into the ensemble. It corroborates, never decides alone, because text detection is the least reliable signal.

How it works

Keystroke to action, one pipeline

Drop the SDK into any labeling tool. Sorsh scores every submission and surfaces a ranked queue.

01

Instrument

A 2KB SDK captures keystroke, paste, and focus timing, never the content.

02

Score

A three-signal ensemble returns a 0 to 100 trust score, with the reasons spelled out.

03

Gate

Flag contaminated batches before they ever reach the client.

04

Act

One ranked queue: who to retrain, who to remove, whose payment to hold.

See the interactive walkthrough →

Do not sell a dashboard. Become the trust rail for the global AI workforce, owning the cross-vendor reputation graph no single labeling shop can replicate.

See it on your own pipeline.

Join the waitlist for a trust audit on a sample of your real data, or kick the tires in the playground right now.