Contributors are quietly pasting ChatGPT into the “human” answers you sell to AI labs. Sorsh catches LLM-contamination and low-effort work in your RLHF, eval, and annotation pipelines, using behavioral telemetry that is hard to fake rather than a text detector anyone can beat.
The platform
Gold sets and inter-annotator agreement cannot catch a worker secretly using an LLM. The output scores high on exactly those metrics. Sorsh watches how the work was produced.
Paste ratio, tab-out-then-paste, typing cadence, time-vs-length, edit churn. The hard-to-fake fingerprint of real work.
Length vs. what the task asked for, information density, gold coverage. Flags filler that passes traditional QA.
An optional content detector running on Groq, layered into the ensemble. It corroborates, never decides alone, because text detection is the least reliable signal.
How it works
Drop the SDK into any labeling tool. Sorsh scores every submission and surfaces a ranked queue.
A 2KB SDK captures keystroke, paste, and focus timing, never the content.
A three-signal ensemble returns a 0 to 100 trust score, with the reasons spelled out.
Flag contaminated batches before they ever reach the client.
One ranked queue: who to retrain, who to remove, whose payment to hold.
Do not sell a dashboard. Become the trust rail for the global AI workforce, owning the cross-vendor reputation graph no single labeling shop can replicate.
Join the waitlist for a trust audit on a sample of your real data, or kick the tires in the playground right now.