Practical principles for consent, provenance, fair compensation, and documented collection—without slowing experimentation.
Ethical sourcing starts with consent and purpose limitation. Collectors should know how data will be used, and buyers should not repurpose submissions outside the posted brief.
Provenance matters for model risk reviews. DataW projects tie submissions to tasks, timestamps, and approval history—useful when auditors ask where a slice of training data originated.
Pay fairly and predictably. Transparent fees on successful delivery align incentives: collectors are compensated for approved work, not rejected noise.
Prefer task-specific collection over scraping ambiguous public sources. Targeted projects reduce legal gray areas and improve label quality.
Document your pipeline: brief, QA steps, and export logs. A lightweight paper trail speeds enterprise security reviews.
Written by
James Okonkwo
Trust & Safety

