Skip to content

Latest commit

 

History

History
29 lines (20 loc) · 1.5 KB

README.md

File metadata and controls

29 lines (20 loc) · 1.5 KB

DataSrc

The rise of LLMs has created a new market for high-quality training data. Data needs to be annotated before it is usable for training AI. Data annotators either work full-time (getting a very high pay for what they do) or they volunteer their time (getting no pay). Neither of these is a good solution. There is no middle ground (a place where people can contribute to annotations as a side gig).

Introducing DataSrc

DataSrc is a decentralized platform for data annotation .

Anybody can upload data for annotation, and anybody can annotate. Payments are handled with NEAR and Avalanche.

How it works

When uploading data (currently limited to images) for annotation, the uploader pays for the annotation and the job is added to the MongoDB database.

When annotating, a random file (image) is loaded and displayed to the annotator. Once a smart contract message approves the annotation, it is stored on the blockchain as an NFT, and the annotator is paid through cryptocurrency.

Next steps

  • Get the project ready to deploy (connect to mainnet instead of testnet).
  • Work with existing data annotators to understand what tools and features could help them with their job.
  • Reach out to different AI companies about the potential to use DataSrc to outsource data annotations.
  • Improve the API to allow companies to connect DataSrc directly to their existing tech stack (through the API).
  • Improved annotation logic (getting annotations from multiple people and inferring the “correct” annotation).