Awarri
AI Platform
Data curation and AI training platform developing localized datasets and training pipelines for West African language models.
Resource Link
Navigate to the developer portal to access services, API documentations, or tier subscriptions.
Introduction
Awarri is an artificial intelligence data startup based in Lagos, Nigeria. It builds high-quality training datasets for machine learning models. Awarri focuses on collecting, cleaning, and labeling voice and text data for West African languages.
Main Features
Awarri provides end-to-end data services for AI development. Here is a detailed explanation of each offering.
Audio Data Collection
The Audio Data Collection service gathers thousands of hours of high-quality spoken audio in local dialects and accents. Awarri recruits native speakers across Nigeria to record sentences, conversations, and domain-specific vocabulary in controlled conditions. Each recording is checked for audio quality, background noise levels, and pronunciation clarity. The audio files are delivered in standard formats with time-stamped transcriptions. This data is essential for training voice recognition engines and speech-to-text models that need to understand African accents accurately.
Text Translation and Curation
The Text Translation and Curation service translates and checks written sentences across regional languages. Professional translators who are native speakers translate source text into target African languages. Each translation goes through a three-step review process: initial translation, back-translation verification, and quality scoring by senior linguists. This ensures translations preserve meaning, cultural nuance, and grammatical correctness. The curated text datasets are used to train machine translation models and multilingual chatbots.
Image and Video Annotation
The Image and Video Annotation service marks and labels visual files to train computer vision models. Annotators draw bounding boxes around objects in images, classify scenes, tag facial expressions, and mark product categories. Awarri specialises in annotations that reflect African contexts, such as local vehicle types, food items, clothing styles, and agricultural products. The annotation team follows strict labeling guidelines to ensure consistency across thousands of images.
Dataset Validation
The Dataset Validation service reviews existing datasets to remove bias and ensure high accuracy. Many publicly available datasets contain errors, duplicates, or cultural biases that reduce model performance. Awarri's data scientists audit the dataset for label accuracy, class imbalance, demographic representation, and edge case coverage. They provide a detailed validation report with specific recommendations for improvement. This service is valuable for AI teams that have collected their own data but want an independent quality check before training their models.
Local Speaker Networks
The Local Speaker Networks service connects AI projects with fluent speakers of over 50 African languages. Awarri maintains a verified pool of thousands of speakers across Nigeria and West Africa, organized by language, dialect, age group, and location. When a tech company needs voice samples in Hausa from Northern Nigeria or text corrections in Igbo from Eastern Nigeria, Awarri can mobilise the right speakers quickly. This network approach ensures authentic, representative data collection at scale.
Service Comparison
| Service | Output | Timeline | Best For |
|---|---|---|---|
| Audio Collection | Transcribed speech files | 2-6 weeks | Voice AI training |
| Text Curation | Verified translation pairs | 2-4 weeks | Translation models |
| Image Annotation | Labeled image datasets | 1-4 weeks | Computer vision |
| Dataset Validation | Quality audit report | 1-2 weeks | Pre-training QA |
| Speaker Networks | On-demand speakers | 1-2 weeks | Custom data projects |
Performance Overview
Performance Metrics
Pricing
| Model | Details |
|---|---|
| Standard Datasets | Flat fee for pre-collected datasets |
| Custom Curation | Volume-based pricing per project |
Frequently Asked Questions
- Where is Awarri based? Lagos, Nigeria.
- Can I work as a data annotator? Yes, they frequently recruit local speakers.
- What languages does Awarri cover? Nigerian languages (Yoruba, Igbo, Hausa, Pidgin) and expanding.
- Are datasets available immediately? Pre-curated sets available; custom projects take weeks.
- How do they ensure data quality? Multi-stage review by senior linguists.
- Do they follow copyright laws? Yes, all data collected with explicit consent.
- Is Awarri suitable for academics? Yes, they partner with universities.
- Can they collect video data? Yes, custom video and image sets available.
- How do I request a quote? Submit a project query through awarri.com.
- Who founded Awarri? A team of African AI data scientists and computational linguists.
Conclusion
Awarri is building the foundational data layer for West African AI. Visit awarri.com to launch your project.