Gamification of Word List Cleaning


Your browser does not support PDF viewing. Click here to download .

Related Projects


Creator/Artist: Tarun Mugunthan

Category: Interaction Design

Document: P2 Project

Batch: 2016-2020

Source: India,   IDC IIT Bombay

Period:  2019-onwards

Medium: Report pdf

Supervisor: Prof. Anirudha Joshi


Detailed Description

Swarachakra Hindi is an Hindi text input keyboard developed by IDC, IIT Bombay for touch input mobile devices. It has over 60,00,000 downloads on the play store as of now. The words typed by users through this keyboard are recorded in the form of a word list. The word list contains two data points, the word that was typed and the number of times(frequency) it was typed. The copy of the word list I was working with is from 2015 and contains 329,525 unique words with their frequencies. The premise of this project is a need to clean the wordlist(remove/correct errors and tag problem words) to get a usable database of conversational Hindi words.