Filtering Non-Devanagari Words: A Heuristic-based Approach
2020-04-21
When collecting Nepali text corpus, we usually collect it from various online sources such as Wikipedia, News portals, and other websites. The online sources introduce a lot of errors due to imperfect online tools such as translators, font convertors, spelling checker, etc.