Katana BETA

Katana is a fast, accurate sentence segmenter specially trained on general and scientific text. Sentence splitting is a crucial text processing step for creating parallel corpora, extracting text, or analyzing linguistic phenomena. A logically segmented sentence plays a critical role in the quality of machine learning results.


An inflectional morphological converter that generates various conjugations of a word given its lemma and part-of-speech. You could specify which type of inflection you want.

NCounter ALPHA

A classifier-based noun annotation tool that tags a noun as count or mass. Nouns are semantic representations of entities and have linguistic features that are essential for core NLP tasks such as translation, word sense disambiguation, grammar error correction, ontologies, and knowledge bases. NCounter can help in semi-automatic annotation of entities and in other NLP applications.


This is a tool that cleans text which helps in building efficient and accurate ML models. Give it your raw text, and it will take care of the things that make your results noisy. Scientific text is a challenging type of text to process, given its special features such as Unicode characters, scientific symbols, citations, and equations, among others.

Lexicon Builder

This application uses NLP algorithms and statistical measures to build you a custom lexicon or word list from your documents. Such lists or dictionaries of words and phrases are useful for translators and educators as well as in the NLP applications of information extraction and retrieval, and topic and keyword analysis.

Word count tool

This tool gives you an accurate word count of a research article. Customize the fields and languages you want counted—add some, ignore some. This comes in handy when it comes to monetizing the number of words or characters in text for translation or editing. This tool works well on English, Japanese, Chinese, and Korean.