A fast, accurate sentence segmenter

Drop your file here


What is Katana?
  • Katana v1 is a hybrid system—there are rules and there is machine learning.
  • A repository of rules, extensive lists and a conditional random field (CRF) model trained on scientific text make Katana a highly accurate sentence segmenter.
  • It works on both general and scientific text.
How to use Katana?
  • Paste text (min 25 and max 200 words) in the input field on the left and click Execute.
  • The segmented sentences will appear in the output field on the right, and the count will be displayed at the bottom of this field.
  • The daily use limit for this tool is 10. If you wish to use Katana for larger files, contact us at talk2us@crimsoni.ai
Why use Katana
Clean scientific corpora

Scientific text is full of noise, e.g. symbols, non-terminal periods, sub(super)scripts, making it challenging for a machine to find the logical end of a sentence. Highly accurate segmentation will ensure your corpus contains more sentences and less fragments.

Accuracy of machine learning tasks

Errors in pre-processing steps such as sentence segmentation propagate to high-level tasks, making product quality questionable. Deriving clean data from text corpora reduces this risk.

Rich information processing

Retrieving information from semi-structured and unstructured data is quite demanding. Accurate, machine-readable information goes a long way in giving the best possible output for various natural language learning tasks such as information/knowledge extraction, sentence-level versioning, question-answering systems.

Easily integrate Katana into your products or build new products using the Katana API

  • Katana is an early AI prototype. We are continuously improving it.
  • Tell us what you think talk2us@crimsoni.ai