English Sentence Segmenter - Online Segmentation Tool

What is Katana?

Katana v1 is a hybrid system—there are rules and there is machine learning.
A repository of rules, extensive lists and a conditional random field (CRF) model trained on scientific text make Katana a highly accurate sentence segmenter.
It works on both general and scientific text.

How to use Katana?

Paste text (min 25 and max 200 words) in the input field on the left and click Execute.
The segmented sentences will appear in the output field on the right, and the count will be displayed at the bottom of this field.
The daily use limit for this tool is 10. If you wish to use Katana for larger files, contact us at talk2us@crimsoni.ai

Why use Katana

Clean scientific corpora

Scientific text is full of noise, e.g. symbols, non-terminal periods, sub(super)scripts, making it challenging for a machine to find the logical end of a sentence. Highly accurate segmentation will ensure your corpus contains more sentences and less fragments.

Accuracy of machine learning tasks

Errors in pre-processing steps such as sentence segmentation propagate to high-level tasks, making product quality questionable. Deriving clean data from text corpora reduces this risk.

Rich information processing

Retrieving information from semi-structured and unstructured data is quite demanding. Accurate, machine-readable information goes a long way in giving the best possible output for various natural language learning tasks such as information/knowledge extraction, sentence-level versioning, question-answering systems.

Katana^BETA

What is Katana?

How to use Katana?

Why use Katana

Easily integrate Katana into your products or build new products using the Katana API

KatanaBETA

What is Katana?

How to use Katana?

Why use Katana

Easily integrate Katana into your products or build new products using the Katana API

Katana^BETA