Utils

Different utilities

nlp utils

clstk.utils.nlp.getSentenceSplitter()

Get sentence splitter function

Returns:A function which takes a string and return list of sentence as strings.
clstk.utils.nlp.getTokenizer(lang)

Get tokenizer for a given language

Parameters:lang – language
Returns:tokenizer, which takes a sentence as string and returns list of tokens
clstk.utils.nlp.getDetokenizer(lang)

Get detokenizer for a given language

Parameters:lang – language
Returns:detokenizer, which takes list of tokens and returns a sentence as string
clstk.utils.nlp.getStemmer()

Get stemmer. For now returns Porter Stemmer

Returns:stemmer, which takes a token and returns its stem
clstk.utils.nlp.getStopwords(lang)

Get list of stopwords for a given language

Parameters:lang – language
Returns:list of stopwords including common puncuations

ProgressBar class

class clstk.utils.progress.ProgressBar(totalCount)

Bases: object

Class to manage and show pretty progress-bar in the console

__init__(totalCount)

Initialize the progressbar

Parameters:totalCount – Total items to be processed
done(doneCount)

Move progressbar ahead

Parameters:doneCount – Out of totalCount, this many have been processed
complete()

Complete progress

__weakref__

list of weak references to the object (if defined)