Core¶
The core contains the bootstrap code for summarization needs. The core provides:
- A common standard structure for documents and summaries to ensure interoperability between different components.
- Utilities for loading document sets into the common structure.
- Common utilities on document sets, documents and sentences, for example sentence splitting, tokenization, etc.
Sentence class¶
-
class
clstk.sentence.Sentence(sentenceText)¶ Bases:
objectClass to represent a single sentence
-
__init__(sentenceText)¶ Set sentence text and translated text
Parameters: sentenceText – sentence text
-
setText(sentenceText)¶ Set text for the sentence
Parameters: sentenceText – sentence text
-
getText()¶ Get sentence text
Returns: sentence text
-
setTranslation(translation)¶ Set translated text
Parameters: translation – translated text
-
getTranslation()¶ Get translated text
The translated text defaults to sentence text
Returns: translated text
-
setVector(vector)¶ Set sentence vector
Parameters: vector – sentence vector
-
getVector()¶ Get sentence vector
Returns: sentence vector
-
setTranslationVector(vector)¶ Set sentence vector for translated text
Parameters: vector – sentence vector
-
getTranslationVector()¶ Get sentence vector for translated text
Returns: sentence vector
-
setExtra(key, value)¶ Set extra key-value pair
Parameters: - key – key for the stored value
- value – value to store
-
getExtra(key, default=None)¶ Get extra value from key
Parameters: - key – key for the stored value
- default – default value if key not found
-
charCount()¶ Get character count for translated text
Returns: Number of character in translated text
-
tokenCount()¶ Get token count for translated text
Returns: Number of tokens in translated text
-
__weakref__¶ list of weak references to the object (if defined)
-
SentenceCollection class¶
-
class
clstk.sentenceCollection.SentenceCollection¶ Bases:
objectClass to store a colelction of sentences.
Also proivdes several common operations on the collection.
-
__init__()¶ Initialize the collection
-
setSourceLang(lang)¶ Set source language for the colelction
Parameters: lang – two-letter code for source language
-
setTargetLang(lang)¶ Set target language for the colelction
Parameters: lang – two-letter code for target language
-
addSentence(sentence)¶ Add a sentence to the colelction
Parameters: sentence – sentence to be added
-
addSentences(sentences)¶ Add sentences to the colelction
Parameters: sentences – list of sentence to be added
-
getSentences()¶ Get list of sentences in the collection
Returns: list of sentences
-
getSentenceVectors()¶ Get list of sentence vectors for sentences in the collection
Returns: np.arraycontaining sentence vectors
-
getTranslationSentenceVectors()¶ Get list of sentence vectors for translations of sentences in the collection
Returns: np.arraycontaining sentence vectors
-
generateSentenceVectors()¶ Generate sentence vectors
-
generateTranslationSentenceVectors()¶ Generate sentence vectors for translations
-
translate(sourceLang, targetLang, replaceOriginal=False)¶ Translate sentences
Parameters: - sourceLang – two-letter code for source language
- targetLang – two-letter code for target language
- replaceOriginal – Replace source text with translation if
True. Used for early-translation
-
simplify(sourceLang, replaceOriginal=False)¶ Simplify sentences
Parameters: - sourceLang – two-letter code for language
- replaceOriginal – Replace source sentences with simplified sentences. Used for early-simplify.
-
__weakref__¶ list of weak references to the object (if defined)
-
Corpus class¶
-
class
clstk.corpus.Corpus(dirname)¶ Bases:
clstk.sentenceCollection.SentenceCollectionClass for source documents. Contains utilities for loading document set.
-
__init__(dirname)¶ Initialize the class
Parameters: dirname – Directory from where source documents are to be loaded
-
load(params, translate=False, replaceWithTranslation=False, simplify=False, replaceWithSimplified=False)¶ Load source docuement set
Parameters: - params –
dictcontaining different params includingsourceLangandtargetLang. - translate – Whether to translate sentences to target language
- replaceWithTranslation – Whether to replace source sentences with translation
- simplify – Whether to simplify sentences
- replaceWithSimplified – Whether to replace source sentences with simplified sentences
- params –
-
Summary class¶
-
class
clstk.summary.Summary¶ Bases:
clstk.sentenceCollection.SentenceCollection-
charCount()¶ Get total number of character in all the sentences
-
tokenCount()¶ Get total number of tokens in all the sentences
-
getSummary()¶ Get printable summary generated from source text
-
getTargetSummary()¶ Get printable summary generated from translated text
-