Core¶
The core contains the bootstrap code for summarization needs. The core provides:
- A common standard structure for documents and summaries to ensure interoperability between different components.
- Utilities for loading document sets into the common structure.
- Common utilities on document sets, documents and sentences, for example sentence splitting, tokenization, etc.
Sentence
class¶
-
class
clstk.sentence.
Sentence
(sentenceText)¶ Bases:
object
Class to represent a single sentence
-
__init__
(sentenceText)¶ Set sentence text and translated text
Parameters: sentenceText – sentence text
-
setText
(sentenceText)¶ Set text for the sentence
Parameters: sentenceText – sentence text
-
getText
()¶ Get sentence text
Returns: sentence text
-
setTranslation
(translation)¶ Set translated text
Parameters: translation – translated text
-
getTranslation
()¶ Get translated text
The translated text defaults to sentence text
Returns: translated text
-
setVector
(vector)¶ Set sentence vector
Parameters: vector – sentence vector
-
getVector
()¶ Get sentence vector
Returns: sentence vector
-
setTranslationVector
(vector)¶ Set sentence vector for translated text
Parameters: vector – sentence vector
-
getTranslationVector
()¶ Get sentence vector for translated text
Returns: sentence vector
-
setExtra
(key, value)¶ Set extra key-value pair
Parameters: - key – key for the stored value
- value – value to store
-
getExtra
(key, default=None)¶ Get extra value from key
Parameters: - key – key for the stored value
- default – default value if key not found
-
charCount
()¶ Get character count for translated text
Returns: Number of character in translated text
-
tokenCount
()¶ Get token count for translated text
Returns: Number of tokens in translated text
-
__weakref__
¶ list of weak references to the object (if defined)
-
SentenceCollection
class¶
-
class
clstk.sentenceCollection.
SentenceCollection
¶ Bases:
object
Class to store a colelction of sentences.
Also proivdes several common operations on the collection.
-
__init__
()¶ Initialize the collection
-
setSourceLang
(lang)¶ Set source language for the colelction
Parameters: lang – two-letter code for source language
-
setTargetLang
(lang)¶ Set target language for the colelction
Parameters: lang – two-letter code for target language
-
addSentence
(sentence)¶ Add a sentence to the colelction
Parameters: sentence – sentence to be added
-
addSentences
(sentences)¶ Add sentences to the colelction
Parameters: sentences – list of sentence to be added
-
getSentences
()¶ Get list of sentences in the collection
Returns: list of sentences
-
getSentenceVectors
()¶ Get list of sentence vectors for sentences in the collection
Returns: np.array
containing sentence vectors
-
getTranslationSentenceVectors
()¶ Get list of sentence vectors for translations of sentences in the collection
Returns: np.array
containing sentence vectors
-
generateSentenceVectors
()¶ Generate sentence vectors
-
generateTranslationSentenceVectors
()¶ Generate sentence vectors for translations
-
translate
(sourceLang, targetLang, replaceOriginal=False)¶ Translate sentences
Parameters: - sourceLang – two-letter code for source language
- targetLang – two-letter code for target language
- replaceOriginal – Replace source text with translation if
True
. Used for early-translation
-
simplify
(sourceLang, replaceOriginal=False)¶ Simplify sentences
Parameters: - sourceLang – two-letter code for language
- replaceOriginal – Replace source sentences with simplified sentences. Used for early-simplify.
-
__weakref__
¶ list of weak references to the object (if defined)
-
Corpus
class¶
-
class
clstk.corpus.
Corpus
(dirname)¶ Bases:
clstk.sentenceCollection.SentenceCollection
Class for source documents. Contains utilities for loading document set.
-
__init__
(dirname)¶ Initialize the class
Parameters: dirname – Directory from where source documents are to be loaded
-
load
(params, translate=False, replaceWithTranslation=False, simplify=False, replaceWithSimplified=False)¶ Load source docuement set
Parameters: - params –
dict
containing different params includingsourceLang
andtargetLang
. - translate – Whether to translate sentences to target language
- replaceWithTranslation – Whether to replace source sentences with translation
- simplify – Whether to simplify sentences
- replaceWithSimplified – Whether to replace source sentences with simplified sentences
- params –
-
Summary
class¶
-
class
clstk.summary.
Summary
¶ Bases:
clstk.sentenceCollection.SentenceCollection
-
charCount
()¶ Get total number of character in all the sentences
-
tokenCount
()¶ Get total number of tokens in all the sentences
-
getSummary
()¶ Get printable summary generated from source text
-
getTargetSummary
()¶ Get printable summary generated from translated text
-