zeugma package¶

Submodules¶

zeugma.conf module¶

Created on the 05/01/18 @author: Nicolas Thiebaut @email: nkthiebaut@gmail.com

zeugma.embeddings module¶

class zeugma.embeddings.EmbeddingTransformer(model: str = 'glove', aggregation: str = 'average')[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Text vectorizer class: load pre-trained embeddings and transform texts into vectors.

fit(x: Iterable[Iterable[T_co]], y: Iterable[T_co] = None) → sklearn.base.BaseEstimator[source]¶: Has to define fit method to conform scikit-learn Transformer definition and integrate a sklearn.Pipeline object

transform(texts: Iterable[str]) → Iterable[Iterable[T_co]][source]¶: Transform corpus from single text transformation method

transform_sentence(text: Union[Iterable[T_co], str]) → numpy.array[source]¶: Compute an aggregate embedding vector for an input str or iterable of str.

zeugma.keras_transformers module¶

Created on the 02/05/2018 @author: Nicolas Thiebaut @email: nicolas@visage.jobs

class zeugma.keras_transformers.Padder(max_length=500)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Pad and crop uneven lists to the same length. Only the end of lists longer than the max_length attribute are kept, and lists shorter than max_length are left-padded with zeros

Variables:	max_length (int) – sizes of sequences after padding max_index (int) – maximum index known by the Padder, if a higher index is met during transform it is transformed to a 0

fit(X, y=None)[source]¶

transform(X, y=None)[source]¶

class zeugma.keras_transformers.TextsToSequences(**kwargs)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject, sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Sklearn transformer to convert texts to indices list

Example

>>> from zeugma import TextsToSequences
>>> sequencer = TextsToSequences()
>>> sequencer.fit_transform(["the cute cat", "the dog"])
[[1, 2, 3], [1, 4]]

fit(texts, y=None)[source]¶

transform(texts, y=None)[source]¶

zeugma.logger module¶

zeugma.texttransformers module¶

class zeugma.texttransformers.ItemSelector(key)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

For data grouped by feature, select subset of data at a provided key.

The data is expected to be stored in a 2D data structure, where the first index is over features and the second is over samples.

Parameters:	key (hashable, required) – The key corresponding to the desired value in a mappable.

fit(x, y=None)[source]¶: Necessary fit method to include transformer in a sklearn.Pipeline

transform(data_dict)[source]¶: Return selected items

class zeugma.texttransformers.Namer(key)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Return a single-entry dictionary with key given by the attribute ‘key’ and value is the input data

Parameters:	key (hashable, required) – The key corresponding to the output name.

fit(x, y=None)[source]¶: Necessary fit method to include transformer in a sklearn.Pipeline

transform(X)[source]¶: Return data in a dictionary with key provided at instantiation

class zeugma.texttransformers.RareWordsTagger(min_count, oov_tag='<oov>')[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Replace rare words with a token in a corpus (list of strings)

fit(texts, y=None)[source]¶

transform(texts)[source]¶

class zeugma.texttransformers.TextStats[source]¶

Bases: sklearn.preprocessing._function_transformer.FunctionTransformer

Extract features from each document for DictVectorizer

Module contents¶

Created on the 05/01/18 @author: Nicolas Thiebaut @email: nkthiebaut@gmail.com