zeugma package


zeugma.conf module

Created on the 05/01/18 @author: Nicolas Thiebaut @email: nkthiebaut@gmail.com

zeugma.embeddings module

class zeugma.embeddings.EmbeddingTransformer(model: str = 'glove', aggregation: str = 'average')[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Text vectorizer class: load pre-trained embeddings and transform texts into vectors.

fit(x: Iterable[Iterable[T_co]], y: Iterable[T_co] = None) → sklearn.base.BaseEstimator[source]

Has to define fit method to conform scikit-learn Transformer definition and integrate a sklearn.Pipeline object

transform(texts: Iterable[str]) → Iterable[Iterable[T_co]][source]

Transform corpus from single text transformation method

transform_sentence(text: Union[Iterable[T_co], str]) → numpy.array[source]

Compute an aggregate embedding vector for an input str or iterable of str.

zeugma.keras_transformers module

Created on the 02/05/2018 @author: Nicolas Thiebaut @email: nicolas@visage.jobs

class zeugma.keras_transformers.Padder(max_length=500)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Pad and crop uneven lists to the same length. Only the end of lists longer than the max_length attribute are kept, and lists shorter than max_length are left-padded with zeros

  • max_length (int) – sizes of sequences after padding
  • max_index (int) – maximum index known by the Padder, if a higher index is met during transform it is transformed to a 0
fit(X, y=None)[source]
transform(X, y=None)[source]
class zeugma.keras_transformers.TextsToSequences(**kwargs)[source]

Bases: sphinx.ext.autodoc.importer._MockObject, sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Sklearn transformer to convert texts to indices list


>>> from zeugma import TextsToSequences
>>> sequencer = TextsToSequences()
>>> sequencer.fit_transform(["the cute cat", "the dog"])
[[1, 2, 3], [1, 4]]
fit(texts, y=None)[source]
transform(texts, y=None)[source]

zeugma.logger module

zeugma.texttransformers module

class zeugma.texttransformers.ItemSelector(key)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

For data grouped by feature, select subset of data at a provided key.

The data is expected to be stored in a 2D data structure, where the first index is over features and the second is over samples.

Parameters:key (hashable, required) – The key corresponding to the desired value in a mappable.
fit(x, y=None)[source]

Necessary fit method to include transformer in a sklearn.Pipeline


Return selected items

class zeugma.texttransformers.Namer(key)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Return a single-entry dictionary with key given by the attribute ‘key’ and value is the input data

Parameters:key (hashable, required) – The key corresponding to the output name.
fit(x, y=None)[source]

Necessary fit method to include transformer in a sklearn.Pipeline


Return data in a dictionary with key provided at instantiation

class zeugma.texttransformers.RareWordsTagger(min_count, oov_tag='<oov>')[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Replace rare words with a token in a corpus (list of strings)

fit(texts, y=None)[source]
class zeugma.texttransformers.TextStats[source]

Bases: sklearn.preprocessing._function_transformer.FunctionTransformer

Extract features from each document for DictVectorizer

Module contents

Created on the 05/01/18 @author: Nicolas Thiebaut @email: nkthiebaut@gmail.com