📝 Natural language processing (NLP) utils: word embeddings (Word2Vec, GloVe, FastText, …) and preprocessing transformers, compatible with scikit-learn Pipelines. 🛠 Check the documentation for more information.
Install package with
pip install zeugma.
Embedding transformers can be either be used with downloaded embeddings (they all come with a default embedding URL) or trained.
As an illustrative example the cosine similarity of the sentences what is zeugma and a figure of speech is computed using the GloVe pretrained embeddings.:
>>> from zeugma.embeddings import EmbeddingTransformer >>> glove = EmbeddingTransformer('glove') >>> embeddings = glove.transform(['what is zeugma', 'a figure of speech']) >>> from sklearn.metrics.pairwise import cosine_similarity >>> cosine_similarity(embeddings)[0, 1] 0.8721696
Embeddings fine tuning (training embeddings with preloaded values) will be implemented in the future.
Feel free to fork this repo and submit a Pull Request.
The development workflow for this repo is the following:
- create a virtual environment:
python -m venv venv && source venv/bin/activate
- install required packages:
pip install -r requirements.txt
- install the pre-commit hooks:
- run the test suite with:
pytestfrom the root folder
Distribution via PyPI¶
To upload a new version to PyPI, simply:
- tag your new version on git:
git tag -a x.x -m "my tag message"
- update the download_url field in the
- commit, push the code and the tag (
git push origin x.x), and make a PR
- Make sure you have a
.pypircfile structured like this in your home folder (you can use
https://upload.pypi.org/legacy/for the URL field)
- once the updated code is present in master run
python setup.py sdist && twine upload dist/*from the root of the package to distribute it.
To build the documentation locally simply run
make html from the