r-doc2vec
Distributed representations of sentences, documents and topics
Learn vector representations of sentences, paragraphs or documents by using the Paragraph Vector algorithms, namely the distributed bag of words (PV-DBOW) and the distributed memory (PV-DM) model. Top2vec finds clusters in text documents by combining techniques to embed documents and words and density-based clustering. It does this by embedding documents in the semantic space as defined by the doc2vec algorithm. Next it maps these document embeddings to a lower-dimensional space using the Uniform Manifold Approximation and Projection (UMAP) clustering algorithm and finds dense areas in that space using a Hierarchical Density-Based Clustering technique (HDBSCAN). These dense areas are the topic clusters which can be represented by the corresponding topic vector which is an aggregate of the document embeddings of the documents which are part of that topic cluster. In the same semantic space similar words can be found which are representative of the topic.
- Versions: 0.2.0
- Website: https://github.com/bnosac/doc2vec
- Licenses: Expat
- Package source: gnu/packages/cran.scm
- Builds: See build status
- Issues: See known issues
Installation
Install the latest version of r-doc2vec
as follows:
guix install r-doc2vec
Or install a particular version:
guix install r-doc2vec@0.2.0
You can also install packages in augmented, pure or containerized environments for development or simply to try them out without polluting your user profile. See the guix shell
documentation for more information.
Badge code
HTML: <a href='http://127.0.0.1:3000/packages/r-doc2vec/'><img src='http://127.0.0.1:3000/packages/r-doc2vec/badges/latest-version.svg'></img></a> Markdown: [![GNU Guix](http://127.0.0.1:3000/packages/r-doc2vec/badges/latest-version.svg)](http://127.0.0.1:3000/packages/r-doc2vec/) Org: [[http://127.0.0.1:3000/packages/r-doc2vec/][http://127.0.0.1:3000/packages/r-doc2vec/badges/latest-version.svg]]