This project is maintained by thomasniebler
This repository contains the generated embeddings from the KDML submission “Learning Word Embeddings from Tagging Data: A Methodological Comparison” by Thomas Niebler, Luzian Hahn and Andreas Hotho.
In our work, we compared the three embedding algorithms Word2Vec, GloVe and LINE with regard to their applicability on tagging data from folksonomies and the semantic quality of the produced embeddings.
We used the following embedding algorithm implementations in our work:
We published the generated embeddings for each tagging dataset in the embeddings directory in the repository.
The Delicious tagging dataset is publicly available.
The BibSonomy tagging data can be retrieved from the BibSonomy homepage.
The CiteULike tagging data can be retrieved from CiteULike.
The Human Intuition Datasets (HIDs) can be retrieved as preprocessed pandas-friendly csv files here or from the corresponding original locations.
In the following, we will present the results for the experiments performed in the paper, but evaluated on WS353, MTurk and Bib100.