This page contains all the necessary information to reproduce the results given in the ISWC’17 poster “Learning Semantic Relatedness from Human Feedback Using Relative Relatedness Learning” by Thomas Niebler, Martin Becker, Christian Pölitz and Andreas Hotho, all members of the DMIR group at the University of Würzburg.
In our work, we learned a semantic relatedness measure from human feedback, using a metric learning approach. Human Intuition Datasets contain direct human judgments about the relatedness of words, i.e. human feedback. We exploit these datasets to then learn a parameterization of the cosine measure, while resorting to a metric learning approach, which is based on relative distance comparisons. We validate our approach on several different embedding datasets, which we either make public or provide a download a link here.
Furthermore and to the best of our knowledge, we were the first to explore the possibility of learning word embeddings from tagging data. We further elaborated on this in a different paper.
To calculate the tag cooccurrence graph as input for the GloVe algorithm, we applied the method presented in “Semantic Grounding of Tag Relatedness in Social Bookmarking Systems” by Cattuto et al.
In src/embeddings/example_call.py, we provided an example on how to call the corresponding methods to construct the co-occurrence graph. It then needs to be saved to a file, before the GloVe algorithm can be called on that file.
RRL is inspired by the LSML metric learning algorithm. We built on the LSML implementation contained in the metric_learn python package.
We used the published code of GloVe to create the tag embeddings of dimension 100. We used the predefined parameter values of alpha=0.75 and x_max=100.
These are the datasets that we used for our experiments.
Delicious The Delicious tagging dataset is publicly available. The generated word embeddings are published in this repository.
BibSonomy The BibSonomy tagging data can be retrieved from the BibSonomy homepage. We also provide the generated word embeddings as a public download in this repository.
WikiGlove Pennington et al. made some of their vector collections publicly available. Specifically, we used to GloVe6B corpus, which is generated from a Wikipedia dump from 2014 and the Gigaword5 corpus.
The Human Intuition Datasets (HIDs) can be retrieved as preprocessed pandas-friendly csv files here or from the corresponding original locations.