These datasets have been built with the SecondHandSongs api.
We provide here for each track a full spectral representation (the Constant-Q Transform, CQT), two melodic representations (dominant melody and multi-pitch) and two harmonic representations (plain Pitch Class Profile, PCP, and crema-Pitch Class Profile, crema-PCP).
The CQT and the PCP have been computed with the Librosa v.0.7.1 library. The dominant melody and the multi-pitch were obtained using our model described here. The crema-PCP was obtained with B. McFee’s model described here.
Each representation spans the first 3 minutes audio of each track (shorter tracks are zero-padded), for a final temporal resolution of 93ms per frame and a final frequency resolution of one bin per semi-tone. For the dominant melody, only the 3 octaves around the mean pitch have been kept (see paper for more details).
The PCP and the CQT have been log-compressed and trimmed at -80dB. All features have been globally normalized between [0,1].
There are two datasets: SHS$_{5+}$ for training and SHS$_{4-}$.
This dataset includes 7460 original works. Each work has a minimum of five covers, for a total of 62311 tracks (i.e. 8.4 covers per work in average).
Examples list | CQT representations (26G) | Dominant melody representations (13G) | PCP representations (5G) |
Multi-pitch representations (27G) | Crema PCP representations (5G) |
This dataset includes 19445 original works. Each work has a maximum of four covers, for a total of 48483 tracks (i.e. 2.5 covers per work in average). It is totally disjoint from SHS$_{5+}$. It is meant to be a more realistic dataset than SHS$_{5+}$, because most songs in real-life audio corpora have at most 2 or 3 covers, and we use it as a test set.
Examples list | CQT representations (20G) | Dominant melody representations (10G) | PCP representations (4G) |
Multi-pitch representations (21G) | Crema PCP representations (4G) |
Each row in the examples.csv
file indicates the work label, followed by its cover track labels.
Each corresponding representation files is named as track_label.ext
, where ext
is
cqt.npz
, f0_cqt.npy
, multif0_cqt.npz
, pcp.npy
, cpcp.npy
for the CQT,
dominant melody, multi-pitch, PCP and crema-PCP, respectively.
The datasets are licensed under the Creative Commons Attribution Noncommercial License.
If you use these datasets, please cite our related works about cover detection using dominant melody and multi-pitch representations:
@inproceedings{doras2019cover, title={Cover Detection using Dominant Melody Embeddings}, author={Doras, Guillaume and Peeters, Geoffroy}, booktitle={Proceedings of ISMIR (International Society of Music Information Retrieval)}, year={2019} }
@inproceedings{doras2020prototypical, title={A Prototypical Triplet Loss for Cover Detection}, author={Doras, Guillaume and Peeters, Geoffroy}, booktitle={ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year={2020}, organization={IEEE} }