CoversDataset

A dataset to train and test cover detection models.

Presentation

These datasets have been built with the SecondHandSongs api.

We provide here for each track a full spectral representation (the Constant-Q Transform, CQT), two melodic representations (dominant melody and multi-pitch) and two harmonic representations (plain Pitch Class Profile, PCP, and crema-Pitch Class Profile, crema-PCP).

The CQT and the PCP have been computed with the Librosa v.0.7.1 library. The dominant melody and the multi-pitch were obtained using our model described here. The crema-PCP was obtained with B. McFee’s model described here.

Each representation spans the first 3 minutes audio of each track (shorter tracks are zero-padded), for a final temporal resolution of 93ms per frame and a final frequency resolution of one bin per semi-tone. For the dominant melody, only the 3 octaves around the mean pitch have been kept (see paper for more details).

The PCP and the CQT have been log-compressed and trimmed at -80dB. All features have been globally normalized between [0,1].

 

Training and test datasets

There are two datasets: SHS$_{5+}$ for training and SHS$_{4-}$.

This dataset includes 7460 original works. Each work has a minimum of five covers, for a total of 62311 tracks (i.e. 8.4 covers per work in average).

Examples list CQT representations (26G) Dominant melody representations (13G) PCP representations (5G)
Multi-pitch representations (27G) Crema PCP representations (5G)

This dataset includes 19445 original works. Each work has a maximum of four covers, for a total of 48483 tracks (i.e. 2.5 covers per work in average). It is totally disjoint from SHS$_{5+}$. It is meant to be a more realistic dataset than SHS$_{5+}$, because most songs in real-life audio corpora have at most 2 or 3 covers, and we use it as a test set.

Examples list CQT representations (20G) Dominant melody representations (10G) PCP representations (4G)
Multi-pitch representations (21G) Crema PCP representations (4G)

 

Each row in the examples.csv file indicates the work label, followed by its cover track labels. Each corresponding representation files is named as track_label.ext, where ext is cqt.npz, f0_cqt.npy, multif0_cqt.npz, pcp.npy, cpcp.npy for the CQT, dominant melody, multi-pitch, PCP and crema-PCP, respectively.

 

Usage and license

The datasets are licensed under the Creative Commons Attribution Noncommercial License.

If you use these datasets, please cite our related works about cover detection using dominant melody and multi-pitch representations:

 
@inproceedings{doras2019cover,
    title={Cover Detection using Dominant Melody Embeddings},
    author={Doras, Guillaume and Peeters, Geoffroy},
    booktitle={Proceedings of ISMIR (International Society of Music Information Retrieval)},
    year={2019}
}
@inproceedings{doras2020prototypical,
    title={A Prototypical Triplet Loss for Cover Detection},
    author={Doras, Guillaume and Peeters, Geoffroy},
    booktitle={ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    year={2020},
    organization={IEEE}
}