Unsupervised multi-view dimensionality reduction with application to audio-visual speaker retrieval