Predicting media memorability with audio, video, and text representations