L-STAP: Learned Spatio-Temporal Adaptive Pooling for video captioning

PUBLICATION: L-STAP: Learned Spatio-Temporal Adaptive Pooling for video captioning

© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in AI4TV 2019, 1st International Workshop on AI for smart TV content production, access and delivery, co-located with the 27th ACM International Conference on Multimedia, 21 October 2019, Nice, France http://dx.doi.org/10.1145/3347449.3357484

Document