Harnessing multimodality: Diffusion based generative modeling and information estimation