Masked multi-time diffusion for multi-modal generative modeling