Generate a video from a text prompt (and optional source image) using AI, with synchronized native audio.