VideoPoet is a state-of-the-art video generation website developed by Google Research that employs a large language model (LLM) to create high-quality, variable-length videos given a text prompt. The website offers an array of tools such as text-to-video, image-to-video, video editing, stylization, and inpainting. VideoPoet can output videos in both square and portrait orientation and also produces high-quality audio from a video input.
The website uses a few basic components such as a pre-trained video tokenizer and audio tokenizer, an autoregressive language model, and a mixture of multimodal generative learning objectives for training. It can also generate longer videos by predicting one second of video output given a one-second video clip, which can be repeated indefinitely to produce a video of any desired duration.
VideoPoet also offers tools for video editing such as controllable and interactive editing, which allows users to choose the types of desired motion from a larger generated video. Additionally, the website allows for image-to-video generation and zero-shot stylization tools that can stylize input videos based on a text prompt.
The VideoPoet team comprises a group of researchers who have made equal technical contributions in the development of the website. VideoPoet provides an overview of the model in its paper and blog and offers various additional results on its pages for text-to-video, image-to-video, video editing, stylization, and inpainting.
Overall, VideoPoet is a powerful video generation tool that enables users to create high-quality videos with a high degree of temporal consistency using a simple modeling method.