OpenAI, the maker of ChatGPT, has unveiled a new tool that can instantly make short videos in response to written commands. The tool, called Sora, is a text-to-video generator that can create realistic and imaginative videos from text instructions, using a large-scale generative model that operates on spacetime patches of video and image latent codes.

Sora can generate videos up to a minute long, while maintaining the visual quality and adherence to the user’s prompt. Sora can also animate static images, transforming them into dynamic video presentations. Sora can generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background.

Sora is a breakthrough in the field of video generation, as it demonstrates the feasibility and the potential of generating realistic and imaginative videos from text instructions, without requiring any additional input or guidance. Sora is also a demonstration of the power and the challenges of large-scale generative models, as it showcases both the strengths and the limitations of the current state-of-the-art technology.

Sora is not yet publicly available, but OpenAI has revealed some details and examples of its performance and capabilities. For example, in response to a social media user’s request for “a monkey playing chess in a park”, Sora generated a high-quality video of a monkey sitting on a bench with a chessboard and moving the pieces with its hands. The video was shared by OpenAI CEO Sam Altman on the X platform1.

Sora is currently available for red teaming, which helps identify flaws in the AI system, as well as for use by visual artists, designers, and filmmakers to gain feedback on the model, the company said in a statement. According to OpenAI, Sora may confuse the spatial details of a prompt and have difficulty in following a specific camera trajectory. OpenAI said it was also developing tools that can discern if a video was generated by Sora.

Sora is different from similar technologies developed by other tech giants, such as Google and Meta, in terms of quality and diversity. Google’s Emu can generate videos from text prompts, but only for a limited number of categories, such as animals, flowers, and landscapes. Meta’s Vid2Play can generate videos of a person performing an action, given a text description of the action and a single image of the person, but the videos are often blurry and unrealistic.

Sora is the latest product of OpenAI, a research organization that aims to create and promote beneficial and ethical AI. OpenAI is backed by Microsoft and other prominent investors, such as Elon Musk, Peter Thiel, and Reid Hoffman. OpenAI is known for its cutting-edge AI models, such as ChatGPT, which can generate coherent and engaging texts from prompts, and DALL-E, which can create images from texts.

Sora is expected to have many applications and implications, such as storytelling, education, communication, and entertainment. However, it also raises some ethical and social concerns, such as the potential misuse of the technology for deception, manipulation, or propaganda. Therefore, OpenAI urges the public and the policymakers to be aware and responsible of the risks and the opportunities of this technology.

