What We Know About OpenAI’s Sora So Far – the New Text to Video AI

Yesterday, OpenAI – the creator of ChatGPT – announced Sora, and it took the internet by storm. Sora is the firm’s new AI model aimed at generating videos from text prompts.

That’s probably the reason you’ve seen a bunch of high-quality videos on Twitter in the past 24 hours.

here is sora, our video generation model:https://t.co/CDr4DdCrh1

today we are starting red-teaming and offering access to a limited number of creators.@_tim_brooks @billpeeb @model_mechanic are really incredible; amazing work by them and the team.

remarkable moment.

— Sam Altman (@sama) February 15, 2024

The release saw some of the most popular internet personalities engage, including but not limited to Marques Brownlee, MrBeast, Elon Musk, and many, many more.

And while there’s a lot of excitement, there are just as many unanswered questions, so let’s dive into what Sora is and what we know so far.

sora_cover

What is Sora?

If ChatGPT is OpenAI’s chat-based model, Sora is the firm’s “AI model that can create realistic and imaginative scenes from text instructions.”

In essence, it’s a text-to-video. You prompt it with instructions, and it produces a video that’s supposedly of high quality and up to one minute long.

There have already been plenty of examples. For instance, MrBeast replied to Sam Altman’s introductory tweet, asking him to create a video of a “monkey playing chess in a park.”

This is what the end video was:

pic.twitter.com/vb9giSg9np

— Sam Altman (@sama) February 15, 2024

At a glance, the video looks remarkably well done in high resolution with stunning visuals and no apparent defects. Upon closer inspection, you can see that the chessboard is not proportionate to the size of the pieces, but beyond that, which we assume is easily fixable with additional prompts, the video looks well-made.

The official website gives the following explanation of the model and the intentions of OpenAI for it:

We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.

There is a detailed technical report published on the website as well. Users can find more information here.

What’s Next?

OpenAI admits that Sora, in its current iteration, is not without its flaws:

It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.

Additionally, it may also have spacial details of a prompt confused – it might mix left and right and find complications with other precise descriptions.

The team has also said they’re building a set of tools to help detect misleading content.

Is Sora Available to the Public?

We’re starting with the first and foremost question that perhaps most of the ChatGPT user base is wondering about.

To be as precise as possible – no, Sora is not yet available to the general public. Altman shared that the text-to-video generating tool is currently only in the hands of a number of creators.

There is no precise timeline as to when the model will be rolled out to the general public.

The post What We Know About OpenAI’s Sora So Far – the New Text to Video AI appeared first on CryptoPotato.