Introduction
OpenAI’s groundbreaking AI system, Sora AI, is revolutionizing video creation by generating realistic video footage solely from written prompts or descriptions. This article delves into the potential impact and timeline of Sora’s release.
Table of Contents
The Next Big Thing in AI is Sora AI
OpenAI’s Sora has captured attention for its ability to create photorealistic videos from text prompts, showcasing a glimpse of its transformative potential in the realm of video production.
Potential 2024 Launch
While an official release date for Sora is yet to be confirmed, hints from OpenAI insiders suggest a possible public launch in late 2024, potentially in Q4. Recent previews on YouTube have showcased the impressive photorealistic capabilities of Sora.
Limited Access So Far
Currently, Sora remains in active research and development at OpenAI, with only selected artists, filmmakers, and creatives granted early access to test and provide feedback on this innovative technology.
A Game-Changer for Video Production
Upon its public release, Sora could redefine video content creation by simplifying the process through text prompts, potentially streamlining complex CGI rendering and set-based filming.
Human Creativity Still Required
Despite its advanced capabilities, Sora is not designed to replace human creativity but rather enhance it. Creators will still play a crucial role in providing vision and narrative direction while leveraging Sora’s visual generation abilities.
Early Samples Hint at Vast Potential
Initial samples of Sora’s output have been awe-inspiring, showcasing the tool’s capacity to simplify tasks that traditionally required extensive VFX teams and budgets.
Impressive, But With Limitations
While Sora’s initial launch may have constraints in video length, processing time, and fidelity, even in its minimum viable product (MVP) form, it has the potential to democratize high-quality visual content creation.
Safety and Ethics a Priority
OpenAI is prioritizing ethical considerations around Sora’s release, focusing on mitigating risks related to deepfakes, copyright issues, and responsible deployment of this powerful generative AI technology.
Unanswered Questions Remain
Several aspects of Sora’s user experience, hardware requirements, prompting methodology, and technical capabilities are yet to be fully disclosed by OpenAI.
A Potential Multimedia Revolution
Sora’s emergence signifies a significant shift in video creation processes. If it lives up to expectations, Sora could catalyze a multimedia revolution by offering new avenues for content creation and storytelling.
Learn more about sora with help of our blog: blog.sorastartups.com
Here is the full transcript of the Interview with OpenAI CTO with WSJ :
The video captures sort of the detail of the prompt
when it comes to the hair and you know,
sort of like professionally-styled women.
- But you can also see some issues.
- Certainly, especially when it comes to the hands.
- [Joanna] These two women, not real.
They were created by Sora,
OpenAI’s text-to-video AI model.
But these two women, very real. - I’m Mira Murati,
CTO of OpenAI. - And former CEO.
- Yes, for two days.
- [Joanna] In November when OpenAI CEO, Sam Altman,
was momentarily ousted,
Murati stepped in.
Now she’s back to her previous job
running all the tech at the company including… - Sora is our video generation model.
It is just based on a text prompt
and it creates this hyper-realistic, beautiful,
highly-detailed video of one-minute length. - [Joanna] I’ve been blown away by the AI-generated videos,
yet also concerned about their impact.
So I asked OpenAI to generate some new videos for me
and sat down with Murati to get some answers.
Reviewing Sora’s videos
How does Sora work? - It’s fundamentally a diffusion model
which is a type of generative model.
It creates a more distilled image
starting from random noise. - [Joanna] Okay, here are the basics.
The AI model analyzed lots of videos
and learned to identify objects and actions.
When given a text prompt,
it creates a scene by defining the timeline
and adding detail to each frame.
What makes this AI video special compared to others
is how smooth and realistic it looks. - If you think about filmmaking,
people have to make sure that each frame continues
into the next frame with a sense of consistency
between objects and people.
And that’s what gives you a sense of realism
and a sense of presence.
And if you break that between frames,
then you get this disconnected sense
and reality is no longer there.
And so this is what Sora does really well. - You can see lots of that smoothness
in the videos OpenAI generated from the prompts I provided.
But you can also see flaws and glitches.
A female video producer on a sidewalk in New York City
holding a high-end cinema camera.
Suddenly, a robot yanks the camera out of her hand. - So in this one,
you can see the model doesn’t follow
the prompt very closely.
The robot doesn’t quite yank the camera out of her hand,
but the person sort of morphs into the robot.
Yeah, a lot of imperfections still. - One thing I noticed there too
is when the cars are going by,
they change colors. - Yeah, so while the model is quite good at continuity,
it’s not perfect.
So you kind of see the yellow cab disappearing
from the frame there for a while
and then it comes back in a different frame. - Would there be a way after the fact to say,
“Fix the taxi cabs in the back?” - Yeah, so eventually.
That’s what we’re trying to figure out,
how to use this technology as a tool
that people can edit and create with. - I wanted to go through one other…
What do you think the prompt was? - It looks like the bull in a China shop.
Yeah, metaphorically,
you’d imagine everything breaking in the scene, right?
And you see in some cases that the bull is stomping
on things and they’re still perfect.
They’re not breaking.
So that’s to be expected this early on.
And eventually, there’s gonna be more steerability
and control and more accuracy
in reflecting the intent of what you want. - And then there was that video of, well, us.
The woman on the left looks like
she has maybe like fingers in one of the shots. - [Mira] Hands actually have their own way of motion
and it’s very difficult to simulate the motion of hands. - In the clip, the mouths move but there’s no sound.
So is audio something you’re working on with Sora? - With Sora specifically,
not in this moment.
But we will eventually.
Optimizing and training Sora - [Joanna] Every time I watch a Sora clip,
I wonder what videos did this AI model learn from?
Did the model see any clips of Ferdinand
to know what a bull in a China shop should look like?
Was it a fan of SpongeBob? - Wow!
You look real good with a mustache, Mr. Crab. - By the way,
my prompt for this crab said nothing about a mustache.
What data was used to train Sora? - We used publicly available data and licensed data.
- So, videos on YouTube.
- I’m actually not sure about that.
- Okay.
Videos from Facebook, Instagram. - You know, if they were publicly available,
publicly available to use,
there might be the data,
but I’m not sure.
I’m not confident about it. - What about Shutterstock?
I know you guys have a deal with them. - I’m just not gonna go into the details of the data
that was used,
but it was publicly available or licensed data. - [Joanna] After the interview,
Murati confirmed that the licensed data
does include content from Shutterstock.
Those videos are 720p, seconds long.
How long does it take to generate those? - It could take a few minutes
depending on the complexity of the prompt.
Our goal was to really focus
on developing the best capability
and now we will start looking into optimizing the technology
so people can use it at low cost and make it easy to use. - To create these,
you must be using a lot of computing power.
Can you give me a sense of how much computing power
to create something like that
versus a ChatGPT response or a DALL-E image? - ChatGPT and DALL-E are optimized
for the public to be using them,
whereas Sora is really a research output.
It’s much, much more expensive.
We don’t know what it’s going to look like exactly
when we make it available eventually to the public,
but we’re trying to make it available
at similar cost eventually to what we saw with DALL-E. - You said eventually.
When is eventually? - I’m hoping definitely this year,
but could be a few months. - There’s an election in November.
You think before or after that? - You know, that’s certainly a consideration
dealing with the issues of misinformation
and harmful bias.
And we will not be releasing anything
that we don’t feel confident on
when it comes to how it might affect global elections
or other issues.
Concerns around Sora - Right now Sora is going through red teaming,
AKA the process where people test the tool
to make sure it’s safe, secure, and reliable.
The goal is to identify vulnerabilities, biases,
and other harmful issues.
What are things that just you won’t be able
to generate with this? - Well, we haven’t made those decisions yet,
but I think there will be consistency on our platform.
So similarly to DALL-E
where you can’t generate images of public figures,
I expect that we’ll have a similar policy for Sora.
And right
For more information on OpenAI’s Sora and its upcoming release date, stay tuned for updates from OpenAI and further insights into this groundbreaking AI technology(https://www.openai.com/sora),(https://www.youtube.com/openai),(https://www.openai.com/newsroom).
Leave a Reply