{"id":68191,"date":"2022-10-06T14:45:14","date_gmt":"2022-10-06T11:45:14","guid":{"rendered":"https:\/\/forklog.com\/en\/?p=68191"},"modified":"2025-09-07T12:48:03","modified_gmt":"2025-09-07T09:48:03","slug":"google-unveils-text-to-video-generator-based-on-imagen","status":"publish","type":"post","link":"https:\/\/u1f987.com\/en\/google-unveils-text-to-video-generator-based-on-imagen\/","title":{"rendered":"Google unveils text-to-video generator based on Imagen"},"content":{"rendered":"<p>Researchers at Google announced the development of an artificial intelligence system, Imagen Video, capable of generating video from textual prompts at a resolution of 1280\u00d7768 pixels and a frame rate of 24 frames per second.<\/p>\n<figure class=\\\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\\\">\n<div class=\\\"wp-block-embed__wrapper\\\">\n<blockquote class=\\\"twitter-tweet\\\" data-width=\\\"500\\\" data-dnt=\\\"true\\\">\n<p lang=\\\"en\\\" dir=\\\"ltr\\\">Excited to announce Imagen Video, our new text-conditioned video diffusion model that generates 1280\u00d7768 24fps HD videos! <a href=\\\"https:\/\/twitter.com\/hashtag\/ImagenVideo?src=hash&#038;ref_src=twsrc%5Etfw\\\">#ImagenVideo<\/a><a href=\\\"https:\/\/t.co\/JWj3L7MpBU\\\">https:\/\/t.co\/JWj3L7MpBU<\/a><br \/>Work w\/ <a href=\\\"https:\/\/twitter.com\/wchan212?ref_src=twsrc%5Etfw\\\">@wchan212<\/a> <a href=\\\"https:\/\/twitter.com\/Chitwan_Saharia?ref_src=twsrc%5Etfw\\\">@Chitwan_Saharia<\/a> <a href=\\\"https:\/\/twitter.com\/jaywhang_?ref_src=twsrc%5Etfw\\\">@jaywhang_<\/a> <a href=\\\"https:\/\/twitter.com\/RuiqiGao?ref_src=twsrc%5Etfw\\\">@RuiqiGao<\/a> <a href=\\\"https:\/\/twitter.com\/agritsenko?ref_src=twsrc%5Etfw\\\">@agritsenko<\/a> <a href=\\\"https:\/\/twitter.com\/dpkingma?ref_src=twsrc%5Etfw\\\">@dpkingma<\/a> <a href=\\\"https:\/\/twitter.com\/poolio?ref_src=twsrc%5Etfw\\\">@poolio<\/a> <a href=\\\"https:\/\/twitter.com\/mo_norouzi?ref_src=twsrc%5Etfw\\\">@mo_norouzi<\/a> <a href=\\\"https:\/\/twitter.com\/fleet_dj?ref_src=twsrc%5Etfw\\\">@fleet_dj<\/a> <a href=\\\"https:\/\/twitter.com\/TimSalimans?ref_src=twsrc%5Etfw\\\">@TimSalimans<\/a> <a href=\\\"https:\/\/t.co\/eN81LqZW7I\\\">pic.twitter.com\/eN81LqZW7I<\/a><\/p>\n<p>\u2014 Jonathan Ho (@hojonathanho) <a href=\\\"https:\/\/twitter.com\/hojonathanho\/status\/1577712621037445121?ref_src=twsrc%5Etfw\\\">October 5, 2022<\/a><\/p><\/blockquote>\n<p><script async src=\\\"https:\/\/platform.twitter.com\/widgets.js\\\" charset=\\\"utf-8\\\"><\/script>\n<\/div>\n<\/figure>\n<p>The tool is based on the Imagen algorithm, which is an analogue of DALL-E 2 and Stable Diffusion. The image generator uses a large pre-trained language neural network and a cascaded diffusion model, combining \\&#8221;a deep level of understanding of words with an unprecedented degree of photorealism\\&#8221;.<\/p>\n<figure class=\\\"wp-block-image\\\"><img decoding=\\\"async\\\" src=\\\"https:\/\/lh6.googleusercontent.com\/zejzlRe-dLHs5wHFs3FUGyivdODrLOTqJEHTovAT_WBomR-YZdS8r2b3flOoZEqgwQCJsei9svs-4zRNJJlHKOcQx2j5-6_QXi0Ku9QJvmgdnYr80o3jEkPDJUsEiigHdHdDiYov63SKbbcS7_GuZK5gZ5FwdFnKt-5s_fqX3ypZmtcyoKvVOVw2Fg\\\" alt=\\\"Google unveils text-to-video generator based on Imagen\\\"\/><figcaption>Images generated by Imagen. Data: Google.<\/figcaption><\/figure>\n<p>According to Google researchers, Imagen Video takes a text description and creates a 16-frame clip at a resolution of 24\u00d748 pixels and a frame rate of 3 <span data-descr=\\\"unit of measurement for frame rate showing how many video frames are displayed per second\\\" class=\\\"old_tooltip\\\">FPS<\/span>. The system then scales up and \\&#8221;predicts\\&#8221; additional frames.<\/p>\n<p>As a result, the algorithm generates a 128-frame animation at a resolution of 1280\u00d7768 pixels and a frame rate of 24 FPS.<\/p>\n<figure class=\\\"wp-block-video\\\"><video controls src=\\\"https:\/\/u1f987.com\/wp-content\/uploads\/cdm-1.mp4\\\"><\/video><figcaption>First stage of generating Imagen Video. Data: Google.<\/figcaption><\/figure>\n<figure class=\\\"wp-block-video\\\"><video controls src=\\\"https:\/\/u1f987.com\/wp-content\/uploads\/cdm-5.mp4\\\"><\/video><figcaption>Intermediate stage of generating Imagen Video. Data: Google.<\/figcaption><\/figure>\n<figure class=\\\"wp-block-video\\\"><video controls src=\\\"https:\/\/u1f987.com\/wp-content\/uploads\/fairytale.mp4\\\"><\/video><figcaption>Final video generated by Imagen Video. Data: Google.<\/figcaption><\/figure>\n<p>To train Imagen Video, developers used 14 million video-caption pairs and 60 million image-text pairs, as well as the public dataset <span data-descr=\\\"a dataset containing more than 400 million image-text pairs\\\" class=\\\"old_tooltip\\\">LAION-400M<\/span>, which allowed the model to apply a range of aesthetic aspects.<\/p>\n<figure class=\\\"wp-block-video\\\"><video controls src=\\\"https:\/\/u1f987.com\/wp-content\/uploads\/31.mp4\\\"><\/video><figcaption>Video generated by Imagen Video. Data: Google.<\/figcaption><\/figure>\n<p>During testing, researchers found that the algorithm could produce \\&#8221;watercolor\\&#8221; videos or transfer the style of Van Gogh. They said Imagen Video demonstrated an understanding of depth and three-dimensionality, enabling it to generate videos as if recorded by a drone.<\/p>\n<figure class=\\\"wp-block-video\\\"><video controls src=\\\"https:\/\/u1f987.com\/wp-content\/uploads\/39.mp4\\\"><\/video><figcaption>Video generated by Imagen Video. Data: Google.<\/figcaption><\/figure>\n<p>The system can also render text correctly.<\/p>\n<blockquote class=\\\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\\\">\n<p>\\&#8221;Unlike Stable Diffusion and DALL-E 2, which try to turn a prompt like &#8216;logo for Diffusion&#8217; into readable words, Imagen Video reproduces it without issue,\\&#8221; the project paper states.<\/p>\n<\/blockquote>\n<p>According to Matthew Guzdial, an AI researcher at the University of Alberta, the problem of turning text into video remains unsolved.<\/p>\n<blockquote class=\\\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\\\">\n<p>\\&#8221;We are unlikely to reach something like DALL-E 2 or Midjourney in terms of quality [of video creation] any time soon,\\&#8221; he said.<\/p>\n<\/blockquote>\n<p>To reduce jitter and distortions, the Imagen Video team plans to join forces with Phenaki developers. This is another Google generator that turns long, detailed prompts into two-minute low-quality clips.<\/p>\n<p>Google also notes that the data used for training contained inappropriate content, which meant Imagen Video sometimes generated clips depicting violence or sexual content. The company therefore does not plan to release the model or its source code until the issue is fixed.<\/p>\n<p>In September, an enthusiast developed a text-to-video animation generator based on Stable Diffusion Video.<\/p>\n<p>In August, TikTok unveiled a tool for creating video backgrounds from text prompts.<\/p>\n<p>In June, Chinese researchers developed the CogVideo transformer with 9 billion parameters to translate text into animation.<\/p>\n<p>Subscribe to ForkLog News on Telegram: <a href=\\\"https:\/\/t.me\/forklogAI\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener nofollow\\\">ForkLog AI<\/a> \u2014 all the news from the AI world!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Researchers at Google announced the development of an artificial intelligence system, Imagen Video, capable of generating video from textual prompts at a resolution of 1280\u00d7768 pixels and a frame rate of 24 frames per second.<\/p>\n","protected":false},"author":1,"featured_media":68192,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"1","news_style_id":"1","cryptorium_level":"","_short_excerpt_text":"","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[3],"tags":[438,738],"class_list":["post-68191","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-analysis","tag-artificial-intelligence","tag-google"],"aioseo_notices":[],"amp_enabled":true,"views":"16","promo_type":"1","layout_type":"1","short_excerpt":"","is_update":"","_links":{"self":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts\/68191","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/comments?post=68191"}],"version-history":[{"count":1,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts\/68191\/revisions"}],"predecessor-version":[{"id":68193,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts\/68191\/revisions\/68193"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/media\/68192"}],"wp:attachment":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/media?parent=68191"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/categories?post=68191"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/tags?post=68191"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}