{"id":63526,"date":"2022-06-27T12:34:51","date_gmt":"2022-06-27T09:34:51","guid":{"rendered":"https:\/\/forklog.com\/en\/?p=63526"},"modified":"2025-09-06T09:28:31","modified_gmt":"2025-09-06T06:28:31","slug":"cerebras-breaks-record-for-training-the-largest-ai-model-on-a-single-device","status":"publish","type":"post","link":"https:\/\/u1f987.com\/en\/cerebras-breaks-record-for-training-the-largest-ai-model-on-a-single-device\/","title":{"rendered":"Cerebras breaks record for training the largest AI model on a single device"},"content":{"rendered":"<p>An American startup Cerebras trained the &#8220;largest AI model&#8221; <a href=\"https:\/\/u1f987.com\/en\/news\/artificial-intelligence-what-it-is-and-how-it-works\">artificial intelligence<\/a> on a single device, equipped with the Wafer Scale Engine 2 (WSE-2) chip the size of a plate. Tom\u2019s Hardware reports.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;Using Cerebras&#8217; software platform (CSoft), our customers can easily train modern GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters in a single CS-2 system,&#8221; the company said.<\/p>\n<\/blockquote>\n<p>According to the startup&#8217;s representatives, Cerebras Weight Streaming separates compute resources, enabling memory to scale to any amount needed to store the rapidly growing number of parameters in AI workloads.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;Models running on a single CS-2 are tuned in minutes, and users can quickly switch between them with only a few keystrokes,&#8221; the statement said.<\/p>\n<\/blockquote>\n<p>Storing up to 20 <a href=\"https:\/\/u1f987.com\/en\/news\/what-is-natural-language-processing\">natural-language processing<\/a> models with billions of parameters on a single chip significantly reduces the training and scaling overhead of thousands of GPUs, the company said. They added that this is one of the most challenging aspects of NLP workloads, taking months to complete.<\/p>\n<p>The Wafer Scale Engine 2 chip is built on a 7-nanometer process, contains 850,000 cores, has 40 GB of on-chip memory with a bandwidth of 20 PB\/s and consumes around 15 kW.<\/p>\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"970\" height=\"546\" src=\"https:\/\/u1f987.com\/wp-content\/uploads\/LD9qjpJEtkozwwdbVQQK3d-970-80.png.png\" alt=\"Cerebras broke the record for AI model training on a single device\" class=\"wp-image-177162\" srcset=\"https:\/\/u1f987.com\/wp-content\/uploads\/LD9qjpJEtkozwwdbVQQK3d-970-80.png.png 970w, https:\/\/u1f987.com\/wp-content\/uploads\/LD9qjpJEtkozwwdbVQQK3d-970-80.png-300x169.png 300w, https:\/\/u1f987.com\/wp-content\/uploads\/LD9qjpJEtkozwwdbVQQK3d-970-80.png-768x432.png 768w\" sizes=\"auto, (max-width: 970px) 100vw, 970px\" \/><figcaption>Wafer Scale Engine 2 chip. Data: Cerebras<\/figcaption><\/figure>\n<p>In April 2021, Cerebras unveiled the WSE-2 processor, designed for computations in the field of machine learning and artificial intelligence.<\/p>\n<p>In August, the company built the CS-2 supercomputer. A CS-2-based installation can train an AI model with 120 billion parameters.<\/p>\n<p>In May 2022, the Top500 <a href=\"https:\/\/u1f987.com\/en\/news\/american-supercomputer-tops-top500-ranking\">was led by the American Frontier system<\/a>, developed by Oak Ridge National Laboratory. This was the first installation to reach a peak of 1.1 exaflops in the Linmark benchmark.<\/p>\n<p>Subscribe to ForkLog&#8217;s News on Telegram: <a href=\"https:\/\/t.me\/forklogAI\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">ForkLog AI<\/a> \u2014 all the news from the world of AI!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>An American startup Cerebras trained the &#8216;largest AI model&#8217; on a single device, equipped with the Wafer Scale Engine 2 chip the size of a plate.<\/p>\n","protected":false},"author":1,"featured_media":63527,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"1","news_style_id":"1","cryptorium_level":"","_short_excerpt_text":"","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[3],"tags":[438,1295],"class_list":["post-63526","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-analysis","tag-artificial-intelligence","tag-chips"],"aioseo_notices":[],"amp_enabled":true,"views":"21","promo_type":"1","layout_type":"1","short_excerpt":"","is_update":"","_links":{"self":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts\/63526","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/comments?post=63526"}],"version-history":[{"count":1,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts\/63526\/revisions"}],"predecessor-version":[{"id":63528,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts\/63526\/revisions\/63528"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/media\/63527"}],"wp:attachment":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/media?parent=63526"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/categories?post=63526"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/tags?post=63526"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}