{"id":20734,"date":"2025-01-29T11:30:00","date_gmt":"2025-01-29T09:30:00","guid":{"rendered":"https:\/\/forklog.com\/en\/deepseek-jolts-markets-why-a-chinese-ai-proved-30-times-more-efficient-than-gpt-4\/"},"modified":"2025-01-29T11:30:00","modified_gmt":"2025-01-29T09:30:00","slug":"deepseek-jolts-markets-why-a-chinese-ai-proved-30-times-more-efficient-than-gpt-4","status":"publish","type":"post","link":"https:\/\/u1f987.com\/en\/deepseek-jolts-markets-why-a-chinese-ai-proved-30-times-more-efficient-than-gpt-4\/","title":{"rendered":"DeepSeek jolts markets: why a Chinese AI proved 30 times more efficient than GPT-4"},"content":{"rendered":"<p>In late January, the little-known Chinese startup DeepSeek found itself in the global spotlight. A modest $5.6m investment in developing a new model turned into a blow to markets \u2014 US tech giants collectively <a href=\"https:\/\/www.bloomberg.com\/news\/articles\/2025-01-27\/nasdaq-futures-slump-as-china-s-deepseek-sparks-us-tech-concern\">lost<\/a> almost $1 trillion in market value.<\/p>\n<p>The arrival of a low-cost alternative to ChatGPT, touted as a \u201cSilicon Valley killer\u201d, set the industry abuzz. ForkLog explains where DeepSeek came from, how it succeeded and what may lie ahead for the global market in language models.<\/p>\n<h2 class=\"wp-block-heading\"><strong>DeepSeek\u2019s rise<\/strong><\/h2>\n<p>DeepSeek struck out on its own in May 2023 in Hangzhou, the capital of Zhejiang province. The city is China\u2019s biggest e-commerce hub, home to the headquarters of giants such as Alibaba Group, Geely, Hikvision and Ant Group.<\/p>\n<p>Behind the project stands Liang Wenfeng \u2014 an entrepreneur and co-founder of the hedge fund High-Flyer, <a href=\"https:\/\/www.news10.com\/news\/technology\/ap-upstart-chinese-ai-company-deepseeks-founder-started-out-as-a-low-key-hedge-fund-entrepreneur\/\" target=\"_blank\" rel=\"noopener\" title=\"\">managing<\/a> $8bn in assets. Founded in 2015, the firm long showed an interest in machine learning, investing heavily in its own computing infrastructure as well as AI research. DeepSeek emerged from that structure.<\/p>\n<p>In 2020 High-Flyer unveiled the Fire-Flyer I supercomputer costing 200m yuan ($27.6m), specialised for deep-learning AI. A year later came Fire-Flyer II \u2014 a 1bn-yuan ($138m) system equipped with more than 10,000 Nvidia A100 graphics processors.<\/p>\n<p>DeepSeek\u2019s debut model, released in November 2023, immediately demonstrated performance on a par with GPT-4 and was made freely available for researchers and commercial users. By May 2024 the firm launched DeepSeek-V2; its aggressive pricing forced giants such as ByteDance, Tencent, Baidu and Alibaba to cut prices for their AI offerings. DeepSeek remained profitable as rivals booked losses.<\/p>\n<p>In December 2024 the company unveiled DeepSeek-V3, which in tests outperformed the latest from OpenAI and Anthropic. Building on it, the firm created <a href=\"https:\/\/x.com\/deepseek_ai\/status\/1881318130334814301\" target=\"_blank\" rel=\"noopener\" title=\"\">DeepSeek-R1<\/a> and its derivatives \u2014 the backbone of the much-discussed service.<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXf84SNWL1RbzBsMB72zrRR-lRCj4FRogoATnt465sRcKmQ42FzvjNjv1x8-5XIwG5KnRYwkJfib7VbpxM7bfjS09W1U71xc518yk4eEY4pcSLFEeVcqclrsP0Ooj4ix66T_wLLz?key=bz9NKW_ZFSN9j2Q1DtXHdoCH\" alt=\"DeepSeek jolts markets: why a Chinese AI proved 30 times more efficient than GPT-4\"\/><figcaption class=\"wp-element-caption\">Performance comparisons of DeepSeek models with OpenAI\u2019s across various tests. Source: DeepSeek.<\/figcaption><\/figure>\n<p>The chief advantage of the new model is its strikingly low cost of use. Processing 1m tokens with DeepSeek costs just $2.19, whereas OpenAI charges $60 for a comparable volume.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Behind the breakthrough: how DeepSeek\u2011R1 works<\/strong><\/h2>\n<p>According to the <a href=\"https:\/\/github.com\/deepseek-ai\/DeepSeek-R1\/blob\/main\/DeepSeek_R1.pdf\" target=\"_blank\" rel=\"noopener\" title=\"\">published<\/a> study, DeepSeek-R1 is built on reinforcement learning and \u201ccold start\u201d methods. That helped it reach exceptional performance in maths, coding and logical reasoning.<\/p>\n<p>A key feature is the <span data-descr=\"\u201cchain of thought\u201d\" class=\"old_tooltip\">Chain of Thought<\/span> approach, which breaks hard problems into sequential steps, mimicking human reasoning. The system analyses a task, divides it into stages and checks each step for errors before producing a final answer.<\/p>\n<p>The technical execution is notably efficient. DeepSeek-R1 was trained on a system of 2,048 Nvidia H800 accelerators, consuming about 2.788m <span data-descr=\"graphics processing unit\" class=\"old_tooltip\">GPU<\/span> hours. Optimisation comes from FP8 mixed precision and Multi-Token Prediction, which materially lowers hardware demands.<\/p>\n<p>The model\u2019s architecture comprises 671bn parameters. Uniquely, only 37bn of them are activated per pass. A Mixture of Experts enables scaling without a proportional rise in compute costs.<\/p>\n<p>Also noteworthy is the Group Relative Policy Optimization (GRPO) method. It lets the model train without a critic, markedly improving efficiency.<\/p>\n<p>As Jim Fan, a senior research manager at Nvidia, noted, this recalls the AlphaZero breakthrough from Google DeepMind, which learned to play Go and chess \u201cwithout prior imitation of human grandmaster moves.\u201d In his words, this is \u201cthe most important takeaway from the research paper.\u201d<\/p>\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\">\n<div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive \u2014 truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely.<\/p>\n<p>DeepSeek-R1 not only open-sources a barrage of models but\u2026 <a href=\"https:\/\/t.co\/M7eZnEmCOY\">pic.twitter.com\/M7eZnEmCOY<\/a><\/p>\n<p>\u2014 Jim Fan (@DrJimFan) <a href=\"https:\/\/twitter.com\/DrJimFan\/status\/1881353126210687089?ref_src=twsrc%5Etfw\">January 20, 2025<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/div>\n<\/figure>\n<h2 class=\"wp-block-heading\"><strong>A new way to train language models<\/strong><\/h2>\n<p>DeepSeek\u2019s training strategy is especially striking. Unlike other leading <span data-descr=\"large language models\" class=\"old_tooltip\">LLM<\/span>s, R1 did not undergo conventional \u201cpretraining\u201d on human-labelled data. The researchers found a way for the model to develop its own reasoning abilities almost from scratch.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u201cInstead of explicitly teaching the model how to solve problems, we simply provide it with the right incentives, and it autonomously develops cutting-edge strategies,\u201d the study says.<\/em><\/p>\n<\/blockquote>\n<p>The model also signals a new AI paradigm: rather than simply piling on training compute, the focus shifts to how much time and resource the model spends thinking before generating an answer. This scaling of test-time compute distinguishes the new class of \u201creasoning models\u201d such as DeepSeek R1 and OpenAI\u2011o1 from their predecessors.<\/p>\n<h2 class=\"wp-block-heading\"><strong>A critical look at the DeepSeek breakthrough<\/strong><\/h2>\n<p>DeepSeek\u2019s success has raised many questions among professionals. Scale AI chief executive Alexandr Wang <a href=\"https:\/\/www.cnbc.com\/video\/2025\/01\/23\/scale-ai-ceo-alexandr-wang-on-u-s-china-ai-race-we-need-to-unleash-u-s-energy-to-enable-ai-boom.html\" target=\"_blank\" rel=\"noopener\" title=\"\">claims<\/a> the company has 50,000 Nvidia H100 chips, which would run counter to American export curbs.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u201cAs I understand it, DeepSeek has 50,000 H100s [\u2026]. They cannot talk about them [publicly] because that contradicts US export controls,\u201d Wang said.<\/em><\/p>\n<\/blockquote>\n<p>After the restrictions were introduced, the price of smuggled H100s in China <a href=\"https:\/\/www.tomshardware.com\/pc-components\/gpus\/nvidias-ai-gpus-are-cheaper-to-rent-in-china-than-in-the-us-dollar6-per-hour-for-eight-nvidia-a100-gpus\" target=\"_blank\" rel=\"noopener\" title=\"\">soared<\/a> to $23,000\u201330,000, implying such a cluster would cost $1bn\u20131.5bn.<\/p>\n<p>Bernstein analysts doubt the stated $5.6m cost of training the V3 model and note the absence of figures for developing R1. In the <a href=\"https:\/\/www.ft.com\/content\/c82933fe-be28-463b-8336-d71a2ff5bbbf\" target=\"_blank\" rel=\"noopener\" title=\"\">view<\/a> of Peel Hunt\u2019s Damindu Jayaweera, the public numbers capture only GPU-hours, ignoring other material expenses.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u201cIt was trained in under 3 million GPU hours, which corresponds to a training cost of a little over $5 million. By comparison, analysts estimate that training the latest large AI model from Meta cost $60\u201370 million,\u201d Jayaweera said.<\/em><\/p>\n<\/blockquote>\n<p>The politics also raise concerns. The <a href=\"https:\/\/www.wsj.com\/tech\/ai\/china-ai-deepseek-chatbot-6ac4ad33\" target=\"_blank\" rel=\"noopener\" title=\"\">participation<\/a> of founder Liang Wenfeng in a closed symposium chaired by China\u2019s premier, Li Qiang, may point to a strategic role for the company in overcoming export curbs and bolstering the country\u2019s technological self\u2011sufficiency.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u201cThere is a high likelihood that DeepSeek and many other large Chinese companies are supported by the Chinese government not only in monetary terms,\u201d \u2014 <\/em><a href=\"https:\/\/time.com\/7210296\/chinese-ai-company-deepseek-stuns-american-ai-industry\/\" target=\"_blank\" rel=\"noopener\" title=\"\"><em>said<\/em><\/a><em> Edward Harris, chief technology officer of Gladstone AI, which works closely with the US government.<\/em><\/p>\n<\/blockquote>\n<p>There are also the model\u2019s built\u2011in censorship mechanisms in the <span data-descr=\"application programming interface\" class=\"old_tooltip\">API<\/span> version of R1, especially around politically sensitive topics for China. The model refuses to discuss the <a href=\"https:\/\/ru.wikipedia.org\/wiki\/%D0%A1%D0%BE%D0%B1%D1%8B%D1%82%D0%B8%D1%8F_%D0%BD%D0%B0_%D0%BF%D0%BB%D0%BE%D1%89%D0%B0%D0%B4%D0%B8_%D0%A2%D1%8F%D0%BD%D1%8C%D0%B0%D0%BD%D1%8C%D0%BC%D1%8D%D0%BD%D1%8C_(1989)\" target=\"_blank\" rel=\"noopener\" title=\"\">Tiananmen Square events<\/a>, human rights in China or Taiwan\u2019s status, substituting generated answers with stock evasions.<\/p>\n<p>Data privacy is another worry. According to DeepSeek\u2019s <a href=\"https:\/\/chat.deepseek.com\/downloads\/DeepSeek%20Privacy%20Policy.html\" target=\"_blank\" rel=\"noopener\" title=\"\">policy<\/a>, users\u2019 personal information is stored on servers in China, potentially exposing the firm to the same sort of problems that beset TikTok. The issue may be particularly acute in the US market, where regulators have already <a href=\"https:\/\/u1f987.com\/en\/news\/us-supreme-court-backs-tiktok-law-wazirx-freezes-3m-in-usdt-and-other-cybersecurity-developments\">shown<\/a> heightened attention to Chinese technology companies in the context of personal\u2011data protection.<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXciAN90khq1715Fbr-fdYMmgzeejauAdLH6Ssnf-w2QkoD4bhKIQFYnIYwzCIPKL-gNaD-SX7yT5QTef9Wf_MBF50nfRnLFotnHz48c-3M-BMcV0O0_bKClgrIz4-TQwAhXHv8kUg?key=bz9NKW_ZFSN9j2Q1DtXHdoCH\" alt=\"DeepSeek jolts markets: why a Chinese AI proved 30 times more efficient than GPT-4\"\/><figcaption class=\"wp-element-caption\">Excerpt from DeepSeek\u2019s privacy policy. Source: DeepSeek.<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\"><strong>The future of language models after DeepSeek<\/strong><\/h2>\n<p>Controversies aside, DeepSeek\u2019s achievements should not be underestimated. Test results indicate that R1 does outperform American peers on many measures. As Alexandr Wang put it, it is \u201ca wake up call for America\u201d, demanding faster innovation and tighter export controls on critical components.<\/p>\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\">\n<div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">DeepSeek is a wake up call for America, but it doesn\u2019t change the strategy:<\/p>\n<p>\u2014 USA must out-innovate &#038;race faster, as we have done in the entire history of AI<br \/>\u2014 Tighten export controls on chips so that we can maintain future leads<\/p>\n<p>Every major breakthrough in AI has been American<\/p>\n<p>\u2014 Alexandr Wang (@alexandr_wang) <a href=\"https:\/\/twitter.com\/alexandr_wang\/status\/1883368885640102092?ref_src=twsrc%5Etfw\">January 26, 2025<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/div>\n<\/figure>\n<p>Although OpenAI still leads the field, DeepSeek\u2019s emergence is reshaping the market for models and infrastructure. If the official numbers hold up, the Chinese company has built a competitive system at far lower cost through innovation and optimisation \u2014 a challenge to the brute-force scaling of compute embraced by many rivals.<\/p>\n<p>Interest in DeepSeek\u2019s techniques is growing: Meta has already <a href=\"https:\/\/www.theinformation.com\/articles\/meta-scrambles-after-chinese-ai-equals-its-own-upending-silicon-valley\" target=\"_blank\" rel=\"noopener\" title=\"\">set up<\/a> four \u201cwar rooms\u201d to study Chinese models, hoping to fold the lessons into its open-source Llama ecosystem.<\/p>\n<p>Some experts see DeepSeek\u2019s success less as a threat to American technological dominance than as a sign of an emerging multipolar AI world. As former OpenAI policy staffer Miles Brundage <a href=\"https:\/\/x.com\/Miles_Brundage\/status\/1881868116633993496\" target=\"_blank\" rel=\"noopener\" title=\"\">noted<\/a>:<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>\u201cChina will still get its own superintelligence(s) no more than a year later than the US, if there isn\u2019t a war. So if you don\u2019t want (literally) war, you need a vision for how to navigate multipolar AI outcomes.\u201d<\/em><\/p>\n<\/blockquote>\n<p>We appear to be at the start of a new era in artificial intelligence, where efficiency and optimisation may matter more than sheer computational muscle.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In late January, the little-known Chinese startup DeepSeek found itself in the global spotlight. A modest $5.6m investment in developing a new model turned into a blow to markets \u2014 US tech giants collectively lost almost $1 trillion in market value. The arrival of a low-cost alternative to ChatGPT, touted as a \u201cSilicon Valley killer\u201d, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":20733,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"","news_style_id":"","cryptorium_level":"","_short_excerpt_text":"","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[1144],"tags":[438,133,1190],"class_list":["post-20734","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-longreads","tag-artificial-intelligence","tag-china","tag-openai"],"aioseo_notices":[],"amp_enabled":true,"views":"34","promo_type":"","layout_type":"","short_excerpt":"","is_update":"","_links":{"self":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts\/20734","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/comments?post=20734"}],"version-history":[{"count":0,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/posts\/20734\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/media\/20733"}],"wp:attachment":[{"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/media?parent=20734"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/categories?post=20734"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/u1f987.com\/en\/wp-json\/wp\/v2\/tags?post=20734"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}