Best Technology news & reviews
Latest
AI
Amazon
Apps
Biotech & Health
Climate
Cloud Computing
Commerce
Crypto
Enterprise
EVs
Fintech
Fundraising
Gadgets
Gaming
Google
Government & Policy
Hardware
Instagram
Layoffs
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
Social
Space
Startups
TikTok
Transportation
Venture
Events
Startup Battlefield
StrictlyVC
Newsletters
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
A Chinese lab has created what appears to be one of the most powerful “open” AI models to date.
The model, DeepSeek V3, was developed by the AI firm DeepSeek and was released on Wednesday under a permissive license that allows developers to download and modify it for most applications, including commercial ones.
DeepSeek V3 can handle a range of text-based workloads and tasks, like coding, translating, and writing essays and emails from a descriptive prompt.
According to DeepSeek’s internal benchmark testing, DeepSeek V3 outperforms both downloadable, “openly” available models and “closed” AI models that can only be accessed through an API. In a subset of coding competitions hosted on Codeforces, a platform for programming contests, DeepSeek outperforms other models, including Meta’s Llama 3.1 405B, OpenAI’s GPT-4o, and Alibaba’s Qwen 2.5 72B.
DeepSeek V3 also crushes the competition on Aider Polyglot, a test designed to measure, among other things, whether a model can successfully write new code that integrates into existing code.
DeepSeek-V3!
60 tokens/second (3x faster than V2!)
API compatibility intact
Fully open-source models & papers
671B MoE parameters
37B activated parameters
Trained on 14.8T high-quality tokens
Beats Llama 3.1 405b on almost every benchmark https://t.co/OiHu17hBSI pic.twitter.com/jVwJU07dqf
— Chubby♨️ (@kimmonismus) December 26, 2024
DeepSeek claims that DeepSeek V3 was trained on a dataset of 14.8 trillion tokens. In data science, tokens are used to represent bits of raw data — 1 million tokens is equal to about 750,000 words.
It’s not just the training set that’s massive. DeepSeek V3 is enormous in size: 671 billion parameters, or 685 billion on AI dev platform Hugging Face. (Parameters are the internal variables models use to make predictions or decisions.) That’s around 1.6 times the size of Llama 3.1 405B, which has 405 billion parameters.
DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M).
For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being… https://t.co/EW7q2pQ94B
Parameter count often (but not always) correlates with skill; models with more parameters tend to outperform models with fewer parameters. But large models also require beefier hardware in order to run. An unoptimized version of DeepSeek V3 would need a bank of high-end GPUs to answer questions at reasonable speeds.
While it’s not the most practical model, DeepSeek V3 is an achievement in some respects. DeepSeek was able to train the model using a data center of Nvidia H800 GPUs in just around two months — GPUs that Chinese companies were recently restricted by the U.S. Department of Commerce from procuring. The company also claims it only spent $5.5 million to train DeepSeek V3, a fraction of the development cost of models like OpenAI’s GPT-4.
The downside is that the model’s political views are a bit… stilted. Ask DeepSeek V3 about Tiananmen Square, for instance, and it won’t answer.
DeepSeek, being a Chinese company, is subject to benchmarking by China’s internet regulator to ensure its models’ responses “embody core socialist values.” Many Chinese AI systems decline to respond to topics that might raise the ire of regulators, like speculation about the Xi Jinping regime.
DeepSeek, which in late November unveiled DeepSeek-R1, an answer to OpenAI’s o1 “reasoning” model, is a curious organization. It’s backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses AI to inform its trading decisions.
High-Flyer builds its own server clusters for model training, one of the most recent of which reportedly has 10,000 Nvidia A100 GPUs and cost 1 billion yen (~$138 million). Founded by Liang Wenfeng, a computer science graduate, High-Flyer aims to achieve “superintelligent” AI through its DeepSeek org.
In an interview earlier this year, Wenfeng characterized closed-source AI like OpenAI’s as a “temporary” moat. “[It] hasn’t stopped others from catching up,” he noted.
Indeed.
TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.
Topics
Microsoft and OpenAI have a financial definition of AGI: Report
These were the badly handled data breaches of 2024
Elon Musk’s xAI lands $6B in new cash to fuel AI ambitions
$132K – $149K, here’s what seed-stage founders pay early employees, based on data
GV, the VC team backed by Google, has a broad remit, but it can’t do one thing
AMD’s CES 2025 press conference: How to watch
Google is using Anthropic’s Claude to improve its Gemini AI
Subscribe for the industry’s biggest tech news
Every weekday and Sunday, you can get the best of TechCrunch’s coverage.
TechCrunch's AI experts cover the latest news in the fast-moving field.
Every Monday, gets you up to speed on the latest advances in aerospace.
Startups are the core of TechCrunch, so get our best coverage delivered weekly.
By submitting your email, you agree to our Terms and Privacy Notice.
© 2024 Yahoo.