A Mysterious Chinese Open-Source Model Surpassed Llama And Qwen, Trained For Just $5.5 Million
On December, 2024, the Chinese artificial intelligence firm DeepSeek unveiled their latest innovation, DeepSeek-V3, an ultra-large open-source AI model. This release has generated significant buzz in the AI community due to its groundbreaking performance and efficiency. DeepSeek-V3 has been benchmarked as the strongest open-source AI model to date, surpassing prominent competitors such as Meta's Llama 3.1 and Alibaba's Qwen 2.5 in various tasks, particularly in Chinese and math-related benchmarks.
Overview of DeepSeek-V3
DeepSeek-V3 is an open-source AI model with 671 billion parameters, making it one of the largest models ever released. It employs a mixture-of-experts (MoE) architecture, which activates only specific subsets of parameters for each task, resulting in efficient task handling without compromising accuracy. This innovative design allows DeepSeek-V3 to achieve exceptional performance while maintaining cost-efficiency.