TU Berlin experts on the recent success of the Large Language Model (LLM) ‘DeepSeek’ from China, the difference between open source applications like DeepSeek and other LLMs and the role of Europe in the development of Artificial Intelligence (AI)
Berlin/Germany, January 30, 2025. The experts:
Dr Vera Schmitt (research group leader) and Dr Nils Feldhus (postdoctoral researcher) conduct research in the XplaiNLP group of the Quality and Usability Lab at TU Berlin on high-risk AI applications and develop AI-supported systems for intelligent decision support. Their focus is on high-performance, transparent and explainable AI solutions for fields of application such as recognising disinformation and analysing medical data. In the field of natural language processing, the group works on key topics such as explainable AI, the robustness of large language models (LLMs), the modelling of argumentation structures and human-machine interaction.
Dr Oliver Eberle is a postdoctoral researcher in the Machine Learning Group of the Berlin Institute for the Foundations of Learning and Data (BIFOLD) at TU Berlin. In his research, he is primarily dedicated to explainable artificial intelligence and natural language processing and their applications in the sciences, such as the digital humanities (e.g. computer-aided text processing) and cognitive science. He focusses in particular on the interpretability of the models and develops methods for a better understanding of the underlying mechanisms of Large Language Models (LLM).
1 How do the concepts of DeepSeek and ChatGPT differ?
Schmitt and Feldhus: DeepSeek stands for open-source transparency and efficiency, while ChatGPT focuses on massive computing power and scaling. The former enables customisation and lower costs, while the latter offers optimised performance, but remains proprietary and resource-intensive. However, it must be recognised that DeepSeek is not 100 percent open source, as not all of the training data that went into the model is known, for example. The availability of the model parameters and the much more open communication on the part of DeepSeek, on the other hand, allows initiatives from the open source community such as ‘Open-R1’ to tackle the reproduction of the model with far fewer resources compared to the huge and expensive infrastructure of OpenAI, Microsoft and others.
Eberle: DeepSeek is integrated into the ‘Hugginface Community’, a platform that already makes hundreds of open source models and model source code available and plays an important role in the availability, accessibility and transparency of LLMs in both research and industry. DeepSeek has already used other open source models (e.g. the Llama model from Meta) as a basis in the past (e.g. for ‘DeepSeek-R1-Distill-Llama-70b’). This saves computational effort, as distilling models is significantly less computationally intensive than training a new model from scratch. DeepSeek publishes detailed descriptions and technical reports of its models and also describes negative results. This is a helpful contribution to the open source community, as it promotes the improvement of future open LLM systems. In comparison, ChatGPT is proprietary and only the interface is accessible; the exact specification of the model and the trained parameters are not known in detail or openly accessible. As far as I know, neither DeepSeek nor ChatGPT publish the training code or specific data sets.
2. do you already work with other open source large language models (LLMs)?
Schmitt and Feldhus: We work a lot with different LLMs such as LLaMa, Mistral, Qwen, Bloom, Vicuna and have also started experimenting with DeepSeek. We use these open source models specifically in various application areas. A particular focus is on disinformation detection, where we use LLMs to analyse narratives in digital media, uncover misinformation and provide explanations for detected misinformation. We also use LLMs to anonymise and process medical data in joint projects with Charité.
Eberle: We work with various models, for example Llama, Mistral, Gemma, Qwen, Mamba, and we focus particularly on interpretability and develop methods to better understand the underlying mechanisms of LLMs.
3. how does the open source approach to large language models specifically support your research? Will Deepseek further advance your research?
Schmitt and Feldhus: An open source approach to LLMs enables us to customise models specifically for our research. Open access allows us to ensure transparency and make specific architectural adjustments. It also allows us to evaluate models, develop them further and integrate them more effectively into human-AI processes. DeepSeek could further advance our research as it offers more efficient model architectures and new training approaches and makes them reproducible on computers at TU Berlin. Particularly exciting are potential improvements in resource efficiency, but also in multilingual processing and adaptability for specific domains, which could complement and optimise our existing methods.
Eberle: DeepSeek joins other open source model families (Llama, Mistral, Qwen and so on) and enables us to make statements about a wider range of LLMs. The structure of these models is largely comparable and differs mainly in the training approach and the data sets used. DeepSeek now gives us access to a model with state-of-the-art reasoning capabilities, which could lead to new insights into how LLMs solve complex tasks.
4. why are chip manufacturers like NVIDIA linked to the success/failure of AI?
Schmitt and Feldhus: The success or failure of AI is closely linked to chip manufacturers such as NVIDIA, because modern AI models require enormous computing power, which is mainly provided by specialised GPUs (Graphics Processing Units) and AI accelerators. NVIDIA is a leader in this field with powerful chips such as the H100 and A100 series, which have been specially developed to train artificial intelligence and provide its results quickly. With CUDA, NVIDIA also offers the right software to enable these calculations efficiently. When AI technologies flourish, the demand for these chips naturally rises sharply – companies, research institutions and cloud providers invest massively in GPU clusters. This drives up NVIDIA’s turnover and share price. Conversely, a decline in AI demand or technological shifts towards alternative architectures (as we are now seeing with DeepSeek R1/V3) would reduce NVIDIA’s dependency and negatively impact its business to some extent. NVIDIA’s dual monopoly position – hardware and software – makes it difficult to decouple AI successes from the company. As long as DeepSeek also uses GPUs from NVIDIA or CUDA, it is impossible to imagine the AI discourse without NVIDIA. In short, hardware development and the success of AI are symbiotic – advances in AI drive the chip industry, while powerfull Chips makes neu AI-Modells possible.
5. Did everyone in the community already know about the huge impact of the new Chinese LLM?
Schmitt and Feldhus: Yes, it was foreseeable that China would increasingly invest in the development of powerful LLMs. The progress of DeepSeek and other Chinese models did not come out of nowhere – there have already been huge investments and strategic initiatives in the AI sector in recent years. Therefore, DeepSeek is not a big surprise, but a natural progression to create more resource-efficient LLMs. DeepSeek also builds heavily on existing open source model families such as LLaMA, Mistral and Qwen and expands our ability to analyse a wider range of LLMs. Qwen in particular, also a product of Chinese research, has already made it clear to us that China is a key player here that should not be underestimated. What is remarkable about DeepSeek R1 is that its reasoning ability has improved significantly, giving us new insights into the ability of LLMs to solve complex tasks. This is particularly interesting for more difficult tasks with a higher level of complexity, such as disinformation detection.
Eberle: DeepSeek is well known, and its predecessor DeepSeek-V2 was already quite successful, for example in generating code. I am therefore somewhat surprised by the strong reaction from the media and markets. DeepSeek-V3 is clearly an impressive technical achievement and can help bring open source models on par with the capabilities of proprietary models such as ChatGPT. DeepSeek should nevertheless be seen in the context of the successful development of other open source LLMs.
6. what is Europe’s position in this area?
Schmitt and Feldhus: Currently, the focus within the EU is primarily on the regulation of AI and not enough resources are being pooled to even remotely counterbalance the USA or China. Especially when we consider investment plans such as Stargate, the EU is currently unable to keep up. The EU cannot currently remain competitive, as promising AI start-ups are often acquired by US companies and/or relocate their headquarters to the US. Regulations and taxes have a significant impact on the innovative strength of NLP (Natural Language Processing) companies within the EU. We see from the innovativeness of small European labs such as Mistral or Flux (image generation) that the European research community nevertheless wants to participate in global AI development, also has quite a big influence and with more investment these ambitions can be fuelled and Europe could emerge as a real AI player.
Eberle: Europe and Germany are focussing on the development of trustworthy and transparent AI methods. I also have the impression that Europe is specialising in specific applications of LLMs, for example basic LLM models for applications in medicine (e.g. aignostics‘ RudolfV model for recognising pathology data), law (legal LLMs such as LEGAL-BERT for editing and creating legal texts) or AI methods for quantum chemistry.
7) The DeepSeek application is subject to Chinese censorship. To what extent do such restrictions affect the performance of large language models?
Eberle: The restrictions are usually imposed after the actual model training, so they can be seen as a filter that suppresses unwanted output. I would therefore not assume that open-topic systems are generally more efficient. However, if large amounts of data are filtered before training, this could have an impact on the generalisation capability of these models. It is an important difference here whether the model does not receive any data on sensitive topics or whether the model is not supposed to say anything about them.
ImageSource
Frank Rietsch on Pixabay, TU Berlin Experts on the latest success of DeepSeek