Running Deepseek localy

Intro

DeepSeek is a new entrant in the field of large language models (LLMs). Developed by DeepSeek AI, a Chinese company specializing in AI technology, it is recognized for its open-source approach to AI development. The models are intended for use by researchers, developers, and AI enthusiasts alike.

DeepSeek’s algorithms, models, and training details are all open-source, allowing users to access, modify, and utilize the code. The company also recruits AI researchers from leading Chinese universities and hires professionals from outside the computer science field to enhance the diversity of knowledge and capabilities within its models.

Why run a model on your local machine

The main concern with deepseek is privacy, especialy what is done with the data that you send to the public platform. This should be a question that everyone asks when using these models.

When you run the model on your local machine, you don’t have to worry about your data being sent to a server and stored there. This is especially important when you are working with sensitive data or when you are working on a project that requires a high level of privacy. Running the model on your local machine also allows you to work offline, which can be useful when you are in a location with limited internet access.

Model destillation

To limmit the size of LLMs a technique called destillation is used. This technique aims to reduce the size of an LLM while perserving the performance of hte larger moddel. A commen technique used for this is the ‘teacher student’ training method.

This method works like this:.

The large model (teacher) generates predictions (e.g., probabilities for next-word prediction). The smaller model (student) is trained to mimic these outputs rather than learning from scratch with traditional supervised learning.

For the purpose of this article, I will be using the following models:

  • Deepseek-r1:1.5b (smallest model: 1,5gb teached by qwen2)
  • Deepseek-r1:7b (medium model: 4.7gb teached by qwen2)
  • Deepseek-r1:8b (full model: 4.9gb teached by llama)
  • Deepseek-r1:32b (full model: 20gb teached by qwen2)

For the last model on the list, you will need a very beefy machine to run it. I tried it on a 16GB RTX40 series NVidia card, and it was not enough to run the model smoothly.

Running on your local machine

Running LLMs on you local machine can be done with Ollama (Mac, Windows, Linux) or Hugging Face (Linux only). For this example i will be using Ollama.

Installation

  • download and install the Ollama app from here
  • After the installation open a command window
  • Run the following command to get the deepseek model. This will download the model to your machine
ollama pull deepseek-r1:7b
  • Run the model with the following command
ollama run deepseek-r1:7b
  • Start typing in your console to talk to the model.

Using a web interface

Talking in the terminal is nice, but using a web interface is even better. To do this, you can use Open-webui. This open-source project allows you to run the model in a web interface. The app can be run as a docker container or as a standalone app and is intended to provide an offline user experience similar to the one supplied by ChatGPT.

To run the app as a docker container and use a local instance of Ollama, run the following command:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

This will start the app and you can access it by going to http://localhost:3000 in your browser.

Working with the local deepseek model

When you are running the model on your own machine, the capabilities are somewhat limited (depending on the version of the model). You get vastly different answers when asking the various versions of the deepseek model the same question. The models used are distilled versions of the original model and are not as powerful as the original model. As a guideline, the smaller the model, the less powerful and accurate its answers are. For example, when you ask the following question to the model:

Can you explain CQRS to me and give me an example of how i could implement it in c#?
Deepseek-r1:1.5b:
To implement the CQRS (Multi-Party Quantum State Transfer) protocol in C#,
we need to integrate quantum communication principles with classical data processing among multiple parties.
Here's a structured approach:...
Deepseek-r1:7b:
CQRS, or Command-Query-Response architecture, is an architectural pattern designed to
decouple concerns in object-oriented systems by separating commands
(actions with preconditions and postconditions) from queries (data retrieval).
This separation simplifies state management and allows for consistent interaction
between command and query components.
Below is a step-by-step explanation of the concepts involved and an example implementation in C#.

As you can see the answer of the smallest model is completely wrong, and the answer of the larger model is better, when you ask the same question to the online (full) model then you get an even more elaborate and correct answer.

The main selling point of the model is the reasoning component. This lets you discover why the model went off the rails and hallucinated. Because you can follow the thinking path of the model, you can adjust your prompt to get a better result.

References

Share the Post:

Related Posts

Gen AI

Phind

What is Phind Phind is an online search engine that uses AI to help you find the information you need.

Read More