Read the full article on DataCamp: How to Set Up and Run Qwen 3 Locally With Ollama
Learn how to install, set up, and, run Qwen3 locally with Ollama, and build a simple Gradio-based reasoning and translation app.
Why Run Qwen3 Locally?
Running Qwen3 locally provides major benefits:
- Privacy: No data leaves your machine.
- Latency: Fast inference without API calls.
- Cost-efficiency: Avoids cloud usage fees.
- Control: Customize model usage and prompts.
- Offline access: Use Qwen3 anywhere, once downloaded.
Qwen3 is optimized for both deep reasoning (via /think
) and fast decoding (/no_think
), making it ideal for local experimentation and deployment.
Setting Up Qwen3 Locally With Ollama
Step 1: Install Ollama
Download from ollama.com/download and install for your OS. Then verify:
ollama --version
Step 2: Download and Run Qwen3
Run the default model:
ollama run qwen3
Or use a smaller one for lower resource use:
ollama run qwen3:4b
Qwen3 Model Reference Table
Model | Command | Best For |
---|---|---|
Qwen3-0.6B | ollama run qwen3:0.6b |
Edge devices, mobile |
Qwen3-1.7B | ollama run qwen3:1.7b |
Chatbots, assistants |
Qwen3-4B | ollama run qwen3:4b |
General tasks |
Qwen3-8B | ollama run qwen3:8b |
Multilingual and reasoning |
Qwen3-14B | ollama run qwen3:14b |
Content creation, problem solving |
Qwen3-32B | ollama run qwen3:32b |
Context-rich tasks |
Qwen3-30B-A3B (MoE) | ollama run qwen3:30b-a3b |
Efficient coding inference |
Qwen3-235B-A22B (MoE) | ollama run qwen3:235b-a22b |
Enterprise-grade reasoning |
Step 3: Serve Qwen3 via API
ollama serve
This exposes the API at http://localhost:11434
.
Using Qwen3 Locally
Option 1: CLI Inference
echo "What is the capital of Brazil? /think" | ollama run qwen3:8b
Option 2: Local API
curl http://localhost:11434/api/chat -d '{
"model": "qwen3:8b",
"messages": [{"role": "user", "content": "Define entropy in physics. /think"}],
"stream": false
}'
Option 3: Python API
import ollama
response = ollama.chat(
model="qwen3:8b",
messages=[{"role": "user", "content": "Summarize the theory of evolution. /think"}]
)
print(response["message"]["content"])
Building a Local Reasoning App With Qwen3
Step 1: Hybrid Reasoning App (Gradio)
import gradio as gr
import subprocess
def reasoning_qwen3(prompt, mode):
prompt_with_mode = f"{prompt} /{mode}"
result = subprocess.run(
["ollama", "run", "qwen3:8b"],
input=prompt_with_mode.encode(),
stdout=subprocess.PIPE
)
return result.stdout.decode()
reasoning_ui = gr.Interface(
fn=reasoning_qwen3,
inputs=[
gr.Textbox(label="Enter your prompt"),
gr.Radio(["think", "no_think"], label="Reasoning Mode", value="think")
],
outputs="text",
title="Qwen3 Reasoning Mode Demo",
description="Switch between /think and /no_think to control response depth."
)
Step 2: Multilingual Translator Tab
def multilingual_qwen3(prompt, lang):
if lang != "English":
prompt = f"Translate to {lang}: {prompt}"
result = subprocess.run(
["ollama", "run", "qwen3:8b"],
input=prompt.encode(),
stdout=subprocess.PIPE
)
return result.stdout.decode()
multilingual_ui = gr.Interface(
fn=multilingual_qwen3,
inputs=[
gr.Textbox(label="Enter your prompt"),
gr.Dropdown(["English", "French", "Hindi", "Chinese"], label="Target Language", value="English")
],
outputs="text",
title="Qwen3 Multilingual Translator",
description="Use Qwen3 locally to translate prompts to different languages."
)
Step 3: Combine Tabs and Launch
demo = gr.TabbedInterface(
[reasoning_ui, multilingual_ui],
tab_names=["Reasoning Mode", "Multilingual"]
)
demo.launch(debug=True)
Conclusion
With Qwen3 and Ollama, you can:
- Run powerful LLMs locally.
- Use hybrid reasoning with
/think
and/no_think
. - Build private, fast, and cost-free applications.
- Translate across 100+ languages.
To explore more:
Comments