Search

Showing top 8 results for "DeepSeek"

… I have been trying to deploy deepseek-ai/DeepSeek-R1-Distill-Qwen-32B on inferentia with a context window higher than 4096 let's say MAX TOTAL TOKENS=8192 , but it seems there is no pre-compiled model for that. …

Mar 27, 2025 · Simon Pagezy

Open-R1: a fully open reproduction of DeepSeek-R1

… MLA is not properly described in their paper, so it would be important to have code for this. · The code for the models are inside the model repositories, e.g. for V3: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling deepseek.py Is it possible to contribute to this project? · Yes, … …

Mar 27, 2025 · Elie Bakouch

Open R1: Update #2

… According to DeepSeek's paper, DeepSeek-Distill-Qwen-7B's performance in MATH-500 and AIME24 is 92.8 and 55.5 respectively, which seems to be very different from the values in the table especially AIME24 . …

Feb 6, 2025

Open-source DeepResearch – Freeing our search agents

DeepSeek's reasoning skills are probably particularly useful for something like this. But in my mind, particularly for academic research type tasks, the propaganda baked into the model is a non-starter. I tested out the new DeepSeek-R1-Distill-Llama-70B-Uncensored-v2-Unbiased model yesterday. …

Mar 27, 2025 · Aymeric Roucher

New in llama.cpp: Model Management

… Otherwise use reasoning-format flag and pass DeepSeek value to get pure tokens Now I can use llama.cpp all the time. …

Dec 11, 2025

FastRTC: The Real-Time Communication Library for Python

… This podcast is generated via ngxson/kokoro-podcast-generator , using DeepSeek-R1 and Kokoro-TTS . …

Jan 12, 2025 · Freddy Boulton

SmolVLM2: Bringing Video Understanding to Every Device

… This podcast is generated via ngxson/kokoro-podcast-generator , using DeepSeek-R1 and Kokoro-TTS . when run this script: python -m mlx vlm.generate --model mlx-community/SmolVLM2-500M-Video-Instruct-mlx --image https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg --… …

Apr 8, 2025 · Orr Zohar

SmolLM3: smol, multilingual, long-context reasoner

… Hi HF team, this SmolLM3 post got me curious about why you chose APO over GRPO, so I dove into comparing approaches across SmolLM3, Tulu3, and DeepSeek-R1. …

Sep 10, 2025 · Elie Bakouch

Followed topics