Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

FlameF0X 
posted an update 3 days ago
hypothetical 
posted an update 2 days ago
danielhanchen 
posted an update about 17 hours ago
sergiopaniego 
posted an update 3 days ago
view post
Post
6167
new banger blog alert 🚨

@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped

takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start

covers torch.profiler from scratch. reading tables and traces, overhead bound vs compute bound, the full dispatch chain from python to gpu kernels, and what torch.compile is actually fusing under the hood

find it here: https://huggingface.co/blog/torch-profiler
  • 1 reply
·
evalstate 
posted an update 2 days ago
view post
Post
3084
Hugging Face MCP Server v0.3.17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SEP-2640 "Skills Over MCP" support added (early access)
  • 1 reply
·
lbourdois 
posted an update 3 days ago
view post
Post
728
New blog post!
An introduction to a little-known but highly effective model reduction method: 𝗧𝗿𝗶𝗺𝗺𝗶𝗻𝗴✂️
We show how to reduce model size (we went up to 87.24% reduction) while preserving its performance.

We applied this technique to 16 different model families across several modalities to illustrate that it works on any architecture (as long as the embedding layer is the last one of the model) and on any modality involving text.
From these 16 families, we generated over 𝟱,𝟱𝟬𝟬 𝗺𝗼𝗻𝗼𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀 𝗶𝗻 𝟭𝟮𝟰 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀 🌍

Key takeaways from our experiments:
1️⃣ Trimming does not require a GPU. Our models were obtained on a CPU.
2️⃣ This method scales up to at least 4B parameters (we did not test beyond that).
3️⃣ Trimmed model is smaller than the original while preserving its performance. If you observe a slight performance drop, just fine-tuned to recover or even surpass the original performance.
4️⃣ For an equivalent compute budget, it is better to trim then fine-tune rather than fine-tuning the original model. Since the model is smaller, you can run more epochs/show more data and get in fine a better model than the original.
5️⃣ Trimming is a competitive alternative to distillation and quantization. E.g. we obtained our alternative to DistilBERT in 9 minutes on CPU vs. 90 hours of GPU for the latter.
6️⃣ Trimming could generate reasoning traces in the language of the trimmed model. This could be an alternative to generating traces in English and then translating them into the desired language.

And many other things (such as how much data are needed, the impact of the database used, the order in which it should be done, etc.) are available in the blogpost!

Blogpost: https://huggingface.co/blog/lbourdois/introduction-to-trimming
Models: alphaedge-ai/Trimming_models_search
  • 4 replies
·
AxionLab-official 
posted an update 1 day ago
RiverRider 
posted an update 2 days ago
view post
Post
2715
This is not the end of words. It is the end of pretending their meanings are determined.

Meaning Forks. SRT detects it.

Paste any text to identify contested terms

RiverRider/srt-introspect

Try any prompt (attached link) to see exactly what an LLM is thinking at every meaningful step of its answer

RiverRider/srt-introspect

Repository

https://github.com/space-bacon/SRT

Paper

https://github.com/space-bacon/SRT/blob/main/paper_nla.md

Explainer

https://github.com/space-bacon/SRT/blob/main/docs/EXPLAINERS.md
ovi054 
posted an update 2 days ago
sergiopaniego 
posted an update 2 days ago
view post
Post
2233
most multi-turn RL loops have a silent bug: you decode the model's output to detect tool calls, then re-tokenize the conversation for the next turn. BPE isn't invertible, so decode then re-encode can land on different ids. gradient ends up on tokens the model never sampled. no crash, just quietly wrong math and broken training

@qgallouedec wrote a super educational blog on MITO (message-in, token-out) vs TITO (token-in, token-out) and how you might fix the problem above

go read it 🤓

https://qgallouedec-tito.hf.space/