arxiv:2508.08243

Jinx: Unlimited LLMs for Probing Alignment Failures

Published on Aug 11, 2025

Jinx

Upvote

Authors:

Jiahao Zhao ,

Abstract

Jinx is a helpful-only variant of open-weight LLMs designed for researchers to assess alignment failures and study safety in language models.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Unlimited, or so-called helpful-only language models are trained without safety alignment constraints and never refuse user queries. They are widely used by leading AI companies as internal tools for red teaming and alignment evaluation. For example, if a safety-aligned model produces harmful outputs similar to an unlimited model, this indicates alignment failures that require further attention. Despite their essential role in assessing alignment, such models are not available to the research community. We introduce Jinx, a helpful-only variant of popular open-weight LLMs. Jinx responds to all queries without refusals or safety filtering, while preserving the base model's capabilities in reasoning and instruction following. It provides researchers with an accessible tool for probing alignment failures, evaluating safety boundaries, and systematically studying failure modes in language model safety.

View arXiv page View PDF GitHub 3 Add to collection

Community

w1ll

Oct 4, 2025

Could you share a high level description of the method used to produce Jinx?

Jeol

Paper author Oct 4, 2025

Thanks for your interest. At a high level, it's simply post-training.

Cowboygarage

10 days ago

Hi, may I ask what were the public datasets (if any) used for said post-training? If they were synthetically generated, could you share the general steps taken to generate such samples? Thank you!

Jeol

Paper author 7 days ago

Thanks for your interest. Yes we used some public datasets on huggingface. You can search it by your self. We have decide not to release our detailed recipe to public.

Jeol

Paper author 7 days ago

This comment has been hidden (marked as Resolved)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2508.08243

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 12

Browse 12 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.08243 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.08243 in a Space README.md to link it from this page.

Jinx: Unlimited LLMs for Probing Alignment Failures

Abstract

Community

Models citing this paper 12

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1