BiliSakura/PixelGen-diffusers

Self-contained PixelGen checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, and weights.

Converted from upstream PixelGen checkpoints using PixelGen-diffusers in Visual-Generative-Foundation-Model-Collection.

Available checkpoints

Subfolder	Pipeline	Task	Resolution	Model type
`PixelGen-XL-16-256/`	`PixelGenC2IPipeline`	class-to-image	256×256	PixelGen-XL/16
`PixelGen-XXL-16-512-t2i/`	`PixelGenT2IPipeline`	text-to-image	512×512	PixelGen-XXL/16-T2I

Repo layout

BiliSakura/PixelGen-diffusers/
├── README.md
├── PixelGen-XL-16-256/
│   ├── pipeline.py
│   ├── model_index.json
│   ├── demo.png
│   ├── scheduler/
│   │   ├── scheduler_config.json
│   │   └── scheduling_pixelgen.py
│   └── transformer/
│       ├── config.json
│       └── transformer_jit.py
└── PixelGen-XXL-16-512-t2i/
    ├── pipeline.py
    ├── model_index.json
    ├── conversion_metadata.json
    ├── scheduler/
    │   ├── scheduler_config.json
    │   └── scheduling_pixelgen.py
    ├── text_encoder/
    ├── tokenizer/
    └── transformer/
        ├── config.json
        ├── diffusion_pytorch_model.safetensors
        └── transformer_jit_t2i.py

Each class-conditional variant is self-contained: load with custom_pipeline=.../pipeline.py and trust_remote_code=True. PixelGen denoises directly in pixel space (no VAE).

ImageNet class labels

For PixelGen-XL-16-256/, id2label is embedded in model_index.json (DiT-style).

pipe.id2label — inspect id → English label correspondence
pipe.labels — reverse map (English synonym → id)
pipe.get_label_ids("golden retriever")
pipe(class_labels="golden retriever", ...) — string labels resolved automatically

Demo

Class 207 — golden retriever, 256×256, 50 steps, guidance_scale=2.25, Heun solver, timeshift=2.0.

Load from Hugging Face

Class-to-image (`PixelGen-XL-16-256`)

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/PixelGen-diffusers/PixelGen-XL-16-256",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(0)
images = pipe(
    class_labels="golden retriever",
    num_inference_steps=50,
    guidance_scale=2.25,
    generator=generator,
).images

Text-to-image (`PixelGen-XXL-16-512-t2i`)

Uses a bundled Qwen3 text encoder when text_encoder/ is present; otherwise downloads from the path recorded in conversion_metadata.json.

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/PixelGen-diffusers/PixelGen-XXL-16-512-t2i",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)

generator = torch.Generator(device="cuda").manual_seed(42)
images = pipe(
    prompt="A golden retriever playing in a sunny garden",
    num_inference_steps=50,
    guidance_scale=4.0,
    generator=generator,
).images

Load from a local clone

Class-to-image (`PixelGen-XL-16-256`)

from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./PixelGen-XL-16-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
    class_labels="golden retriever",
    num_inference_steps=50,
    guidance_scale=2.25,
    generator=generator,
).images[0]
image.save("demo.png")

Recommended inference settings

Variant	Steps	CFG scale	Solver	Timeshift	CFG interval
`PixelGen-XL-16-256`	50	2.25	heun	2.0	[0.1, 0.9]
`PixelGen-XXL-16-512-t2i`	25	4.0	adam_lm	3.0	[0.0, 1.0]

height and width are fixed by each checkpoint's sample_size. Custom sizes are not supported for these exports.

Interface notes

Class-conditional generation uses class_labels (integer ImageNet id or English synonym).
guidance_scale > 1.0 enables classifier-free guidance over a null class token.
sampling_method accepts heun or euler for C2I; T2I defaults to adam_lm.
noise_scale defaults to 1.0 at 256×256 and 2.0 at 512×512 when not specified.

Citation

Source paper:

@article{ma2026pixelgen,
  title={PixelGen: Improving Pixel Diffusion with Perceptual Loss},
  author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
  year={2026},
  eprint={2602.02493},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2602.02493},
}

Downloads last month: -

Collection including BiliSakura/PixelGen-diffusers

Visual Generation Models

Collection

21 items • Updated about 12 hours ago • 1

Paper for BiliSakura/PixelGen-diffusers

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Paper • 2602.02493 • Published Feb 2 • 46

BiliSakura/PixelGen-diffusers

Available checkpoints

Repo layout

ImageNet class labels

Demo

Load from Hugging Face

Class-to-image (PixelGen-XL-16-256)

Text-to-image (PixelGen-XXL-16-512-t2i)

Load from a local clone

Class-to-image (PixelGen-XL-16-256)

Recommended inference settings

Interface notes

Citation

Collection including BiliSakura/PixelGen-diffusers

Paper for BiliSakura/PixelGen-diffusers

Class-to-image (`PixelGen-XL-16-256`)

Text-to-image (`PixelGen-XXL-16-512-t2i`)

Class-to-image (`PixelGen-XL-16-256`)