BiliSakura/PixelGen-diffusers

Self-contained PixelGen checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, and weights.

Converted from upstream PixelGen checkpoints using PixelGen-diffusers in Visual-Generative-Foundation-Model-Collection.

Available checkpoints

Subfolder Pipeline Task Resolution Model type
PixelGen-XL-16-256/ PixelGenC2IPipeline class-to-image 256Γ—256 PixelGen-XL/16
PixelGen-XXL-16-512-t2i/ PixelGenT2IPipeline text-to-image 512Γ—512 PixelGen-XXL/16-T2I

Repo layout

BiliSakura/PixelGen-diffusers/
β”œβ”€β”€ README.md
β”œβ”€β”€ PixelGen-XL-16-256/
β”‚   β”œβ”€β”€ pipeline.py
β”‚   β”œβ”€β”€ model_index.json
β”‚   β”œβ”€β”€ demo.png
β”‚   β”œβ”€β”€ scheduler/
β”‚   β”‚   β”œβ”€β”€ scheduler_config.json
β”‚   β”‚   └── scheduling_pixelgen.py
β”‚   └── transformer/
β”‚       β”œβ”€β”€ config.json
β”‚       └── transformer_jit.py
└── PixelGen-XXL-16-512-t2i/
    β”œβ”€β”€ pipeline.py
    β”œβ”€β”€ model_index.json
    β”œβ”€β”€ conversion_metadata.json
    β”œβ”€β”€ scheduler/
    β”‚   β”œβ”€β”€ scheduler_config.json
    β”‚   └── scheduling_pixelgen.py
    β”œβ”€β”€ text_encoder/
    β”œβ”€β”€ tokenizer/
    └── transformer/
        β”œβ”€β”€ config.json
        β”œβ”€β”€ diffusion_pytorch_model.safetensors
        └── transformer_jit_t2i.py

Each class-conditional variant is self-contained: load with custom_pipeline=.../pipeline.py and trust_remote_code=True. PixelGen denoises directly in pixel space (no VAE).

ImageNet class labels

For PixelGen-XL-16-256/, id2label is embedded in model_index.json (DiT-style).

  • pipe.id2label β€” inspect id β†’ English label correspondence
  • pipe.labels β€” reverse map (English synonym β†’ id)
  • pipe.get_label_ids("golden retriever")
  • pipe(class_labels="golden retriever", ...) β€” string labels resolved automatically

Demo

PixelGen-XL-16-256 demo

Class 207 β€” golden retriever, 256Γ—256, 50 steps, guidance_scale=2.25, Heun solver, timeshift=2.0.

Load from Hugging Face

Class-to-image (PixelGen-XL-16-256)

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/PixelGen-diffusers/PixelGen-XL-16-256",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(0)
images = pipe(
    class_labels="golden retriever",
    num_inference_steps=50,
    guidance_scale=2.25,
    generator=generator,
).images

Text-to-image (PixelGen-XXL-16-512-t2i)

Uses a bundled Qwen3 text encoder when text_encoder/ is present; otherwise downloads from the path recorded in conversion_metadata.json.

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/PixelGen-diffusers/PixelGen-XXL-16-512-t2i",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)

generator = torch.Generator(device="cuda").manual_seed(42)
images = pipe(
    prompt="A golden retriever playing in a sunny garden",
    num_inference_steps=50,
    guidance_scale=4.0,
    generator=generator,
).images

Load from a local clone

Class-to-image (PixelGen-XL-16-256)

from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./PixelGen-XL-16-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
    class_labels="golden retriever",
    num_inference_steps=50,
    guidance_scale=2.25,
    generator=generator,
).images[0]
image.save("demo.png")

Recommended inference settings

Variant Steps CFG scale Solver Timeshift CFG interval
PixelGen-XL-16-256 50 2.25 heun 2.0 [0.1, 0.9]
PixelGen-XXL-16-512-t2i 25 4.0 adam_lm 3.0 [0.0, 1.0]

height and width are fixed by each checkpoint's sample_size. Custom sizes are not supported for these exports.

Interface notes

  • Class-conditional generation uses class_labels (integer ImageNet id or English synonym).
  • guidance_scale > 1.0 enables classifier-free guidance over a null class token.
  • sampling_method accepts heun or euler for C2I; T2I defaults to adam_lm.
  • noise_scale defaults to 1.0 at 256Γ—256 and 2.0 at 512Γ—512 when not specified.

Citation

Source paper:

@article{ma2026pixelgen,
  title={PixelGen: Improving Pixel Diffusion with Perceptual Loss},
  author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
  year={2026},
  eprint={2602.02493},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2602.02493},
}
Downloads last month
-
Inference Examples
Examples
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including BiliSakura/PixelGen-diffusers

Paper for BiliSakura/PixelGen-diffusers