Instructions to use BiliSakura/PixelGen-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/PixelGen-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/PixelGen-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "golden retriever" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
BiliSakura/PixelGen-diffusers
Self-contained PixelGen checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, and weights.
Converted from upstream PixelGen checkpoints using PixelGen-diffusers in Visual-Generative-Foundation-Model-Collection.
Available checkpoints
| Subfolder | Pipeline | Task | Resolution | Model type |
|---|---|---|---|---|
PixelGen-XL-16-256/ |
PixelGenC2IPipeline |
class-to-image | 256Γ256 | PixelGen-XL/16 |
PixelGen-XXL-16-512-t2i/ |
PixelGenT2IPipeline |
text-to-image | 512Γ512 | PixelGen-XXL/16-T2I |
Repo layout
BiliSakura/PixelGen-diffusers/
βββ README.md
βββ PixelGen-XL-16-256/
β βββ pipeline.py
β βββ model_index.json
β βββ demo.png
β βββ scheduler/
β β βββ scheduler_config.json
β β βββ scheduling_pixelgen.py
β βββ transformer/
β βββ config.json
β βββ transformer_jit.py
βββ PixelGen-XXL-16-512-t2i/
βββ pipeline.py
βββ model_index.json
βββ conversion_metadata.json
βββ scheduler/
β βββ scheduler_config.json
β βββ scheduling_pixelgen.py
βββ text_encoder/
βββ tokenizer/
βββ transformer/
βββ config.json
βββ diffusion_pytorch_model.safetensors
βββ transformer_jit_t2i.py
Each class-conditional variant is self-contained: load with custom_pipeline=.../pipeline.py and trust_remote_code=True. PixelGen denoises directly in pixel space (no VAE).
ImageNet class labels
For PixelGen-XL-16-256/, id2label is embedded in model_index.json (DiT-style).
pipe.id2labelβ inspect id β English label correspondencepipe.labelsβ reverse map (English synonym β id)pipe.get_label_ids("golden retriever")pipe(class_labels="golden retriever", ...)β string labels resolved automatically
Demo
Class 207 β golden retriever, 256Γ256, 50 steps, guidance_scale=2.25, Heun solver, timeshift=2.0.
Load from Hugging Face
Class-to-image (PixelGen-XL-16-256)
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/PixelGen-diffusers/PixelGen-XL-16-256",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))
generator = torch.Generator(device="cuda").manual_seed(0)
images = pipe(
class_labels="golden retriever",
num_inference_steps=50,
guidance_scale=2.25,
generator=generator,
).images
Text-to-image (PixelGen-XXL-16-512-t2i)
Uses a bundled Qwen3 text encoder when text_encoder/ is present; otherwise downloads from the path recorded in conversion_metadata.json.
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/PixelGen-diffusers/PixelGen-XXL-16-512-t2i",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
generator = torch.Generator(device="cuda").manual_seed(42)
images = pipe(
prompt="A golden retriever playing in a sunny garden",
num_inference_steps=50,
guidance_scale=4.0,
generator=generator,
).images
Load from a local clone
Class-to-image (PixelGen-XL-16-256)
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./PixelGen-XL-16-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
class_labels="golden retriever",
num_inference_steps=50,
guidance_scale=2.25,
generator=generator,
).images[0]
image.save("demo.png")
Recommended inference settings
| Variant | Steps | CFG scale | Solver | Timeshift | CFG interval |
|---|---|---|---|---|---|
PixelGen-XL-16-256 |
50 | 2.25 | heun | 2.0 | [0.1, 0.9] |
PixelGen-XXL-16-512-t2i |
25 | 4.0 | adam_lm | 3.0 | [0.0, 1.0] |
height and width are fixed by each checkpoint's sample_size. Custom sizes are not supported for these exports.
Interface notes
- Class-conditional generation uses
class_labels(integer ImageNet id or English synonym). guidance_scale > 1.0enables classifier-free guidance over a null class token.sampling_methodacceptsheunoreulerfor C2I; T2I defaults toadam_lm.noise_scaledefaults to1.0at 256Γ256 and2.0at 512Γ512 when not specified.
Citation
Source paper:
@article{ma2026pixelgen,
title={PixelGen: Improving Pixel Diffusion with Perceptual Loss},
author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
year={2026},
eprint={2602.02493},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.02493},
}
- Downloads last month
- -