FALCON
Collection
FALCON: Transforming Cyber Threat Intelligence into Deployable IDS Rules with Self-Reflection β’ 16 items β’ Updated
How to use shaswatamitra/falcon-snort-dual-all-mpnet-base-v2 with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("shaswatamitra/falcon-snort-dual-all-mpnet-base-v2", dtype="auto")all-mpnet-base-v2
Contrastive encoder fine-tuned to map CTI text and SNORT rules into a shared embedding space.
Backbone: sentence-transformers/all-mpnet-base-v2.
| split | recall@1 | F1 | threshold | diag mean | off-diag mean |
|---|---|---|---|---|---|
| pretrained | 0.8142 | 0.3063 | 0.6441 | 0.5653 | 0.3167 |
| run_0 | 0.9514 | 0.9151 | 0.6778 | 0.8442 | 0.0089 |
| run_1 | 0.9564 | 0.9221 | 0.6777 | 0.8301 | -0.0012 |
| run_2 | 0.9539 | 0.9356 | 0.6818 | 0.8507 | -0.0019 |
| run_3 | 0.9564 | 0.9299 | 0.6863 | 0.8859 | 0.0112 |
| run_4 | 0.9564 | 0.9433 | 0.7022 | 0.9511 | 0.0012 |
Symmetric InfoNCE / NT-Xent over in-batch negatives. Best checkpoint selected by validation loss.
from transformers import AutoModel, AutoTokenizer
tok = AutoTokenizer.from_pretrained("shaswatamitra/falcon-snort-dual-all-mpnet-base-v2", subfolder='rule')
model = AutoModel.from_pretrained("shaswatamitra/falcon-snort-dual-all-mpnet-base-v2", subfolder='rule')
Dual-encoder layout: this repo has rule/ (encodes SNORT rules) and cti/ (encodes CTI text) subfolders. Load each with subfolder=....
@article{mitra2025falcon,
title={FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation},
author={Mitra, Shaswata and Bazarov, Azim and Duclos, Martin and Mittal, Sudip and Piplai, Aritran and Rahman, Md Rayhanur and Zieglar, Edward and Rahimi, Shahram},
journal={arXiv preprint arXiv:2508.18684},
year={2025}
}
Base model
sentence-transformers/all-mpnet-base-v2