FALCON
Collection
FALCON: Transforming Cyber Threat Intelligence into Deployable IDS Rules with Self-Reflection β’ 16 items β’ Updated
How to use shaswatamitra/falcon-snort-dual-all-MiniLM-L6-v2 with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("shaswatamitra/falcon-snort-dual-all-MiniLM-L6-v2", dtype="auto")all-MiniLM-L6-v2
Contrastive encoder fine-tuned to map CTI text and SNORT rules into a shared embedding space.
Backbone: sentence-transformers/all-MiniLM-L6-v2.
| split | recall@1 | F1 | threshold | diag mean | off-diag mean |
|---|---|---|---|---|---|
| pretrained | 0.7993 | 0.3346 | 0.7216 | 0.9425 | 0.8358 |
| run_0 | 0.9539 | 0.8945 | 0.6762 | 0.8634 | 0.0394 |
| run_1 | 0.9551 | 0.9132 | 0.6886 | 0.8966 | 0.0373 |
| run_2 | 0.9576 | 0.9276 | 0.6931 | 0.9105 | 0.0176 |
| run_3 | 0.9551 | 0.9202 | 0.6902 | 0.9246 | 0.0408 |
| run_4 | 0.9539 | 0.9211 | 0.7031 | 0.9504 | 0.0050 |
Symmetric InfoNCE / NT-Xent over in-batch negatives. Best checkpoint selected by validation loss.
from transformers import AutoModel, AutoTokenizer
tok = AutoTokenizer.from_pretrained("shaswatamitra/falcon-snort-dual-all-MiniLM-L6-v2", subfolder='rule')
model = AutoModel.from_pretrained("shaswatamitra/falcon-snort-dual-all-MiniLM-L6-v2", subfolder='rule')
Dual-encoder layout: this repo has rule/ (encodes SNORT rules) and cti/ (encodes CTI text) subfolders. Load each with subfolder=....
@article{mitra2025falcon,
title={FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation},
author={Mitra, Shaswata and Bazarov, Azim and Duclos, Martin and Mittal, Sudip and Piplai, Aritran and Rahman, Md Rayhanur and Zieglar, Edward and Rahimi, Shahram},
journal={arXiv preprint arXiv:2508.18684},
year={2025}
}
Base model
nreimers/MiniLM-L6-H384-uncased