6 2 25

nomadicsynth PRO

nomadicsynth

nomadicsynth

AI & ML interests

architecture research, knowledge discovery

Recent Activity

liked a model about 3 hours ago

kkierii/qwen3-tutor-safe-neurodivergent-smoltalk-stage2-gguf

reacted to ajibawa-2023's post with 🔥 1 day ago

Shell-Code-Large Dataset: https://huggingface.co/datasets/ajibawa-2023/Shell-Code-Large Shell-Code-Large is a large-scale corpus of Shell scripting source code comprising approximately 640,000 code samples stored in JSON Lines (.jsonl) format. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, DevOps automation, cloud infrastructure engineering, system administration, and software engineering automation. By providing a high-volume, language-specific corpus focused exclusively on Shell scripting, Shell-Code-Large enables systematic experimentation in automation workflows, deployment pipelines, infrastructure management, and command-line tooling. These domains remain foundational to Linux systems, cloud-native platforms, CI/CD environments, and modern DevOps practices. Shell-Code-Large addresses the need for a dedicated Shell-focused dataset at substantial scale, enabling targeted research into scripting patterns, command composition, workflow orchestration, infrastructure automation, and operational engineering practices

liked a model 16 days ago

unsloth/Qwen3.6-27B-MTP-GGUF

View all activity

Organizations

liked a model about 3 hours ago

kkierii/qwen3-tutor-safe-neurodivergent-smoltalk-stage2-gguf

8B • Updated Feb 9 • 2

reacted to ajibawa-2023's post with 🔥 1 day ago

Post

6692

Shell-Code-Large
Dataset: ajibawa-2023/Shell-Code-Large

Shell-Code-Large is a large-scale corpus of Shell scripting source code comprising approximately 640,000 code samples stored in JSON Lines (.jsonl) format. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, DevOps automation, cloud infrastructure engineering, system administration, and software engineering automation.

By providing a high-volume, language-specific corpus focused exclusively on Shell scripting, Shell-Code-Large enables systematic experimentation in automation workflows, deployment pipelines, infrastructure management, and command-line tooling. These domains remain foundational to Linux systems, cloud-native platforms, CI/CD environments, and modern DevOps practices.

Shell-Code-Large addresses the need for a dedicated Shell-focused dataset at substantial scale, enabling targeted research into scripting patterns, command composition, workflow orchestration, infrastructure automation, and operational engineering practices

liked a model 16 days ago

unsloth/Qwen3.6-27B-MTP-GGUF

Image-Text-to-Text • 27B • Updated 28 days ago • 887k • 819

updated a dataset 27 days ago

nomadicsynth/yesbot

Viewer • Updated 27 days ago • 1k • 46 • 1

published a dataset 27 days ago

nomadicsynth/yesbot

Viewer • Updated 27 days ago • 1k • 46 • 1

liked a model 29 days ago

Qwen/Qwen3.6-27B

Image-Text-to-Text • 28B • Updated Apr 24 • 5.87M • • 1.79k

liked a dataset 29 days ago

HuggingFaceFW/fineweb-edu

Viewer • Updated Jul 11, 2025 • 3.5B • 399k • 1.16k

updated a model about 1 month ago

nomadicsynth/neon-360-0.1

Text Generation • 0.4B • Updated about 1 month ago • 76

liked 2 datasets about 1 month ago

angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k

Viewer • Updated May 1 • 38.5k • 10.3k • 408

AlienKevin/SWE-ZERO-12M-trajectories

Viewer • Updated May 14 • 12.3M • 9.82k • 123

updated a dataset about 2 months ago

nomadicsynth/finevideo-yoga-mention

Viewer • Updated Apr 25 • 509 • 274

published a dataset about 2 months ago

nomadicsynth/finevideo-yoga-mention

Viewer • Updated Apr 25 • 509 • 274

liked a Space 2 months ago

OmniVoice

🌍

1.06k

High-quality voice cloning TTS for 600+ languages

reacted to telcom's post with 👍 3 months ago

Post

192

How to tell to your Chat LLM to be more natural.
Add the below to it's personality
Prefer specific facts over vague importance. Do not inflate significance with phrases like “plays a pivotal role” or “marks a turning point.” Use numbers, dates, mechanisms, or measurable outcomes. Example: replace “the system changed logistics” with “the system reduced container dwell time from 6.2 to 4.1 days.”

Avoid promotional language. Keep a neutral tone. Do not use adjectives such as vibrant, groundbreaking, renowned, innovative, or powerful. Use plain wording.

Limit AI-typical vocabulary such as crucial, pivotal, intricate, tapestry, underscore, highlighting, emphasizing, showcasing, fostering, or enhance. Prefer simpler words.

Avoid generic commentary and vague attribution. Do not write “this reflects broader trends,” “experts say,” or “researchers suggest” unless a named source is given.

Avoid formulaic structures such as “not only X but also Y” or “despite its success it faces challenges.” Use direct explanations.

Use lists sparingly. Prefer short paragraphs unless bullets improve clarity. Avoid triple-adjective patterns.

Prefer simple sentences like “X is Y” or “the system uses Z.” Minimize formatting. Avoid emojis, decorative headings, and excessive bold.

Remove sentences that add no information. Avoid generic endings such as “in conclusion” or “overall.” Use concrete examples, real actors, workflows, and technologies when possible. Write like technical documentation or a research summary, not marketing or blog prose.

4 replies

upvoted an article 5 months ago

Article

Rank-Stabilized LoRA: Unlocking the Potential of LoRA Fine-Tuning

damjan-k

•

Feb 20, 2024

• 33

reacted to scthornton's post with 👍 5 months ago

Post

2191

# SecureCode: Security-Aware Code Models

**A collection of 8 code models (3B–20B) trained to behave like a security reviewer.**

## The Problem

Code assistants frequently recommend patterns that pass tests but fail security review—string-built SQL, brittle auth logic, unsafe parsing, insecure defaults, and more. I built SecureCode to address this gap.

## What SecureCode Does

- **Identify vulnerable patterns** and explain why they're risky
- **Outline plausible abuse paths** (defensive framing)
- **Propose secure rewrites** (drop-in replacements where possible)
- **Include defense-in-depth guidance** + regression tests/checks

## Resources

| Resource | Link |
|----------|------|
| Models | https://huggingface.co/collections/scthornton/securecode |
| Dataset | scthornton/securecode (2,185 examples) |
| Paper | https://arxiv.org/abs/2512.18542 |

## How to Test It

Copy and paste this prompt with your code:

You are a senior application security engineer. Review the code below.

Output: 
(1) findings with severity, 
(2) likely exploit scenarios (high level),
(3) secure rewrite,
(4) defense-in-depth recommendations, 
(5) regression tests/checks.

Code: `...`

## Dataset Coverage

SecureCode covers both traditional and emerging security domains:
- **Traditional web security** (OWASP Top 10 2021)
- **AI/ML security** (OWASP LLM Top 10 2025): prompt injection, RAG poisoning, model extraction, agentic AI patterns

## We Want Your Feedback

We're looking for real-world contributions:

- **Real snippets**: Share code that "slipped through review once" (sanitized is fine)
- **False positives/negatives**: What didn't work as expected?
- **CVE-grounded examples**: New vulnerability patterns you've encountered

**Please include**: language/framework + what the correct remediation looks like in your environment.

---

**Have contributions or suggestions?** I'd be happy to hear them. Thanks for your support!

replied to telcom's post 5 months ago

i foresee a new industry on Fiverr adding prompt injections to people's CVs. The future is hilarious, if you don't mind a bit of chaos.

liked 2 datasets 6 months ago

nvidia/NitroGen

Updated Jan 12 • 1.79k • 213

HuggingFaceM4/FineVision

Viewer • Updated Oct 21, 2025 • 24.2M • 127k • 498

liked a model 6 months ago

nomadicsynth PRO

AI & ML interests

Recent Activity

Organizations

nomadicsynth's activity

OmniVoice

Rank-Stabilized LoRA: Unlocking the Potential of LoRA Fine-Tuning