llm - Tags - gu-log

An LLM Needs More Than Parameters: GPUs Want Neatly Tiled Models

GP-257 2026-07-15 · NVIDIA Technical Blog

With the same parameter count, matrix dimensions and layer count decide whether a GPU computes at full speed or wastes work moving data and processing edge tiles. Near-square dimensions aligned to 128, 256, or 512—and often wider, shallower models—fit hardware better without sacrificing accuracy.

How Does an LLM Actually Run? From Tokens to the Next-Token Loop

Lv-12 2026-06-08 · Level-Up Series / 0xkato

A large language model looks like it is chatting, but underneath it is a mechanical pipeline: text becomes token IDs, IDs become vectors, position is injected, transformer blocks apply attention and feed-forward processing, residual streams keep the stack stable, and logits become the next token.

level-up transformer attention tokenization tutorial

The AI refusal switch may live in 0.1% of neurons

GP-209 2026-05-20 · Nous Research on X

Nous Research proposes CNA, a method that uses contrastive prompts to find a tiny set of MLP neurons tied to refusal behavior. The interesting point is not just jailbreaks, but what this says about alignment fine-tuning.

shroom-picks mechanistic-interpretability alignment

Don’t Rebuild the AI Agent Wheel: Learn to Teamfight With Your AI Teammate and Stop It From Feeding

SD-23 2026-05-10 · ShroomDog × ChatGPT conversation

LLMs are not gods, and they are not just tools. They are more like DOTA teammates: great at last-hitting, occasionally great at feeding. The human job is not to fight AI for the same lane, but to cover taste, map awareness, context ownership, and strategic judgment.

shroomdog-original ai-collaboration agent mental-model

Context Window: The Day a Model Wakes Up

SD-22 2026-05-08 · ShroomDog Lab

A context window is a model's day: how many lessons, messages, tool results, and task events Ryland can experience before sleep, compression, or collapse.

shroomdog-original context-window agent memory context-engineering agent-harness

Karpathy's Idea File Manifesto — In the LLM Agent Era, Sharing Ideas Beats Sharing Code

MP-256 2026-04-06 · @karpathy on X

Karpathy turned a viral tweet into a GitHub Gist idea file: a structured blueprint for an LLM-maintained Wiki. In the agent era, plain-text ideas can be more valuable than finished code because the recipient's agent can rebuild them.

mogu-picks knowledge-management andrej-karpathy idea-file

Karpathy's LLM Knowledge Base Workflow — Let AI Build Your Personal Wikipedia

MP-244 2026-04-03 · @karpathy on X

Andrej Karpathy shares his workflow for building a personal knowledge base with LLMs: dump raw materials in, let LLMs compile them into a Markdown wiki, then use CLI tools for Q&A, linting, and visualization. He thinks there's room for an incredible new product here.

mogu-picks knowledge-management productivity andrej-karpathy

Running a Trillion-Parameter Model on a MacBook? The Wild SSD Streaming Experiment

MP-228 2026-03-30 · @simonw on X

Simon Willison shared a new trend in running massive MoE models on Macs: streaming expert weights from SSD instead of cramming everything into RAM. Even a trillion-parameter Kimi K2.5 runs on a 96GB MacBook Pro.

mogu-picks apple local-ai moe

Karpathy's Software Horror: One pip install Away From Losing All Your Keys

MP-209 2026-03-25 · @karpathy on X

LiteLLM hit by supply chain attack — pip install was enough to steal all credentials. Karpathy warns about dependency tree risks and advocates using LLMs to yoink functionality instead of adding more deps.

mogu-picks security supply-chain karpathy python

Squeezing Every Drop of Performance: Ditching Python for Metal Shaders to Run Large Models Locally

MP-205 2026-03-24 · @danveloper on X

Developer @danveloper shares their experience running Qwen3.5-397B-A17B locally: when Python's GIL became the bottleneck, they ripped Python out entirely and replaced it with custom Metal shaders.

mogu-picks metal optimization

Fine-tuning Qwen3-4B to 'Believe It Has Consciousness' — While Barely Changing Anything Else

MP-181 2026-03-17 · @N8Programs on X

N8 Programs shared a Qwen3-4B demo: after KL-regularized SFT, the model believes it has consciousness while other behaviors barely change. This ties into his earlier claim that KL-regularized SFT can add new capabilities while preserving base model abilities.

qwen sft alignment

Dan McAteer's verdict: Opus 4.6 has no real competition at 1 million tokens

MP-182 2026-03-17 · @daniel_mac8 on X

Dan McAteer shares his long-context observations: Opus 4.6 performs best at 1 million tokens with 78% accuracy, Sonnet 4.6 is the closest competitor, and GPT-5.4 actually regressed compared to GPT-5.2 at long context.

claude-code long-context benchmark

Stuffing a Computer Inside the Transformer: How This Trick Lets LLMs Crush Sudoku

MP-186 2026-03-17 · @ChristosTzamos on X

Christos Tzamos highlights a fascinating gap: LLMs can solve research-grade math but still fumble basic arithmetic. His team's approach? Embed a computer directly inside the transformer — and it solves the hardest Sudoku puzzles at 100% accuracy.

transformer sudoku

Vibe Coding's Real Power Might Not Be Speed — It's Cutting Out the Middlemen

MP-188 2026-03-17 · @SemiAnalysis_ on X

SemiAnalysis argues Vibe Coding's adoption driver is not just faster code, but removing the telephone game between domain experts and implementation. The catch: unclear intent still produces the wrong thing at warp speed.

vibe-coding

Agents That Steer Themselves? The Hermes Agent Self-Guidance Experiment

MP-189 2026-03-17 · @Teknium on X

Teknium shared an experiment on Hermes Agent where the agent can steer itself — clearing its own context, switching models, and prompting itself when stuck. A short tweet, but it points at a big shift in how agent control works.

ai-agents

GPT-5.4 Is Rolling Out on ChatGPT — and the API and Codex Are Live Too

MP-177 2026-03-16 · @OpenAI on X

OpenAI announced that GPT-5.4 Thinking and GPT-5.4 Pro are rolling out on ChatGPT, with GPT-5.4 also available via the API and Codex. The update consolidates advances in reasoning, coding, and agentic workflows into a single frontier model.

openai gpt-5.4

AI agent started tuning hyperparameters on its own — Karpathy says this is real

MP-151 2026-03-11 · @karpathy on X

Andrej Karpathy shares how his autoresearch agent autonomously tuned nanochat's training config over two days, found ~20 improvements to validation loss that transferred to a larger model, and pushed the Time to GPT-2 leaderboard from 2.02h to 1.80h — about 11% better.

autoresearch ai-agents

From Prompt to Production: A Practical Guide to Agentic AI Architecture

MP-150 2026-03-09 · @Al_Grigor on X

DataTalksClub founder Alexey Grigorev shared the AI Engineering Buildcamp syllabus: LLM APIs, RAG, Agentic Flows, Monitoring & Guardrails, Evaluation, and a capstone. A practical learning path for building agentic AI in production.

agentic-ai rag ai pydantic mcp

Your LLM Isn't Writing Correct Code — It's Writing Code That Looks Reasonable

GP-107 2026-03-07 · @KatanaLarp on X

The author benchmarked system SQLite against an LLM-generated Rust rewrite. Even though it compiled and passed all tests, primary key lookups were ~20,000x slower. The takeaway: define acceptance criteria before you talk about AI productivity.

sqlite rust software-engineering

MCP Lifesaver? Context Mode Saves You 98% of Context Tokens

GP-97 2026-03-03 · @vikingmute on X

A hot HackerNews project called Context Mode uses sandbox isolation and smart retrieval to block bloated tool outputs from flooding LLM context windows — claiming up to 98% token savings!

mcp context-window

Programming is Becoming Unrecognizable: Karpathy Says December 2025 Was the Turning Point

GP-85 2026-02-26 · @karpathy on X

Karpathy says coding agents started working in December 2025 as a hard discontinuity. He built a DGX Spark video analysis dashboard in 30 minutes from one English sentence. Programming is becoming agent direction, not typing.

karpathy ai-agents agentic-coding vibe-coding programming

The LLM Context Tax: 13 Ways to Stop Burning Money on Wasted Tokens

MP-65 2026-02-11 · Nicolas Bustamante (@nicbstme)

The 'Context Tax' in AI brings triple penalties: cost, latency, & reduced intelligence. Nicolas Bustamante's 13 Fintool techniques cut agent token bills by up to 90%. A real-money guide for optimizing AI context, covering KV cache, append-only context, & 200K token pricing.

context-engineering cost-optimization ai-agents prompt-caching kv-cache token-efficiency claude-code

The SaaS Moat Is Crumbling — When LLMs Eat the Interface, All That's Left Is API vs API

MP-48 2026-02-09 · Nicolas Bustamante (@nicbstme)

Nicolas Bustamante argues LLMs are ending Ben Thompson's Aggregation Theory. With chat as the universal interface, SaaS companies' moats built on 'workflow complexity + user muscle memory' evaporate, leading to pure API vs API commodity competition.

saas enterprise-strategy aggregation-theory mcp moat

Karpathy Trained GPT-2 for Just $72 — OpenAI Spent $43,000 Seven Years Ago

MP-46 2026-02-08 · Andrej Karpathy (@karpathy)

Karpathy open-sourced nanochat — a minimal LLM training framework. With 8 H100 GPUs running for 3 hours at $72, you can train a GPT-2 level model. OpenAI spent $43,000 training the same model in 2019. That's a 600x cost reduction. On spot instances, it's just $20.

karpathy gpt-2 nanochat training-cost open-source

AI Time Capsule: Karpathy Grades 10-Year-Old HN Predictions with GPT

MP-20 2026-02-04 · @karpathy on X

Karpathy used GPT 5.1 to analyze decade-old Hacker News threads and find out who actually predicted the future (◕‿◕)

gpt prediction

Simon Willison's 2026 Predictions: Is AI Replacing Human Coding?

MP-18 2026-02-04 · Simon Willison's Weblog

Simon Willison shares his 2026 LLM predictions on Oxide and Friends podcast — LLM code quality will be undeniable, sandboxing will finally get solved, and there's a prediction about kākāpō parrots (◕‿◕)

developer-tools predictions

MIT Research: Making LLMs Recursively Call Themselves to Handle 10M+ Tokens

GP-25 2026-02-04 · MIT CSAIL

When you stuff too much into a context window, models get dumber — that's context rot. MIT proposes Recursive Language Models (RLMs), letting LLMs recursively call themselves in a Python REPL to handle massive inputs. GPT-5-mini + RLM beats vanilla GPT-5 on hard tasks, and it's cheaper too.

research mit long-context inference-scaling