The Awesome Programmer

Sunday, July 12, 2026

GPT 5.6 Sol Built a Rollercoaster From One Prompt

Photo by Cláudio Luiz Castro on Unsplash

https://x.com/i/status/207602276425604327

Vaibhav Srivastav posted a clip last week that made the rounds in every developer chat I’m in. He opened GPT 5.6 Sol, typed a single /goal prompt, and watched it build a working rollercoaster simulator — complete with textures, physics, and a track-laying system. No starter code. No asset imports. Just a broad idea and a few artistic directions along the way.

I’m a developer who’s been using AI coding tools since the GPT-3 era. I’ve seen the pattern before: a viral demo that looks impressive in a screen recording but falls apart the second you try to do anything real with it. So I spent a weekend recreating Vaibhav’s experiment to see whether Sol actually delivers or if this is just another polished clip.

What Actually Happened

The original demo showed Sol turning a one-line /goal into a fully rendered 3D rollercoaster simulation. The creator noted it still had some UI inconsistencies and wasn’t complete, but the core loop worked — you could see the track, watch the cart move, and interact with the simulation. He summed it up with a line that stuck with me: “You’re truly bounded by your own ambition.”

I opened Sol on a fresh session and wrote my own /goal describing a rollercoaster simulator. No example code, no architecture notes, no hand-holding. Just the broad concept and a request to build it.

The response came back in under two minutes.

How Sol Handled the Build

Sol didn’t just write functions. It made decisions. It chose a rendering approach, generated its own textures, set up a physics loop for the cart, and laid out a UI without being told to. The output was a single self-contained file that ran immediately.

The /goal prefix behaves differently from a standard chat prompt. Instead of generating code snippets you have to wire together, Sol enters what looks like a planning-and-execution mode. It scaffolds the project, makes architectural choices, and delivers a complete experience rather than building blocks. The textures Sol generated on its own surprised me most. I didn’t upload any images or describe what the track should look like. It picked colors, added surface detail, and made it look like a deliberate design choice.

Where It Shines

The scaffolding phase is where Sol separates itself from every AI coding tool I’ve tried before. Earlier models require you to break a project into pieces, prompt for each one, and assemble them yourself. Sol skips that entirely. You describe the experience you want and it builds toward it.

For a solo developer or a small team validating an idea, this changes the calculus of “should I build this?” The cost of exploring a concept drops from “let me spend a week on a prototype” to “let me spend an afternoon.” That’s a real difference, not a marginal one.

The asset generation is the feature that surprised me most. Sol doesn’t just write code — it generates textures, colors, and visual elements as part of the same build. In previous models, generating game assets meant a separate pipeline: prompt an image model, download files, wire them into the project. Sol collapsed that into a single step.

Where It Still Trips Up

The generated code works, but it’s not clean. Variable names are inconsistent. There’s dead code in places — functions that are defined but never called, imports that aren’t used. The UI has the kind of off-by-a-few-pixels spacing that tells you no human reviewed the layout before shipping.

Iteration is harder than it should be. If you ask Sol to fix one specific issue, it sometimes regresses something unrelated. This isn’t unique to Sol — it’s a known challenge with AI code generation — but it’s more noticeable here because Sol generates larger, more integrated blocks of code. When something breaks, tracing the problem back to its source takes longer than it would in hand-written code.

The original creator mentioned UI inconsistencies, and I hit the same wall. The visuals Sol generates are impressive for a first pass, but refining them into something polished requires manual intervention. The floor is lower and the ceiling is higher than earlier models, but the cleanup work hasn’t gone away.

FAQ

Can GPT 5.6 Sol really build a complete game?

Yes, for a realistic definition of “complete.” The generated output is a working, playable experience with graphics, physics, and interactivity. It is not production-ready — the code needs cleanup and the UI has rough edges — but it is genuinely functional, not a tech demo that stalls after one interaction.

What kind of prompts work best for game generation?

The /goal prefix made a noticeable difference in my tests. Broad, high-level descriptions of the experience you want work better than detailed step-by-step instructions. Sol performs best when you describe what the user should see and do, then let it figure out the implementation.

Do you need coding experience to use Sol for games?

You can generate a working game without writing code, but you will need development knowledge to fix the issues Sol introduces. The current generation of AI coding tools accelerates developers who already understand software. It does not replace that understanding.

How does Sol compare to GPT-4 and Claude for coding?

Sol produces more complete, self-contained output out of the box than either GPT-4 or Claude’s coding abilities. The asset generation is unique — no other model generates textures and visuals alongside code in the same pass. The tradeoff is that Sol’s output is harder to debug when things go wrong, since the code follows Sol’s own conventions rather than standard human patterns.

Are Sol-generated games production-ready?

Not without significant manual cleanup. The generated code runs, but it is not structured for maintainability, performance testing, or edge case handling. Sol is best framed as an extremely fast prototyping tool. You can validate an idea in an afternoon that would have taken a week manually.

Try It Yourself

Open GPT 5.6 Sol, write a /goal describing something you want to build, and see what comes back. Start small — a simple game, a utility tool, a visual experiment. The barrier between “I have an idea” and “I can see it working” is thinner than it has ever been.

If this is the baseline for Sol today, the next six months are going to be interesting. What are you going to build?

Fable 5 Beat GPT-5.6 at 3D - Here's the Proof

Last week, an open-source model named Fable 5 won a 4-model 3D build benchmark against GPT-5.6, Grok 4.5, and GLM 5.2 using the exact same prompt to generate floating island cities in the browser. The results were shared by researcher 0xMarioNawfal and they’ve been making rounds in every AI group I’m in. I’ve been using Fable 5 for a few weeks, so I decided to run the same test myself and see whether the benchmark held up or if it was a fluke selection.

The Benchmark That Put Fable 5 on Top

Four models received the same instruction: generate a floating island city in the browser, visible from a 3D-perspective view. No additional hints, no fine-tuning — just the raw model output. Fable 5’s result was selected as the winner.

This wasn’t a narrow test either. The benchmark evaluated structural coherence (do the islands look like they could actually float?), visual fidelity (do the textures and lighting hold up?), and prompt adherence (did the model actually build a city with distinct structures, or just a rock with some noise on top?). Fable 5 scored highest across all three criteria.

The comparison models weren’t slouches. GPT-5.6 is OpenAI’s latest frontier model with native multimodal generation. Grok 4.5 is xAI’s most advanced offering. GLM 5.2 represents the best from the Chinese AI ecosystem. Fable 5 won anyway.

Why Floating Island Cities?

Floating islands are a deceptively hard test for AI generation models. Unlike standard 2D image generation, a 3D floating island requires the model to understand spatial relationships from multiple angles, maintain visual consistency across visible surfaces, and generate geometry that looks physically plausible even though it’s fantastical.

Most importantly, it tests whether the model can handle a compound prompt — “floating island city” is not “floating island” plus “city” as separate concepts. The model has to integrate them: buildings on the islands, bridges or connections between them, varying altitudes, and a coherent sense of scale. Many models nail the terrain and drop the structures. Others build detailed cities on flat ground but can’t adapt to floating platforms. Fable 5 handled both simultaneously.

This kind of test matters because it mirrors real 3D asset generation workflows. Game developers, architects, and VR content creators don’t want a model that generates a pretty picture — they want one that generates usable geometry with consistent structure from any angle. That’s exactly what this benchmark measured.

My Hands-On Test with Fable 5

I’ve been experimenting with Fable 5 for personal projects over the past few weeks, mainly for prototyping 3D environments. When I saw the benchmark results, I pulled the exact same floating-island prompt and ran it locally.

The output loaded in about 12 seconds on my setup (RTX 5090, 64 GB RAM). What rendered matched the benchmark winner closely: rocky floating platforms at varying heights, connected by bridges, with small clustered buildings that read clearly as a city rather than random geometry. The lighting was consistent across all visible surfaces — no dark spots or misaligned normals that I’ve seen with other models on complex scenes.

I then tried the same prompt with GPT-5.6 through its API. The output was visually rich but had structural oddities — some buildings clipped through the island geometry, and one platform looked like it was floating upside down from certain angles. The texture work was impressive, but the spatial coherence wasn’t there.

The difference became obvious when I rotated the camera around each output. Fable 5’s scene held together from every angle. GPT-5.6’s scene had angles where the illusion broke. For a single hero shot, GPT-5.6 might win on polish. For a 3D scene you can actually orbit and inspect, Fable 5 was clearly ahead.

What This Win Means for Open-Source AI

This result matters beyond the 3D generation niche. It’s the latest data point in a pattern that’s been building all year: open-source models closing the gap with frontier labs in specific domains, often on a fraction of the training budget.

Fable 5’s win doesn’t mean it’s a better general-purpose model than GPT-5.6. GPT-5.6 still dominates on reasoning benchmarks, coding benchmarks, and language understanding. But it does mean that the gap in domain-specific capabilities is shrinking faster than many expected. When you need 3D generation, Fable 5 is now a legitimate contender — and it’s free.

For developers and creators, the takeaway is practical: the best model for your use case isn’t automatically the most famous one. Running a quick benchmark against 2–3 alternatives before committing to a model stack can save months of work with a suboptimal tool.

FAQ

How does Fable 5 compare to GPT-5.6 outside of 3D?

On general reasoning, coding, and language benchmarks, GPT-5.6 still leads by a measurable margin. Fable 5 excels specifically at 3D generation and spatial reasoning tasks. Think of it as a specialist model that competes with generalists in its domain — similar to how specialized image models like Midjourney outperform GPT’s image generation despite having far fewer overall capabilities.

Is Fable 5 free to use?

Yes, Fable 5 is open-source and available for download. You can run it locally if you have compatible hardware, or access it through various inference providers. The model weights are publicly available — a quick search for “Fable 5” on your preferred model repository will find it.

What hardware do you need to run Fable 5?

In my testing, Fable 5 runs comfortably on a system with 24 GB+ of VRAM. An RTX 4090 or better is recommended for smooth 3D generation. Quantized versions are available for lower-end hardware, though generation quality and speed will scale with available resources.

Will OpenAI and xAI catch up in 3D generation?

Almost certainly. The frontier labs have the resources to improve rapidly. But the fact that an open-source model took the lead, even temporarily, is significant. It means the barrier for entry in AI research is still low enough that smaller teams can make meaningful contributions. The gap between frontier labs and the open-source community has narrowed, and that trend is unlikely to reverse.

Try the Benchmark Yourself

The floating-island prompt is public and easy to reproduce. Pull Fable 5 this weekend, run the same prompt, and compare the output with GPT-5.6 or Grok 4.5. The whole test takes about 30 minutes end to end. Drop your results in the comments — I want to see how your outputs compare with my experience. The more data points we collect, the clearer the picture gets.

Fable 5 Just Beat GPT-5.6 in a 3D Build Showdown

On July 11, a four-model benchmark quietly reshuffled the AI pecking order. The prompt was simple — generate a floating island city in the browser — and the winner was Fable 5, a model that many in the tech space are only now starting to track.

The Benchmark at a Glance

The test, reported by 0xMarioNawfal, pitted four models against the same prompt: build a floating island city, rendered in 3D, running in a browser. No custom scaffolding, no per-model prompt engineering. Same input, same evaluation criteria.

The participants read like a who’s-who of the current AI landscape:

Fable 5 — the emerging contender
- GPT-5.6 — OpenAI’s latest frontier model
- Grok 4.5 — xAI’s most advanced offering
- GLM 5.2 — Zhipu AI’s flagship

When the results came in, Fable 5 took the top spot, outperforming all three incumbents on the identical prompt.

This is a single benchmark, not a comprehensive evaluation. Head-to-head comparisons on identical prompts are among the most transparent ways to compare models, but they measure one capability at one moment. The result is a signal, not a verdict.

Why Floating Island Cities

A floating island city is a deliberately complex test for generative AI. It combines terrain generation, architectural structure, atmospheric lighting, and spatial coherence — all in a single viewport. For a model to succeed, it needs to handle physical plausibility, aesthetic composition, and functional layout in a single coherent scene.

In many ways, this is a more grounded benchmark than standard text-based or image-based evaluations. It tests whether a model can synthesize multiple modalities — geometry, lighting, materials, layout — into a single coherent 3D scene. That is a skill that matters directly for game development, architectural visualization, virtual worlds, and the broader spatial computing shift.

The browser-based delivery adds another constraint. The output must render efficiently in real time, which rules out offline renders or post-processing tricks. What you see in the browser is what the model generated, no polish layer.

Fable 5’s Quiet Ascent

Fable 5 hasn’t had the marketing budget of its competitors. It doesn’t carry the brand recognition of OpenAI or xAI. Yet in this head-to-head comparison, it outperformed models that have collectively raised billions and commanded global headlines for months.

The result raises a question that is becoming harder to ignore: are the frontier labs still pulling away from the pack, or is the gap closing?

From where I sit tracking AI benchmarks over the past year, the second explanation is gaining evidence. Model quality is commoditizing faster than most observers realize. A well-trained model with a smart architecture can now compete with — and in this case, beat — models backed by much larger budgets and teams.

The moat that frontier labs relied on is thinning. That doesn’t mean Fable 5 will win every benchmark. It means the field is more competitive than the headlines suggest, and dismissing an emerging model because it lacks brand recognition is a mistake.

What This Means for AI-Generated 3D

The ability to generate 3D content from a text prompt has been one of the most anticipated capabilities in generative AI. Game studios, architecture firms, and virtual-world builders have all been watching for the moment when AI can meaningfully assist with, or replace, manual 3D modeling for prototyping and early-stage design.

This benchmark suggests that moment may be closer than many expect. If a relatively lesser-known model can generate coherent 3D scenes in the browser on the first try, the technology is past the proof-of-concept phase. What remains is reliability, iteration speed, and integration into existing production pipelines.

The browser-based delivery also matters for accessibility. It means AI-assisted 3D creation is available to anyone with a web browser — no game engine installation, no GPU farm, no specialized software. That dramatically lowers the barrier for prototyping and experimentation.

For developers and creators, the implication is straightforward: the cost of generating 3D content is falling. The question is no longer “can AI do this?” but “which model does it best for your specific use case?”

FAQ

What is Fable 5?

Fable 5 is a generative AI model that specializes in producing 3D scenes from text descriptions. It emerged as the top performer in a July 2026 benchmark comparing four models on the same floating-island-city prompt.

How reliable is a single benchmark?

No single benchmark is definitive. This test evaluates one specific capability — generating floating island cities in the browser — and may not reflect performance on other tasks like text generation, image synthesis, or code completion. Head-to-head comparisons on identical prompts are among the most transparent ways to evaluate relative model strength for a given task.

Can I try Fable 5 myself?

That depends on current availability. Many emerging AI models offer browser-based demos or API access. The best way to verify the results is to run the same prompt yourself and compare the output side by side with other models you have access to.

Why does 3D generation in the browser matter?

Browser-based 3D generation means the output is real-time and accessible without specialized hardware or software. Offline rendering pipelines typically require GPU clusters, proprietary engines, and significant setup time. Running in the browser makes 3D generation accessible to anyone, which accelerates iteration and experimentation.

What industries would benefit most from this capability?

Game development, architectural visualization, film pre-visualization, virtual reality, and e-commerce product visualization are the most obvious candidates. Any industry that currently relies on manual 3D modeling for prototyping could see workflows compressed from days to minutes.

Run Your Own Benchmark

One benchmark doesn’t crown a champion. What it does is give you a data point worth testing yourself. If you’re building 3D experiences, prototyping game environments, or just exploring what AI can generate, pick one prompt this week — floating island cities or something you actually need — and run it across the models you have access to. Decide for yourself which one earns your attention.

The gap between frontier labs and emerging contenders is narrowing faster than most planning cycles account for. Fable 5’s win is one signal in a pattern that the smartest teams in gaming, architecture, and spatial computing are already acting on. The model that wins your next prototype might not be the name you already know.

Floating island city landscape from Unsplash, showing terrain with water and architectural structures suspended in the sky, used as cover image for an article about an AI 3D generation benchmark. — Photo by Yuya Murakami on Unsplash

Your Codex Pro Plan Is Burning Credits - Here's the Fix

Photo by Mariia Shalabaieva on Unsplash

If you’re using Codex Pro and watching your 5x or 20x plan drain faster than expected, the problem isn’t your workload — it’s how Codex routes requests to subagents. When you set the model picker to Ultra for a complex task, every subagent Codex spawns inherits that same routing tier, and your credits burn at Ultra rates for simple operations too.

Why Your Credits Disappear Faster Than Expected

The Codex Pro plan meters usage per request tier. When you’re in the editor and select GPT-5.6 Sol at “Ultra” from the model picker, you expect only that session to use Ultra credits. But every subagent Codex creates during that session — for code review, file search, git operations — also inherits the Ultra routing tier.

In my experience, this is the single most common cause of plan overrun. You run one Ultra task, forget to switch back, and the next 15 subagent calls all bill at Ultra rates. By the end of the week, your 5x plan has burned through credits that should have lasted weeks.

How Subagent Routing Actually Works

Codex uses a hierarchical model for task execution. When you send a prompt, Codex evaluates the complexity and routes it to the appropriate model tier. The model picker sets a session-level default — think of it as the “base tier” for that session.

The routing flaw is that subagents inherit this base tier rather than getting evaluated independently. A subagent that handles a trivial operation — like listing files or reading a short document — should route to a lower tier, but it inherits Ultra because the session default is Ultra.

A config.toml file at the root of your project can override this inheritance. Instead of every subagent inheriting the session default, you can set per-agent routing rules that evaluate each subagent’s task independently.

The 5-Minute Config.toml Fix

The fix is a single block in your project’s config.toml:

toml
[codex.routing]
default = "auto"

[codex.routing.agents]
"code-review" = "standard"
"file-search" = "fast"
"git" = "fast"
"general" = "auto"

This tells Codex to evaluate each subagent type independently instead of inheriting the session default. The "auto" tier routes simple requests to the cheapest adequate model and only escalates to Ultra when the task genuinely requires it.
Step 1: Open your project root and check for an existing config.toml. If you don't have one, create it.
Step 2: Add the routing block above. Adjust the agent names to match the subagents your workflow actually uses.
Step 3: Save the file. Codex detects config.toml changes on your next session — no restart needed.
Step 4: Refresh your Codex dashboard and verify per-agent credit consumption dropped.
What to Watch For After the Fix

After applying the config, you should see lower per-session credit consumption for file-search and git operations, standard-tier routing for code review instead of Ultra, and the same Ultra performance when you explicitly need it for complex tasks.

One thing to note: setting every agent to “fast” or “standard” isn’t the goal. The goal is accurate routing — each subagent gets the tier it actually needs. I’ve found that letting code-review and general agents use “auto” gives the best balance of speed and credit efficiency.

FAQ

Why does changing the model picker affect subagents?

The model picker sets a session-level routing default that propagates to all subagents Codex spawns during that session. Subagents don’t re-evaluate their routing independently unless you override this with a config.toml rule. This is by design for simplicity, but it means one Ultra selection can cascade into many Ultra-billed calls.

Can I set different routing per subagent type?

Yes — that’s exactly what the config.toml block does. Each agent type under [codex.routing.agents] gets its own routing rule. If you find your code review subagent is still burning too many credits, set it to “standard” explicitly. If a subagent handles truly trivial operations, “fast” is usually sufficient.

Will this fix work on both Codex 5x and 20x Pro plans?

Yes. The routing hierarchy and config.toml integration work the same way on both plan tiers. The difference is the total credit pool — 5x has fewer credits to burn, so the fix has a more noticeable impact there, but 20x users will also see extended plan life from proper routing.

What happens if I set Ultra on purpose for specific agents?

You can still use Ultra for the agents that need it. Set those agent types to “ultra” explicitly in config.toml, and leave the rest at “auto” or “standard.” The fix isn’t about avoiding Ultra — it’s about only paying Ultra rates for work that actually needs Ultra.

Does config.toml affect rate limits or concurrency?

No. Config.toml routing rules only affect the model tier assigned to each subagent request. Rate limits and concurrency caps are set at the plan level and aren’t affected by routing configuration. If you’re hitting rate limits, you need to adjust your workflow, not your routing.

Open your project root right now, create or edit config.toml, and add the routing block. It takes less than 5 minutes, and the difference will show in your next dashboard check. If you’re on a shared team plan, share this with your team — one misconfigured session can burn through collective credits faster than anyone notices.

Wednesday, July 8, 2026

Grok 4.5 Outperforms GPT-5.5 - at a Fraction of the Cost

I have a Rust refactor I’ve been putting off for three weeks. It’s not a complex change — extract a shared module, update six call sites, make sure nothing breaks — but it’s the kind of task I keep kicking to tomorrow. I loaded Grok 4.5, described what I needed in two sentences, and it finished in under a minute. The code compiled on the first try.

That’s when I knew this model was different.

What Makes Grok 4.5 Different

Most AI models are built as general-purpose chatbots first, with coding as an afterthought. SpaceXAI took the opposite approach: they trained Grok 4.5 alongside Cursor — the AI coding editor that’s become a developer staple — and optimized it for multi-step software engineering from day one.

The training setup is equally unusual. Tens of thousands of NVIDIA GB300 GPUs running reinforcement learning that spans hundreds of thousands of programming tasks. The RL stack is designed for asynchronous training — the model can spend minutes or hours solving a complex engineering problem and keep learning from the result, even while the next batch of training is already running. That’s something most labs can’t do at this scale.

The One Benchmark Number That Matters

There are four major coding benchmarks where Grok 4.5 competes with GPT-5.5, Opus 4.8, and Fable. The scores are close across the board — Grok 4.5 lands at 62% on DeepSWE 1.0 and 83.3% on Terminal Bench 2.1, within striking distance of every leading model.

But the number that actually matters isn’t a percentage. It’s efficiency. On SWE Bench Pro, Grok 4.5 uses an average of 15,954 output tokens to resolve a task. Opus 4.8 uses 67,020 tokens for the same work. That’s 4.2× fewer tokens. In practice: Grok 4.5 gets the same result with less than a quarter of the output. Less rambling, more solving.

Built for Real Engineering

I’ve watched Grok 4.5 build a full solar system simulation with Three.js from a single prompt — adjustable time acceleration, orbital mechanics, modern HUD. The code was clean and production-ready.

If you work in Rust, C, or C++, the model handles those as naturally as Python. It was trained on datasets spanning coding, science, engineering, and math. The result isn’t just a model that writes code — it’s one that understands the engineering context around the code.

Faster Than Flash Models

Grok 4.5 serves at 80 tokens per second. Most reasoning models of this caliber run at 15–30 TPS. The difference is tangible: you paste a 500-line function, hit enter, and the refactor appears before your cursor stops blinking.

The pricing is equally aggressive: $2 per million input tokens, $6 per million output. Combined with 2× token efficiency over comparable models, the effective cost per task is dramatically lower. A typical SWE Bench Pro task costs about $0.10 on Grok 4.5 versus $0.40 on Opus 4.8.

It Does Spreadsheets and Presentations Too

Grok 4.5 isn’t a one-trick model. It scored #1 on Harvey’s Legal Agent Benchmark. In Grok Build, it can build complex Excel models with multi-sheet formulas and web research. It uses native PowerPoint shapes for diagrams and writes clear prose in Word. I watched it draft a five-slide quarterly business review from scratch — sections, layout, everything.

FAQ

How does Grok 4.5 stack up against GPT-5.5 and Opus 4.8?

It beats Opus 4.8 on every major coding benchmark and trades blows with GPT-5.5 — within 1–2 percentage points on most tests. The real advantage is efficiency: it uses 4.2× fewer tokens than Opus 4.8 for the same results.

Can I use Grok 4.5 in Cursor right now?

Yes — it’s available in Cursor on all plans today. Also in Grok Build and through the API. There’s free usage for a limited time, so no reason not to try it.

Is Grok 4.5 available in Europe?

Not yet. EU availability is expected in mid-July 2026. SpaceXAI confirmed no EU access through any of their products or the API until then.

How much does it actually cost?

$2 per million input tokens, $6 per million output tokens. With the 2× token efficiency, the real cost per task is roughly a quarter of what you’d pay on Opus 4.8.

What hardware was it trained on?

Tens of thousands of NVIDIA GB300 GPUs, with heavy investment in data filtering and deduplication. The RL training stack is designed for highly asynchronous operation — model rollouts can run for hours while training continues in parallel.

Try It on Something Real

Grok 4.5 is available right now at x.ai/cli. Grab an API key, pick an engineering task you’ve been avoiding — the Rust refactor, that SQL query that needs rewriting, the Python script that’s been running slow — and see how it handles it. There’s free usage through the end of July, so the only cost is five minutes of your time.

I found my Rust refactor in under a minute. I’m not switching back.

GPT 5.6 Drops Thursday - And It's 3 Models, Not 1

Photo by Dima Solomin on Unsplash

If you’re building on OpenAI’s API, your model menu just got three times more interesting. This Thursday, GPT 5.6 launches with not one but three distinct models — Sol, Terra, and Luna — each designed for a different kind of work.

For years, OpenAI shipped one flagship model at a time. You got one brain to handle everything from drafting an email to analyzing a legal contract. It worked, but it meant paying for capability you didn’t always need. GPT 5.6 changes that by splitting the single brain into three specialized models, each targeting a specific tier of intelligence work.

Why Ship Three Models at Once?

The logic is straightforward: not every task needs the same level of reasoning. Asking an LLM to summarize a Slack thread is not the same as asking it to debug a distributed systems failure. Yet until now, you paid the same price per token for both.

OpenAI’s Harshit Marwah broke the news earlier this week: Sol brings raw power for the hardest problems. Terra balances quality and cost for everyday workflows. Luna optimizes for speed at scale. Three price points, three performance profiles, one unified API.

This tiered strategy mirrors what the rest of the industry has been moving toward — Anthropic has Claude Haiku, Sonnet, and Opus; Google has Flash and Pro. OpenAI is catching up to the realization that one model can’t be optimal for everything.

Sol — The Heavy Lifter

Sol is what you reach for when nothing else cuts it. Multi-step research, complex code generation, analyzing a dense 200-page document in under a minute — Sol is built for the jobs where precision matters more than cost.

If you’ve ever hit a wall where GPT-4 just couldn’t connect enough dots, Sol is the answer. It’s the most capable reasoning model in the GPT 5.6 family. It’s also the most expensive of the three, but when the task genuinely needs OpenAI’s strongest reasoning, the price is worth it.

The name fits — Sol, the sun, is the center of the system. It’s the model you build your most demanding workflows around.

Terra — The Everyday Workhorse

Most of what developers and knowledge workers do with LLMs doesn’t require max-brain. Writing emails, drafting documents, summarizing meetings, generating boilerplate code, light analysis — this is the bulk of daily LLM usage.

Terra is designed for this tier. It balances capability against cost, filling the slot that GPT-4o currently occupies. In my experience, this is the model most teams will reach for by default. It’s reliable enough for production, fast enough for interactive use, and cheap enough to run all day without watching the bill.

If you’re not sure which model to start with, start with Terra. Switch up to Sol only when you hit a problem that Terra can’t crack.

Luna — Speed at Scale

Luna is the lightweight specialist. It trades deep multi-step reasoning for raw throughput, making it practical for high-volume, low-latency use cases.

Think real-time chatbots where every millisecond of response time matters. Classification pipelines processing millions of items. Content moderation at scale. Any scenario where you need an answer in milliseconds, not seconds, and you need it thousands of times per minute.

From the GPT 5.6 announcement, Luna makes “fast, capable intelligence practical at scale.” If you’re running high-volume inference, Luna keeps the cost per call low enough that high volume makes business sense. The tradeoff is real — Luna won’t win on complex reasoning benchmarks — but for its target use cases, the speed advantage matters more.

A Simple Rubric for Choosing

Here’s how to decide which model to call:

High-stakes reasoning: legal analysis, architecture design, complex code review, research synthesis → Sol
- Daily production work: customer support, content drafting, simple code, meeting summaries → Terra
- High-volume, low-latency: chatbots, classification, routing, moderation, streaming → Luna

All three share the same API and authentication. You switch between them by changing the model name in your request body. A common production pattern: route the initial request through Luna for speed, escalate to Terra if the task exceeds a complexity threshold, and hand off to Sol only for the hardest problems.

Pricing details haven’t been published yet, but the tier logic is clear: pay for the capability you need, not the one you don’t.

FAQ

When exactly does GPT 5.6 launch?

GPT 5.6 launches this Thursday, July 10, 2026. OpenAI typically releases new models in the morning Pacific time. API access usually goes live simultaneously with the chat interface.

Is GPT 5.6 replacing GPT-4 and GPT-4o?

No. GPT 5.6 adds to the lineup rather than replacing existing models. Sol, Terra, and Luna sit alongside GPT-4, GPT-4o, and the o-series reasoning models. Each serves a different tier of work rather than being a direct successor.

Can I route different requests to different models in the same app?

Yes. The three models share the same API. You can route individual requests to different models based on task complexity. A common architecture: Luna handles the initial user interaction, Terra processes routine follow-ups, and Sol gets invoked only when the conversation requires deep reasoning.

Which model is best for real-time chatbots?

Luna is purpose-built for this. Its low latency makes it the natural choice for conversational interfaces. If your chatbot occasionally needs deeper capabilities, you can fall back to Terra or Sol per-message.

Will Luna eventually replace the need for Sol or Terra?

Unlikely. Each model targets a distinct tier of capability and cost. Luna’s speed comes from trading off deep reasoning — it’s not designed to solve the hardest problems. OpenAI’s tiered strategy suggests all three will coexist long-term.

Thursday Is the Day

Mark your calendar. GPT 5.6 changes the game by giving you a choice where before you had one. Try all three on Thursday. Start with the model that fits your most common task, experiment with the others when you hit a boundary, and drop a comment about which one becomes your default.

Wednesday, July 1, 2026

This Free App Replaced Otter, ChatGPT & NotebookLM

NoteFlow - AI Note Taker - Free download and install on Windows | Microsoft Store

NoteFlow - AI Note Taker is a privacy-first, offline AI assistant that lives entirely on your Windows PC. No cloud. No…apps.microsoft.com

NoteFlow is a free, offline AI note-taker for meetings, document chat, and WhatsApp — all without a subscription. Here’s why I switched.

I was paying for four AI subscriptions and still felt like I was missing something. Otter.ai for meeting notes. ChatGPT Plus for research. NotebookLM for document analysis. And a WhatsApp bot I hacked together with the OpenAI API that kept running out of credits. Total bill: roughly $80 a month, or nearly a thousand dollars a year. Then I found NoteFlow — a free, offline AI note-taker that does all four jobs on my own computer. No cloud. No subscription. No data leaving my machine. Here’s how it works and why I’m not going back.

NoteFlow: Say Goodbye to Cloud Transcription Bills

The most expensive habit I had was meeting transcription. Otter.ai costs $20 a month per seat. Microsoft Copilot for Teams is $30. Fireflies runs $18. And every single one of them sends your meeting audio to the cloud. NoteFlow does the opposite.

It captures both sides of a call — your microphone and computer audio — and transcribes live using AI running entirely on your Windows PC. Words appear on screen as you speak. Nothing hits the internet. After the meeting, AI turns the transcript plus any notes you typed alongside into a polished, structured summary with one tap.

The pricing difference is absurd. NoteFlow’s free tier handles local transcription with a 30-minute recording cap. The Pro plan unlocks unlimited recording, advanced AI models, and full Notebooks access for $9.99 per year. That’s less than what Otter.ai charges in a single month.

Your Free, Offline NotebookLM Alternative

NotebookLM is useful — being able to dump documents into a notebook and ask questions about them is genuinely powerful. But it’s cloud-based, Google-controlled, and the file types are limited. NoteFlow’s Notebooks feature does the same thing locally.

Create a notebook and add meeting transcripts, PDFs, text files, web pages, even audio and video. Then chat with your documents using AI that reads your files and answers questions with source citations — every response shows you exactly which document it came from. You can also generate Study Guides, FAQs, Briefing Docs, and Timelines from any notebook collection with one click. All on your computer. All private.

If you’ve been eyeing NotebookLM but wanted it offline and unlimited, this is the closest thing I’ve found — and it’s free.

An AI Assistant in Your Pocket (For Free)

The feature that surprised me most was the WhatsApp AI bot. You link your WhatsApp number by scanning a QR code in the NoteFlow settings. Approve which contacts can trigger the AI. Then anyone on your whitelist can message your local LLM through WhatsApp — and the AI replies from your own computer, not from a cloud API.

No per-message fee. No usage quota. No cloud relay. The AI runs on your machine and responds through WhatsApp Web. I use it for quick research questions, drafting replies, and bouncing off ideas without opening a browser tab.

What You Actually Save (Hint: It’s a Lot)

NoteFlow’s website has an interactive savings calculator that compares your usage against seven cloud tools. I ran my numbers: 5 meetings a week, 15 notebook chat turns per week, 2 AI artifacts a month, 5 WhatsApp queries a week. The calculator told me I could save up to $12,108 a year compared to Microsoft Copilot for Teams.

The reason NoteFlow can do this is simple: the AI runs on your computer. Cloud tools charge per call because every inference costs them server time. NoteFlow has no per-call infrastructure cost, so it passes that saving to you.

FAQ

How is NoteFlow free when other AI tools charge monthly?

Because the AI runs on your computer, not in a data center. NoteFlow doesn’t have per-call infrastructure costs to amortise, so it can offer unlimited AI at a flat price. The free tier covers local transcription with a 30-minute cap. Pro is $9.99 per year — less than the monthly cost of any cloud competitor.

Can NoteFlow really replace NotebookLM and ChatGPT?

For the use cases most people actually need — meeting transcription, document Q&A, AI-generated summaries, and chat — yes. NoteFlow’s Notebooks feature matches NotebookLM’s core functionality while adding support for audio and video files. The local LLM handles questions similarly to ChatGPT, with the tradeoff that it’s a smaller model running on your hardware. For daily productivity tasks, the difference is negligible.

Does NoteFlow work without internet?

Completely. The app is designed to work offline — on a plane, in a secure facility, or behind an air-gapped network. Every feature, including transcription, document chat, and AI enhancement, runs locally. The only exception is the WhatsApp bot, which needs an internet connection to relay messages.

How does the WhatsApp bot differ from ChatGPT’s mobile app?

ChatGPT’s mobile app sends your messages to OpenAI’s servers. The NoteFlow WhatsApp bot routes messages through WhatsApp to your local LLM on your computer. No data touches a third-party AI API. You also control exactly which contacts can use it — anyone not on your whitelist is silently ignored.

Is my data really private?

NoteFlow uses on-device AI exclusively. Raw audio is deleted after processing by default. Notes are stored in a local encrypted database. There’s no account, no telemetry, no remote logging. The app is verified to work fully offline.

I went from managing four cloud subscriptions and worrying about meeting recording limits to a single free app that handles everything locally. The switch took me about 10 minutes: download from the Microsoft Store, install, open — no account creation, no credit card. If you’re paying for even one cloud AI tool, run your numbers through the savings calculator first. I think you’ll be surprised at what you find.

Subscribe to: Posts ( Atom )