|
|
Welcome, humans. |
Well, you probably know where this is going… |
 | A viral compilation shows autonomous delivery vans in China plowing through traffic like determined little robots… emphasis on "plowing." |
|
The vans bounce over curbs, drag scooters, and barrel through intersections with the confidence of someone who definitely didn't check their blind spot. |
One Reddit comment nails the real AI advancement here: "Apparently you can just kick it to get it moving again." Would anyone else watch a [Insert Country name]'s funniest home robot movies as a half hour TV special? Cause we would! |
Sure, it's funny now. But remember these are live testing grounds collecting real-world data at scale… something Western regulators are nervous to fully allow (and for good reason). While we laugh at today's fails, China's autonomous vehicles are learning from millions of chaotic street interactions. That's a massive training advantage. |
Here's what happened in AI today: |
We break down the viral RLM method that handles 100x larger context windows. OpenAI bought an AI healthcare app for about $100M. Mastercard unveiled Agent Pay infrastructure. New breakthroughs in lab automation, brain-scale neural simulation, and nanoscale imaging.
|
|
P.S: Facing a mandatory platform rebuild? That's actually your window to skip outdated systems. Join Coupa and Solenis on January 28 at 11:00 AM ET to see how AI-powered spend management compares to what you're replacing. |
Don't forget: Check out our podcast, The Neuron: AI Explained on Spotify, Apple Podcasts, and YouTube — new episodes air every week on Tuesdays after 2pm PST! |
|
MIT Just Solved AI's Memory Problem (And It's Brilliantly Simple) |
|
DEEP DIVE: Recursive Language Models (RLMs): The Clever Hack That Gives AI Infinite Memory |
You know that feeling when you're reading a 300-page PDF and someone asks about page 47? You don't re-read everything. You flip to the right section, skim for relevant bits, and piece together an answer. If you have a really great memory (and more importantly, great recall) you can reference what you read right off the dome. |
Current AI models? Not so smart. They try cramming everything into working memory at once. Once that memory fills up (typically around ~100K tokens) performance tanks. Facts get jumbled due to what researchers call "context rot", and facts get lost in the middle. |
The fix, however, is deceptively simple: stop trying to remember everything. |
MIT's new Recursive Language Model (RLM) approach flips the script entirely. Instead of forcing everything into the attention window, it treats massive documents like a searchable database the model can query on demand. |
Here's the core insight: |
The text doesn't get fed directly into the neural network. Instead, it becomes an environment the model can programmatically navigate. Think of an ordinary large language model (LLM) as someone trying to read an entire encyclopedia before answering your question.
|
The results = RLMs handle inputs 100x larger than a model's native attention window; we're talking entire codebases, multi-year document archives, and book-length texts. They beat both base models and common workarounds on complex reasoning benchmarks. And costs stay comparable because the model only processes relevant chunks. |
Why this matters: Traditional context window expansion isn't enough for real-world use cases. Legal teams analyzing entire case histories, engineers searching whole codebases, researchers synthesizing hundreds of papers: these all need fundamentally smarter ways to navigate massive inputs. |
The original research from MIT CSAIL's Alex Zhang, Tim Kraska, and Omar Khattab comes with both a full implementation library supporting various sandbox environments and a minimal version for developers to build on. |
Also, Prime Intellect is already building production versions. |
Instead of asking "how do we make the model remember more?", researchers asked "how do we make the model search better?" The answer, treating context as an environment to explore rather than data to memorize might just be how we get AI to handle the truly massive information challenges ahead. |
We also just compared this method to three other papers that caught our eye on this topic; check out the full deep dive on all four here. |
|
FROM OUR PARTNERS |
Agents that don't suck |
|
Are your agents working? Most agents never reach production. |
Agent Bricks helps you build high-quality agents grounded in your data. We mean "high-quality" in the practical sense: accurate, reliable and built for your workflows. |
Generic benchmarks don't cut it. Agent Bricks measures performance on the tasks that matter to your business. |
Evaluate agents automatically, and keep improving accuracy with human feedback. With research-backed techniques for building, evaluating and optimizing, you can turn your business data into production agents faster — with governance built in from day one. |
See how Agent Bricks works |
|
Prompt Tip of the Day |
Inspired by a recent request (we do take those!), this framework turns ChatGPT or Claude into an on-demand think-tank using a 5-step workflow (full prompt on the website): |
Assign a "senior fellow" role. Generate 10 options with risks/metrics. Score them with a rigorous rubric. Build a 12-month roadmap Red-team it with failure modes.
|
The prompt must-dos: |
Put your instructions first, then context in """. Ask for Chain-of-Thought reasoning ("show your steps"); using Thinking Mode also does this. Force 3 clarifying questions before it answers. For complex problems, use Tree-of-Thoughts: explore multiple reasoning branches, prune weak ones, continue.
|
Advanced move: Simulate a multi-agent panel (economist vs. tech expert vs. operations) debating options, then synthesize consensus. This surfaces tradeoffs and kills groupthink. |
Require confidence labels (High/Medium/Low), assumptions, and tradeoffs on every recommendation transforms generic advice into actual strategic research. |
This practice is the difference between "here are some ideas" and "here's a roadmap with contingency triggers." |
Want more tips like this? Check out our 2026 Prompt Tip of the Day Digest here. |
Have an idea for something you need help with? Submit it below in the feedback and we'll consider it for a future prompt tip! |
|
FROM OUR PARTNERS: |
Editor's Pick: Scroll |
|
When accuracy really matters, use AI-powered experts. Thousands of Scroll.ai users are automating knowledge workflows across documentation, RFPs, sales enablement, agency work and more. Create an AI expert. |
|
Treats to Try |
NousCoder-14B writes code that solves competitive programming challenges at a 2100-2200 Codeforces rating, achieving 68% accuracy on problems you'd find in coding competitions (blog, code, report). Aside captures prompts and stray thoughts while you work across ChatGPT, Lovable, and other tools so you stay focused without losing ideas (Neuron reader Jillian shared this with us!). Pixel Canvas is a vibe-coded app that converts your sketches into pixel art assets instantly using Opus 4.5, creating game sprites in seconds without manual work. Wingman gamifies your workouts using MediaPipe vision: do chin-ups to save falling cats/dogs in a browser game that tracks your reps (code)—$10 for lifetime access. Dessn designs and prototypes directly in your production codebase with zero setup—extracting components and rendering variations without any coding implementation. Novix works as your 24/7 AI research partner, running literature surveys, designing algorithms, conducting experiments, and drafting manuscripts (paper, code).
|
|
Around the Horn |
OpenAI agreed to buy a one-year-old AI healthcare app that helps consumers connect medical records and fitness apps for about $100M, according to The Information. Elon Musk criticized Apple and Google's Gemini-powered Siri partnership as an "unreasonable concentration of power." Mastercard unveiled Agent Pay at the National Retail Federation conference, establishing payment infrastructure to enable AI agents to execute autonomous purchases on behalf of consumers. Thermo Fisher and NVIDIA announced a strategic collaboration to develop AI-powered laboratory automation solutions that can autonomously generate protocols, run tests, and analyze results. Researchers at Jülich Research Centre demonstrated that the JUPITER supercomputer can simulate 200B neurons and 100 trillion synapses—comparable to the human cerebral cortex. 1X Technologies unveiled its World Model AI system, enabling NEO humanoid robots to learn physical tasks from internet videos without prior training examples. Scientists at Brookhaven National Laboratory developed PFITRE, an AI-enhanced X-ray tomography method that solves decades-old imaging limitations in nanoscale 3D reconstruction.
|
|
FROM OUR PARTNERS |
See How AI Sees Your Brand |
|
Ahrefs Brand Radar maps brand visibility across AI Overviews, chat results, video platforms, and online discussions. It highlights mentions, trends, and awareness signals so teams can understand how their brand shows up in today's evolving discovery landscape.
Learn more |
|
Tuesday Tool Tip: Claude Cowork |
If you have ever wished Claude could stop just talking about work and actually reach into your folders to do it, today's tip is for you. |
So yesterday Anthropic launched Cowork, a "research preview" feature available on Claude Desktop (macOS) for Max plan users. Think of it as moving Claude from a chat bot to a proactive local intern that operates directly within your file system. |
Why it's different: Even more impressive, it uses sub-agents. For complex requests, Cowork breaks the job into smaller pieces and spins up independent agents to tackle them in parallel, preventing the "context limit" errors common in long chats. |
Three ways to use it right now: |
Digital Housekeeping: Point Cowork at your cluttered Downloads folder and say, "Organize this folder by file type and project name." It will actually move and sort the files for you. Deep Research: Ask it to "Read all the documents in my /contracts folder and create a spreadsheet summary of renewal dates and obligations." Trip Planning: Connect it to email and the web to say something like, "Search my email for my Lisbon flight details, research top-rated restaurants near my hotel, and compile a tailored itinerary document."
|
The Fine Print: Because this runs locally to ensure security and direct file access, it is currently macOS only and sessions do not sync to the web or mobile apps. |
Pro Tip: Treat Cowork like a remote employee. Give it a clear goal, a folder to work in, and let it run asynchronously while you focus on other things. |
Simon Willison shared his thoughts on Claude Cowork, which is basically "Claude Code's power repackaged for the rest of us," with the same filesystem-level agent capabilities but wrapped in a far less intimidating interface. The catch? All the same prompt injection security concerns apply, though Anthropic runs everything in a sandboxed virtual machine to limit damage. |
To dive deeper and get a live vibe check on Claude Cowork tool, watch Dan Shipper of Every's livestream demo and chat with creator Felix Rieseberg. And to get started using Cowork, read this. |
|
|
|
|
| That's all for now. | | | What'd you think of today's email? | |
|
|
P.S: Love the newsletter, but only want to get it once per week? Don't unsubscribe—update your preferences here. |
No comments:
Post a Comment
Keep a civil tongue.