|
|
Welcome, humans. |
Happy 2025! Hope you had a relaxing holiday break. We're SO glad to be back to writing again. 2024 was packed with AI news (especially in December!), so below we'll catch you up on everything that happened while we were away. |
Quick shoutout: Congrats to Alif, Christy, and Steven for winning our "Free Year of AI" giveaway! Thanks to everyone's support, we hit 500K subscribers (*~cue airhorn sound effects!!~*) just in time for the new year! |
Don't worry if you didn't win—more competitions are definitely coming in 2025. |
Now, let's open today's new year AI extravaganza with this little gem: |
| Ominous or inspiring? Full discussion here |
|
Here's what you need to know about AI today: |
We break down the hype and reality of OpenAI's new reasoning model, o3. Microsoft pledged $100B+ for global AI infrastructure. Meta killed its AI influencer bots after backlash over their authenticity. DeepSeek released a new state-of-the-art open source AI.
|
|
|
OpenAI demo'd o3, the closest thing to "Superintelligence" we've seen so far in AI. |
| o3 - wow |
|
|
Talk about FOMO—the day after our break started, OpenAI announced their biggest news of 2025: a new reasoning model called o3 (launch video here if ya missed it). |
Unlike ChatGPT's next-word prediction, o3 generates hundreds of solutions to a problem, thinks through each step-by-step, and uses a "verifier" to pick the best one. Think of it like a room of genius mathematicians + a master expert checking their work. |
And the results = wild: |
o3 outperforms OpenAI's chief scientist at competitive programming (2,727 ELO rating—that's really f*cking good). Scores 87.7% on graduate-level science questions (PhDs typically score ~70%). OpenAI staff are now hyping o3 up as a form of "superintelligence."
|
| FYI, this performance on the ARC-AGI test stirred up some controversy… here's a good recap video on the drama. |
|
On FrontierMath, a benchmark so tough that previous AI models scored less than 2%, o3 achieved over 25%. Here's why that's impressive: As Balaji Srinivasan put it, possibly no human mathematician can solve 25% of these problems, let alone at o3's speed. |
The catch? It cost OpenAI ~$350K and o3 had to think for ~16 hours to do it. That's right, o3 = expeeensive. Even ARC testing ranged from $20/task in low compute mode to $3K-4K per task in high compute mode. |
But remember: what's expensive today is cheap tomorrow. |
Not everyone's buying the hype, of course. Gary Marcus, famous AI skeptic, predicts we'll soon see reliability issues, while François Chollet (who created the ARC benchmark o3 crushed) says it's impressive, but definitely not AGI. Even OpenAI admits o3 still struggles with simple tasks. |
Interestingly, Chollet is already working on ARC-AGI v2 test—he believes it'll cut o3's performance to under 30% while smart humans will still score above 95%. |
This brings us to what's called Moravec's paradox, shared by NVIDIA's Jim Fan: Things humans find hard (like complex math) are actually easy for AI to learn, while things humans find easy (like walking or catching a ball) are incredibly difficult for AI. |
In o3's case, AI can now outthink world-class mathematicians, but struggles with things any 5-year-old can do (like, say, pick up a toy truck and chuck it at you, like your grumpy nephew can… which on the whole, is probably good?). |
But this paradox shows why we shouldn't judge AI progress just on single domain benchmarks, and instead, we need to come up with a more well-rounded way to assess AGI—we're not proposing we test AI for world-class math AND toy truck throwing capabilities, but we're also not ruling it out… |
For now, o3 remains in safety testing. A smaller version (o3-mini) should launch in late January, with full o3 public access planned for Q1 2025. |
Our main takeaway: Even with its massive performance gains, o3 ≠ GPT-5. According to the WSJ, GPT-5 is 18 months behind, costs up to $500M per 6-month training run, and has faced multiple failed attempts. This means o3 is either GPT-5's replacement… or its predecessor. |
OpenAI sees o3 as "the beginning of the next phase of AI" and they're not wrong—after all, we went from o1 to o3 in just three months. At this pace, we might see o4 by summer and o5 by fall. If o3 is just the warm-up to GPT-5... 2025 could get real wild. |
|
FROM OUR PARTNERS |
Want 10x faster AI? Ditch the GPUs. |
|
While everyone's focused on getting their hands on H100s, there's a fascinating alternative that's flying under the radar: AI accelerators like SambaNova's RDU platform. |
Here's why this matters: When you're running AI in production, inference speed is everything. Slow responses = frustrated users. SambaNova's custom-built chips can switch between hundreds of models in microseconds and handle complex multi-model workflows that would choke a standard GPU setup. |
The coolest part? It's an open stack—bring your own models, including Llama. No vendor lock-in (like with NVIDIA). |
Ready to see the difference? Try out SambaNova's playground and experience the speed yourself. Try the SambaNova Playground for free here. |
|
Prompt Tip of the Day |
| Google's 9 Hour AI Prompt Engineering Course In 20 Minutes |
|
|
Check this out: Tina Huang condenses Google's 9-hour prompt engineering course into a 20-minute masterclass, where she introduces two memorable frameworks to help your prompting: |
|
The video includes real examples of these frameworks in action, plus a mini-quiz at the end to help you remember everything (science approved!). |
Our favorite insight: When prompts fail, try "switching to analogous tasks." Need a marketing plan? Tina says the results are usually WAY more compelling. |
|
Treats To Try. |
|
*Transform your video content with AI. See how Dell's Precision workstations with NVIDIA RTX™ power Nuke CopyCat and Beeble.ai. Register for Jan 9th! VocAdapt automatically adjusts content to your understanding level, so you can read, watch, and listen to content in a new language without getting stuck. SEOBot automates your SEO by generating keywords and creating blog content in 50 languages—you can have it analyze your site for free and get 10 headline ideas and new homepage copy (subscription starts at $19/mo). GenFuse AI creates agents that automate complex tasks so you can build custom automations easily across any major models— free to start, watch how it works here. Flowdrafter helps you write without constantly editing yourself (this tool was made with the help of Claude! Read more about it here). Dreamina is a new image generator (sorta like Midjourney) based on the "Seed T2I" image model from ByteDance (120 free credits when you sign up). Deepseek released V3, a new state-of-the-art performance open-source AI that's also available for commercial use—try to chat with it here or use the API here. Read more here for when to use it (and when not).
|
See our top 51 AI Tools for Business here! |
*This is sponsored content. Advertise in The Neuron here. |
|
Around the Horn. |
Microsoft will invest $80B+ on AI data centers in fiscal 2025 (half in the US), plus $35B+ across 14 other countries and will partner with Blackrock/MGX on a potential $100B AI infrastructure fund. Meta took down its AI influencer bots just days after announcing their expansion, all due to backlash that erupted over their misleading, identity-and-history fabricating profiles—here's an example. CES kicks off tomorrow, and you can probably expect new AI-powered TVs, smart homes, laptops, and wearables (plus potentially ~5x new NVIDIA cards).
|
|
Monday Meme |
| And we shall call it forevermore, "The Shreka Lisa" |
|
|
|
| You're so welcome! |
|
|
| | That's all for today, for more AI treats, check out our website. | The best way to support us is by checking out our sponsors—today's are SambaNova and Dell. | See you cool cats on Twitter: @noahedelman02 |
|
|
|
| | What'd you think of today's email? | |
| |
|
|
|
|
No comments:
Post a Comment
Keep a civil tongue.