Your curated collection of saved posts and media

Showing 24 posts Β· last 30 days Β· by score
H
HelloSurgeAI
@HelloSurgeAI
πŸ“…
Feb 10, 2026
28d ago
πŸ†”32576001

Another Hemingway-bench prompt asks for an oral presentation about time management. GPT-5.2 writes like a LinkedIn engagement farm: "When people hear β€œworking from home,” they often think it means more freedom, more comfort, and maybe even more free time. And sometimes that’s true. But what doesn’t get talked about enough is how easily work-from-home life can get messy if you don’t manage your time well." (πŸ₯±) Opus 4.6 feels like a charismatic creative working the room: "So... raise your hand if you've ever "worked from home" and somehow ended up four hours into a Netflix series at 2 PM on a Tuesday. No judgment. We've all been there."

Media 1
πŸ–ΌοΈ Media
H
HelloSurgeAI
@HelloSurgeAI
πŸ“…
Feb 10, 2026
28d ago
πŸ†”66529051
⭐0.38

Overall: GPT-5.2 feels like a mass market writer; Opus has personality and soul. See the updated leaderboard here! https://t.co/LNah0H0gBy

L
LewisNWatson
@LewisNWatson
πŸ“…
Feb 16, 2026
22d ago
πŸ†”68219356

* for context I'm fine-tuning VLMs at the moment doing lora a rank sweep ablation. https://t.co/t4UzsvritN

Media 1
πŸ–ΌοΈ Media
D
dair_ai
@dair_ai
πŸ“…
Mar 03, 2026
7d ago
πŸ†”92939071

New research on improving self-reflection in language agents. A core problem with agent self-reflection is that models tend to generate repetitive reflections that add noise instead of signal, hurting overall reasoning performance. It introduces ParamMem, a parametric memory module that encodes cross-sample reflection patterns directly into model parameters, then uses temperature-controlled sampling to generate diverse reflections at inference time. ParamMem shows consistent improvements over SOTA baselines across code generation, mathematical reasoning, and multi-hop QA. It also enables weak-to-strong transfer and self-improvement without needing a stronger external model, making it a practical upgrade for agentic pipelines. Paper: https://t.co/16Yp56j8Jm Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1Media 2
πŸ–ΌοΈ Media
O
omarsar0
@omarsar0
πŸ“…
Mar 03, 2026
7d ago
πŸ†”60935331

Theory of Mind in Multi-agent LLM Systems. A good read for anyone building systems where agents need to model each other's beliefs to coordinate effectively. This work introduces a multi-agent architecture combining Theory of Mind, Belief-Desire-Intention models, and symbolic solvers for logical verification, then evaluates how these cognitive mechanisms affect collaborative decision-making across multiple LLMs. The results reveal a complex interdependency where cognitive mechanisms like ToM don't automatically improve coordination. Their effectiveness depends heavily on underlying LLM capabilities. Knowing when and how to add these mechanisms is key to building reliable multi-agent systems. Paper: https://t.co/8ASbUgzGjF Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

Media 1
πŸ–ΌοΈ Media
O
omarsar0
@omarsar0
πŸ“…
Mar 05, 2026
5d ago
πŸ†”72341167

Banger CLI tool released by Google. CLI for Google Workspace + a bunch of useful Agent Skills to go with it. We had a few unofficial ones floating around, so it's nice to finally see an official one. Testing it already. https://t.co/jDWw45P4oA

Media 1
πŸ–ΌοΈ Media
O
omarsar0
@omarsar0
πŸ“…
Mar 05, 2026
5d ago
πŸ†”73420447

Don't overload your AGENTS dot md files. Keep them brief. GitHub's analysis of 2,500+ repos found what makes AGENTS dot md files work: - provide the agent a specific job or persona - exact commands to run - well-defined boundaries to follow - clear examples of good outputs https://t.co/GFSEjLsx8C

Media 1
πŸ–ΌοΈ Media
A
andimarafioti
@andimarafioti
πŸ“…
Feb 26, 2026
12d ago
πŸ†”10559523

Introducing Faster Qwen3TTS! Realistic voice generation at 4x real time: - Same amazing voice quality from Qwen's model - Streaming support with <200 ms to first audio - 5x faster than the official implementation Just pip install faster-qwen3-tts Try the demo! https://t.co/Dcf9jNXz8g

πŸ–ΌοΈ Media
πŸ”ai_fast_track retweeted
A
Andi Marafioti
@andimarafioti
πŸ“…
Feb 26, 2026
12d ago
πŸ†”10559523
⭐0.34

Introducing Faster Qwen3TTS! Realistic voice generation at 4x real time: - Same amazing voice quality from Qwen's model - Streaming support with <200 ms to first audio - 5x faster than the official implementation Just pip install faster-qwen3-tts Try the demo! https://t.co/Dcf9jNXz8g

❀️982
likes
πŸ”119
retweets
I
ihtesham2005
@ihtesham2005
πŸ“…
Feb 25, 2026
13d ago
πŸ†”73314975

🚨 Anthropic just open-sourced the exact Skills library their own engineers use internally. Stop building Claude workflows from scratch. These are plug-and-play components that work across Claude Code, API, SDK, and VS Code copy once, deploy everywhere. What's inside: β†’ Excel + PowerPoint generation out of the box β†’ File handling and document workflows β†’ MCP-ready subagent building blocks β†’ Pre-built patterns for multi-step automation β†’ Production templates you'd normally spend weeks writing The old way: re-explain your workflow every single chat. The new way: build a Skill once, Claude never forgets how you work. 100% Open Source. Official Anthropic release. Repo: https://t.co/XNx3i4yNy6

Media 1Media 2
πŸ–ΌοΈ Media
B
bcherny
@bcherny
πŸ“…
Feb 28, 2026
11d ago
πŸ†”34544489

In the next version of Claude Code.. We're introducing two new Skills: /simplify and /batch. I have been using both daily, and am excited to share them with everyone. Combined, these kills automate much of the work it used to take to (1) shepherd a pull request to production and (2) perform straightforward, parallelizable code migrations.

Media 1
πŸ–ΌοΈ Media
πŸ”ai_fast_track retweeted
B
Boris Cherny
@bcherny
πŸ“…
Feb 28, 2026
11d ago
πŸ†”34544489
⭐0.36

In the next version of Claude Code.. We're introducing two new Skills: /simplify and /batch. I have been using both daily, and am excited to share them with everyone. Combined, these kills automate much of the work it used to take to (1) shepherd a pull request to production and (2) perform straightforward, parallelizable code migrations.

❀️11,742
likes
πŸ”762
retweets
D
dair_ai
@dair_ai
πŸ“…
Feb 25, 2026
13d ago
πŸ†”15123594

New research from Georgia Tech and Microsoft Research. GUI agents today are reactive. Every step costs an LLM call, which is why a lot of GUI agents are expensive, slow, and fragile. This new research introduces ActionEngine, a framework that shifts GUI agents from reactive execution to programmatic planning. A Crawling Agent explores the application offline and builds a state-machine graph of the interface. Nodes are page states, edges are actions. Then at runtime, an Execution Agent uses this graph to synthesize a complete Python program in a single LLM call. Instead of O(N) vision model calls per task, you get O(1) planning cost. On Reddit tasks from WebArena, ActionEngine achieves 95% task success with, on average, a single LLM call, compared to 66% for the strongest vision-only baseline. Cost drops by 11.8x. Latency drops by 2x. If the pre-planned script fails at runtime, a vision-based fallback repairs the action and updates the memory graph for future runs. Why does it matter? Treating GUI interaction as graph traversal rather than step-by-step probabilistic reasoning is a compelling direction for making agents both faster and more reliable. Paper: https://t.co/UR0PjvFf0c Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1Media 2
πŸ–ΌοΈ Media
D
dair_ai
@dair_ai
πŸ“…
Feb 26, 2026
12d ago
πŸ†”41341227

How can graphs improve coding agents? Multi-agent systems can boost code generation, but fixed interaction topologies don't adapt to task difficulty. This research introduces AgentConductor, a system where an orchestrator agent uses RL to dynamically generate task-adapted interaction topologies based on inferred agent roles and difficulty levels. A topological density function that captures communication-aware characterizations of multi-agent interactions, plus difficulty interval partitioning that prevents excessive pruning and provides precise topology control. Across five code datasets, AgentConductor achieves up to 14.6% improvement in pass@1 accuracy while reducing density by 13% and token costs by 68%. The great benefit of this approach is better performance with lower costs. Dynamic agent coordination is more efficient than static workflows for complex code generation. Paper: https://t.co/BypJZfU49q Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1Media 2
πŸ–ΌοΈ Media
E
emollick
@emollick
πŸ“…
Mar 01, 2026
9d ago
πŸ†”98956505
⭐0.40

The economics of model training are such that the labs need to release their big models widely, they cannot generate returns from always holding back their best models so one customer uses them. Fine tuning & specialized SLMs are useful, but they don't expand the ability frontier

G
gerardsans
@gerardsans
πŸ“…
Mar 09, 2026
1d ago
πŸ†”32753473

@FrankieIsLost This diagram by @trychroma shows how accuracy crashes past ~5K tokens, dropping below 50/50. Let that sink in: you might need ~50 attempts to get the same result (if it exists). If not, you could be heading toward 100 tries with zero chance of success. https://t.co/qG2vWoAQBo https://t.co/OuSMrnUL3q

Media 1Media 2
πŸ–ΌοΈ Media
G
gerardsans
@gerardsans
πŸ“…
Mar 09, 2026
1d ago
πŸ†”05254501
⭐0.40

@jdegoes LLMs are basically zip files for patterns in data. There will always be code snippets outside what they’ve compressed. People imagine full coverage of programming languages, but that’s not realistic. Coverage is patchy and drops fast once you leave popular languages or common paths. The solution? RAG. But even then it may fail if the ecosystem, docs, or training signal just isn’t there. #UATFallacy #CoverageGap #KeyholePrinciple

T
TencentAI_News
@TencentAI_News
πŸ“…
Mar 09, 2026
1d ago
πŸ†”81229160

Introducing WorkBuddy, Tencent's AI native desktop agent for multi-type tasks. Handle non-technical tasks effortlessly using built-in skill templates for coding, documentation, research, data analysis, and automation. No projects setup required. One minutes to connect with IM like Wecom (WeChat for Work). Plan. Execute. Review. Deliver.

Media 1
πŸ–ΌοΈ Media
O
openclaw
@openclaw
πŸ“…
Mar 09, 2026
1d ago
πŸ†”24471045

OpenClaw 2026.3.8 🦞 πŸ”’ ACP provenance β€” your agent finally knows who's talking to it πŸ’Ύ openclaw backup β€” because YOLO deploys need a safety net πŸ“± Telegram dupes killed πŸ›‘οΈ 12+ security fixes We fixed more things than we broke. Progress. https://t.co/ahq26lABw3

Media 1
πŸ–ΌοΈ Media
πŸ”Scobleizer retweeted
O
OpenClaw🦞
@openclaw
πŸ“…
Mar 09, 2026
1d ago
πŸ†”24471045
⭐0.32

OpenClaw 2026.3.8 🦞 πŸ”’ ACP provenance β€” your agent finally knows who's talking to it πŸ’Ύ openclaw backup β€” because YOLO deploys need a safety net πŸ“± Telegram dupes killed πŸ›‘οΈ 12+ security fixes We fixed more things than we broke. Progress. https://t.co/ahq26lABw3

❀️305
likes
πŸ”30
retweets
F
ForBo7_
@ForBo7_
πŸ“…
Mar 09, 2026
1d ago
πŸ†”13405393

Created close reading notebooks for almost every lesson of @jeremyphoward's fastai deep learning course (it's more than a course) Close reading is a technique for reading out of text, not into. Use a LLM, and you're in flow state for longer–you ask right there, with all context. https://t.co/Wr1sWs40Tl

Media 1
πŸ–ΌοΈ Media
N
nikunjhanda
@nikunjhanda
πŸ“…
Mar 09, 2026
1d ago
πŸ†”05524301

In the API, use image detail: original to unlock our biggest vision and CUA gains! https://t.co/WE1cRKzHtN

Media 1
πŸ–ΌοΈ Media
E
emollick
@emollick
πŸ“…
Mar 09, 2026
1d ago
πŸ†”53969530
⭐0.36

Still no Claude Cowork competitor from any other lab yet. On one hand, its been six weeks. On the other, its been six weeks for companies that say that all their code is being written for them by AI.

I
ivanleomk
@ivanleomk
πŸ“…
Mar 09, 2026
1d ago
πŸ†”93446875
⭐0.40

Building deep research agents is pretty fun with @pydantic AI, doing a workshop this weekend with @hugobowne :) https://t.co/y4qVNSjOC4