Your curated collection of saved posts and media
How to effectively create, evaluate and evolve skills for AI agents? Without systematic skill accumulation, agents constantly reinvent the wheel. SkillNet introduces an open infrastructure for creating, evaluating, and organizing AI skills at scale. It structures over 200,000 skills within a unified ontology, supporting rich relational connections like similarity, composition, and dependency, and performs multi-dimensional evaluation. SkillNet improves average rewards by 40% and reduces execution steps by 30% across ALFWorld, WebShop, and ScienceWorld benchmarks. The key takeaway is treating skills as evolving, composable assets rather than transient solutions. Paper: https://t.co/Xv3uGLnPH2 Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

Anthropic themselves found that vibecoding hinders SWEs ability to read, write, debug, and understand code. not only that, but AI generated code doesnβt result in a statistically significant increase in speed donβt let your managers scare you into increased productivity. show them this paper straight from Anthropic.
Planning for Long-Horizon Web Tasks Really solid work on making web agents better at complex, long-horizon tasks. STRUCTUREDAGENT introduces a hierarchical planning framework using dynamic AND/OR trees for efficient search and a structured memory module for tracking candidate solutions across browsing steps. It produces interpretable hierarchical plans that make debugging and human intervention easier. Current web agents struggle with multi-step tasks because they act greedily and lose track of alternatives. STRUCTUREDAGENT achieves 46.7% on complex shopping tasks, outperforming all baselines, by giving agents the ability to backtrack, revise, and maintain structured state. Paper: https://t.co/3UOqz5TvYW Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

i've also renamed the open-excel repo into office-agents. the SDK, which contains the agent loop, IndexedDB storage logic, etc is published to NPM. so you can build your own plugins. fwiw, powerpoint is only ~2.5k LoC excluding the system prompt and the officejs .d.ts file https://t.co/ZEvp2xE21k
Claude Code deleted developers' production setup, including its database and snapshots. 2.5 years of records were nuked in an instant. https://t.co/0v70ChNEVL

A listener has created this detailed vocabulary and set of linked references for anyone interested in diving deeper: https://t.co/oM2kkUttLS
tesla's decision to point blank refuse to touch lidar has proven to be one of the most insane self owns of any technology company ever. they easily have the research talent, and waymo has proved they could be doing millions of fully autonomous rides. at this point it's a choice
I asked Claude to explain me the Physics behind all the points it stated in the definition below. The explanation was amazing. 1. Embodiment: Thermodynamics, Electromagnetism, Newtonian Mechanics, Statistical Mechanics Every sensorimotor interaction is an energy exchange. Vision is photon detection, touch is mechanical stress transduction, hearing is longitudinal pressure wave detection β all converted into electrical signals via electromagnetism. Movement is governed by F=ma, with proprioception measuring real-time angular momentum, joint torque, and gravitational orientation. The environment is a high-dimensional probability distribution of physical states β embodied intelligence must sample, predict, and act within this distribution via statistical mechanics. π§΅
A difference in company philosophy: @neuralink: put wires on brain. @CorticalLabs: grow brain on wires. Cortical Labs just completely changed my dreams and nightmares. Here it is in Hon Wengβs hotel room. He is showing this off tomorrow at a brain conference in San Francisco. You get a sneak peak tonight.
Thanks, AK @_akhaliq !!! We release the Gradio Demo and Code here: Code: https://t.co/F5K6iWzN7m Demo: https://t.co/z5LoWYkWOL

@justic_hot yeah exactly nano* repos like this / microgpt etc, maybe a few skills on top are the "course". Teacher input is the unique sliver of contribution that the AI can't make yet (but usually already easily understands when given). For the rest of it just ask your favorite AI.
@sriramk @steipete Interesting! Is (1) the Mini using a model on the Spark via API call or (2) are you running two separate agents? If (1) why not running openclaw on the Spark directly?
Copilot CLI is a tool for momentum, not a replacement for judgment. Check out our full guide on the CLI workflow, and try the new GitHub Skills exercise to practice in a safe sandbox. π₯½ https://t.co/nSGCJYH1c6
@AnishA_Moonka Reality check. Not vibes. Not Silicon Valley gaslighting. Just facts like bricks. AI is just software. The "agency" and "autonomy" narrative isn't science, it's Silicon Valley marketing that plays well in pitch decks. Anthropic spent years selling "reasoning" and "agents" to investors. Now the Pentagon wants Claude for "all lawful purposes" and suddenly they discover it lacks judgment for autonomous military use? They built the myth. They're trapped in it. One thing is seducing VCs. Another is lying to policymakers who actually believe you. The current standoff exposes the grift: when real governance conflicts arise, everyone reverts to the underlying reality, this is pattern-matching software, not entities with will. Ask yourself why you trust what AI labs say about their own technology in the first place. Healthy skepticism isn't anti-innovation. It's pro-accountability. https://t.co/Ut4hpvTU3C
@JasonBotterill You are on the wrong side here. Some reading to get up to date on pre-training as the effective boundary for RL. In a nutshell: You canβt infer over what you didnβt sample. https://t.co/yETkG6Xhq8
@birdabo lolβ¦ Anthropic as a AI lab has been behind OpenAI and Gemini for months. Now is a software company? Donβt even have native multimodal systems yet with images or video support. All Chinese models even open source have it. What a joke.
I gave ChatGPT for Excel and Claude for Excel a try on a very hard Excel file: macro-economic data from 1,000 years of English history across over a hundred tabs. I think both did a good job, and I did not spot errors (though I only did spot checks). However, Claude was harder to check because ChatGPT tended to stick within the Excel app, building formulas and manipulating the data in the way a person would. On the other hand, Claude used Python and often pasted material into Excel for display purposes only, making it harder to trace or edit. If that holds, I think it will generally make ChatGPT more useful for serious users if you want to audit the results. Prompt: "help me understand the relationship between the mix of agricultural products in the UK, GDP, and population, along with hours worked. I want this over the total period, and you should illustrate interesting trends with graphs and statistical analysis

Some suggestions here that telling Claude to only use formulas might solve the problem. I find that it helps, but that it still has a tendency to use Python for part of the work (like combining columns together and then pasting the data into a new sheet), breaking the references.
We just published our 1H 2026 roadmap (https://t.co/qRKP2wg7RN) and an accompanying blog (https://t.co/fjVDnvk37c) for enabling the IBM's Spyre accelerator in PyTorch β ecosystem-first, building on torch.inductor, vLLM, and contributing back (Dataflow accelerator's Tile IR, OpenReg, out-of-tree CI). While the market debates whether AI disrupts legacy tech, we're busy building the accelerator infrastructure that enterprise AI runs on. We're sharing this journey in the open. Come see our talks on extending torch.inductor for dataflow accelerators and Spyre's vLLM integration at the inaugural PyTorch Conference Europe in Paris, April 7β8! @PyTorch @IBMResearch @IBM @RedHat_AI
I wrote this 2 years ago as a joke but it is no longer a joke: βForget Torch, Tensorflow, and Theano. I decided to implement Backprop NEAT in Javascript, because it is considered the best language for Deep Learning.β https://t.co/eGNEpBWm6e https://t.co/JD27jievYB

We partnered with Mozilla to test Claude's ability to find security vulnerabilities in Firefox. Opus 4.6 found 22 vulnerabilities in just two weeks. Of these, 14 were high-severity, representing a fifth of all high-severity bugs Mozilla remediated in 2025. https://t.co/It1uq5ATn9
On January 5, employees at Cursor returned from the holiday weekend to an all-hands meeting with a slide deck titled βWar Time.β After becoming the hottest, fastest growing AI coding company, Cursor is confronting a new reality: developers may no longer need a code editor at all. Check out the full story: https://t.co/5ofNvjOW2u (πΈ: Β Kimberly White via Getty Images for Fortune Media)
robotics startups are so fun lmao just went around scanning our office then spent a stupid amount buying 64 parts for our rigs and now running 3D reconstructions of our sf and toronto offices like where is the work π https://t.co/hsFRLCpDsL
robotics startups are so fun lmao just went around scanning our office then spent a stupid amount buying 64 parts for our rigs and now running 3D reconstructions of our sf and toronto offices like where is the work π https://t.co/hsFRLCpDsL