Your curated collection of saved posts and media

Showing 24 posts Β· last 30 days Β· by score
B
BenjaminDEKR
@BenjaminDEKR
πŸ“…
Mar 06, 2026
3d ago
πŸ†”61675717

Talking to a voice AI LLM over ham radio (on UHF 420.69 megahertz, of course!) (Note: cool experiment, but be careful: FCC regs require a licensed control operator to be present at the control point the entire time the LLM is operating.) https://t.co/S2WcCrkp83

πŸ–ΌοΈ Media
T
theworldlabs
@theworldlabs
πŸ“…
Mar 05, 2026
4d ago
πŸ†”16216287

70 hackers joined us in SF for the first-ever World Labs Hackathon. In just 3.5 hours, 32 teams used Marble for projects ranging from robotics sims and agents to AR/VR interfaces, games, art experiences, and real estate tools. Check out what they built ↓ https://t.co/cX0bAlvhh1

Media 1Media 2
+1 more
πŸ–ΌοΈ Media
D
dair_ai
@dair_ai
πŸ“…
Mar 06, 2026
3d ago
πŸ†”41785046

New research on evaluating coding agents via continuous integration. Coding agents are moving beyond isolated bug fixes. If they're going to own CI pipelines, we need benchmarks that reflect the actual complexity of codebase maintenance. Most coding agent benchmarks today test whether an agent can fix a single issue. But real software engineering involves maintaining entire codebases over time. SWE-CI evaluates agent capabilities through continuous integration workflows: running test suites, catching regressions, and maintaining code quality across multiple changes. Paper: https://t.co/p8bOTJ9QPX Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1Media 2
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Mar 06, 2026
3d ago
πŸ†”29386760

PDFs are the bane of every AI agent's existence: here's why parsing them is so much harder than you think πŸ“„ Every developer building document agents eventually hits the same wall: PDFs weren't designed to be machine-readable. They're drawing instructions from 1982, not structured data. πŸ“ PDF text isn't stored as characters: it's glyph shapes positioned at coordinates with no semantic meaning πŸ“Š Tables don't exist as objects: they're just lines and text that happen to look tabular when rendered πŸ”„ Reading order is pure guesswork β€” content streams have zero relationship to visual flow πŸ€– Seventy years of OCR evolution led us to combine text extraction with vision models for optimal results We built LlamaParse using this hybrid approach: fast text extraction for standard content, vision models for complex layouts. It's how we're solving document processing at scale. Read the full breakdown of why PDFs are so challenging and how we're tackling it: https://t.co/K8bQmgq7xN

Media 1Media 2
πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Mar 06, 2026
3d ago
πŸ†”16127763

Parsing PDFs is insanely hard This is completely unintuitive at first glance, considering PDFs are the most commonly used container of unstructured data in the world. I wrote a blog post digging into the PDF representation itself, why its impossible to β€œsimply” read the page into plaintext, and what the modern parsing techniques are πŸ‘‡ The crux of the issue is that PDFs are designed to display text on a screen, and not to represent what a word means. 1️⃣ PDF text is represented as glyph shapes positioned at absolute x,y coordinates. Sometimes there’s no mapping from character codes back to a unicode representation 2️⃣ Most PDFs have no concept of a table. Tables are described as grid lines drawn with coordinates. Traditional parser would have to find intersections between lines to infer cell boundaries and associate with text within cells through algorithms 3️⃣ The order of operators has no relationship with reading order. You would need clustering techniques to be able to piece together text into a coherent logical format. That’s why everyone today is excited about using VLMs to parse text. Which to be clear has a ton of benefits, but still limitations in terms of accuracy and cost. At @llama_index we’re building hybrid pipelines that interleave both text and VLMs to give both extremely accurate parsing at the cheapest price points. Blog: https://t.co/iLJpIr7cbH LlamaParse: https://t.co/TqP6OT5U5O

Media 1Media 2
πŸ–ΌοΈ Media
πŸ”llama_index retweeted
J
Jerry Liu
@jerryjliu0
πŸ“…
Mar 06, 2026
3d ago
πŸ†”16127763
⭐0.32

Parsing PDFs is insanely hard This is completely unintuitive at first glance, considering PDFs are the most commonly used container of unstructured data in the world. I wrote a blog post digging into the PDF representation itself, why its impossible to β€œsimply” read the page into plaintext, and what the modern parsing techniques are πŸ‘‡ The crux of the issue is that PDFs are designed to display text on a screen, and not to represent what a word means. 1️⃣ PDF text is represented as glyph shapes positioned at absolute x,y coordinates. Sometimes there’s no mapping from character codes back to a unicode representation 2️⃣ Most PDFs have no concept of a table. Tables are described as grid lines drawn with coordinates. Traditional parser would have to find intersections between lines to infer cell boundaries and associate with text within cells through algorithms 3️⃣ The order of operators has no relationship with reading order. You would need clustering techniques to be able to piece together text into a coherent logical format. That’s why everyone today is excited about using VLMs to parse text. Which to be clear has a ton of benefits, but still limitations in terms of accuracy and cost. At @llama_index we’re building hybrid pipelines that interleave both text and VLMs to give both extremely accurate parsing at the cheapest price points. Blog: https://t.co/iLJpIr7cbH LlamaParse: https://t.co/TqP6OT5U5O

❀️648
likes
πŸ”51
retweets
A
acossta
@acossta
πŸ“…
Mar 06, 2026
3d ago
πŸ†”82306671

Something we've been thinking about: planning in the age of capable coding agents. Agents can now build entire requirements end-to-end. They code longer, handle more complexity, and break work down on their own. Granular task breakdown? That's the agent's job now. Requirements are what matter. We shipped a new Build experience in @BrainGridAI that reflects this. No more breaking down tasks upfront. Specify your requirement, pick your agent or paste one command. The agent creates tasks as it works β€” so you have a record and can resume any session without losing progress.

Media 1
πŸ–ΌοΈ Media
A
acossta
@acossta
πŸ“…
Mar 06, 2026
3d ago
πŸ†”21281489
⭐0.36

Read the full write-up: https://t.co/yeMb7gtnai

πŸ”ylecun retweeted
_
AK
@_akhaliq
πŸ“…
Mar 04, 2026
5d ago
πŸ†”50449052

Beyond Language Modeling An Exploration of Multimodal Pretraining paper: https://t.co/GmtPAQDo8T

Media 1
❀️66
likes
πŸ”12
retweets
πŸ–ΌοΈ Media
A
askalphaxiv
@askalphaxiv
πŸ“…
Mar 05, 2026
4d ago
πŸ†”91535314

Yann LeCun 🀝 Saining Xie insane crossover of the 2 biggest visual representation researchers in the AI field β€œBeyond Language Modeling: An Exploration of Multimodal Pretraining” Right now, most multimodal models are basically a language model with a vision adapter bolted on, so they can describe images, but they don’t really think in images or video. This paper shows what happens when you do it the hard way: train one model from scratch on text, images, and video with a unified setup. They key idea is if you give the model a good visual internal format and it can use vision for both understanding and generating. Additionally, multimodal data can improve language instead of distracting it, and mixture-of-experts lets you scale vision’s huge data intake without bloating everything else. This paves the way towards changing the vision paradigm from β€œcaptioning add-on” model to native multimodal foundation model.

Media 1
πŸ–ΌοΈ Media
πŸ”ylecun retweeted
A
alphaXiv
@askalphaxiv
πŸ“…
Mar 05, 2026
4d ago
πŸ†”91535314
⭐0.36

Yann LeCun 🀝 Saining Xie insane crossover of the 2 biggest visual representation researchers in the AI field β€œBeyond Language Modeling: An Exploration of Multimodal Pretraining” Right now, most multimodal models are basically a language model with a vision adapter bolted on, so they can describe images, but they don’t really think in images or video. This paper shows what happens when you do it the hard way: train one model from scratch on text, images, and video with a unified setup. They key idea is if you give the model a good visual internal format and it can use vision for both understanding and generating. Additionally, multimodal data can improve language instead of distracting it, and mixture-of-experts lets you scale vision’s huge data intake without bloating everything else. This paves the way towards changing the vision paradigm from β€œcaptioning add-on” model to native multimodal foundation model.

❀️654
likes
πŸ”99
retweets
N
nanbeige
@nanbeige
πŸ“…
Mar 05, 2026
5d ago
πŸ†”30220863

In both LeetCode's Weekly Contests (Weekly Contests 489–491) and the HMMT February 2026 (Harvard-MIT Mathematics Tournament), Nanbeige4.1-3B's performance not only significantly outperformed that of Qwen3.5-4B but also surpassed Qwen3.5-9B. https://t.co/2guwzB3yNa

Media 1
πŸ–ΌοΈ Media
T
tri_dao
@tri_dao
πŸ“…
Mar 04, 2026
5d ago
πŸ†”64118407
⭐0.32

Attack of the asynchronous machines. We’ve seen this a lot in GPU kernels. This time the same principle applies in speculative decoding

πŸ”tri_dao retweeted
W
Wonmin Byeon
@wonmin_byeon
πŸ“…
Mar 04, 2026
5d ago
πŸ†”46418709
⭐0.38

πŸš€ New paper: Mamba–Transformer hybrid VLMs can go fast without forgetting. We introduce stateful token reduction for long-video VLMs. βœ… Only 25% of visual tokens πŸš€ 3.8–4.2Γ— faster prefilling (TTFT) 🎯 Near-baseline accuracy (can exceed baseline with light finetuning) https://t.co/CJaCktyWCt

❀️209
likes
πŸ”23
retweets
T
togethercompute
@togethercompute
πŸ“…
Mar 05, 2026
4d ago
πŸ†”35702061

Together Research has produced FlashAttention, ATLAS, ThunderKittens and more. This week at AI Native Conf: seven more releases, all coming to production soon. Thread β†’ #ainativeconf #ainativecloud https://t.co/XXIXMRRiLe

Media 1
πŸ–ΌοΈ Media
P
PyTorch
@PyTorch
πŸ“…
Mar 04, 2026
5d ago
πŸ†”13580671

Recover more than 70% accuracy degradation from 4-bit quantization using TorchAO’s (https://t.co/Jr0qtnIAgZ) Quantization-Aware Training (QAT), now available through fine-tuning in Unsloth and Axolotl! Following the previous TorchAO QAT blog(https://t.co/kXAGBfOSMZ), the PyTorch team at @Meta extended the TorchAO QAT flow to support an end-to-end GPU server flow, targeting fast CUDA kernels for fast inference in @vllm_project, and integrated this flow into popular fine-tuning frameworks like Unsloth and Axolotl. Read our latest blog: https://t.co/nFx4MYHoRj #PyTorch #vLLM #OpenSourceAI #TorchAO

Media 1Media 2
πŸ–ΌοΈ Media
A
ah20im
@ah20im
πŸ“…
Mar 05, 2026
4d ago
πŸ†”48712061

Today we are introducing GPT-5.4 in codex. It's more token efficient and better at tool calling, computer use, and frontend development. We are also introducing /fast to get a faster version of Codex. Enjoy ❀️ https://t.co/uTOlQsK7hE

Media 1
πŸ–ΌοΈ Media
L
ltx_model
@ltx_model
πŸ“…
Mar 05, 2026
4d ago
πŸ†”29586860

If the engine is strong enough, you should be able to build real products on top of it. That's the whole point of LTX-2.3. Introducing LTX Desktop. A fully local, open-source video editor running directly on the LTX engine, optimized for NVIDIA GPUs and compatible hardware. https://t.co/aApm06E6RZ

πŸ–ΌοΈ Media
H
HamelHusain
@HamelHusain
πŸ“…
Mar 06, 2026
4d ago
πŸ†”25024284
⭐0.30

@pamelafox I mean I am just gonna say do evals ℒ️

O
omarsar0
@omarsar0
πŸ“…
Mar 03, 2026
6d ago
πŸ†”70973399
⭐0.34

Impressive if true. The agent harness is powered by recursive and parallel planning. Clever planning is a big deal. Everyone should be trying to build their own harness. Trust me, you really want to be exploring higher levels of orchestration for your agents right now.

O
omarsar0
@omarsar0
πŸ“…
Mar 04, 2026
5d ago
πŸ†”25659668

When you build AI agents, don't treat prompts like config strings. Treat them like executable business logic. Because that's what they really are. @arshdilbagi's blog and this Stanford CS 224G lecture lay out one of the clearest mental models I have seen for LLM evaluation. Stop treating evals like unit tests. That works for deterministic software. For LLM products, it creates false confidence because real-world usage changes over time. Example: an insurance prompt passed 20 eval cases. The team shipped. In production, a new class of requests showed up and failed quietly. No crash, no alert, just wrong answers at scale. The fix is not "write more eval cases," which is what many teams do. It is building evals as a living feedback loop. Start with a small set, ship, watch what breaks in production, add those failures back, and re-run on every prompt or model change. What eval failure caught your team off guard? Blog: https://t.co/HCVhcow5rA Stanford CS 224G lecture: https://t.co/q667gGwckt

Media 1Media 2
πŸ–ΌοΈ Media
W
Wauplin
@Wauplin
πŸ“…
Mar 05, 2026
4d ago
πŸ†”37015074
⭐0.34

huggingface_hub v1.5.0 just dropped! The highlight: Buckets. Think S3, but native to the Hub. No git history. Just fast, chunk-deduplicated object storage. hf buckets sync ./outputs hf://buckets/me/my-checkpoints And that's it. Currently in beta preview. DM me if interested!

πŸ”_akhaliq retweeted
W
Wauplin
@Wauplin
πŸ“…
Mar 05, 2026
4d ago
πŸ†”37015074
⭐0.32

huggingface_hub v1.5.0 just dropped! The highlight: Buckets. Think S3, but native to the Hub. No git history. Just fast, chunk-deduplicated object storage. hf buckets sync ./outputs hf://buckets/me/my-checkpoints And that's it. Currently in beta preview. DM me if interested!

❀️18
likes
πŸ”5
retweets
A
alvarobartt
@alvarobartt
πŸ“…
Mar 02, 2026
7d ago
πŸ†”97875845

πŸ’₯ Learn how to build your own tool-calling agent with @huggingface TRL + @Alibaba_Qwen Qwen3.5 on @Azure Machine Learning! - @NousResearch hermes-function-calling-v1, 500 single-turn samples - SFT with TRL on Qwen3.5 2B (released today!) on a single NVIDIA H100 - Everything on Azure, from Container Registry to Machine Learning! Step-by-step in the thread 🧡

Media 1
πŸ–ΌοΈ Media