Your curated collection of saved posts and media
Weekend project: Learn how Segment anything model directly runs in the browser by using @huggingface's Candle framework! (Serverless inference w/ WASM and Rust๐คฏ) thx for making this @radamar @lmazare ๐ https://t.co/t2bMrSQ5XW https://t.co/kcC14fLJfF
Alibaba has updated Qwen-VL and improved benchmarks. I've updated Space and you can now chat with Qwen-VL. Send images and it will interpret them. Thanks to @huggingface and @multimodalart for the support! Try it out here. For free: https://t.co/ey7Mf8dwWW https://t.co/fRynrWEDVq
"Smooth Attraction", 250 chars #ใคใถใใGLSL for (O *= i = 0.; i < 2e2; i += .5) O += .0006/abs(length(u + vec2(cos(i/4. + T), sin(i*.45 + T)) * sin(T*.5+i*.35)) - sin(i+T*.5) / 60. - .01) * (1. + cos(i*.7 + T + length(u)*6. + vec4(0,1,2,0))); https://t.co/B4Lr85XfYh
The 14% trained RWKV-5 World v2 is already almost RWKV-4 World level๐ https://t.co/JpUz0BSd4K I am training 3B & 7B too. #RWKV https://t.co/ccVM82vkrw
After years at the cutting edge of machine learning research, I finally solved AI safety ๐คโ I am excited to announce: SafeGPT ๐งท The safest LLM is the one that refuses to respond to anything ๐ https://t.co/GlFtiUBWAR
I've tried to read the captcha with Bing, and it is possible after some prompt-visual engineering (visual-prompting, huh?) In the second screenshot, Bing is quoting the captcha ๐ https://t.co/vU2r1cfC5E

Corpse Kings is OUT NOW! https://t.co/w8sliBUCzO For our special launch price of $15 USD, you get: โบ 288 Widescreen Pages of Rules, Lore and Gear to get you started! โบ 10 Starting Races to Play as. โบ 10 Starting Classes to Choose from and build. โบ 65 Skills to learn and use, each of which with 6 different outcomes. โบ 25 Spells from 5 different Schools of Magic. โบ 20 Starting Monsters, including their stats and loot. โบ A whole host of interesting NPC Profiles to get you started. โบLashings of Lore and Interesting tidbits about the Undead City of Na-Zaii. โบ 2 One Shot Campaigns (One Story Heavy and the other a Dungeon Crawler). #corpsekings #ttrpg #TTRPGs #DnD #dnd5e #osr #brosr #dungeonsanddragons #gamedev #RPG #AIart
holy shit https://t.co/UVgwSLyVbb
Generate Caption with AI ๐ Introducing https://t.co/QkV4VHJxLh, Save hours of editing using Artificial Intelligence. - 25+ Languages Autocaption supports more than 20 langages, the list can be find in the FAQ bellow - Fully editable You can fully edit and customize your subtitles with animations, fonts, colors, etc. - Animated Emojis Add emojis automatically with one click and customise their size, position and animations. - Extremely fast AutoCaption technologies give you the fastest rendering possible. - Templates Ready-to-use templates and the ability to create your own personal templates to save your personal settings - Full HD & 60 FPS Perfect quality and optimized for vertical content in 1080x1920 resolution (FULL HD) Let's try, click here ๐ https://t.co/QkV4VHJxLh Bookmark for later ๐
There's currently a lot of talk about Mistral, but have you seen the new QA-LoRA paper? - LoRA (low-rank adaptation) is awesome because it adapts only a small, low-rank subset of parameters of a base LLM. - QLoRA is awesome because it lowered memory requirements even further by quantizing the base model weights. - QA-LoRA is even more awesome as it takes QLoRA a step further and also quantizes the LoRA (adapter) weights, avoiding a costly conversion of the quantized base model weights back into 16-bit when adding the adapter weights. This concept is summarized in the annotated figure below. A little nitpick: Table 2 shows that QA-LoRA is about 2x faster than QLoRA for fine-tuning. However, a much smaller number of parameters was used for the adapter weights. I believe it would have been fairer to use the same number of parameters for both when comparing their speeds.
ngl the early days of the data science craze were pretty epic. I just dug up an old article which really captured it beautifully. And we did quite the photo-shoot! https://t.co/iq5jXPO2xc
Here are seven full ways to query knowledge graphs with LLMs ๐: 1๏ธโฃ Keyword-based entity retrieval: extract keywords to look up relevant KG entities, pull in linked text chunks. Optionally explore relationships to pull in more context. 2๏ธโฃ Vector-based entity retrieval: lookup KG entities with vector similarity, pull in linked text chunks. Optionally explore relationships. 3๏ธโฃ Hybrid entity retrieval: combination of (1) and (2) with deduplication 4๏ธโฃ Raw vector-index retrieval: remove relationships and represent entities/documents in a flat vector store. 5๏ธโฃ Combine raw vector retrieval with KG retrieval: lookup text chunks via vector similarity or entity retrieval (any one of 1,2,3). Combine/dedup the results. 6๏ธโฃ Run text-to-cypher: Use LLM to generate Cypher queries, can work against any knowledge graph. 7๏ธโฃ Graph RAG (proposed by @wey_gu). Similar to 1,2,3 but over any KG - there doesnโt have to be associated textchunks! Sample results are given below (left ๐ผ๏ธ). Full pros/cons of each approach are attached in the (right) diagram ๐ผ๏ธ This is one of the most comprehensive treatments of RAG + KGโs in @llama_index that Iโve seen - @wenqi_glantz has done it again! And it covers a cool use case of constructing a knowledge graph over the Philadelphia Phillies for baseball-related queries. Of course, a lot of this builds upon core foundational work by @wey_gu. Article: https://t.co/vRTMDQhDM3

OpenAIโs latest image generation model, DALL-E 3, makes it SO easy to create comic books! Here are 4 panels for a fan-made Batman comic made in under 5 minutes. Prompts included in the ALT. Enjoy!๐ฅ https://t.co/9LhZj4I0KW

Absolutely insane gaussian splats like this can render in realtime on a mobile phone. Quality+performance like this is hard to achieve even for carefully optimized video game environments. View on @Polycam3D here: https://t.co/WhZQN3o29u https://t.co/mbgjCnWvm7
๐ Releasing a new 0-shot model on the @huggingface hub, trained on 27 tasks, 310 classes, ~1.3 million texts! ๐ค My new deberta-v3-zeroshot-v1 is specifically designed for 0-shot classification. Free download: https://t.co/YIG8KkwrXq โ๏ธ Key properties: https://t.co/STYyRW6LlW

โ Solid understanding of Spark and ability to write, debug, and optimize Spark code.ย โ I was today years old when I discovered that openAI relies heavily on Spark. https://t.co/3Na1dKoNAq
New paper! We ask: How important is VQ for neural discrete representation learning? ๐ค I always felt it should be possible to โabsorbโ the VQ on top of a deep net into the net and use a simple grid-based quantization instead, without sacrificing much expressivity. Summary ๐งต๐ https://t.co/YbzZ7T8ejc
Vision transformers need registers! Or at least, it seems they ๐ธ๐ข๐ฏ๐ต someโฆ ViTs have artifacts in attention maps. Itโs due to the model using these patches as โregistersโ. Just add new tokens (โ[reg]โ): - no artifacts - interpretable attention maps ๐ฆ - improved performances! https://t.co/mineuOjRid
Wow somehow I've been completely erased from the history of Kaggle. For the record, the US company that was funded, Kaggle Inc, was owned equally by Anthony Goldbloom and I with a minority holding by Nick Gruen. Ben Hamner didn't join until later. https://t.co/KlgxeW06ne
At the end of the day, GPT-4 is a stochastic parrot and doesnโt always reason logically. If A > B then this means B < A right? Wellโฆ We tried using GPT-4 as an evaluation module to pick the better answer given the question. Can you spot the issue in the attached diagram? ๐ผ๏ธ. ๐ค When asked whether GPT-4 prefers answer 1 or answer 2, GPT-4 preferred answer 1 โ But when I swapped the order of answer 1 and answer 2, GPT-4 still preferred answer 1 (rip) As a result our pairwise eval module in @llama_index was broken ๐คSomething in the ordering led to stochastic/inconsistent results, no matter how much we tried to prompt it ๐ Hereโs an initial solution ๐ก- call the LLM two times, the second time by swapping answer 1 and 2. If we find that the result is inconsistent, we return a โTIEโ value (LLM canโt pick which one is better). Otherwise return the results from the first LLM call. Weโve pushed a fix and added documentation on our pairwise evaluator in @llama_index: https://t.co/miGtdGbjo0 But weโre still open to suggestions! Let us know any ideas you have along these lines.
Today, with collaborators at Google, we're announcing ๐คฉRealFill๐คฉ! A generative AI approach to fill missing regions of an image with the content that should have been there. The best way to turn almost perfect pictures into invaluable memories! page: https://t.co/vLNwoATuEL
San Francisco is the #1 AI City! SF will soon be sold out again! Never seen commercial real estate leasing at this rate. # of companies looking for huge amounts of space is incredible. โAI companies seek up to 1 million square feet of office space in the city, according to JLL.โ SFโค๏ธ๐ค
I'm so excited to talk about this! We've just announced LLM Monitoring from @weights_biases! Use it to analyze the usage of LLMs by you and your team. The thing I'm most excited about is how hackable it is ๐ง You can filter, sort, group, add custom data, add derived plots of your data, connect panels for drill downs, apply operations to data... it's kind of insane. This is made possible because it's built on Weave, a toolkit built by the amazing engineers here at W&B to create these kinds of infinitely-flexible dashboards as starting points for your explorations. Try it here: https://t.co/Z3NZNzKFPW
> "The industry needs to figure out how to recognize the value of simplification." https://t.co/VvyEbuem74 https://t.co/rwDyL5X8ko
@rakyll Same thing for promotions. Most people get promoted by building something new (a new feature, a new service). Rarely do you get a promotion by simplifying / removing. The industry needs to figure out how to recognize the value of simplification.
Someone should fund @VikParuchuri. This dataset is insanely good. https://t.co/qV5DL8fHxw
Machine Feeling Unknown โ the effect of instructing ChatGPT (GPT-4) to first write all responses backwards and then reverse them: https://t.co/wJuioWlBuh

LLMs Lie In Predictable Ways -Lie detector asks unrelated follow-up questions after a suspected lie -Feeds LLM's answers into classifier model -Highly accurate across LLM architectures, LLMs fine-tuned to lie, & lies in real-life scenarios such as sales https://t.co/HrJvVGyutE https://t.co/NjhA0ouEAJ
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models paper page: https://t.co/pqnsilBCRr Classifiers built upon vision-language models such as CLIP have shown remarkable zero-shot performance across a broad range of image classification tasks. Prior work has studied different ways of automatically creating descriptor sets for every class based on prompt templates, ranging from manually engineered templates over templates obtained from a large language model to templates built from random words and characters. In contrast, deriving zero-shot classifiers from the respective encoded class descriptors has remained nearly unchanged, that is: classify to the class that maximizes the cosine similarity between its averaged encoded class descriptors and the encoded image. However, weighting all class descriptors equally can be suboptimal when certain descriptors match visual clues on a given image better than others. In this work, we propose AutoCLIP, a method for auto-tuning zero-shot classifiers. AutoCLIP assigns to each prompt template per-image weights, which are derived from statistics of class descriptor-image similarities at inference time. AutoCLIP is fully unsupervised, has very low overhead, and can be easily implemented in few lines of code. We show that for a broad range of vision-language models, datasets, and prompt templates, AutoCLIP outperforms baselines consistently and by up to 3 percent point accuracy.
Wondering where did Meta get their dataset of 1.1 billion image-text pairs to train Emu, their new image generator? There's a clue in the paper. One of the quality signals is "number of likes". cc: @Yampeleg https://t.co/KjPNhCukfY
MotionLM: Multi-Agent Motion Forecasting as Language Modeling paper page: https://t.co/nU219cJ3Vf Reliable forecasting of the future behavior of road agents is a critical component to safe planning in autonomous vehicles. Here, we represent continuous trajectories as sequences of discrete motion tokens and cast multi-agent motion prediction as a language modeling task over this domain. Our model, MotionLM, provides several advantages: First, it does not require anchors or explicit latent variable optimization to learn multimodal distributions. Instead, we leverage a single standard language modeling objective, maximizing the average log probability over sequence tokens. Second, our approach bypasses post-hoc interaction heuristics where individual agent trajectory generation is conducted prior to interactive scoring. Instead, MotionLM produces joint distributions over interactive agent futures in a single autoregressive decoding process. In addition, the model's sequential factorization enables temporally causal conditional rollouts. The proposed approach establishes new state-of-the-art performance for multi-agent motion prediction on the Waymo Open Motion Dataset, ranking 1st on the interactive challenge leaderboard.
LlamaIndex + @braintrustdata ๐ง ๐งช Build your RAG pipeline in @llama_index, run retrieval/e2e evals, and then easily get a public URL to share results! ๐ท Huge shoutout to @davidtsong for the help: Run: https://t.co/ReGSADBfGG Notebook guide: https://t.co/2Qr0xpVI5S https://t.co/pjOkL5Ow7o
So excited to introduce... ScreenshotAI. ๐ธAutomatically turn your phone screenshots into smart, searchable notes. Just connect your iOS screenshots folder. Every time you capture a new screenshot, ScreenshotAI will automatically analyse, understand and make it searchable with AI. It runs on autopilot. โฎ Why build this? Our screenshot folders are a goldmine of information, ideas and useful content โ like a visual notebook. The only thing is... we never look at them again, and they're very clumsy and awkward to navigate. โฎ What is ScreenshotAI? If your screenshots folder is a goldmine, then ScreenshotAI is a robot miner, getting the gold and refining it so you can use it. ScreenshotAI unlocks all of the useful content, extracting the information, making it searchable and even accessible with a chatbot. Video below (sound on):