Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
M
Marcel Pociot πŸ§ͺ
@marcelpociot
πŸ“…
Aug 28, 2023
983d ago
πŸ†”90498923

🀯 This is absolutely insane. You can clone any voice locally on your computer using only 3 seconds of reference audio! Entirely open-source. https://t.co/YGm6LZxWDi https://t.co/ZL4uGCYdvK

❀️3,582
likes
πŸ”575
retweets
πŸ–ΌοΈ Media
Y
younes
@younesbelkada
πŸ“…
Aug 29, 2023
982d ago
πŸ†”21302143

Did you know that flash-attention 1 was already integrated in @huggingface transformers? Let us see how to use it and when it is not possible to use it 🧡 https://t.co/TwUUtfKNeX https://t.co/Wgk1nvG0Wd

Media 1
❀️84
likes
πŸ”28
retweets
πŸ–ΌοΈ Media
G
Gradio
@Gradio
πŸ“…
Aug 29, 2023
982d ago
πŸ†”41680691

🀯Text-to-Sing @Gradio demo. πŸ”₯Results are unbelievably melodious! [attached] Upload a melody of your choice, enter your own lyrics, and have the computer sing back your lyrics in the given melody! Demo on @huggingface Spaces - https://t.co/IbS7qyvM8b https://t.co/qFKhVViFtB

❀️415
likes
πŸ”115
retweets
πŸ–ΌοΈ Media
_
AK
@_akhaliq
πŸ“…
Aug 29, 2023
982d ago
πŸ†”26681240

ORES: Open-vocabulary Responsible Visual Synthesis paper page: https://t.co/qH95Ud8OUE Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content. To address this problem, we present a Two-stage Intervention (TIN) framework. By introducing 1) rewriting with learnable instruction through a large-scale language model (LLM) and 2) synthesizing with prompt intervention on a diffusion synthesis model, it can effectively synthesize images avoiding any concepts but following the user's query as much as possible. To evaluate on ORES, we provide a publicly available dataset, baseline models, and benchmark. Experimental results demonstrate the effectiveness of our method in reducing risks of image generation. Our work highlights the potential of LLMs in responsible visual synthesis.

Media 1
❀️127
likes
πŸ”37
retweets
πŸ–ΌοΈ Media
_
AK
@_akhaliq
πŸ“…
Aug 28, 2023
983d ago
πŸ†”48616838

VALL-E X: Multilingual Text-to-Speech Synthesis and Voice Cloning πŸ”Š github: https://t.co/joqMbM1rOM web demo: https://t.co/EHEgLycJ3R https://t.co/bpMdJR8VFH

Media 1
❀️648
likes
πŸ”152
retweets
πŸ–ΌοΈ Media
_
AK
@_akhaliq
πŸ“…
Aug 28, 2023
983d ago
πŸ†”58406758

Dense Text-to-Image Generation with Attention Modulation github: https://t.co/HWAIot62Di web demo: https://t.co/ihQV6thM00 Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout. We first analyze the relationship between generated images' layouts and the pre-trained model's intermediate attention maps. Next, we develop an attention modulation method that guides objects to appear in specific regions according to layout guidance. Without requiring additional fine-tuning or datasets, we improve image generation performance given dense captions regarding both automatic and human evaluation scores. In addition, we achieve similar-quality visual results with models specifically trained with layout conditions.

Media 1
❀️325
likes
πŸ”85
retweets
πŸ–ΌοΈ Media
_
AK
@_akhaliq
πŸ“…
Aug 28, 2023
983d ago
πŸ†”05771453

Relighting Neural Radiance Fields with Shadow and Highlight Hints paper page: https://t.co/yCFqXZPLQh paper presents a novel neural implicit radiance representation for free viewpoint relighting from a small set of unstructured photographs of an object lit by a moving point light source different from the view position. We express the shape as a signed distance function modeled by a multi layer perceptron. In contrast to prior relightable implicit neural representations, we do not disentangle the different reflectance components, but model both the local and global reflectance at each point by a second multi layer perceptron that, in addition, to density features, the current position, the normal (from the signed distace function), view direction, and light position, also takes shadow and highlight hints to aid the network in modeling the corresponding high frequency light transport effects. These hints are provided as a suggestion, and we leave it up to the network to decide how to incorporate these in the final relit result. We demonstrate and validate our neural implicit representation on synthetic and real scenes exhibiting a wide variety of shapes, material properties, and global illumination light transport.

Media 1
❀️107
likes
πŸ”17
retweets
πŸ–ΌοΈ Media
_
AK
@_akhaliq
πŸ“…
Aug 28, 2023
983d ago
πŸ†”78100775

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models paper page: https://t.co/UoeEJ6xGDs Large language models (LLMs) have revolutionized natural language processing tasks. However, their practical deployment is hindered by their immense memory and computation requirements. Although recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM, they hand-craft quantization parameters, which leads to low performance and fails to deal with extremely low-bit quantization. To tackle this issue, we introduce an Omnidirectionally calibrated Quantization (OmniQuant) technique for LLMs, which achieves good performance in diverse quantization settings while maintaining the computational efficiency of PTQ by efficiently optimizing various quantization parameters. OmniQuant comprises two innovative components including Learnable Weight Clipping (LWC) and Learnable Equivalent Transformation (LET). LWC modulates the extreme values of weights by optimizing the clipping threshold. Meanwhile, LET tackles activation outliers by shifting the challenge of quantization from activations to weights through a learnable equivalent transformation. Operating within a differentiable framework using block-wise error minimization, OmniQuant can optimize the quantization process efficiently for both weight-only and weight-activation quantization. For instance, the LLaMA-2 model family with the size of 7-70B can be processed with OmniQuant on a single A100-40G GPU within 1-16 hours using 128 samples. Extensive experiments validate OmniQuant's superior performance across diverse quantization configurations such as W4A4, W6A6, W4A16, W3A16, and W2A16. Additionally, OmniQuant demonstrates effectiveness in instruction-tuned models and delivers notable improvements in inference speed and memory reduction on real devices.

Media 1
❀️140
likes
πŸ”33
retweets
πŸ–ΌοΈ Media
_
AK
@_akhaliq
πŸ“…
Aug 28, 2023
983d ago
πŸ†”02853781

Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities github: https://t.co/2dmrybryNE web demo: https://t.co/bGyJsbpfGU https://t.co/zjqdJRr0TW

Media 1
❀️151
likes
πŸ”33
retweets
πŸ–ΌοΈ Media
M
Miguel Fierro
@miguelgfierro
πŸ“…
Aug 28, 2023
983d ago
πŸ†”26901705

The rate of innovation we are seeing in LLMs is mindblowing. Here is a study comparing a fine-tuned Llama-2 with GPT4. Some takeaways: - Fine-tuned Llama-2 outperforms GPT4 in SQL and unstructured data understanding. - GPT4 outperforms fine-tuned Llama-2 in math reasoning. This result is very interesting, and it shows the potential of OSS models. However, let's not forget that we are not comparing apples to apples. Llama-2 here is a fine-tuned model, and GPT4 is a zero-shot model. A few months ago, it was unimaginable to think that a zero-shot model would outperform a fine-tuned model. Now we are making benchmarks of the opposite. We are living in exponential times. Details here: https://t.co/gqGWZa2Rd3 ____ #AI #datascience #machinelearning #LLM

Media 1
❀️59
likes
πŸ”9
retweets
πŸ–ΌοΈ Media
_
AK
@_akhaliq
πŸ“…
Aug 28, 2023
983d ago
πŸ†”52789947

SoTaNa: The Open-Source Software Development Assistant paper page: https://t.co/FanO5BZc4v Software development plays a crucial role in driving innovation and efficiency across modern societies. To meet the demands of this dynamic field, there is a growing need for an effective software development assistant. However, existing large language models represented by ChatGPT suffer from limited accessibility, including training data and model weights. Although other large open-source models like LLaMA have shown promise, they still struggle with understanding human intent. In this paper, we present SoTaNa, an open-source software development assistant. SoTaNa utilizes ChatGPT to generate high-quality instruction-based data for the domain of software engineering and employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA. We evaluate the effectiveness of in answering Stack Overflow questions and demonstrate its capabilities. Additionally, we discuss its capabilities in code summarization and generation, as well as the impact of varying the volume of generated data on model performance. Notably, SoTaNa can run on a single GPU, making it accessible to a broader range of researchers.

Media 1
❀️326
likes
πŸ”75
retweets
πŸ–ΌοΈ Media
S
Sebo
@sebo_gm
πŸ“…
Aug 19, 2023
992d ago
πŸ†”71437609

Could Voiceflow emerge as the 'WordPress' for AI agents? I believe they stand a good chance.πŸ† Over the past month, I've immersed myself in the world of AI chatbots and agents, exploring various platforms. Among them, Voiceflow stands out due to its advanced functionality, user-friendliness, and impressive enterprise customer base. While Voiceflow offers sophisticated tooling for creating end-to-end conversational AI agents, its power is evident even in its most basic chatbot functionality. This morning, I put together a short tutorial on how I built a basic chatbot for real estate agents in less than 30 minutes using Voiceflow. Let us know what you think. P.S. Not sponsored by @VoiceflowHQ. Just a huge fan.

❀️171
likes
πŸ”35
retweets
πŸ–ΌοΈ Media
T
Teortaxes
@teortaxesTex
πŸ“…
Aug 27, 2023
984d ago
πŸ†”99568624

Unpopular take: LLM community is *coping* about quantization. Any real test of reasoning shows k-quants≀Q5 fail. Ppl, evals are misleading: do you care that it's only 1% loss if it takes 99% of the hardest skills? We need kernels for AWQ, SpQR, or better – for all platforms. https://t.co/5RNOScIzgj

@teortaxesTex β€’

They are hinting at that, sure. But they're testing on OPT, as in most of those Hype-Aware Quantization papers Why? OPT's FF layers use ReLU. It sacrifices perplexity but makes activations sparse. I'm skeptical it'll work for SwiGLU in LLaMA… without retrain. (paper:MoEfication)

Media 1
❀️115
likes
πŸ”12
retweets
πŸ–ΌοΈ Media
T
TechHalla
@techhalla
πŸ“…
Aug 25, 2023
986d ago
πŸ†”99313602

πŸ’ͺ🀯 Are you ready for a new super prompt? This time we're going to create detailed statues of our favorite characters with a very special twist. What's the prompt?πŸ‘‡ πŸ“ƒ A statue of [subject], in the style of highly detailed foliage, matte drawing, museum gallery dioramas, trompe-l'Ε“il illusionistic detail, light [color] and dark gray, bold shadows, intertwining materials --ar 85:128 βš™οΈ Choose the subject and the color, keep the rest the same, and share your results! πŸ«‚πŸ–€ Remember, if you liked it, give it a like, repost, and invite your friends to participate. Let's go for it! βœ¨πŸŽ¨πŸ—½ #CreativeChallenge #ArtisticTwist #midjourney #aiartcommunity

Media 1Media 2
+2 more
❀️231
likes
πŸ”38
retweets
πŸ–ΌοΈ Media
T
Teortaxes
@teortaxesTex
πŸ“…
Aug 17, 2023
994d ago
πŸ†”59271988

They are hinting at that, sure. But they're testing on OPT, as in most of those Hype-Aware Quantization papers Why? OPT's FF layers use ReLU. It sacrifices perplexity but makes activations sparse. I'm skeptical it'll work for SwiGLU in LLaMA… without retrain. (paper:MoEfication) https://t.co/LbkvP5LW5i

Media 1Media 2
❀️24
likes
πŸ”1
retweets
πŸ–ΌοΈ Media
A
Lior⚑
@AlphaSignalAI
πŸ“…
Aug 27, 2023
984d ago
πŸ†”88873380

DevOpsGPT is about to reach 3,000 stars on Github. It's a Multi-Agent system for software development. They combined LLM with DevOps tools to convert natural language requirements into working software. DevOpsGPT allows you to: β–Έ Increase development efficiency β–Έ Reduce communication costs β–Έ Shorten development cycles β–Έ Improve quality of software delivery

Media 1
❀️512
likes
πŸ”91
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Aug 27, 2023
984d ago
πŸ†”60356871

Here's a clever new algorithm for better retrieval + better RAG, (s/o @jxnlco + ChatGPT): the β€œAutoMergingRetriever”. Retrieve smaller chunks, then recursively merge into more β€œcontinuous” blobs of context. Leads to better LLM synthesized answers: https://t.co/46CPmPjU2F https://t.co/HeCnwhKYch

Media 1
❀️186
likes
πŸ”32
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Aug 27, 2023
984d ago
πŸ†”66083156

Anti-hype LLM Reading List This is actually a really good list of papers and reading materials on LLMs. Love the curation by @vboykis. https://t.co/XYQP1FcQnC

Media 1
❀️1,611
likes
πŸ”391
retweets
πŸ–ΌοΈ Media
D
DAIR.AI
@dair_ai
πŸ“…
Aug 27, 2023
984d ago
πŸ†”24669811

Top ML Papers of the Week (August 21 - August 27): - Code Llama - Prompt2Model - Use of LLMs for Illicit Purposes - Survey on Instruction Tuning for LLMs - A Survey on LLM-based Autonomous Agents - Language to Rewards for Robotic Skill Synthesis ... https://t.co/UUYknp7P0A

Media 1
❀️305
likes
πŸ”77
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Aug 27, 2023
985d ago
πŸ†”54630149

There’s two key concepts for retrieval: semantic vs. keyword search. Hybrid search is a compromise but uses fixed parameters. What if you could have the LLM dynamically decide whether to use vector search or BM25 given a Q? 🚏 Checkout our new guide: https://t.co/YAwC7Yu577 https://t.co/kBejM6D7zI

Media 1
❀️129
likes
πŸ”22
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Aug 26, 2023
985d ago
πŸ†”64566466

We now have the most comprehensive cookbook on building LLMs with Knowledge Graphs (credits @wey_gu). βœ… Key query techniques: text2cypher, graph RAG βœ… Automated KG construction βœ… vector db RAG vs. KG RAG Check out the full 1.5 hour tutorial: https://t.co/mChA4oWzcL https://t.co/v52umkGMG9

Media 1
❀️243
likes
πŸ”58
retweets
πŸ–ΌοΈ Media
M
Marcel Pociot πŸ§ͺ
@marcelpociot
πŸ“…
Aug 23, 2023
988d ago
πŸ†”92297017

🀯 You can now easily face-swap any video, all locally on your Macbook. All you need is a photo and a video. FaceFusion is an open-source face swapper/enhancer with a simple web interface. On an M1/M2 Macbook, it even makes use of Core ML! https://t.co/EjQmv2ut6u https://t.co/aU2tpglECv

❀️1,172
likes
πŸ”214
retweets
πŸ–ΌοΈ Media
A
anton
@abacaj
πŸ“…
Aug 26, 2023
985d ago
πŸ†”92264792

Not sure if anyone else has seen this, I get very bad outputs using the HF code llama models, see below for the same prompt using HF vs meta provided code & weights (34b model). Also recommend the inference code from meta which is much faster out of the box https://t.co/mtqq53BkRq

Media 1
❀️215
likes
πŸ”23
retweets
πŸ–ΌοΈ Media
K
Andrej Karpathy
@karpathy
πŸ“…
Aug 26, 2023
985d ago
πŸ†”83171696

Deep Neural Nets: 33 years ago and 33 years from now https://t.co/pbZvYgMJak My post from last year randomly made it to HN so resharing here too. Maybe in 2055 someone will train an improved GPT-4 on their personal computing device in ~1 min as an irrelevant fun weekend project. https://t.co/jmDbq6PovD

Media 1
❀️2,071
likes
πŸ”311
retweets
πŸ–ΌοΈ Media
Z
Zhenjun Zhao
@zhenjun_zhao
πŸ“…
Aug 24, 2023
987d ago
πŸ†”63325669

Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields Hyeonseop Song, Seokhun Choi, Hoseok Do, Chul Lee, Taehyeong Kim tl;dr: pretrained NeRF+editable NeRF #ICCV2023 https://t.co/0ygDwWIZx3 https://t.co/HBFMctkXDG

Media 1Media 2
❀️51
likes
πŸ”12
retweets
πŸ–ΌοΈ Media
M
Matan Cohen-Grumi
@MatanCohenGrumi
πŸ“…
Aug 25, 2023
986d ago
πŸ†”07821912

AI Burger commercial made this video using images generated in @midjourney that were animated in @pika_labs and @runwayml #aivideo #Food

❀️485
likes
πŸ”84
retweets
πŸ–ΌοΈ Media
A
Aiming_AI
@Aiming_AI
πŸ“…
Aug 25, 2023
986d ago
πŸ†”45467386

🚨Fresh! Fruits are LOVE😍 #aiartcommunity Prompt πŸ“‘: [Fruit] color splash,black background, in the style of fluid photography, spectacular backdrops, vray, environmental awareness, duckcore, digital art wonders, award-winning Inviting all πŸ₯³ FollowπŸ₯° BMπŸ“RT ♻️ https://t.co/8S4ME8mMXi

Media 1
❀️309
likes
πŸ”57
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Aug 25, 2023
986d ago
πŸ†”32170350

On very exciting use of AI is that it can build interactive educational tools. Here, I asked Code Interpreter to develop an interactive website that illustrated the Central Limit Theorem in statistics. Just two prompts! Chat: https://t.co/WpcGLJR8XP Site: https://t.co/9S1GiJBmWp https://t.co/moeUdCpd3p

Media 1
❀️730
likes
πŸ”125
retweets
πŸ–ΌοΈ Media
L
Linus (●ᴗ●)
@LinusEkenstam
πŸ“…
Aug 26, 2023
985d ago
πŸ†”86583349

Text FX Google silently dropped this new AI powered tools for rappers, writers and wordsmiths in collaboration with Lupe Fiasco Technology has been paramount to Hiphop/rap This is just the latest tool in that tool box of samplers, drum machines & recorders Link below ⬇️ https://t.co/87ai7W7ySw

❀️1,444
likes
πŸ”311
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Aug 26, 2023
986d ago
πŸ†”83144764

A key concept we’ve been playing around is β€œchunk dreaming” (s/o @tomchapin) πŸ’­ Given a text chunk, auto-extract metadata like questions it can answer and also summaries over adjacent nodes. Better context -> better performing RAG. Brand-new guide πŸ’«: https://t.co/tMrp4T9Teg https://t.co/me5XVTUk8G

Media 1
❀️126
likes
πŸ”22
retweets
πŸ–ΌοΈ Media
C
Cameron R. Wolfe, Ph.D.
@cwolferesearch
πŸ“…
Aug 25, 2023
986d ago
πŸ†”88177354

One of the best ways to reduce hallucinations with LLMs is by retrieving useful, factual information and injecting it into the LLM’s prompt as added context. Although this might sound complicated, it’s actually quite easy to implement with standard vector search functionality… Why do we need this? All LLMs have a fixed context length. So, the amount of information we can include in a prompt is limited by nature! As such, we need to be selective about the context that we provide to our model. If we want to provide useful context that can reduce hallucinations and improve the model’s output, one of the best approaches is to retrieve relevant information from an external (vector) database. Retrieval framework for LLMs. Assuming that we have a lot of relevant textual data that can be used by an LLM, we can’t just inject all of this data into the model’s prompt every time that we perform inference. Rather, we need to do the following: 1. Break our data into textual chunks 2. Vectorize each of the textual chunks 3. Store these vectors (with their data) in a vector database 4. Find relevant data at inference time using vector search 5. Add relevant data to our prompt to provide more context to the LLM We will use the same embedding model to vectorize these chunks and to generate query vectors that can be used for search. Storing data in a vector db. The first step in the above framework is to chunk our data. Typically, we will use chunks of ~200 tokens. However, the optimal chunk size is a hyperparameter that can change depending on the application. Then, we use an embedding model to vectorize each of these chunks, and we can store them, along with their text data, in a vector database (e.g., Redis, Weaviate, Pinecone, Qdrant, etc.). Retrieving relevant context. When we want to retrieve relevant textual data from our vector db, we should just i) create a query embedding based on our prompt (possibly including the chat history) and ii) run a vector search for relevant documents. This way, we can use semantic search to identify portions of data that are relevant to include as context within the LLM’s prompt. Creating the query embedding. There are a ton of different ways we can create query embedding for searching our vector db. The simplest approach would be to truncate our chat history or prompt and pass this directly into the embedding model. But, if this is too long, we could ask the LLM to summarize our chat history or prompt before embedding it, or even to convert the chat history or prompt into a list of search keywords. Picking an embedding model. To make sure this works well, we need a good embedding model that captures the semantic similarities between our queries and textual chunks. There are a variety of good embedding models publicly available via SentenceTransformers and HuggingFace. To find one that works for you, I’d recommend taking a look at the Massive Text Embedding Benchmark (hosted on HuggingFace). The result. We can use the approach described above to power retrieval-augmented generation (RAG), which is one of the best ways to reduce hallucinations and improve the output of LLMs. Given that this approach can be implemented without significant effort via tools like Pinecone / Weaviate and HuggingFace / SentenceTransformers, it is undoubtedly one of the most useful practical tools for building with LLMs.

Media 1
❀️1,297
likes
πŸ”183
retweets
πŸ–ΌοΈ Media
L
Leandro von Werra
@lvwerra
πŸ“…
Aug 25, 2023
986d ago
πŸ†”59969459

Not much is known about the pretraining data of Code Llama but there is some good evidence the @StackOverflow was part of it. Found some breadcrumbs while working on a demo with a hello world example: Suddenly the model started generating a discussion between two users. https://t.co/3vnGo8Wr2j

Media 1
❀️183
likes
πŸ”42
retweets
πŸ–ΌοΈ Media