GPU Optimization of LLMs - 搜索视频

Secure Automation with RAG and LLMs

Secure Automation with RAG and LLMs

2024年10月3日

LLMs and LVMs for agentic AI: a GPU-accelerated multimodal system architecture for RAG-grounded, explainable, and adaptive intelligence

LLMs and LVMs for agentic AI: a GPU-accelerated multimodal system architecture for RAG-grounded, explainable, and adaptive intelligence

spiedigitallibrary.org

Setting up a custom AI large language model (LLM) GPU server to sell

Setting up a custom AI large language model (LLM) GPU server to sell

geeky-gadgets.com

LeftoverLocals: Listening to LLM responses through leaked GPU local memory

LeftoverLocals: Listening to LLM responses through leaked GPU local memory

2024年1月16日

trailofbits.com

Green AI at Scale: Energy-Efficient LLM Serving using vLLM & LLM Compressor - Abhijit, Anindita

Green AI at Scale: Energy-Efficient LLM Serving using vLLM & LLM Compressor - Abhijit, Anindita

已浏览 4 次2 个月之前

YouTubePython India

The CUDA Trick That Makes LLMs Faster AND Use Less Power (Real Results)

The CUDA Trick That Makes LLMs Faster AND Use Less Power (Real Results)

已浏览 1万次1 个月前

YouTubeOnchain AI Garage

The Hidden GPU Bottleneck That Kills LLMs in Production #gpu #llm #machinelearning

The Hidden GPU Bottleneck That Kills LLMs in Production #gpu #llm #machinelearning

已浏览 1178 次2 个月之前

YouTubeJam With AI | Shirin Khosravi Jam

Run LLMs on Your CPU’s NPU (NO GPU Needed) – Full Setup Guide

已浏览 3259 次1 个月前

YouTubeQuinn Favo

Google TurboQuant -Optimize Memory in LLMs

已浏览 107 次1 个月前

YouTubeaiunlocked

The LLM Decode Secret That Changes Everything (10x) #Shorts

YouTubeCollapsedLatents

Google's TurboQuant Explained: 8x Faster LLMs with ZERO Accuracy Loss!

已浏览 859 次1 个月前

YouTubeMuhammad Idnan

Why LLMs Need GPUs

已浏览 10 次1 个月前

YouTubeRemoder Inc.

MegaTrain: Train 100B+ Parameter LLMs on One GPU

已浏览 96 次1 个月前

YouTubeAI Research Roundup

PagedAttention Explained: How LLMs Save GPU Memory

已浏览 99 次2 个月之前

YouTubeThe AI Context

Run Local LLMs 100% on AMD GPU (Ollama & Windows Guide)

已浏览 1094 次1 个月前

YouTubeFilip Delac

kvcached: Revolutionizing GPU Memory for LLMs

已浏览 1 次1 个月前

YouTubeThe AI Opus

How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU! ⚡️Excited to share our new #ICML2026 paper in collaboration with @NVIDIA: "Sparser, Faster, Lighter Transformer Language Models". This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer language models:Paper: https://t.co/3Avj8N8iYOBlog: https://t.co/SqFkkKvkbdCode: https://t.co/PHSzMq8pg0While LLMs are undoubtedl

已浏览 15.3万次2 周前

LLMs require more GPU memory as they generate longer responses. Can we make GPU memory constant without significantly sacrificing accuracy?IceCache is a new method for managing KV caches that leverages Dynamic Continuous Indexing (DCI) to efficiently group and retrieve tokens by semantics.Joint work w/ @Mao_Yuzhen, @q1tong and Martin Ester.For details, check out the links below.

已浏览 6046 次1 个月前

x.comKe Li 🍁

【双语纯享】🧠如何精准计算LLMs内存需求！？人人能懂的GPU负载指南！

已浏览 1465 次2025年5月21日

bilibili比特光锥_BightCone

Deepspeed GPU optimizer

已浏览 1295 次2024年12月27日

YouTubeMLOps.community

Optimize Your AI - Quantization Explained

已浏览 40.7万次2024年12月28日

YouTubeMatt Williams

How LLMs use multiple GPUs

已浏览 1万次9 个月之前

YouTubeSimon Oz

Optimizing LLM Training on GPUs

已浏览 434 次3 个月之前

YouTubeFaradawn Yang

LLMs on GPU vs. CPU

已浏览 2834 次2025年3月4日

YouTubeBlueSpork

Deep Dive: Optimizing LLM inference

已浏览 4.9万次2024年3月11日

YouTubeJulien Simon

Pretraining LLMs: Lessons from Cohere

已浏览 4386 次11 个月之前

YouTubeLossfunk

CUDA-L1: LLM Auto-Optimizes GPU Code

已浏览 123 次10 个月之前

YouTubeAI Research Roundup

How to Efficiently Serve an LLM?

已浏览 5039 次2024年8月5日

YouTubeAhmed Tremo

All You Need To Know About Running LLMs Locally

已浏览 32.2万次2024年2月26日

Fine Tuning LLM Models – Generative AI Course

已浏览 44万次2024年5月21日

YouTubefreeCodeCamp.org

展开