Mlsys Arxiv Daily

Updated on 2025.08.13

Usage instructions: here

LLM inference

Publish Date	Title	Authors	PDF	Code
2025-07-22	Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework	Hongyi Tang et.al.	2507.16414	null
2025-07-21	Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing	Shibo Yu et.al.	2507.15553	null
2025-07-18	Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need	Michael Davies et.al.	2507.14397	null
2025-07-18	Characterizing Communication Patterns in Distributed Large Language Model Inference	Lang Xu et.al.	2507.14392	null
2025-07-18	Can LLMs Infer Personality from Real World Conversations?	Jianfeng Zhu et.al.	2507.14355	null
2025-07-14	PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training	Pengfei Du et.al.	2507.14202	null
2025-07-23	Photonic Fabric Platform for AI Accelerators	Jing Ding et.al.	2507.14000	null
2025-07-23	DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training	Zhixin Wang et.al.	2507.13833	null
2025-07-18	Team of One: Cracking Complex Video QA with Model Synergy	Jun Xie et.al.	2507.13820	null
2025-07-18	LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues	Haoyang Li et.al.	2507.13681	null
2025-07-17	Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation	Genki Kusano et.al.	2507.13525	null
2025-07-16	Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage	Junqing Lin et.al.	2507.12205	null
2025-07-15	MIRAGE: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving	Ruihao Li et.al.	2507.11507	null
2025-07-15	Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations	Miray Özcan et.al.	2507.11417	null
2025-07-15	KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding	Luohe Shi et.al.	2507.11273	null
2025-07-16	GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning	Ziru Liu et.al.	2507.10628	null
2025-07-14	Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving	Wonung Kim et.al.	2507.10178	null
2025-07-14	Past-Future Scheduler for LLM Serving under SLA Guarantees	Ruihao Gong et.al.	2507.10150	null
2025-07-14	ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism	Zedong Liu et.al.	2507.10069	null
2025-07-14	Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference	Jiaming Cheng et.al.	2507.09942	null
2025-07-13	Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset	Lily Hong Zhang et.al.	2507.09650	null
2025-07-12	SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding	Weihong Xu et.al.	2507.09201	null
2025-07-11	On Evaluating Performance of LLM Inference Serving Systems	Amey Agrawal et.al.	2507.09019	null
2025-07-11	Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference	Chun-Ting Chen et.al.	2507.09010	null
2025-07-11	Orchestration for Domain-specific Edge-Cloud Language Models	Prasoon Patidar et.al.	2507.09003	null
2025-07-11	InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching	Yilun Wang et.al.	2507.08523	null
2025-07-11	Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training	Aleksei Ilin et.al.	2507.08284	null
2025-07-10	Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions	Quanyan Zhu et.al.	2507.08208	null
2025-07-10	Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing	Junyi Wen et.al.	2507.08045	null
2025-07-11	Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models	Varin Sikka et.al.	2507.07505	null
2025-07-16	Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving	Xiaoxiang Shi et.al.	2507.06608	null
2025-07-11	QUEST: Query Optimization in Unstructured Document Analysis	Zhaoze Sun et.al.	2507.06515	null
2025-07-08	Voltage Regulation in Distribution Systems with Data Center Loads	Yize Chen et.al.	2507.06416	null
2025-07-08	Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models	L’ea Dubois et.al.	2507.05822	null
2025-07-07	Cascade: Token-Sharded Private LLM Inference	Rahul Thomas et.al.	2507.05228	null
2025-07-07	MoLink: Distributed and Efficient Serving Framework for Large Models	Lewei Jin et.al.	2507.05043	null
2025-07-16	Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?	Yun Qu et.al.	2507.04632	null
2025-07-09	Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking	Tim Beyer et.al.	2507.04446	null
2025-07-23	Fairness Evaluation of Large Language Models in Academic Library Reference Services	Haining Wang et.al.	2507.04224	null
2025-07-05	Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States	Karine Karine et.al.	2507.03871	null
2025-07-05	OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference	Seungjun Shin et.al.	2507.03865	null
2025-07-08	MemOS: A Memory OS for AI System	Zhiyu Li et.al.	2507.03724	null
2025-07-04	Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA	Jindong Li et.al.	2507.03308	null
2025-07-03	HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference	Weishu Deng et.al.	2507.03153	null
2025-06-20	Large Language Model-Driven Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization	Lindong Xie et.al.	2507.02892	null
2025-07-03	On the Convergence of Large Language Model Optimizer for Black-Box Network Management	Hoon Lee et.al.	2507.02689	null
2025-07-03	Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure	Rui Xie et.al.	2507.02654	null
2025-07-14	FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference	Xing Liu et.al.	2507.02620	null
2025-07-02	Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency	Zongpu Zhang et.al.	2507.02135	null
2025-07-02	AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training	Zhenyu Han et.al.	2507.01663	null
2025-07-02	Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities	Yingqiang Gao et.al.	2507.01479	null
2025-07-02	LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation	Tianyu Liu et.al.	2507.01449	null
2025-07-02	EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices	Zheyu Shen et.al.	2507.01438	null
2025-07-08	SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech	Zhuangfei Cheng et.al.	2507.01348	null
2025-07-02	La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation	Kai Liu et.al.	2507.01299	null
2025-07-01	PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning	Xingke Yang et.al.	2507.01216	null
2025-06-28	A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval	Puspendu Banerjee et.al.	2507.01058	null
2025-07-01	VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator	Zhican Wang et.al.	2507.00797	null
2025-07-01	Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models	Yilun Zhang et.al.	2507.00653	null
2025-07-01	LLM-Mesh: Enabling Elastic Sharing for Serverless LLM Inference	Chuhao Xu et.al.	2507.00507	null
2025-07-01	Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs	Mohammad Firas Sada et.al.	2507.00418	null
2025-06-30	Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission	Faranaksadat Solat et.al.	2507.00082	null
2025-06-30	Scaling Human Judgment in Community Notes with LLMs	Haiwen Li et.al.	2506.24118	null
2025-06-30	A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications	Boyang Yang et.al.	2506.23749	null
2025-06-28	Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models	Tejas Vaidhya et.al.	2506.23025	null
2025-06-28	Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation	Sen Fang et.al.	2506.22776	null
2025-07-01	Not All Water Consumption Is Equal: A Water Stress Weighted Metric for Sustainable Computing	Yanran Wu et.al.	2506.22773	null
2025-06-27	QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization	Danush Khanna et.al.	2506.22396	null
2025-06-27	Towards Operational Data Analytics Chatbots – Virtual Knowledge Graph is All You Need	Junaid Ahmed Khan et.al.	2506.22267	null
2025-06-27	SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference	Yongchao He et.al.	2506.22033	null
2025-06-27	A Survey of LLM Inference Systems	James Pan et.al.	2506.21901	null
2025-06-26	Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces	Michael Johnston et.al.	2506.21467	null
2025-06-26	BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services	Zhaojiacheng Zhou et.al.	2506.21033	null
2025-06-17	Utility-Driven Speculative Decoding for Mixture-of-Experts	Anish Saxena et.al.	2506.20675	null
2025-06-25	DipSVD: Dual-importance Protected SVD for Efficient LLM Compression	Xuan Ding et.al.	2506.20353	null
2025-07-02	Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU	He Sun et.al.	2506.20187	null
2025-06-24	MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection	Zhengxiang Huang et.al.	2506.19884	null
2025-06-24	Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models	Jungwoo Park et.al.	2506.19697	null
2025-06-25	Adaptive Request Scheduling for CodeLLM Serving with SLA Guarantees	Shi Chang et.al.	2506.19677	null
2025-06-23	Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation	Ahmadreza Saboor Yaraghi et.al.	2506.19045	null
2025-06-23	WiLLM: An Open Wireless LLM Communication System	Boyi Liu et.al.	2506.19030	null
2025-06-23	LLMs on a Budget? Say HOLA	Zohaib Hasan Siddiqui et.al.	2506.18952	null
2025-06-23	CommVQ: Commutative Vector Quantization for KV Cache Compression	Junyan Li et.al.	2506.18879	null
2025-06-26	PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries	Steven Kolawole et.al.	2506.18728	null
2025-06-22	Mechanistic Interpretability in the Presence of Architectural Obfuscation	Marcos Florencio et.al.	2506.18053	null
2025-06-22	LLMs for Customized Marketing Content Generation and Evaluation at Scale	Haoran Liu et.al.	2506.17863	null
2025-07-18	LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning	Haoxuan Che et.al.	2506.17562	null
2025-06-08	Training-free LLM Verification via Recycling Few-shot Examples	Dongseok Lee et.al.	2506.17251	null
2025-06-20	Towards AI Search Paradigm	Yuchen Li et.al.	2506.17188	null
2025-06-23	From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents	Mohammad Amaan Sayeed et.al.	2506.15911	null
2025-05-30	Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding	Feiyu Yao et.al.	2506.15704	null
2025-06-18	eLLM: Elastic Memory Management Framework for Efficient LLM Serving	Jiale Xu et.al.	2506.15155	null
2025-06-17	CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision	Dyah Adila et.al.	2506.14912	null
2025-06-17	Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching	Qizheng Zhang et.al.	2506.14852	null
2025-06-05	MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs	Zhenyan Lu et.al.	2506.13772	null
2025-06-17	Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention	Haonan Wang et.al.	2506.13674	null
2025-06-16	Vector Ontologies as an LLM world view extraction method	Kaspar Rothenfusser et.al.	2506.13252	link
2025-06-16	Empirical Evaluation of Large Language Models in Automated Program Repair	Jiajun Sun et.al.	2506.13186	null
2025-06-19	Serving Large Language Models on Huawei CloudMatrix384	Pengfei Zuo et.al.	2506.12708	null
2025-06-13	Semantic Scheduling for LLM Inference	Wenyue Hua et.al.	2506.12204	link
2025-05-21	FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization	Fangxin Liu et.al.	2506.12024	null
2025-06-13	Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache	Xiaoran Liu et.al.	2506.11886	null
2025-06-13	GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news	Abdul Haque et.al.	2506.11600	null
2025-06-13	Collaborative LLM Inference via Planning for Efficient Reasoning	Byeongchan Lee et.al.	2506.11578	null
2025-06-13	Efficient Long-Context LLM Inference via KV Cache Clustering	Jie Hu et.al.	2506.11418	null
2025-06-12	From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review	Yaohui Zhang et.al.	2506.11343	null
2025-06-12	SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding	Ziyi Zhang et.al.	2506.11309	null
2025-06-06	DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration	Hanzhi Zhang et.al.	2506.11104	link
2025-06-12	Slimming Down LLMs Without Losing Their Minds	Qingda et.al.	2506.10885	null
2025-06-12	AdaptiveLLM: A Framework for Selecting Optimal Cost-Efficient LLM for Code-Generation Based on CoT Length	Junhang Cheng et.al.	2506.10525	link
2025-06-12	TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference	Hongbin Zhang et.al.	2506.10470	null
2025-06-11	A First Look at Bugs in LLM Inference Engines	Mugeng Liu et.al.	2506.09713	link
2025-06-12	Understanding the Performance and Power of LLM Inferencing on Edge Accelerators	Mayank Arya et.al.	2506.09554	null
2025-06-11	Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning	Jiayi Yuan et.al.	2506.09501	null
2025-06-10	Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive- $k$	Chihiro Taguchi et.al.	2506.08479	null
2025-07-19	Draft-based Approximate Inference for LLMs	Kevin Galim et.al.	2506.08373	link
2025-06-09	MiniCPM4: Ultra-Efficient LLMs on End Devices	MiniCPM Team et.al.	2506.07900	link
2025-06-09	How Benchmark Prediction from Fewer Data Misses the Mark	Guanhua Zhang et.al.	2506.07673	link
2025-06-09	TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review	Yuan Chang et.al.	2506.07642	null
2025-06-09	MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts	Wei Tao et.al.	2506.07533	null
2025-06-07	Containerized In-Storage Processing and Computing-Enabled SSD Disaggregation	Miryeong Kwon et.al.	2506.06769	null
2025-06-06	Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques	Adarsh Prasad Behera et.al.	2506.06579	null
2025-06-06	Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage	Ziqi Yuan et.al.	2506.06472	null
2025-07-08	On the Fundamental Impossibility of Hallucination Control in Large Language Models	Michał P. Karpowicz et.al.	2506.06382	null
2025-05-21	Reward Is Enough: LLMs Are In-Context Reinforcement Learners	Kefan Song et.al.	2506.06303	null
2025-06-06	AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search	Yu Li et.al.	2506.06017	null
2025-06-06	FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model	Md Jueal Mia et.al.	2506.05640	link
2025-06-11	Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models	Yanzhao Zhang et.al.	2506.05176	null
2025-06-05	Are LLMs Reliable Translators of Logical Reasoning Across Lexically Diversified Contexts?	Qingchuan Li et.al.	2506.04575	link
2025-06-04	Cascadia: A Cascade Serving System for Large Language Models	Youhe Jiang et.al.	2506.04203	null
2025-06-04	SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling	Anhao Zhao et.al.	2506.04179	null
2025-06-04	GORACS: Group-level Optimal Transport-guided Coreset Selection for LLM-based Recommender Systems	Tiehua Mei et.al.	2506.04015	null
2025-06-04	Pre $^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation	Junyi Chen et.al.	2506.03887	null
2025-06-04	Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis	Avihay Cohen et.al.	2506.03656	null
2025-06-04	POSS: Position Specialist Generates Better Draft for Speculative Decoding	Langlin Huang et.al.	2506.03566	link
2025-07-10	Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs	Jiakun Fan et.al.	2506.03296	null
2025-06-03	QKV Projections Require a Fraction of Their Memory	Malik Khalaf et.al.	2506.02939	null
2025-06-03	Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs	Shangmin Guo et.al.	2506.02918	null
2025-06-14	TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression	Zhong-Zhi Li et.al.	2506.02678	link
2025-07-23	KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider	Jiahao Wang et.al.	2506.02634	link
2025-06-03	HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference	Ping Gong et.al.	2506.02572	link
2025-06-03	Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective	Shenghua He et.al.	2506.02553	null
2025-05-29	NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs	Haeun Lee et.al.	2506.02024	null
2025-05-24	Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing	Zhaoyuan Su et.al.	2506.02006	null
2025-05-16	Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism	Yuhao Shen et.al.	2506.01979	null
2025-06-02	Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts	Spencer Banasik et.al.	2506.01827	null
2025-05-13	AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies	Amit Sharma et.al.	2506.00008	null
2025-05-30	AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption	Yajie Zhou et.al.	2505.24773	null
2025-05-30	SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training	Yehonathan Refael et.al.	2505.24749	null
2025-05-30	Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching	Juan Wisznia et.al.	2505.24643	null
2025-05-30	LLM Inference Enhanced by External Knowledge: A Survey	Yu-Hsuan Lin et.al.	2505.24377	link
2025-05-30	SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference	Tian Xia et.al.	2505.24095	null
2025-05-29	Large Language Model Meets Constraint Propagation	Alexandre Bonlarron et.al.	2505.24012	null
2025-05-29	EmbAdvisor: Adaptive Cache Management for Sustainable LLM Serving	Yuyang Tian et.al.	2505.23970	null
2025-05-29	Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters	Hayden Moore et.al.	2505.23554	null
2025-06-10	Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism	Jinhui Wei et.al.	2505.23219	null
2025-05-29	SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference	Yinghao Tang et.al.	2505.23022	null
2025-05-28	Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference	Donghyeon Joo et.al.	2505.22913	link
2025-05-28	AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models	Feng Luo et.al.	2505.22662	null
2025-05-28	Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR	Mingchen Shao et.al.	2505.22063	null
2025-05-28	ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning	Zhendong Mi et.al.	2505.21987	null
2025-05-28	Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference	Yue Zhu et.al.	2505.21919	null
2025-05-29	EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse	Tianyu Guo et.al.	2505.21889	link
2025-05-28	HoliTom: Holistic Token Merging for Fast Video Large Language Models	Kele Shao et.al.	2505.21334	link
2025-06-04	LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models	Jieyong Kim et.al.	2505.21082	null
2025-05-27	Efficient Large Language Model Inference with Neural Block Linearization	Mete Erdogan et.al.	2505.21077	null
2025-07-18	FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration	Daehyeon Baek et.al.	2505.20839	null
2025-05-26	HAMburger: Accelerating LLM Inference via Token Smashing	Jingyu Liu et.al.	2505.20438	null
2025-05-23	Less Context, Same Performance: A RAG Framework for Resource-Efficient LLM-Based Clinical NLP	Satya Narayana Cheetirala et.al.	2505.20320	null
2025-05-26	APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization	Javier Marín et.al.	2505.19912	link
2025-06-13	MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE	Zongle Huang et.al.	2505.19645	null
2025-05-26	VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning	Maonan Wang et.al.	2505.19486	null
2025-05-26	BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs	Guilong Lu et.al.	2505.19457	link
2025-05-26	WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference	Sihan Chen et.al.	2505.19427	link
2025-05-25	DECA: A Near-Core LLM Decompression Accelerator Supporting Out-of-Order Invocation	Gerasimos Gerogiannis et.al.	2505.19349	null
2025-05-25	Can Large Language Models Infer Causal Relationships from Real-World Text?	Ryan Saklad et.al.	2505.18931	null
2025-06-18	ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models	Hao Chen et.al.	2505.18799	null
2025-06-01	A Survey of LLM $\times$ DATA	Xuanhe Zhou et.al.	2505.18458	link
2025-05-23	LatentLLM: Attention-Aware Joint Tensor Compression	Toshiaki Koike-Akino et.al.	2505.18413	null
2025-05-23	An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs	Rahul Thomas et.al.	2505.18332	null
2025-07-01	Two-Stage Regularization-Based Structured Pruning for LLMs	Mingkuan Feng et.al.	2505.18232	null
2025-05-23	NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache	Donghyun Son et.al.	2505.18231	null
2025-05-23	Navigating Pitfalls: Evaluating LLMs in Machine Learning Programming Education	Smitha Kumar et.al.	2505.18220	null
2025-05-23	Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning	Michael Hassid et.al.	2505.17813	null
2025-05-23	DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies	Ning Yang et.al.	2505.17420	null
2025-05-26	RAP: Runtime-Adaptive Pruning for LLM Inference	Huanrong Liu et.al.	2505.17138	null
2025-05-20	Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency	Ruixiao Li et.al.	2505.17074	null
2025-05-16	SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs	Jinwoo Park et.al.	2505.17052	null
2025-05-22	CASTILLO: Characterizing Response Length Distributions of Large Language Models	Daniel F. Perez-Ramirez et.al.	2505.16881	link
2025-05-24	Recursive Offloading for LLM Serving in Multi-tier Networks	Zhiyuan Wu et.al.	2505.16502	link
2025-05-22	Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization	Vera Neplenbroek et.al.	2505.16467	link
2025-05-22	LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead	Yifan Zhang et.al.	2505.16221	null
2025-05-31	QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design	Benjamin Schneider et.al.	2505.16175	link
2025-05-22	KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization	Mingbo Song et.al.	2505.16162	null
2025-05-21	Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning	Jinghui Lu et.al.	2505.15154	null
2025-05-21	BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms	Yunlong Hou et.al.	2505.15141	null
2025-06-04	Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity	Susav Shrestha et.al.	2505.14884	link
2025-05-20	ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions	Bufang Yang et.al.	2505.14668	null
2025-05-20	ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs	Yifan Sui et.al.	2505.14468	null
2025-05-20	Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning	Jiwon Song et.al.	2505.13866	link
2025-05-19	Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training	Shane Bergsma et.al.	2505.13738	null
2025-05-16	An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents	Ayesha Amjad et.al.	2505.13504	null
2025-04-02	Large Language Model powered Symbolic Execution	Yihe Li et.al.	2505.13452	null
2025-05-19	Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately	Yuhang Wang et.al.	2505.13326	null
2025-05-19	HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding	Siran Liu et.al.	2505.13254	null
2025-05-19	FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference	Guangda Liu et.al.	2505.13109	null
2025-05-19	EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code	Yuhao Qing et.al.	2505.13004	link
2025-05-25	FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks	Zihua Wang et.al.	2505.12728	link
2025-05-19	HydraInfer: Hybrid Disaggregated Scheduling for Multimodal Large Language Model Serving	Xianzhe Dong et.al.	2505.12658	null
2025-05-17	Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning	Yuheng Lu et.al.	2505.11922	null
2025-05-17	Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture	Yu Wu et.al.	2505.11916	null
2025-05-25	Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning	Yansong Ning et.al.	2505.11827	null
2025-07-10	TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference	Raja Gond et.al.	2505.11329	link
2025-05-23	SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning	Zheng Li et.al.	2505.11274	null
2025-05-16	Vaiage: A Multi-Agent Solution to Personalized Travel Planning	Binwen Liu et.al.	2505.10922	null
2025-05-21	SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices	Xiangwen Zhuge et.al.	2505.10259	link
2025-06-05	ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production	Yuxing Xiang et.al.	2505.09999	link
2025-05-15	How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference	Nidhal Jegham et.al.	2505.09598	null
2025-05-14	Statistical Modeling and Uncertainty Estimation of LLM Inference Systems	Kaustabha Ray et.al.	2505.09319	null
2025-05-14	ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor	Seungbeom Choi et.al.	2505.09142	null
2025-05-13	ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition	Keran Zheng et.al.	2505.08981	null
2025-06-30	LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries	Zekun Wu et.al.	2505.08842	null
2025-05-13	Automatic Task Detection and Heterogeneous LLM Speculative Decoding	Danying Ge et.al.	2505.08600	null
2025-05-08	Scaling Laws for Speculative Decoding	Siyuan Yan et.al.	2505.07858	null
2025-05-12	SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models	Hang Wu et.al.	2505.07680	null
2025-05-12	LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning	Xiaotian Lin et.al.	2505.07437	link
2025-05-12	Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity	Guang Yan et.al.	2505.07239	null
2025-05-12	PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications	Kuntai Du et.al.	2505.07203	null
2025-06-15	I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference	Zibo Gao et.al.	2505.06738	null
2025-05-09	Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference	Haolin Zhang et.al.	2505.06461	null
2025-04-30	Towards Efficient LLM Storage Reduction via Tensor Deduplication and Delta Compression	Zirui Wang et.al.	2505.06252	null
2025-05-09	Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM	Zehao Fan et.al.	2505.05772	null
2025-05-08	PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization	Mohamed Salah Bouafif et.al.	2505.05584	link
2025-05-08	HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow	You Peng et.al.	2505.05286	link
2025-05-12	Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving	Shan Yu et.al.	2505.04021	null
2025-05-31	LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection	Xinyue Zeng et.al.	2505.03793	link
2025-05-15	GPU Performance Portability needs Autotuning	Burkhard Ringlein et.al.	2505.03780	link
2025-04-21	Splitwiser: Efficient LM inference with constrained resources	Asad Aali et.al.	2505.03763	link
2025-04-07	AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design	Yanbiao Liang et.al.	2505.03745	null
2025-05-06	Faster MoE LLM Inference for Extremely Large Models	Haoqi Yang et.al.	2505.03531	null
2025-05-16	34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery	Yoel Zimmermann et.al.	2505.03049	null
2025-06-30	RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference	Yaoqi Chen et.al.	2505.02922	null
2025-05-06	EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices	Arnab Sanyal et.al.	2505.02380	null
2025-05-03	Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients	Yezhen Wang et.al.	2505.01744	null
2025-05-03	High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers	Brian Wong et.al.	2505.01693	null
2025-05-08	A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency	Sihyeong Park et.al.	2505.01658	link
2025-05-02	PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding	Bradley McDanel et.al.	2505.01572	null
2025-05-01	Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models	Andrew Adiletta et.al.	2505.00817	null
2025-04-29	Efficient LLMs with AMP: Attention Heads and MLP Pruning	Leandro Giusti Mugnaini et.al.	2504.21174	null
2025-04-29	Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts	Hanhua Hong et.al.	2504.21117	null
2025-04-30	Ascendra: Dynamic Request Prioritization for Efficient LLM Serving	Azam Ikram et.al.	2504.20828	null
2025-04-30	GenTorrent: Scaling Large Language Model Serving with An Overley Network	Fei Fang et.al.	2504.20101	null
2025-04-24	Tempo: Application-aware LLM Serving with Mixed SLO Requirements	Wei Zhang et.al.	2504.20068	null
2025-04-28	AutoJudge: Judge Decoding Without Manual Annotation	Roman Garipov et.al.	2504.20039	null
2025-04-28	semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage	Ke Hong et.al.	2504.19867	null
2025-04-28	Taming the Titans: A Survey of Efficient LLM Inference Serving	Ranran Zhen et.al.	2504.19720	link
2025-04-28	Bullet: Boosting GPU Utilization for LLM Serving via Dynamic Spatial-Temporal Orchestration	Zejia Lin et.al.	2504.19516	null
2025-04-28	R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference	Zhenyu Zhang et.al.	2504.19449	null
2025-04-28	Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory	Prateek Chhikara et.al.	2504.19413	null
2025-05-07	A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification	Junichiro Niimi et.al.	2504.18884	link
2025-06-15	PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation	Zihao An et.al.	2504.18583	null
2025-04-25	EcoServe: Enabling Cost-effective LLM Serving with Proactive Intra- and Inter-Instance Orchestration	Jiangsu Du et.al.	2504.18154	null
2025-04-25	PropRAG: Guiding Retrieval with Beam Search over Proposition Paths	Jingjin Wang et.al.	2504.18070	null
2025-04-25	Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving	Chang Xiao et.al.	2504.17999	null
2025-04-24	Energy Considerations of Large Language Model Inference and Efficiency Optimizations	Jared Fernandez et.al.	2504.17674	null
2025-04-24	L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference	Qingyuan Liu et.al.	2504.17584	null
2025-04-24	A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task	Jiaqi Deng et.al.	2504.17547	null
2025-04-24	On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration	Maoyang Xiang et.al.	2504.17376	null
2025-04-26	QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining	Fengze Liu et.al.	2504.16511	null
2025-04-18	HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing	Myunghyun Rhee et.al.	2504.16112	null
2025-05-29	Optimizing Token Consumption in LLMs: A Nano Surge Approach for Code Reasoning Efficiency	Junwei Hu et.al.	2504.15989	null
2025-04-22	SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference	Yihao Zhao et.al.	2504.15720	null
2025-04-23	A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings	Md Millat Hosen et.al.	2504.15610	link
2025-04-21	Speculative Sampling via Exponential Races	Szymon Kobus et.al.	2504.15475	null
2025-05-20	KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments	Junyoung Park et.al.	2504.15364	null
2025-04-18	High-Throughput LLM inference on Heterogeneous Clusters	Yi Xiong et.al.	2504.15303	null
2025-04-17	D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving	Haodong Wang et.al.	2504.15299	null
2025-06-12	SLO-Aware Scheduling for Large Language Model Inferences	Jinqi Huang et.al.	2504.14966	null
2025-04-21	Hardware-based Heterogeneous Memory Management for Large Language Model Inference	Soojin Hwang et.al.	2504.14893	null
2025-05-28	gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling	Tianyu Guo et.al.	2504.14775	link
2025-04-20	Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions	Luyang Fang et.al.	2504.14772	null
2025-04-22	Optimizing SLO-oriented LLM Serving with PD-Multiplexing	Weihao Cui et.al.	2504.14489	null
2025-04-19	Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator	Akshat Ramachandran et.al.	2504.14365	null
2025-04-19	FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference	Coleman Hooper et.al.	2504.14152	null
2025-05-12	From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs	Jiliang Ni et.al.	2504.13471	null
2025-05-23	The Quantum LLM: Modeling Semantic Spaces with Quantum Principles	Timo Aukusti Laine et.al.	2504.13202	null
2025-04-25	Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving	Yaoyao Ding et.al.	2504.12984	null
2025-04-17	Data-efficient LLM Fine-tuning for Code Generation	Weijie Lv et.al.	2504.12687	link
2025-04-16	Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading	Kihyun Kim et.al.	2504.11816	link
2025-04-16	Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs	Hyungwoo Lee et.al.	2504.11765	null
2025-04-16	Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures	Prabhu Vellaisamy et.al.	2504.11750	null
2025-04-16	Progent: Programmable Privilege Control for LLM Agents	Tianneng Shi et.al.	2504.11703	link
2025-04-15	Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints	Ruicheng Ao et.al.	2504.11320	link
2025-04-14	HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving	Avinash Kumar et.al.	2504.10724	null
2025-04-14	Load Balancing with Network Latencies via Distributed Gradient Descent	Santiago R. Balseiro et.al.	2504.10693	null
2025-04-14	AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference	Yangshen Deng et.al.	2504.10326	null
2025-04-14	KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference	Yuxuan Tian et.al.	2504.09936	null
2025-04-20	Understanding and Optimizing Multi-Stage AI Inference Pipelines	Abhimanyu Rajeshkumar Bambhaniya et.al.	2504.09775	null
2025-04-13	Integrating Large Language Models for Automated Structural Analysis	Haoran Liang et.al.	2504.09754	null
2025-04-13	Efficient LLM Serving on Hybrid Real-time and Best-effort Requests	Wan Borui et.al.	2504.09590	null
2025-04-13	LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference	Jianing Zheng et.al.	2504.09561	link
2025-04-12	MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints	Yichao Yuan et.al.	2504.09345	null
2025-05-22	DynaServe: Unified and Elastic Execution for Dynamic Disaggregated LLM Serving	Chaoyi Ruan et.al.	2504.09285	null
2025-04-11	An Adaptive Vector Index Partitioning Scheme for Low-Latency RAG Pipeline	Junkyum Kim et.al.	2504.08930	null
2025-04-11	SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting	Jiaming Xu et.al.	2504.08850	null
2025-05-31	SD $^2$ : Self-Distilled Sparse Drafters	Mike Lasby et.al.	2504.08838	null
2025-04-07	PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters	Zonghang Li et.al.	2504.08791	link
2025-04-11	Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash	Fucheng Jia et.al.	2504.08378	null
2025-04-11	Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices	Shengyuan Ye et.al.	2504.08242	null
2025-04-10	Token Level Routing Inference System for Edge Devices	Jianshu She et.al.	2504.07878	null
2025-04-10	A System for Comprehensive Assessment of RAG Frameworks	Mattia Rengo et.al.	2504.07803	link
2025-04-10	Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving	Shihong Gao et.al.	2504.07494	link
2025-04-10	UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference	Weikai Xu et.al.	2504.07479	null
2025-04-24	Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents	Yueying Li et.al.	2504.07347	null
2025-04-08	S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning	Hanqing Zeng et.al.	2504.06426	null
2025-04-08	SPIRe: Boosting LLM Inference Throughput with Speculative Decoding	Sanjit Neelam et.al.	2504.06419	null
2025-04-08	Mosaic: Composite Projection Pruning for Resource-efficient LLMs	Bailey J. Eccles et.al.	2504.06323	null
2025-04-08	Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching	Yanhao Dong et.al.	2504.06319	null
2025-05-23	Hogwild! Inference: Parallel LLM Generation via Concurrent Attention	Gleb Rodionov et.al.	2504.06261	null
2025-05-27	User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems	Jianling Wang et.al.	2504.05522	null
2025-04-07	REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding	Sakib Reza et.al.	2504.05491	null
2025-04-07	Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness	Dongzhuoran Zhou et.al.	2504.05163	null
2025-05-20	Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning	Sugyeong Eo et.al.	2504.05047	null
2025-04-05	PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models	Haofei Yin et.al.	2504.04104	null
2025-04-03	FlowKV: A Disaggregated Inference Framework with Low-Latency KV Cache Transfer and Load-Aware Scheduling	Weiqing Li et.al.	2504.03775	null
2025-03-30	VFlow: Discovering Optimal Agentic Workflows for Verilog Generation	Yangbo Wei et.al.	2504.03723	null
2025-04-08	MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization	Zongwu Wang et.al.	2504.03661	link
2025-03-01	Echo: Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model Serving	Zhibin Wang et.al.	2504.03651	null
2025-02-22	AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure	The AIBrix Team et.al.	2504.03648	null
2025-04-04	Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Erik Johannes Husom et.al.	2504.03360	null
2025-04-04	Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation	Weitao Li et.al.	2504.03165	link
2025-04-03	Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search	Parsa Ghaffari et.al.	2504.02426	link
2025-04-01	SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching	Yuxuan Zhu et.al.	2504.00970	null
2025-06-04	Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding	Aayush Gautam et.al.	2504.00030	null
2025-03-31	TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance	Jingxian Xu et.al.	2503.24198	null
2025-04-06	ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance	Tong Xie et.al.	2503.24053	link
2025-03-31	Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving	Wei Gao et.al.	2503.24000	link
2025-03-31	Model Hemorrhage and the Robustness Limits of Large Language Models	Ziyang Ma et.al.	2503.23924	null
2025-03-31	MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration	Tatsuya Kubo et.al.	2503.23817	null
2025-03-30	Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference	Wei Tao et.al.	2503.23294	null
2025-03-30	PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference	Weisheng Jin et.al.	2503.23274	link
2025-03-28	Niyama : Breaking the Silos of LLM Inference Serving	Kanishk Goel et.al.	2503.22562	null
2025-03-26	Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation	Yunkai Liang et.al.	2503.20552	link
2025-03-25	LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation	Han Chen et.al.	2503.19950	link
2025-03-24	LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment	Varsha Embar et.al.	2503.19090	null
2025-03-23	SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices	Jian Ma et.al.	2503.18986	null
2025-03-24	xKV: Cross-Layer SVD for KV-Cache Compression	Chi-Chih Chang et.al.	2503.18893	link
2025-04-21	Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design	Rui Xie et.al.	2503.18869	null
2025-05-14	Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization	Minsu Kim et.al.	2503.18599	null
2025-03-24	DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective	Changlun Li et.al.	2503.18313	null
2025-03-24	Jenga: Effective Memory Management for Serving LLM with Heterogeneity	Chen Zhang et.al.	2503.18292	null
2025-03-27	WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference	Youhui Zuo et.al.	2503.17922	link
2025-03-22	PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling	Chongpeng Liu et.al.	2503.17707	null
2025-03-21	V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms	Javier J. Poveda Rodrigo et.al.	2503.17422	null
2025-03-21	Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation	Jingzhi Fang et.al.	2503.16893	null
2025-05-16	KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse	Huan Yang et.al.	2503.16525	null
2025-03-20	SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models	Fahao Chen et.al.	2503.15921	null
2025-03-19	Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study	Jomar Thomas Almonte et.al.	2503.15248	null
2025-04-15	ELTEX: A Framework for Domain-Driven Synthetic Data Generation	Arina Razmyslovich et.al.	2503.15055	link
2025-03-19	FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Chongjun Tu et.al.	2503.14935	null
2025-03-19	Communication-Efficient Distributed On-Device LLM Inference Over Wireless Networks	Kai Zhang et.al.	2503.14882	null
2025-03-21	RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving	Wenqi Jiang et.al.	2503.14649	null
2025-03-18	PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play	Wei Fang et.al.	2503.14432	null
2025-03-24	Mitigating KV Cache Competition to Enhance User Experience in LLM Inference	Haiying Shen et.al.	2503.13773	null
2025-03-17	AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications	Haiying Shen et.al.	2503.13737	null
2025-03-17	ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts	Evangelos Georganas et.al.	2503.13565	null
2025-03-14	Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce	Jingying Zeng et.al.	2503.13518	null
2025-03-17	xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference	Maximilian Beck et.al.	2503.13427	link
2025-04-14	VeriLeaky: Navigating IP Protection vs Utility in Fine-Tuning for LLM-Driven Verilog Coding	Zeng Wang et.al.	2503.13116	null
2025-03-15	TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation	Mayank Kumar et.al.	2503.12217	null
2025-04-22	Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques	Neusha Javidnia et.al.	2503.11816	null
2025-05-19	D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning	Jia Zhang et.al.	2503.11441	null
2025-03-14	MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens	Jeong Hun Yeo et.al.	2503.11315	link
2025-04-08	Green Prompting	Marta Adamska et.al.	2503.10666	null
2025-05-15	Collaborative Speculative Inference for Efficient LLM Inference Serving	Luyao Gao et.al.	2503.10325	null
2025-03-17	Exploiting Edited Large Language Models as General Scientific Optimizers	Qitan Lv et.al.	2503.09620	null
2025-03-13	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	Md Mohaiminul Islam et.al.	2503.09590	link
2025-05-23	Prompt Inference Attack on Distributed Large Language Model Inference Frameworks	Xinjian Luo et.al.	2503.09291	null
2025-05-02	Prompt Inversion Attack against Collaborative Inference of Large Language Models	Wenjie Qu et.al.	2503.09022	null
2025-03-19	Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning	Yuan Jiang et.al.	2503.09020	link
2025-03-11	Position-Aware Depth Decay Decoding ( $D^3$ ): Boosting Large Language Model Inference Efficiency	Siqi Fan et.al.	2503.08524	null
2025-03-11	FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework	Jianian Zhu et.al.	2503.08461	null
2025-03-19	TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems	Feiyang Wu et.al.	2503.08415	link
2025-03-11	Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference	Pol G. Recasens et.al.	2503.08311	null
2025-03-09	Seesaw: High-throughput LLM Inference via Model Re-sharding	Qidong Su et.al.	2503.06433	null
2025-02-24	Encoding Inequity: Examining Demographic Bias in LLM-Driven Robot Caregiving	Raj Korpan et.al.	2503.05765	null
2025-03-07	Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching	Bowen Pang et.al.	2503.05248	link
2025-05-21	Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching	Simon A. Aytes et.al.	2503.05179	link
2025-03-07	SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding	Kaiyu Huang et.al.	2503.05096	null
2025-03-07	Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size	Alireza Behtash et.al.	2503.04704	null
2025-03-15	Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking	Yijie Xu et.al.	2503.04636	null
2025-03-06	AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services	Xiaoqi Wang et.al.	2503.04418	null
2025-03-06	Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search	Kou Misaki et.al.	2503.04412	null
2025-03-06	ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput	Junsoo Kim et.al.	2503.04253	null
2025-03-06	Beyond Memorization: Evaluating the True Type Inference Capabilities of LLMs for Java Code Snippets	Yiwen Dong et.al.	2503.04076	null
2025-03-04	FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference	Hongchao Du et.al.	2503.03777	null
2025-03-05	MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems	Rui Ye et.al.	2503.03686	null
2025-03-05	Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems	Yaoru Li et.al.	2503.03505	link
2025-03-05	Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism	Xinyuan Lin et.al.	2503.03182	null
2025-03-04	PersonaX: A Recommendation Agent Oriented User Modeling Framework for Long Behavior Sequence	Yunxiao Shi et.al.	2503.02398	link
2025-03-04	VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference	Zihan Liu et.al.	2503.02236	null
2025-02-26	Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis	Long Cheng et.al.	2503.01873	null
2025-04-30	SAGE: A Framework of Precise Retrieval for RAG	Jintao Zhang et.al.	2503.01713	null
2025-03-03	Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens	Xinsheng Wang et.al.	2503.01710	link
2025-03-03	DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems	Minoo Hosseinzadeh et.al.	2503.01704	null
2025-03-15	Towards An Efficient LLM Training Paradigm for CTR Prediction	Allen Lin et.al.	2503.01001	null
2025-03-02	Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers	Yiran Zhao et.al.	2503.00865	null
2025-03-01	Tutorial Proposal: Speculative Decoding for Efficient LLM Inference	Heming Xia et.al.	2503.00491	null
2025-03-01	Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving	Qihui Zhou et.al.	2503.00392	null
2025-02-28	FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference	Xunhao Lai et.al.	2502.20766	link
2025-05-04	SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models	Han-Byul Kim et.al.	2502.20727	null
2025-04-02	Smart Routing: Cost-Effective Multi-LLM Serving for Multi-Core AIOS	Kai Mei et.al.	2502.20576	link
2025-02-27	M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging	Jinghao Feng et.al.	2502.20301	null
2025-02-26	Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs	Yiheng Yang et.al.	2502.19078	null
2025-02-26	Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection	Carter Adams et.al.	2502.18823	null
2025-02-24	LLM Inference Acceleration via Efficient Operation Fusion	Mahsa Salmani et.al.	2502.17728	null
2025-02-24	CodeSwift: Accelerating LLM Inference for Efficient Code Generation	Qianhui Zhao et.al.	2502.17139	null
2025-02-24	Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM	Lian Liu et.al.	2502.16963	null
2025-02-24	DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance	Xuanfan Ni et.al.	2502.16886	null
2025-03-01	CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter	Yepeng Weng et.al.	2502.16880	null
2025-02-23	DISC: Dynamic Decomposition Improves LLM Inference Scaling	Jonathan Light et.al.	2502.16706	null
2025-02-23	Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines	Xinwei Long et.al.	2502.16641	null
2025-05-01	TerEffic: Highly Efficient Ternary LLM Inference on FPGA	Chenyang Yin et.al.	2502.16473	null
2025-02-27	Dynamic Parallel Tree Search for Efficient LLM Reasoning	Yifu Ding et.al.	2502.16235	null
2025-02-21	KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse	Jingbo Yang et.al.	2502.16002	link
2025-02-14	Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization	Bowen Pang et.al.	2502.15763	null
2025-02-21	Towards Swift Serverless LLM Cold Starts with ParaServe	Chiheng Lou et.al.	2502.15524	null
2025-02-24	HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings	Rasmus Aavang et.al.	2502.15411	link
2025-02-24	Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference	Yaohua Tang et.al.	2502.15294	null
2025-02-21	A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation	Shilong Hou et.al.	2502.15233	link
2025-02-19	EvoP: Robust LLM Inference via Evolutionary Pruning	Shangyu Wu et.al.	2502.14910	null
2025-04-21	LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention	Shang Yang et.al.	2502.14866	link
2025-02-20	Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale	Shashwat Jaiswal et.al.	2502.14617	null
2025-02-20	SR-LLM: Rethinking the Structured Representation in Large Language Model	Jiahuan Zhang et.al.	2502.14352	null
2025-02-20	Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications	Kayhan Behdin et.al.	2502.14305	null
2025-02-19	RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression	Payman Behnam et.al.	2502.14051	null
2025-02-19	Autellix: An Efficient Serving Engine for LLM Agents as General Programs	Michael Luo et.al.	2502.13965	null
2025-02-19	Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference	Qingfa Xiao et.al.	2502.13542	null
2025-02-19	What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis	Peiran Wang et.al.	2502.13490	null
2025-02-24	BaKlaVa – Budgeted Allocation of KV cache for Long-context Inference	Ahmed Burak Gulhan et.al.	2502.13176	null
2025-02-18	SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems	Mike Zhang et.al.	2502.12927	link
2025-03-27	R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs	Sumin Jo et.al.	2502.12767	link
2025-02-18	HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading	Cheng Luo et.al.	2502.12574	link
2025-02-18	Distributed On-Device LLM Inference With Over-the-Air Computation	Kai Zhang et.al.	2502.12559	null
2025-02-18	SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs	Ahmed F. AbouElhamayed et.al.	2502.12444	link
2025-02-17	Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs	Kan Zhu et.al.	2502.12216	null
2025-02-17	Designing Role Vectors to Improve LLM Inference Behaviour	Daniele Potertì et.al.	2502.12055	null
2025-02-17	DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services	Ting Sun et.al.	2502.11417	null
2025-02-17	Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment	Ben Dong et.al.	2502.11347	null
2025-02-16	Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View	Yanran Wu et.al.	2502.11256	null
2025-02-16	Diversified Sampling Improves Scaling LLM inference	Tianchun Wang et.al.	2502.11027	null
2025-02-16	Leveraging Uncertainty Estimation for Efficient LLM Routing	Tuo Zhang et.al.	2502.11021	null
2025-04-07	Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings	Liangqi Yuan et.al.	2502.11007	link
2025-02-15	Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA	Jindong Li et.al.	2502.10659	null
2025-02-05	QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache	Rishabh Tiwari et.al.	2502.10424	null
2025-02-14	λScale: Enabling Fast Scaling for Serverless Large Language Model Inference	Minchen Yu et.al.	2502.09922	null
2025-02-14	INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing	Hongsun Jang et.al.	2502.09921	null
2025-02-13	On multi-token prediction for efficient LLM inference	Somesh Mehra et.al.	2502.09419	null
2025-02-13	ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments	Youhe Jiang et.al.	2502.09334	null
2025-03-21	RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models	Quan Wei et.al.	2502.09003	null
2025-02-13	InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU	Heejun Lee et.al.	2502.08910	null
2025-02-13	DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation	Tangyu Jiang et.al.	2502.08905	null
2025-02-12	Universal Model Routing for Efficient LLM Inference	Wittawat Jitkrittum et.al.	2502.08773	null
2025-02-12	MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation	Min Hou et.al.	2502.08271	null
2025-02-12	Memory Offloading for Large Language Model Inference with Latency SLO Guarantees	Chenxiang Ma et.al.	2502.08182	null
2025-02-12	Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences	Shanshan Han et.al.	2502.08142	null
2025-03-19	Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding	Ziyao Wang et.al.	2502.08020	null
2025-02-11	HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment	Youhe Jiang et.al.	2502.07903	null
2025-02-11	SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters	Yiping Wang et.al.	2502.07832	null
2025-03-21	PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference	Yufeng Gu et.al.	2502.07578	link
2025-03-05	Online Scheduling for LLM Inference with KV Cache Constraints	Patrick Jaillet et.al.	2502.07115	null
2025-02-10	Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE	Haiduo Huang et.al.	2502.06282	link
2025-03-15	Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models	Soham Poddar et.al.	2502.05610	null
2025-02-08	Mechanistic Interpretability of Emotion Inference in Large Language Models	Ala N. Tak et.al.	2502.05489	null
2025-02-07	BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference	Reena Elangovan et.al.	2502.05376	null
2025-01-31	Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies	Nadav Timor et.al.	2502.05202	null
2025-03-15	EcoServe: Designing Carbon-Aware AI Inference Systems	Yueying Li et.al.	2502.05043	null
2025-02-07	LLM Query Scheduling with Prefix Reuse and Latency Constraints	Gregory Dexter et.al.	2502.04677	null
2025-02-18	WaferLLM: A Wafer-Scale LLM Inference System	Congjie He et.al.	2502.04563	null
2025-02-25	KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference	Xing Li et.al.	2502.04420	link
2025-02-06	CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Zehua Pei et.al.	2502.04416	link
2025-02-11	Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing	Kunfeng Lai et.al.	2502.04411	null
2025-02-26	AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference	Qingyue Yang et.al.	2502.04077	link
2025-02-06	CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing	Yu Yuan et.al.	2502.03997	null
2025-02-06	Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective	Yuan Feng et.al.	2502.03805	link
2025-04-04	Adaptive Semantic Prompt Caching with VectorQ	Luis Gaspar Schroeder et.al.	2502.03771	null
2025-02-05	Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training	Reza Shirkavand et.al.	2502.03604	null
2025-02-05	HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference	Zeyu Zhang et.al.	2502.03589	null
2025-02-05	Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL	Wenbo Sun et.al.	2502.02818	null
2025-02-05	Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation	Jingyu Liu et.al.	2502.02789	link
2025-02-04	LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing	Yang Li et.al.	2502.02743	null
2025-02-04	EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization	Yize Wu et.al.	2502.02493	null
2025-01-30	Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency	Sazzad Hossain et.al.	2502.01651	null
2025-02-06	An Investigation of FP8 Across Accelerators for LLM Inference	Jiwoo Kim et.al.	2502.01070	null
2025-02-02	Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference	Patrick Yubeaton et.al.	2502.00922	null
2025-02-02	MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies	Ehsaneddin Asgari et.al.	2502.00894	null
2025-02-02	SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models	Jiawen Zhang et.al.	2502.00847	null
2025-02-02	Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs	Youhe Jiang et.al.	2502.00722	null
2025-02-13	Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning	Zhi Zhou et.al.	2502.00511	null
2025-02-01	UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs	Yizhe Xiong et.al.	2502.00439	null
2025-02-01	ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference	Xiang Liu et.al.	2502.00299	null
2025-01-16	Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models	Tom Wallace et.al.	2502.00046	null
2025-02-07	Pushing the Limits of BFP on Narrow Precision LLM Inference	Hui Wang et.al.	2502.00026	null
2025-02-14	Reward-Guided Speculative Decoding for Efficient LLM Reasoning	Baohao Liao et.al.	2501.19324	null
2025-01-31	Pheromone-based Learning of Optimal Reasoning Paths	Anirudh Chari et.al.	2501.19278	null
2025-01-31	Structural Embedding Projection for Contextual Large Language Model Inference	Vincent Enoasmo et.al.	2501.18826	null
2025-01-29	On the Partitioning of GPU Power among Multi-Instances	Tirth Vamja et.al.	2501.17752	null
2025-02-02	RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations	Zunhai Su et.al.	2501.16383	null
2025-01-27	Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs	Antony Bartlett et.al.	2501.16191	null
2025-01-27	TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference	Jack Min Ong et.al.	2501.16007	null
2025-01-27	Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference	Tharindu B. Hewage et.al.	2501.15829	link
2025-01-25	Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads	Xingyang He et.al.	2501.15113	null
2025-01-25	PatchRec: Multi-Grained Patching for Efficient LLM-based Sequential Recommendation	Jiayi Liao et.al.	2501.15087	null
2025-02-09	HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location	Ting Sun et.al.	2501.14808	null
2025-01-11	HeteroLLM: Accelerating Large Language Model Inference on Mobile SoCs platform with Heterogeneous AI Accelerators	Le Chen et.al.	2501.14794	null
2025-01-04	DeServe: Towards Affordable Offline LLM Inference via Decentralization	Linyu Wu et.al.	2501.14784	null
2024-12-13	KVDirect: Distributed Disaggregated LLM Inference	Shiyang Chen et.al.	2501.14743	null
2025-01-24	Accelerated Preference Elicitation with LLM-Based Proxies	David Huang et.al.	2501.14625	null
2025-01-27	DeepFlow: Serverless Large Language Model Serving at Scale	Junhao Hu et.al.	2501.14417	null
2025-01-24	Locality-aware Fair Scheduling in LLM Serving	Shiyi Cao et.al.	2501.14312	null
2025-01-24	Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading	Minrui Xu et.al.	2501.14205	null
2025-01-08	iServe: An Intent-based Serving System for LLMs	Dimitrios Liakopoulos et.al.	2501.13111	null
2025-01-24	EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation	Yifan Yu et.al.	2501.12689	null
2025-03-16	Human-like conceptual representations emerge from language prediction	Ningyu Xu et.al.	2501.12547	null
2025-01-21	AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding	Zikun Li et.al.	2501.12162	null
2025-02-11	Glinthawk: A Two-Tiered Architecture for Offline LLM Inference	Pouya Hamadanian et.al.	2501.11779	link
2025-01-20	Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas	Nishant Balepur et.al.	2501.11549	link
2025-03-21	GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation	Shashikant Ilager et.al.	2501.11006	link
2025-03-06	A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks	Xinzhe Li et.al.	2501.10069	link
2025-01-16	PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks	Huiyou Zhan et.al.	2501.09367	null
2025-01-16	Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition	Takaaki Hori et.al.	2501.09258	null
2025-01-16	Split Fine-Tuning for Large Language Models in Wireless Networks	Songge Zhang et.al.	2501.09237	null
2025-01-15	Guiding Retrieval using LLM-based Listwise Rankers	Mandeep Rathee et.al.	2501.09186	link
2025-01-14	Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings	Paul Joe Maliakel et.al.	2501.08219	null
2025-01-14	PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving	Ahmet Caner Yüzügüler et.al.	2501.08192	null
2025-01-14	Hierarchical Autoscaling for Large Language Model Serving with Chiron	Archit Patke et.al.	2501.08090	null
2025-01-12	MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference	Wenxuan Zeng et.al.	2501.06807	null
2025-01-12	Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management	Liu Qianli et.al.	2501.06709	null
2025-02-07	Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping	Muru Zhang et.al.	2501.06589	link
2025-01-15	Multimodal-to-Text Prompt Engineering in Large Language Models Using Feature Embeddings for GNSS Interference Characterization	Harshith Manjunath et.al.	2501.05079	null
2025-02-08	Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text	Ali Al-Lawati et.al.	2501.03166	link
2025-01-05	TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms	Jovan Stojkovic et.al.	2501.02600	null
2025-01-04	AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference	Zhuomin He et.al.	2501.02336	link
2024-12-31	Towards Sustainable Large Language Model Serving	Sophia Nguyen et.al.	2501.01990	null
2025-01-03	Efficient LLM Inference with Activation Checkpointing and Hybrid Caching	Sanghyeon Lee et.al.	2501.01792	null
2025-01-03	(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges	Mohamed Hisham Abdellatif et.al.	2501.01588	null
2025-01-21	BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference	Wonsuk Jang et.al.	2501.01144	link
2025-01-02	FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving	Zihao Ye et.al.	2501.01005	link
2025-02-25	Rethinking Layer Removal: A Hybrid Pruning Framework Combining Layer Removal and Singular Value Selection for Efficient LLM Compression	Kainan Liu et.al.	2501.00339	null
2024-12-23	Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs	Dibakar Gope et.al.	2501.00032	link
2024-12-29	TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication	Zongwu Wang et.al.	2412.20501	link
2024-12-29	GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions	Tianyao Shi et.al.	2412.20322	null
2025-01-15	LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System	Hyucksung Kwon et.al.	2412.20166	null
2024-12-19	GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors	Chengming Zhang et.al.	2412.19829	null
2025-01-05	Gradient Weight-normalized Low-rank Projection for Efficient LLM Training	Jia-Hong Huang et.al.	2412.19616	link
2025-01-02	A Survey on Large Language Model Acceleration based on KV Cache Management	Haoyang Li et.al.	2412.19442	link
2025-02-13	An Engorgio Prompt Makes Large Language Model Babble on	Jianshuo Dong et.al.	2412.19394	link
2024-12-25	Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference	Libo Zhang et.al.	2412.18934	null
2024-12-24	TimelyLLM: Segmented LLM Serving System for Time-sensitive Robotic Applications	Neiwen Ling et.al.	2412.18695	null
2024-12-26	KunServe: Elastic and Efficient Large Language Model Serving with Parameter-centric Memory Management	Rongxin Cheng et.al.	2412.18169	null
2025-02-22	Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media	Zhen Sun et.al.	2412.18148	null
2024-12-24	Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels	Mingcong Song et.al.	2412.18106	null
2024-12-23	Trustworthy and Efficient LLMs Meet Databases	Kyoungmin Kim et.al.	2412.18022	null
2025-02-20	GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference	Chao Zeng et.al.	2412.17560	null
2025-02-18	VilBias: A Study of Bias Detection through Linguistic and Visual Cues , presenting Annotation Strategies, Evaluation, and Key Challenges	Shaina Raza et.al.	2412.17052	link
2024-12-21	SYMPHONY: Improving Memory Management for LLM Inference Workloads	Saurabh Agarwal et.al.	2412.16434	null
2024-12-20	WebLLM: A High-Performance In-Browser LLM Inference Engine	Charlie F. Ruan et.al.	2412.15803	link
2024-12-19	Fietje: An open, efficient LLM for Dutch	Bram Vanroy et.al.	2412.15450	link
2024-12-19	PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization	Jiayi Wu et.al.	2412.14510	link
2024-12-19	Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems	Genki Kusano et.al.	2412.14454	null
2024-12-18	A Survey on LLM Inference-Time Self-Improvement	Xiangjue Dong et.al.	2412.14352	link
2024-12-18	Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models	Seungeun Oh et.al.	2412.12687	null
2024-12-17	A System for Microserving of LLMs	Hongyi Jin et.al.	2412.12488	null
2024-12-17	LITA: An Efficient LLM-assisted Iterative Topic Augmentation Framework	Chia-Hsuan Chang et.al.	2412.12459	null
2024-12-16	CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation	Hongxuan Zhang et.al.	2412.11741	null
2025-01-20	FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation	Dannong Wang et.al.	2412.11378	null
2025-01-09	Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning	Yun Qu et.al.	2412.11120	link
2024-12-15	NITRO: LLM Inference on Intel Laptop NPUs	Anthony Fei et.al.	2412.11053	link
2025-03-11	SCBench: A KV Cache-Centric Analysis of Long-Context Methods	Yucheng Li et.al.	2412.10319	null
2024-12-17	TurboAttention: Efficient Attention Approximation For High Throughputs LLMs	Hao Kang et.al.	2412.08585	null
2024-12-11	Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths	Naryeong Kim et.al.	2412.08281	null
2024-12-12	TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch	Xingchen Song et.al.	2412.08237	null
2024-12-09	Asynchronous LLM Function Calling	In Gim et.al.	2412.07017	null
2024-12-08	Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization	Dongwei Wang et.al.	2412.06858	null
2024-12-09	JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM	Takuro Fujii et.al.	2412.06738	link
2024-12-09	SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs	James Vo et.al.	2412.06198	null
2024-12-08	XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference	Weizhuo Li et.al.	2412.05896	null
2025-02-17	APOLLO: SGD-like Memory, AdamW-level Performance	Hanqing Zhu et.al.	2412.05270	link
2024-12-06	Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale?	Seyed Amin Tabatabaei et.al.	2412.05137	null
2024-12-11	Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference	Qingyuan Li et.al.	2412.04964	null
2025-01-26	GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments	Yanyu Chen et.al.	2412.04788	null
2024-12-09	Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems	Ayush Gundawar et.al.	2412.04569	link
2024-12-03	Multi-Bin Batching for Increasing LLM Inference Throughput	Ozgur Guldogan et.al.	2412.04504	null
2025-01-17	BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching	Zhen Zheng et.al.	2412.03594	null
2024-12-04	Unifying KV Cache Compression for Large Language Models with LeanKV	Yanqi Zhang et.al.	2412.03131	null
2024-12-03	Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity	Da Ma et.al.	2412.02252	null
2024-12-02	Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training	Yujie Wang et.al.	2412.01523	null
2024-12-02	PLD+: Accelerating LLM inference by leveraging Language Model Artifacts	Shwetha Somasundaram et.al.	2412.01447	null
2024-12-02	Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking	Marco Federici et.al.	2412.01380	null
2024-12-02	Can Large Language Models Serve as Evaluators for Code Summarization?	Yang Wu et.al.	2412.01333	link
2024-12-05	RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy	Geonho Lee et.al.	2412.01129	null
2024-12-02	TruncFormer: Private LLM Inference Using Only Truncations	Patrick Yubeaton et.al.	2412.01042	null
2024-11-25	Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration	Zhuofan Wen et.al.	2412.00061	null
2024-11-29	A dynamic parallel method for performance optimization on hybrid CPUs	Luo Yu et.al.	2411.19542	null
2024-12-04	Marconi: Prefix Caching for the Era of Hybrid LLMs	Rui Pan et.al.	2411.19379	null
2024-12-08	Puzzle: Distillation-Based NAS for Inference-Optimized LLMs	Akhiad Bercovich et.al.	2411.19146	null
2024-11-27	FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving	Ao Shen et.al.	2411.18424	null
2024-11-29	InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks	Xinyao Zheng et.al.	2411.18191	null
2024-11-28	MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache	Akshat Sharma et.al.	2411.18077	null
2024-11-24	Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments	Nikoleta Iliakopoulou et.al.	2411.17741	null
2024-11-18	Generative AI on the Edge: Architecture and Performance Evaluation	Zeinab Nezami et.al.	2411.17712	null
2024-11-26	Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism	Yi-Chien Lin et.al.	2411.17651	null
2024-11-26	PIM-AI: A Novel Architecture for High-Efficiency LLM Inference	Cristobal Ortega et.al.	2411.17309	null
2024-11-26	Star Attention: Efficient LLM Inference over Long Sequences	Shantanu Acharya et.al.	2411.17116	link
2024-11-26	Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation	Chaoyi Jiang et.al.	2411.17089	null
2024-11-25	MixPE: Quantization and Hardware Co-design for Efficient LLM Inference	Yu Zhang et.al.	2411.16158	null
2024-11-24	eFedLLM: Efficient LLM Inference Based on Federated Learning	Shengwen Ding et.al.	2411.16003	null
2024-11-24	Ensuring Fair LLM Serving Amid Diverse Applications	Redwan Ibne Seraj Khan et.al.	2411.15997	null
2024-11-24	Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format	Chao Fang et.al.	2411.15982	null
2024-11-24	Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems	Wenxiang Lin et.al.	2411.15715	null
2025-01-14	AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution	Fengyuan Liu et.al.	2411.15102	link
2024-11-27	XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models	Yixin Dong et.al.	2411.15100	null
2024-11-02	Transforming Engineering Education Using Generative AI and Digital Twin Technologies	Yu-Zheng Lin et.al.	2411.14433	null
2024-11-21	InstCache: A Predictive Cache for LLM Serving	Longwei Zou et.al.	2411.13820	null
2024-11-21	Disentangling Memory and Reasoning Ability in Large Language Models	Mingyu Jin et.al.	2411.13504	link
2024-11-27	Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding	Hyun Ryu et.al.	2411.13157	null
2024-11-21	LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts	Zhuohan Gu et.al.	2411.13009	null
2024-11-15	An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2	Pepijn de Reus et.al.	2411.12758	link
2025-01-24	SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference	Jiho Shin et.al.	2411.12692	null
2024-11-18	BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration	Yuzong Chen et.al.	2411.11745	link
2024-11-18	MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao et.al.	2411.11217	null
2024-11-17	FastDraft: How to Train Your Draft	Ofir Zafrir et.al.	2411.11055	null
2024-12-16	SAM Decoding: Speculative Decoding via Suffix Automaton	Yuxuan Hu et.al.	2411.10666	link
2024-11-15	Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity	Zichen Song et.al.	2411.10069	null
2024-11-15	AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference	Janghwan Lee et.al.	2411.09909	null
2024-11-23	Squeezed Attention: Accelerating Long Context Length LLM Inference	Coleman Hooper et.al.	2411.09688	link
2024-11-15	Communication Compression for Tensor Parallel LLM Inference	Jan Hansen-Palmus et.al.	2411.09510	null
2024-11-14	Pie: Pooling CPU Memory for LLM Inference	Yi Xu et.al.	2411.09317	null
2025-01-23	Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism	Libo Wang et.al.	2411.09111	link
2024-11-12	Towards Low-bit Communication for Tensor Parallel LLM Inference	Harry Dong et.al.	2411.07942	null
2024-12-12	ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization	Weibo Zhao et.al.	2411.07762	null
2025-01-08	BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks	Shubham Gandhi et.al.	2411.07464	null
2024-11-19	The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving	Kyoungmin Kim et.al.	2411.07447	null
2024-11-10	EcoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving	Haiying Shen et.al.	2411.06364	null
2024-11-08	SSSD: Simply-Scalable Speculative Decoding	Michele Marzollo et.al.	2411.05894	null
2024-11-08	AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality	Ilias Bournias et.al.	2411.05555	null
2024-11-07	Hardware and Software Platform Inference	Cheng Zhang et.al.	2411.05197	null
2024-10-22	Scattered Forest Search: Smarter Code Space Exploration with LLMs	Jonathan Light et.al.	2411.05010	null
2024-11-07	SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference	Gabriele Oliaro et.al.	2411.04975	null
2024-11-05	CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration	Hongpeng Jin et.al.	2411.02829	null
2024-12-19	DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving	Yuhan Liu et.al.	2411.02820	null
2024-11-10	Context Parallelism for Scalable Million-Token Inference	Amy Yang et.al.	2411.01783	null
2024-11-04	RAGViz: Diagnose and Visualize Retrieval-Augmented Generation	Tevin Wang et.al.	2411.01751	link
2024-11-03	Autoformulation of Mathematical Optimization Models Using LLMs	Nicolás Astorga et.al.	2411.01679	null
2024-11-06	HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference	Peng Tang et.al.	2411.01433	null
2024-11-02	RA-WEBs: Remote Attestation for WEB services	Kosei Akama et.al.	2411.01340	null
2024-11-02	NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference	Xuanlin Jiang et.al.	2411.01142	null
2024-10-30	A Theoretical Perspective for Speculative Decoding Algorithm	Ming Yin et.al.	2411.00841	null
2024-11-01	Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction	Houjing Wei et.al.	2411.00646	null
2024-11-01	LLM-Based Misconfiguration Detection for AWS Serverless Computing	Jinfeng Wen et.al.	2411.00642	null
2024-12-08	ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models	Anbang Wang et.al.	2411.00533	null
2024-11-01	Attention Tracker: Detecting Prompt Injection Attacks in LLMs	Kuo-Han Hung et.al.	2411.00348	null
2024-10-31	LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators	Krishna Teja Chitty-Venkata et.al.	2411.00136	link
2024-10-31	Interpretable Language Modeling via Induction-head Ngram Models	Eunji Kim et.al.	2411.00066	link
2024-10-31	ALISE: Accelerating Large Language Model Serving with Speculative Scheduling	Youpeng Zhao et.al.	2410.23537	null
2024-10-30	BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference	Junqi Zhao et.al.	2410.23079	link
2024-10-29	Scaling LLM Inference with Optimized Sample Compute Allocation	Kexun Zhang et.al.	2410.22480	link
2024-10-29	SVIP: Towards Verifiable Inference of Open-source Large Language Models	Yifan Sun et.al.	2410.22307	null
2025-02-08	ProMoE: Fast MoE-based LLM Serving using Proactive Caching	Xiaoniu Song et.al.	2410.22134	null
2025-01-21	MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression	Noel Elias et.al.	2410.21548	link
2024-10-28	ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference	Hanshi Sun et.al.	2410.21465	link
2024-10-27	FIRP: Faster LLM inference via future intermediate representation prediction	Pengfei Wu et.al.	2410.20488	null
2024-10-29	Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management	Tuowei Wang et.al.	2410.19274	null
2024-10-24	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai et.al.	2410.19123	link
2024-10-30	Dynamic Vocabulary Pruning in Early-Exit LLMs	Jort Vincenti et.al.	2410.18952	link
2024-10-25	A Survey on Speech Large Language Models	Jing Peng et.al.	2410.18908	null
2024-10-24	A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs	Ankit Singh Rawat et.al.	2410.18779	null
2024-10-24	BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching	Peizhuang Cong et.al.	2410.18701	null
2024-10-23	CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation	Qinsi Wang et.al.	2410.18311	null
2024-10-25	Fast Inference for Augmented Large Language Models	Rana Shahout et.al.	2410.18248	null
2024-10-23	POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference	Aditya K Kamath et.al.	2410.18038	null
2024-12-29	AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning	Yehonathan Refael et.al.	2410.17881	null
2024-10-22	FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs	Haoran Lin et.al.	2410.16663	null
2024-10-22	Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency	Prafulla Kumar Choubey et.al.	2410.16597	null
2024-12-18	MagicPIG: LSH Sampling for Efficient LLM Generation	Zhuoming Chen et.al.	2410.16179	link
2024-10-21	Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuning	Arijit Das et.al.	2410.16029	link
2024-10-21	RAC: Efficient LLM Factuality Correction with Retrieval Augmentation	Changmao Li et.al.	2410.15667	link
2024-10-21	Bayesian Concept Bottleneck Models with LLM Priors	Jean Feng et.al.	2410.15555	link
2024-10-20	CompAct: Compressed Activations for Memory-Efficient LLM Training	Yara Shamshoum et.al.	2410.15352	null
2024-10-20	EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models	Junhao Hu et.al.	2410.15332	null
2024-10-19	IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System	Minseok Seo et.al.	2410.15008	null
2024-10-23	Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching	Jie Peng et.al.	2410.14740	null
2024-10-18	A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference	You Wu et.al.	2410.14442	link
2024-10-18	Revisiting SLO and Goodput Metrics in LLM Serving	Zhibin Wang et.al.	2410.14257	null
2024-10-18	Leveraging Large Language Models for Enhancing Public Transit Services	Jiahao Wang et.al.	2410.14147	null
2024-10-17	RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs	Jiatan Huang et.al.	2410.13987	null
2024-11-07	Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs	Tianyu Guo et.al.	2410.13835	link
2024-10-17	Progressive Mixed-Precision Decoding for Efficient LLM Inference	Hao Mark Chen et.al.	2410.13461	null
2024-10-17	Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning	Minseok Choi et.al.	2410.13274	null
2024-10-17	Data Defenses Against Large Language Models	William Agnew et.al.	2410.13138	link
2024-10-19	In-context KV-Cache Eviction for LLMs via Attention-Gate	Zihao Zeng et.al.	2410.12876	null
2024-10-10	RecurFormer: Not All Transformer Heads Need Self-Attention	Ruiqing Yan et.al.	2410.12850	null
2024-10-16	COMET: Towards Partical W4A4KV4 LLMs Serving	Lian Liu et.al.	2410.12168	null
2024-10-16	Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning	Huiwen Wu et.al.	2410.12130	null
2024-10-15	Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix	Yingyu Liang et.al.	2410.11261	null
2024-10-06	Continuous Approximations for Improving Quantization Aware Training of LLMs	He Li et.al.	2410.10849	null
2024-10-14	DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads	Guangxuan Xiao et.al.	2410.10819	link
2024-10-16	SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization	Akrit Mudvari et.al.	2410.10759	null
2024-10-12	Power-Softmax: Towards Secure LLM Inference over Encrypted Data	Itamar Zimerman et.al.	2410.09457	null
2024-10-11	Large Language Models for Energy-Efficient Code: Emerging Results and Future Directions	Huiyun Peng et.al.	2410.09241	null
2024-10-11	SubZero: Random Subspace Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning	Ziming Yu et.al.	2410.08989	link
2024-12-03	HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework	Yinuo Ren et.al.	2410.08316	null
2024-10-14	Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining	Tianyi Bai et.al.	2410.08102	link
2024-10-09	SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration	Heming Xia et.al.	2410.06916	link
2024-10-08	Active Evaluation Acquisition for Efficient LLM Benchmarking	Yang Li et.al.	2410.05952	null
2024-10-08	Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space	Zhonghan Chen et.al.	2410.05752	null
2024-10-08	ParallelSpec: Parallel Drafter for Efficient Speculative Decoding	Zilin Xiao et.al.	2410.05589	null
2024-10-07	Fast State Restoration in LLM Serving with HCache	Shiwei Gao et.al.	2410.05004	null
2024-10-06	RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference	Yige Xu et.al.	2410.04519	link
2025-01-23	Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective	Jinhao Li et.al.	2410.04466	null
2024-12-05	SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation	Aurick Qiao et.al.	2410.03960	null
2024-10-04	LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity	Selim Furkan Tekin et.al.	2410.03953	link
2024-10-04	EXAQ: Exponent Aware Quantization For LLMs Acceleration	Moran Shkolnik et.al.	2410.03185	link
2024-10-04	UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference	Jing Xiong et.al.	2410.03090	null
2024-10-03	LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences	Zhenxiao Fu et.al.	2410.02950	null
2024-10-03	Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration	Yun Qu et.al.	2410.02511	link
2024-10-03	LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services	Małgorzata Łazuka et.al.	2410.02425	link
2024-10-04	Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation	Xiaoqun Liu et.al.	2410.02220	null
2024-10-05	Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models	Yinhong Liu et.al.	2410.02205	null
2024-10-02	Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads	Yuxiang Huang et.al.	2410.01805	link
2024-10-02	ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving	Yifan Qiao et.al.	2410.01228	null
2024-10-01	TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices	Zonghang Li et.al.	2410.00531	link
2024-10-09	LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management	Yi Xiong et.al.	2410.00428	null
2024-11-06	The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems	Linke Song et.al.	2409.20002	null
2024-09-28	SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models	Yi Wu et.al.	2409.19471	null
2024-11-28	Confidential Prompting: Protecting User Prompts from Cloud LLM Providers	In Gim et.al.	2409.19134	link
2024-09-26	Control Industrial Automation System with Large Language Models	Yuchen Xia et.al.	2409.18009	link
2024-10-18	Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores	Shaobo Ma et.al.	2409.17870	null
2024-09-25	Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction	Zhenmei Shi et.al.	2409.17422	link
2024-09-25	Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations	Amey Agrawal et.al.	2409.17264	null
2024-09-25	Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale	Fan Zhou et.al.	2409.17115	link
2024-09-25	Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference	Zongyue Qin et.al.	2409.16560	null
2024-10-21	AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization	Yifan Tan et.al.	2409.16546	link
2024-11-07	Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines	Lei Gao et.al.	2409.15520	link
2024-10-29	Eagle: Efficient Training-Free Router for Multi-LLM Inference	Zesen Zhao et.al.	2409.15518	null
2024-10-03	Archon: An Architecture Search Framework for Inference-Time Techniques	Jon Saad-Falcon et.al.	2409.15254	link
2024-09-23	CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts	Zeyu Zhang et.al.	2409.15104	null
2024-09-24	UELLM: A Unified and Efficient Approach for LLM Inference Serving	Yiyuan He et.al.	2409.14961	null
2024-11-01	RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph	Lindsey Linxi Wei et.al.	2409.14556	null
2024-09-21	Practically implementing an LLM-supported collaborative vulnerability remediation process: a team-based approach	Xiaoqing Wang et.al.	2409.14058	null
2024-10-21	Do Large Language Models Need a Content Delivery Network?	Yihua Cheng et.al.	2409.13761	link
2024-09-19	PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)	Mahmoud Nazzal et.al.	2409.12699	link
2024-09-12	LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs	Han Xu et.al.	2409.11424	null
2024-09-04	ISO: Overlap of Computation and Communication within Seqenence For LLM Inference	Bin Xiao et.al.	2409.11155	null
2024-12-31	RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval	Di Liu et.al.	2409.10516	link
2024-09-12	Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat	Sidong Feng et.al.	2409.07829	null
2024-09-13	LLM-Enhanced Software Patch Localization	Jinhong Yu et.al.	2409.06816	null
2024-09-24	OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models	Jahyun Koo et.al.	2409.05902	null
2024-09-08	InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference	Xiurui Pan et.al.	2409.04992	null
2024-09-07	Achieving Peak Performance for Large Language Models: A Systematic Review	Zhyar Rzgar K Rostam et.al.	2409.04833	null
2024-09-06	Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance	Guanyu Lin et.al.	2409.04593	null
2024-09-06	A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage	Huan Yang et.al.	2409.04040	null
2024-11-05	Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study	Jianwei Zhu et.al.	2409.03992	null
2024-09-05	Sirius: Contextual Sparsity with Correction for Efficient LLMs	Yang Zhou et.al.	2409.03856	link
2024-08-31	HSF: Defending against Jailbreak Attacks with Hidden State Filtering	Cheng Qian et.al.	2409.03788	null
2024-12-11	Efficient Large Foundation Model Inference: A Perspective From Model and System Co-Design	Dong Liu et.al.	2409.01990	null
2024-09-03	Efficient LLM Context Distillation	Rajesh Upadhayayaya et.al.	2409.01930	null
2024-09-03	Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information	Xinyu Zhang et.al.	2409.01605	null
2024-09-02	CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification	Junhui He et.al.	2409.01366	null
2024-12-18	Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference	Barys Liskavets et.al.	2409.01227	null
2024-09-01	Research on LLM Acceleration Using the High-Performance RISC-V Processor “Xiangshan” (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product)	Xu-Hao Chen et.al.	2409.00661	null
2024-11-10	Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling	Guangya Wan et.al.	2408.17017	null
2024-08-28	Decentralized LLM Inference over Edge Networks with Energy Harvesting	Aria Khoshsirat et.al.	2408.15907	null
2024-08-28	Efficient LLM Scheduling by Learning to Rank	Yichao Fu et.al.	2408.15792	link
2024-08-28	Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation	Lujun Gui et.al.	2408.15562	null
2024-08-23	Memory-Efficient LLM Training with Online Subspace Descent	Kaizhao Liang et.al.	2408.12857	link
2024-08-22	NanoFlow: Towards Optimal Large Language Model Serving Throughput	Kan Zhu et.al.	2408.12757	link
2024-10-23	TensorOpera Router: A Multi-Model Router for Efficient LLM Inference	Dimitris Stripelis et.al.	2408.12320	null
2024-09-04	Parallel Speculative Decoding with Adaptive Draft Length	Tianyu Liu et.al.	2408.11850	link
2024-08-21	MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models	Elias Frantar et.al.	2408.11743	link
2024-08-23	Xinyu: An Efficient LLM-based System for Commentary Generation	Yiquan Wu et.al.	2408.11609	null
2024-08-21	Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning	Kai Xiong et.al.	2408.11431	null
2024-08-21	Image Score: Learning and Evaluating Human Preferences for Mercari Search	Chingis Oinar et.al.	2408.11349	null
2024-08-20	Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models	Artem Vazhentsev et.al.	2408.10692	null
2024-08-20	How Well Do Large Language Models Serve as End-to-End Secure Code Producers?	Jianian Gong et.al.	2408.10495	null
2024-09-29	GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making	Arsham Gholamzadeh Khoee et.al.	2408.09785	null
2024-08-19	PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars	Sumanth Prabhu et.al.	2408.08869	null
2024-08-23	ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models	Chao Zeng et.al.	2408.08554	link
2024-08-14	LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference	Seungjae Moon et.al.	2408.07326	null
2024-08-12	LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration	Zhiwen Mo et.al.	2408.06003	null
2024-08-16	Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion	Jacob K Christopher et.al.	2408.05636	null
2024-08-10	LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale	Jaehong Cho et.al.	2408.05499	link
2024-08-05	SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving	Andreas Kosmas Kakolyris et.al.	2408.05235	null
2024-09-14	Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness	Xiaojing Fan et.al.	2408.04585	null
2024-08-08	Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning	Ke Cheng et.al.	2408.04323	null
2024-08-07	Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference	Zeyu Zhang et.al.	2408.04107	null
2024-08-07	MPC-Minimized Secure LLM Inference	Deevashwer Rathee et.al.	2408.03561	null
2024-08-06	Can LLMs Serve As Time Series Anomaly Detectors?	Manqing Dong et.al.	2408.03475	null
2024-08-05	Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning	Hao Zhou et.al.	2408.02549	null
2024-08-02	The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines	Matias Martinez et.al.	2408.01050	null
2024-08-01	DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency	Jovan Stojkovic et.al.	2408.00741	null
2024-08-01	Designing Efficient LLM Accelerators for Edge Devices	Jude Haris et.al.	2408.00462	null
2024-08-01	Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control	Hao Zhou et.al.	2408.00214	null
2024-09-10	ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency	Yuhang Yao et.al.	2408.00008	null
2024-08-01	Responsive ML inference in multi-tenanted environments using AQUA	Abhishek Vijaya Kumar et.al.	2407.21255	null
2024-11-04	Palu: Compressing KV-Cache with Low-Rank Projection	Chi-Chih Chang et.al.	2407.21118	link
2024-07-30	Accelerating Large Language Model Inference with Self-Supervised Early Exits	Florian Valade et.al.	2407.21082	null
2024-10-03	ThinK: Thinner Key Cache by Query-Driven Pruning	Yuhui Xu et.al.	2407.21018	null
2024-07-25	An Efficient Inference Framework for Early-exit Large Language Models	Ruijie Miao et.al.	2407.20272	null
2024-07-29	Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost	Sania Nayab et.al.	2407.19825	null
2024-07-29	Teaching LLMs at Charles University: Assignments and Activities	Jindřich Helcl et.al.	2407.19798	null
2024-07-09	Mobile Edge Intelligence for Large Language Models: A Contemporary Survey	Guanqiao Qu et.al.	2407.18921	null
2024-07-04	The Price of Prompting: Profiling Energy Use in Large Language Models Inference	Erik Johannes Husom et.al.	2407.16893	link
2024-07-23	PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets	Jaeyoung Kim et.al.	2407.16329	null
2024-07-22	RazorAttention: Efficient KV Cache Compression Through Retrieval Heads	Hanlin Tang et.al.	2407.15891	null
2024-07-22	vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving	Jiale Xu et.al.	2407.15309	link
2024-07-20	All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks	Ajay Jaiswal et.al.	2407.14996	null
2024-07-19	LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference	Qichen Fu et.al.	2407.14057	null
2024-07-13	Beyond KV Caching: Shared Attention for Efficient LLMs	Bingli Liao et.al.	2407.12866	link
2024-07-01	PQCache: Product Quantization-based KVCache for Long Context LLM Inference	Hailin Zhang et.al.	2407.12820	null
2024-07-17	Struct-X: Enhancing Large Language Models Reasoning with Structured Data	Xiaoyu Tan et.al.	2407.12522	null
2024-07-17	LLM Inference Serving: Survey of Recent Advances and Opportunities	Baolin Li et.al.	2407.12391	null
2024-10-11	Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale	Ayush Kaushal et.al.	2407.12327	link
2024-11-16	PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	Branden Butler et.al.	2407.11798	null
2024-08-16	Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference	Yuan Feng et.al.	2407.11550	link
2024-07-15	Static Detection of Filesystem Vulnerabilities in Android Systems	Yu-Tsung Lee et.al.	2407.11279	null
2024-10-03	Fast Matrix Multiplications for Lookup Table-Quantized LLMs	Han Guo et.al.	2407.10960	link
2024-10-02	Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference	Zongyue Qin et.al.	2407.09722	null
2024-08-30	Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems	Amey Agrawal et.al.	2407.07000	link
2024-07-08	Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU	Daliang Xu et.al.	2407.05858	link
2024-07-07	A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length	Yuqing Yang et.al.	2407.05347	null
2024-07-06	Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning	Yun-Da Tsai et.al.	2407.05040	null
2024-11-16	Software-Hardware Co-Design For Embodied AI Robots	Yiyang Huang et.al.	2407.04292	link
2024-07-04	Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems	Grant Wilkins et.al.	2407.04014	null
2024-10-30	MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention	Huiqiang Jiang et.al.	2407.02490	link
2024-06-29	When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration	Philipp Allgeuer et.al.	2407.00518	link
2024-06-29	Teola: Towards End-to-End Optimization of LLM-based Applications	Xin Tan et.al.	2407.00326	null
2024-06-25	T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge	Jianyu Wei et.al.	2407.00088	link
2024-07-09	Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving	Ruoyu Qin et.al.	2407.00079	link
2024-06-28	InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management	Wonbeom Lee et.al.	2406.19707	null
2024-08-28	AI-native Memory: A Pathway from LLMs Towards AGI	Jingbo Shang et.al.	2406.18312	null
2024-06-25	FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model	Feijie Wu et.al.	2406.17706	link
2024-06-26	MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool	Cunchen Hu et.al.	2406.17565	null
2024-11-11	Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters	Euiin Yi et.al.	2406.16758	link

LLM Scheduling

Publish Date	Title	Authors	PDF	Code
2025-07-11	InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching	Yilun Wang et.al.	2507.08523	null
2025-07-09	Gradientsys: A Multi-Agent LLM Scheduler with ReAct Orchestration	Xinyuan Song et.al.	2507.06520	null
2025-05-29	Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters	Hayden Moore et.al.	2505.23554	null
2025-05-14	ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor	Seungbeom Choi et.al.	2505.09142	null
2025-06-08	PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference	Zeyu Zhang et.al.	2409.15104	null
2024-08-28	Efficient LLM Scheduling by Learning to Rank	Yichao Fu et.al.	2408.15792	link

MoE

Publish Date	Title	Authors	PDF	Code
2025-07-22	Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data	Yunyi Shen et.al.	2507.16817	null
2025-07-22	Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training	Zixiao Huang et.al.	2507.16274	null
2025-07-21	Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure	Alexandra Junell et.al.	2507.16088	null
2025-07-21	Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation	Alessandro B. Melchiorre et.al.	2507.15826	null
2025-07-21	RankMixer: Scaling Up Ranking Models in Industrial Recommenders	Jie Zhu et.al.	2507.15551	null
2025-07-21	The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts	Sungmin Yun et.al.	2507.15465	null
2025-07-21	Universal crystal material property prediction via multi-view geometric fusion in graph transformers	Liang Zhang et.al.	2507.15303	null
2025-07-20	CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning	Pan Hu et.al.	2507.14903	null
2025-07-23	GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving	Chi Wan et.al.	2507.14456	null
2025-07-18	SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing	Yingying Zhang et.al.	2507.13812	null
2025-07-17	Apple Intelligence Foundation Language Models: Tech Report 2025	Hanzhi Zhou et.al.	2507.13575	null
2025-07-17	R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning	Xiaohan Guo et.al.	2507.13107	null
2025-07-16	Astro-MoE: Mixture of Experts for Multiband Astronomical Time Series	Martina Cádiz-Leyton et.al.	2507.12611	null
2025-07-16	Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models	Gen Luo et.al.	2507.12566	null
2025-07-16	Mixture of Raytraced Experts	Andrea Perin et.al.	2507.12419	null
2025-07-16	CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning	Peiwen Xia et.al.	2507.11834	null
2025-07-09	The AI Shadow War: SaaS vs. Edge Computing Architectures	Rhea Pritham Marpu et.al.	2507.11545	null
2025-07-15	Mixture of Experts in Large Language Models	Danyang Zhang et.al.	2507.11181	null
2025-07-15	Atmos-Bench: 3D Atmospheric Structures for Climate Insight	Tianchi Xu et.al.	2507.11085	null
2025-07-14	DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models	Luolin Xiong et.al.	2507.09955	null
2025-07-14	ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization	Huilai Li et.al.	2507.09945	null
2025-07-14	Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems	Vindula Jayawardana et.al.	2507.09836	null
2025-07-18	Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts	Aakash Tripathi et.al.	2507.09754	null
2025-07-13	Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive	You Huang et.al.	2507.09612	null
2025-07-12	PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process	Shiqi Jiang et.al.	2507.09242	null
2025-07-11	SSH-Passkeys: Leveraging Web Authentication for Passwordless SSH	Moe Kayali et.al.	2507.09022	null
2025-07-11	BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity	Chenyang Song et.al.	2507.08771	null
2025-07-11	CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes	Tianyou Jiang et.al.	2507.08542	null
2025-07-11	White-Basilisk: A Hybrid Model for Code Vulnerability Detection	Ioannis Lamprou et.al.	2507.08540	null
2025-07-21	KAT-V1: Kwai-AutoThink Technical Report	Zizheng Zhan et.al.	2507.08297	null
2025-07-11	Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization	Woon Ryong Kim et.al.	2507.08269	null
2025-07-10	MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving	Lu Xu et.al.	2507.07818	null
2025-07-10	When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance	Peizhang Shao et.al.	2507.07748	null
2025-07-09	Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning	Ankit Jyothish et.al.	2507.07335	null
2025-07-08	Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate	A. Bochkov et.al.	2507.07129	null
2025-07-07	Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding	Nidhi Bhatia et.al.	2507.07120	null
2025-06-03	Multi-level Mixture of Experts for Multimodal Entity Linking	Zhiwei Hu et.al.	2507.07108	null
2025-07-09	4KAgent: Agentic Any Image to 4K Super-Resolution	Yushen Zuo et.al.	2507.07105	null
2025-07-11	FlexOlmo: Open Language Models for Flexible Data Use	Weijia Shi et.al.	2507.07024	null
2025-07-09	Deep Disentangled Representation Network for Treatment Effect Estimation	Hui Meng et.al.	2507.06650	null
2025-07-09	SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference	Qian Chen et.al.	2507.06567	null
2025-07-09	MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models	Yiwen Liu et.al.	2507.06502	null
2025-07-08	Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation	Szymon Płotka et.al.	2507.06363	null
2025-07-08	Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis	Xintong Hu et.al.	2507.06116	null
2025-07-09	A Survey on Prompt Tuning	Zongqian Li et.al.	2507.06085	null
2025-07-08	Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors	Bing Wang et.al.	2507.05939	null
2025-07-08	What You Have is What You Track: Adaptive and Robust Multimodal Tracking	Yuedong Tan et.al.	2507.05899	null
2025-07-21	Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition	Zijin Gu et.al.	2507.05724	null
2025-07-08	Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach	Xiaobing Chen et.al.	2507.05685	null
2025-07-08	City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data	Tianxing Wu et.al.	2507.05651	null
2025-07-07	QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks	Hoang-Quan Nguyen et.al.	2507.05190	null
2025-07-07	NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification	Jun Hu et.al.	2507.04870	null
2025-07-07	UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization	Kai Yang et.al.	2507.04706	null
2025-07-07	DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics	Yayu Long et.al.	2507.04661	null
2025-07-08	UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification	Xixi Wan et.al.	2507.04638	null
2025-07-07	Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts	Yun Wang et.al.	2507.04631	null
2025-07-06	Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts	Guokan Shang et.al.	2507.04569	null
2025-07-22	Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge	Linshen Liu et.al.	2507.04123	null
2025-07-05	From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM	Xinyi Wu et.al.	2507.03868	null
2025-07-04	Decoupled Relative Learning Rate Schedules	Jan Ludziejewski et.al.	2507.03526	null
2025-07-03	Neural Inhibition Improves Dynamic Routing and Mixture of Experts	Will Y. Zou et.al.	2507.03221	null
2025-07-02	Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!	Do-hyeon Yoon et.al.	2507.03014	null
2025-07-03	System-performance and cost modeling of Large Language Model training and inference	Wenzhe Guo et.al.	2507.02456	null
2025-07-03	NLP4Neuro: Sequence-to-sequence learning for neural population decoding	Jacob J. Morra et.al.	2507.02264	null
2025-07-02	MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics	Dmytro Kuzmenko et.al.	2507.01843	null
2025-07-02	GradMetaNet: An Equivariant Architecture for Learning on Gradients	Yoav Gelberg et.al.	2507.01649	null
2025-07-02	Mixtures of Neural Network Experts with Application to Phytoplankton Flow Cytometry Data	Ethan Pawl et.al.	2507.01375	null
2025-07-02	Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model	Chaoxiang Cai et.al.	2507.01351	null
2025-07-02	Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations	Bohao Wang et.al.	2507.01337	null
2025-07-02	ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation	JianChao Zhao et.al.	2507.00502	null
2025-07-01	MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE	Geng Zhang et.al.	2507.00390	null
2025-06-30	Engineering NV Centers via Hydrogen-Driven Defect Chemistry in CVD Diamonds for Quantum Applications: NVHx Dissociations into NV, Origin of 468nm Center, and Cause of Brown Coloration	Mubashir Mansoor et.al.	2507.00300	null
2025-06-17	LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing	Wenbing Li et.al.	2507.00029	null
2025-06-30	MotionGPT3: Human Motion as a Second Modality	Bingfan Zhu et.al.	2506.24086	null
2025-06-30	MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis	Zhe Liu et.al.	2506.23648	null
2025-06-30	Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model	Mu-Chi Chen et.al.	2506.23635	null
2025-06-29	Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging	Lujun Li et.al.	2506.23266	null
2025-06-29	External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting	Haoran Li et.al.	2506.23201	null
2025-06-29	Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound	Zhiyuan Zhu et.al.	2506.23108	null
2025-07-01	Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning	Sanskar Pandey et.al.	2506.22919	null
2025-06-27	QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization	Danush Khanna et.al.	2506.22396	null
2025-06-27	Towards Distributed Neural Architectures	Aditya Cowsik et.al.	2506.22389	null
2025-06-27	MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism	Zheng Zhang et.al.	2506.22175	null
2025-07-09	DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE	Hang Shao et.al.	2506.21864	null
2025-06-21	AdaptGOT: A Pre-trained Model for Adaptive Contextual POI Representation Learning	Xiaobin Ren et.al.	2506.21612	null
2025-06-26	Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts	Jiajie Yang et.al.	2506.21328	null
2025-06-26	Learning to Skip the Middle Layers of Transformers	Tim Lawson et.al.	2506.21103	null
2025-06-26	Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning	Haodong Lu et.al.	2506.21035	null
2025-06-26	EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning	Xiao Zhang et.al.	2506.20986	null
2025-06-30	The Singapore Consensus on Global AI Safety Research Priorities	Yoshua Bengio et.al.	2506.20702	null
2025-06-17	Utility-Driven Speculative Decoding for Mixture-of-Experts	Anish Saxena et.al.	2506.20675	null
2025-06-25	Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration	Jiaxing Huang et.al.	2506.20282	null
2025-06-24	Integrating Pair Programming as a Work Practice	Nina Haugland Andersen et.al.	2506.19511	null
2025-07-05	The H $α$ line as a probe of chromospheric magnetic fields	Harsh Mathur et.al.	2506.19510	null
2025-06-23	Multimodal Anomaly Detection with a Mixture-of-Experts	Christoph Willibald et.al.	2506.19077	null
2025-06-23	Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models	Zihan Wang et.al.	2506.18945	null
2025-06-23	Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning	Rahul Atul Bhope et.al.	2506.18789	null
2025-06-23	An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify	Shivam Verma et.al.	2506.18735	null
2025-06-23	Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks	Xiaodong Wu et.al.	2506.18543	null
2025-06-23	SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation	Zichong Li et.al.	2506.18349	null
2025-06-23	Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies	Junchao Fan et.al.	2506.18304	null
2025-06-22	Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection	Zheng Zhan et.al.	2506.18145	null
2025-06-21	Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert	Gelei Xu et.al.	2506.17787	null
2025-06-21	Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities	Xinghao Huang et.al.	2506.17755	null
2025-06-21	PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation	Xinyu Xiong et.al.	2506.17712	null
2025-06-20	SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification	Zhenglin Lai et.al.	2506.17368	null
2025-07-14	FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE	Khiem Le et.al.	2506.16600	null
2025-06-19	Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models	Daniel Fidel Harvey et.al.	2506.16419	null
2025-06-19	DCFNet: Doppler Correction Filter Network for Integrated Sensing and Communication in Multi-User MIMO-OFDM Systems	Hyeonho Noh et.al.	2506.16191	null
2025-06-17	Scaling Intelligence: Designing Data Centers for Next-Gen Language Models	Jesmin Jahan Tithi et.al.	2506.15006	null
2025-06-17	NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification	Wajih Hassan Raza et.al.	2506.14970	null
2025-06-17	Narrowing the Gap between TEEs Threat Model and Deployment Strategies	Filip Rezabek et.al.	2506.14964	null
2025-05-31	Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors	Henrik Klagges et.al.	2506.14794	null
2025-06-19	Integrating Dynamical Systems Learning with Foundational Models: A Meta-Evolutionary AI Framework for Clinical Trials	Joseph Geraci et.al.	2506.14782	null
2025-06-17	GMT: General Motion Tracking for Humanoid Whole-Body Control	Zixuan Chen et.al.	2506.14770	null
2025-06-17	Exploring Speaker Diarization with Mixture of Experts	Gaobin Yang et.al.	2506.14750	null
2025-06-18	Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs	Ling Team et.al.	2506.14731	null
2025-06-17	GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors	Hengyuan Zhang et.al.	2506.14646	link
2025-06-17	Single-Example Learning in a Mixture of GPDMs with Latent Geometries	Jesse St. Amand et.al.	2506.14563	null
2025-06-30	MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation	Shen Yuan et.al.	2506.14436	link
2025-06-17	MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models	Hongyu Wang et.al.	2506.14435	null
2025-06-17	Less is More: Undertraining Experts Improves Model Upcycling	Stefan Horoi et.al.	2506.14126	null
2025-06-16	Load Balancing Mixture of Experts with Similarity Preserving Routers	Nabil Omi et.al.	2506.14038	null
2025-06-16	GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics	Qianzhong Chen et.al.	2506.14009	null
2025-06-16	MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention	MiniMax et.al.	2506.13585	link
2025-06-16	Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization	Guanghui Song et.al.	2506.13541	null
2025-07-04	EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization	Zhongqian Fu et.al.	2506.13329	link
2025-06-16	Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs	Xintong Tang et.al.	2506.13192	null
2025-06-19	Serving Large Language Models on Huawei CloudMatrix384	Pengfei Zuo et.al.	2506.12708	null
2025-06-14	Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts	Shengzhuang Chen et.al.	2506.12597	null
2025-06-14	Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control	Rongpeng Li et.al.	2506.12453	null
2025-06-17	HarMoEny: Efficient Multi-GPU Inference of MoE Models	Zachary Doucet et.al.	2506.12417	null
2025-06-14	Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model	Chong Li et.al.	2506.12388	null
2025-06-13	Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?	Houyi Li et.al.	2506.12119	null
2025-06-13	Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution	Zhangkai Ni et.al.	2506.11823	link
2025-05-21	MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning	Linjie Li et.al.	2506.11038	null
2025-04-23	Test code generation at Ericsson using Program Analysis Augmented Fine Tuned LLMs	Sai Krishna et.al.	2506.11006	null
2025-06-12	Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts	Zaijing Li et.al.	2506.10357	null
2025-06-12	Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments	Philipp Bär et.al.	2506.10345	null
2025-06-13	A Survey of Generative Categories and Techniques in Multimodal Large Language Models	Longzhen Han et.al.	2506.10016	null
2025-06-11	GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture	GigaChat team et.al.	2506.09440	null
2025-06-11	DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts	Yuchen Feng et.al.	2506.09351	null
2025-06-11	Ming-Omni: A Unified Multimodal Model for Perception and Generation	Inclusion AI et.al.	2506.09344	link
2025-06-10	CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks	Yixuan Li et.al.	2506.08931	null
2025-06-10	CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA	Jiale Dong et.al.	2506.08496	link
2025-06-11	MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding	Shivang Chopra et.al.	2506.08356	null
2025-06-09	Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting	Timothée Hornek Amir Sartipi et.al.	2506.08113	null
2025-06-11	STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation	Yiming Wang et.al.	2506.08054	link
2025-06-09	A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling	Jacob Helwig et.al.	2506.07969	link
2025-06-09	New Insights into the T Tauri Binary Separation Distribution	Caleb Eastlund et.al.	2506.07938	null
2025-06-09	M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration	Yongzhen Wang et.al.	2506.07814	null
2025-07-23	MIRA: Medical Time Series Foundation Model for Real-World Health Data	Hao Li et.al.	2506.07584	null
2025-06-11	MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization	Ken Yaggel et.al.	2506.07563	link
2025-06-09	MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts	Wei Tao et.al.	2506.07533	null
2025-06-09	Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning	Libo Wang et.al.	2506.07501	null
2025-06-09	MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing	Haiyue Ma et.al.	2506.07366	null
2025-06-08	UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment	Wentao Zhao et.al.	2506.07013	null
2025-06-07	High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations	Ziwei Li et.al.	2506.06858	null
2025-06-07	Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning	Yuan Yuan et.al.	2506.06694	null
2025-06-25	SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities	Guoyang Xia et.al.	2506.06406	null
2025-05-27	MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes	Feiyang Pan et.al.	2506.06318	null
2025-06-06	Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization	Jonathan Yang et.al.	2506.06196	null
2025-06-06	MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models	Jie Cao et.al.	2506.05928	null
2025-06-06	dots.llm1 Technical Report	Bi Huo et.al.	2506.05767	null
2025-06-05	Mixture-of-Experts Meets In-Context Reinforcement Learning	Wenhao Wu et.al.	2506.05426	null
2025-06-20	Kinetics: Rethinking Test-Time Scaling Laws	Ranajoy Sadhukhan et.al.	2506.05333	link
2025-06-05	Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection	Ziyi Zhou et.al.	2506.04739	null
2025-06-09	FlashDMoE: Fast Distributed MoE in a Single Kernel	Osayamen Jonathan Aimuyo et.al.	2506.04667	link
2025-06-04	Out-of-Distribution Graph Models Merging	Yidi Wang et.al.	2506.03674	null
2025-06-04	Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts	Jiaxing Zhang et.al.	2506.03591	null
2025-06-04	PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs	Ze Yu Zhang et.al.	2506.02965	null
2025-06-03	Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights	Jakub Krajewski et.al.	2506.02890	null
2025-06-03	Brain-Like Processing Pathways Form in Models With Heterogeneous Experts	Jack Cook et.al.	2506.02813	null
2025-06-04	MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection	Juntong Li et.al.	2506.02535	null
2025-06-03	MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework	Yupeng Qi et.al.	2506.02460	null
2025-05-31	Enhancing Multimodal Continual Instruction Tuning with BranchLoRA	Duzhen Zhang et.al.	2506.02041	null
2025-06-02	SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model	Zhao Yang et.al.	2506.01833	link
2025-06-02	Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning	Ryotaro Kawata et.al.	2506.01656	null
2025-06-02	DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models	Jiancheng Ye et.al.	2506.01257	null
2025-06-01	Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts	Fan Liu et.al.	2506.00965	null
2025-05-31	FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts	Xinyi Wang et.al.	2506.00495	null
2025-05-30	Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction	Shuai Liu et.al.	2505.24597	null
2025-06-11	Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis	Junzhuo Li et.al.	2505.24593	null
2025-05-30	Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer	Yilun Kong et.al.	2505.24378	link
2025-05-30	GradPower: Powering Gradients for Faster Language Model Pre-Training	Mingze Wang et.al.	2505.24275	null
2025-05-30	On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks	Mingze Wang et.al.	2505.24205	null
2025-05-29	Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts	Xuweiyi Chen et.al.	2505.23926	null
2025-06-09	Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert	Zhaokun Wang et.al.	2505.23868	null
2025-05-29	Revisiting Uncertainty Estimation and Calibration of Large Language Models	Linwei Tao et.al.	2505.23854	null
2025-05-28	EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models	Linglin Jing et.al.	2505.23830	null
2025-06-03	LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions	Hadi Askari et.al.	2505.23811	null
2025-05-29	From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents	Tobias Lindenbauer et.al.	2505.23422	link
2025-05-29	Context-Aware Semantic Communication for the Wireless Networks	Guangyuan Liu et.al.	2505.23249	null
2025-05-29	Two Is Better Than One: Rotations Scale LoRAs	Hongcan Guo et.al.	2505.23184	null
2025-05-28	HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer	Qi Cai et.al.	2505.22705	link
2025-05-28	Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts	Xue Zhang et.al.	2505.22582	null
2025-05-28	A Human-Centric Approach to Explainable AI for Personalized Education	Vinitra Swamy et.al.	2505.22541	link
2025-05-28	Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion	Kewen Chen et.al.	2505.22360	null
2025-05-28	Advancing Expert Specialization for Better MoE	Hongcan Guo et.al.	2505.22323	null
2025-05-28	ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation	Jiawen Yu et.al.	2505.22159	null
2025-05-28	On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition	Shujie HU et.al.	2505.22072	null
2025-05-28	AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation	Yan Rong et.al.	2505.22053	null
2025-05-29	ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge	Zhongyi Zhou et.al.	2505.21906	null
2025-05-27	MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis	Yitong Li et.al.	2505.21698	null
2025-05-23	EvidenceMoE: A Physics-Guided Mixture-of-Experts with Evidential Critics for Advancing Fluorescence Light Detection and Ranging in Scattering Media	Ismail Erbas et.al.	2505.21532	null
2025-05-28	Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity	Yehui Tang et.al.	2505.21411	null
2025-05-27	Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities	Junyan Zhang et.al.	2505.21191	null
2025-05-27	Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts	Yue Zhang et.al.	2505.21079	null
2025-05-27	Multi-objective Large Language Model Alignment with Hierarchical Experts	Zhuo Li et.al.	2505.20925	null
2025-05-26	FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models	Hao Kang et.al.	2505.20225	link
2025-06-01	NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID	Shihao Li et.al.	2505.20001	null
2025-05-26	Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments	Junming Liu et.al.	2505.19699	null
2025-06-13	MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE	Zongle Huang et.al.	2505.19645	null
2025-05-26	Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate	Liangwei Nathan Zheng et.al.	2505.19525	link
2025-05-26	WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference	Sihan Chen et.al.	2505.19427	link
2025-05-25	RankLLM: A Python Package for Reranking with LLMs	Sahel Sharifymoghaddam et.al.	2505.19284	null
2025-05-25	I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts	Jiayi Xin et.al.	2505.19190	link
2025-05-24	TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling	Chonghua Han et.al.	2505.18670	null
2025-05-24	ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation	Jian Liang et.al.	2505.18640	link
2025-07-02	Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter	Weizhi Zhong et.al.	2505.18612	null
2025-05-24	Guiding the Experts: Semantic Priors for Efficient and Focused MoE Routing	Chengxi Min et.al.	2505.18586	link
2025-05-24	Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning	Aofei Chang et.al.	2505.18503	null
2025-05-24	On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts	Fanqi Yan et.al.	2505.18455	null
2025-05-24	$μ$ -MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts	Toshiaki Koike-Akino et.al.	2505.18451	null
2025-05-23	Betelgeuse’s Buddy: X-Ray Constraints on the Nature of $α$ Ori B	Anna J. G. O’Grady et.al.	2505.18376	null
2025-05-23	Betelgeuse, Betelgeuse, Betelgeuse, Betel-buddy? Constraints on the dynamical companion to $α$ Orionis from HST	Jared A. Goldberg et.al.	2505.18375	null
2025-05-13	Constrained Edge AI Deployment: Fine-Tuning vs Distillation for LLM Compression	Jacob Sander et.al.	2505.18166	null
2025-05-23	Enhancing CTR Prediction with De-correlated Expert Networks	Jiancheng Wang et.al.	2505.17925	null
2025-05-23	PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval	Zehua Pei et.al.	2505.17639	null
2025-05-23	CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning	Jinyuan Feng et.al.	2505.17553	null
2025-05-31	MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation	Kaixing Yang et.al.	2505.17543	null
2025-07-04	JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model	Qihao Duan et.al.	2505.17257	null
2025-05-31	TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling	Weizhe Lin et.al.	2505.17155	null
2025-05-22	DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving	Zhenjie Yang et.al.	2505.16278	null
2025-05-22	DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor	Yan Zhao et.al.	2505.16256	null
2025-05-21	Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models	Jingcong Liang et.al.	2505.16056	link
2025-05-26	MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding	Yuxiang Wei et.al.	2505.15946	null
2025-05-21	Who “Controls” Where Work Shall be Done? State-of-Practice in Post-Pandemic Remote Work Regulation	Darja Smite et.al.	2505.15743	null
2025-05-21	CoLA: Collaborative Low-Rank Adaptation	Yiyun Zhou et.al.	2505.15471	link
2025-07-04	Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought	Tencent Hunyuan Team et.al.	2505.15431	null
2025-05-21	Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks	Uranik Berisha et.al.	2505.15414	null
2025-05-21	Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites	Xintong Wang et.al.	2505.15297	null
2025-05-21	Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines	Xiaohou Shi et.al.	2505.15151	null
2025-05-20	Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies	Haoyi Qiu et.al.	2505.14972	link
2025-05-30	TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis	Yu Zhang et.al.	2505.14910	link
2025-05-20	Balanced and Elastic End-to-end Training of Dynamic LLMs	Mohamed Wahib et.al.	2505.14864	null
2025-05-20	Solving MNIST with a globally trained Mixture of Quantum Experts	Paolo Alessandro Xavier Tognini et.al.	2505.14789	null
2025-05-27	Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training	Mengru Wang et.al.	2505.14681	null
2025-05-21	Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach	Umberto Cappellazzo et.al.	2505.14336	null
2025-05-20	FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation	Shaolin Zhu et.al.	2505.14256	null
2025-05-20	THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation	Yunlong Liang et.al.	2505.14173	null
2025-05-20	Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition	Shuo Zhang et.al.	2505.14143	null
2025-05-20	Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging	Ryo Bertolissi et.al.	2505.14136	null
2025-05-20	Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts	Xi Chen et.al.	2505.14088	null
2025-05-20	StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning	Huaijie Wang et.al.	2505.13997	null
2025-05-20	Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting	Bao-Ngoc Dao et.al.	2505.13944	link
2025-05-27	U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding	Ziqian Wang et.al.	2505.13880	link
2025-05-20	EfficientLLM: Efficiency in Large Language Models	Zhengqing Yuan et.al.	2505.13840	null
2025-05-19	CompeteSMoE – Statistically Guaranteed Mixture of Experts Training via Competition	Nam V. Nguyen et.al.	2505.13380	link
2025-05-19	Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference	Shuqing Luo et.al.	2505.13345	link
2025-05-19	Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models	Lucas Berry et.al.	2505.13273	null
2025-05-19	True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics	Christoph Jürgen Hemmer et.al.	2505.13192	null
2025-05-23	Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures	Tuan Thai et.al.	2505.13052	null
2025-05-19	TransferTraj: A Vehicle Trajectory Learning Model for Region and Task Transferability	Tonglong Wei et.al.	2505.12672	null
2025-05-30	Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization	Hongbiao Zhu et.al.	2505.12311	null
2025-05-22	Model Merging in Pre-training of Large Language Models	Yunshui Li et.al.	2505.12082	null
2025-05-22	Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition	Runduo Han et.al.	2505.12007	link
2025-05-17	MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging	Zihuan Qiu et.al.	2505.11883	null
2025-05-17	Improving Coverage in Combined Prediction Sets with Weighted p-values	Gina Wong et.al.	2505.11785	null
2025-05-16	HessFormer: Hessians at Foundation Scale	Diego Granziol et.al.	2505.11564	null
2025-05-10	PRIME: Physics-Related Intelligent Mixture of Experts for Transistor Characteristics Prediction	Zhenxing Dou et.al.	2505.11523	null
2025-05-19	MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production	Chao Jin et.al.	2505.11432	null
2025-05-21	MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	Yinsicheng Jiang et.al.	2505.11415	null
2025-05-16	A Fast Kernel-based Conditional Independence test with Application to Causal Discovery	Oliver Schacht et.al.	2505.11085	null
2025-05-16	On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating	Huy Nguyen et.al.	2505.10860	null
2025-05-14	PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning	Zongqian Li et.al.	2505.09519	link
2025-05-14	Qwen3 Technical Report	An Yang et.al.	2505.09388	link
2025-05-14	Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures	Chenggang Zhao et.al.	2505.09343	null
2025-05-29	Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony	Shaoyu Wang et.al.	2505.08944	null
2025-05-13	PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts	Yang Su et.al.	2505.08719	null
2025-05-25	AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale	Yunjie Ji et.al.	2505.08311	null
2025-05-12	UMoE: Unifying Attention and FFN with Shared Experts	Yuanhang Yang et.al.	2505.07260	null
2025-05-11	Seed1.5-VL Technical Report	Dong Guo et.al.	2505.07062	null
2025-05-21	FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers	Tianyu Chen et.al.	2505.06858	null
2025-05-11	The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts	Enric Boix-Adsera et.al.	2505.06839	null
2025-05-10	Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free	Zihan Qiu et.al.	2505.06708	link
2025-05-30	Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding	Dawei Huang et.al.	2505.06685	link
2025-05-10	QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration	HamidReza Imani et.al.	2505.06481	null
2025-05-06	A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning	Junzhou Xu et.al.	2505.06272	null
2025-05-12	FloE: On-the-Fly MoE Inference on Memory-constrained GPU	Yuxin Zhou et.al.	2505.05950	null
2025-05-09	MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design	Haojie Duanmu et.al.	2505.05799	link
2025-05-10	SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA Communication	Mikhail Khalilov et.al.	2505.05366	null
2025-05-08	Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts	Ming Li et.al.	2505.05035	null
2025-05-07	Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs	Yehui Tang et.al.	2505.04519	null
2025-05-07	SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios	Ning Cheng et.al.	2505.04201	null
2025-05-07	LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?	Teddy Foley et.al.	2505.04075	link
2025-05-07	Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications	Yuanai Xie et.al.	2505.04068	null
2025-05-24	Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks	Mehran Mazandarani et.al.	2505.03806	null
2025-05-02	MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance	Xing Hu et.al.	2505.03804	null
2025-05-06	Towards Smart Point-and-Shoot Photography	Jiawan Li et.al.	2505.03638	null
2025-05-06	Faster MoE LLM Inference for Extremely Large Models	Haoqi Yang et.al.	2505.03531	null
2025-05-06	STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation	Maolin Wang et.al.	2505.03484	null
2025-05-06	3D Gaussian Splatting Data Compression with Mixture of Priors	Lei Liu et.al.	2505.03310	null
2025-05-05	Finger Pose Estimation for Under-screen Fingerprint Sensor	Xiongjun Guan et.al.	2505.02481	link
2025-05-05	Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems	Kai Zhang et.al.	2505.02381	null
2025-05-08	Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques	Sanjay Surendranath Girija et.al.	2505.02309	null
2025-05-04	Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields	Zhenxing Mi et.al.	2505.02005	link
2025-05-03	Backdoor Attacks Against Patch-based Mixture of Experts	Cedric Chan et.al.	2505.01811	link
2025-05-01	MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling	Abdoul Majid O. Thiombiano et.al.	2505.01459	null
2025-05-02	Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders	Rogelio A Mancisidor et.al.	2505.01134	null
2025-05-02	CoCoAFusE: Beyond Mixtures of Experts via Model Fusion	Aurelio Raffa Ugolini et.al.	2505.01105	null
2025-05-01	Improving Routing in Sparse Mixture of Experts with Graph of Tokens	Tam Nguyen et.al.	2505.00792	null
2025-05-01	CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series	Tian Lan et.al.	2505.00415	null
2025-05-01	Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing	Piotr Piękos et.al.	2505.00315	link
2025-04-30	Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders	Xuwei Yang et.al.	2505.00216	null
2025-05-08	Identifying Critical Dependencies in Large-Scale Continuous Software Engineering	Anastasiia Tkalich et.al.	2504.21437	null
2025-04-29	TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts	Pradip Kunwar et.al.	2504.21190	null
2025-04-29	Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization	Shuai Gong et.al.	2504.21063	null
2025-04-26	PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight	Ben Goertzel et.al.	2504.21029	null
2025-04-29	In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer	Zechuan Zhang et.al.	2504.20690	null
2025-05-30	ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting	Yu Zhang et.al.	2504.20630	null
2025-04-29	MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification	Yichu Xu et.al.	2504.20509	null
2025-04-29	FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks	Wenjing Xiao et.al.	2504.20446	null
2025-04-29	MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation	Amaan Izhar et.al.	2504.20343	link
2025-04-28	Accelerating Mixture-of-Experts Training with Adaptive Expert Replication	Athinagoras Skiadopoulos et.al.	2504.19925	null
2025-04-28	DUETS: Setting expectations for asteroseismic binaries and binary products with synthetic populations	A. Mazzi et.al.	2504.19866	null
2025-04-28	Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey	Yunting Xu et.al.	2504.19660	null
2025-05-04	ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving	Renju Feng et.al.	2504.19580	link
2025-05-30	Versatile Framework for Song Generation with Prompt-based Control	Yu Zhang et.al.	2504.19062	null
2025-04-29	BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts	Qingyue Wang et.al.	2504.18598	null
2025-04-25	NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation	Rob Romijnders et.al.	2504.18147	null
2025-05-15	TGDT: A Temporal Graph-based Digital Twin for Urban Traffic Corridors	Nooshin Yousefzadeh et.al.	2504.18008	null
2025-06-11	Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection	Haokai Zhang et.al.	2504.17834	link
2025-04-22	Compass-V2 Technical Report	Sophia Maria et.al.	2504.15527	null
2025-04-21	Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images	Jonathan Brokman et.al.	2504.15470	link
2025-04-17	D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving	Haodong Wang et.al.	2504.15299	null
2025-04-23	MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core	Dennis Liu et.al.	2504.14960	null
2025-04-20	Evaluating Temporal Plasticity in Foundation Time Series Models for Incremental Fine-tuning	Jia Liu et.al.	2504.14677	null
2025-04-29	Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning	ByteDance Seed et.al.	2504.13914	null
2025-04-18	Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts	Jie Zou et.al.	2504.13655	null
2025-04-18	HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering	Alexander Rusnak et.al.	2504.13590	null
2025-04-18	Dense Backpropagation Improves Training for Sparse Mixture-of-Experts	Ashwinee Panda et.al.	2504.12463	link
2025-04-16	Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models	Yuanbo Tang et.al.	2504.12359	null
2025-04-16	Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data	Sangwon Hyun et.al.	2504.12287	null
2025-04-16	The Discovery of Two Quadruple Star Systems with the Second and Third Shortest Outer Periods	Brian P. Powell et.al.	2504.12239	null
2025-04-16	MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models	Hang Yuan et.al.	2504.12234	null
2025-04-13	Transmission of low energy electrons through a polyethylene terephthalate 800-nm diameter nanocapillary	Li Pengfei et.al.	2504.11479	null
2025-04-15	Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology	Henrik Häggström et.al.	2504.11279	link
2025-05-22	Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability	Jiani Liu et.al.	2504.10804	null
2025-04-14	Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning	LeiLei Ma et.al.	2504.09990	null
2025-04-14	DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training	Masahiro Tanaka et.al.	2504.09983	null
2025-04-14	Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications	Nathalie Bartoli et.al.	2504.09930	null
2025-04-14	Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming	Zhiqiang He et.al.	2504.09906	null
2025-04-13	Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation	Jia Wei et.al.	2504.09601	null
2025-04-12	MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints	Yichao Yuan et.al.	2504.09345	null
2025-04-12	Mixture of Group Experts for Learning Invariant Representations	Lei Kang et.al.	2504.09265	null
2025-04-12	Exploring Modality Disruption in Multimodal Fake News Detection	Moyang Liu et.al.	2504.09154	null
2025-05-08	RouterKT: Mixture-of-Experts for Knowledge Tracing	Han Liao et.al.	2504.08989	null
2025-03-23	ExpertRAG: Efficient RAG with Mixture of Experts – Optimizing Context Retrieval for Adaptive LLM Responses	Esmail Gumaan et.al.	2504.08744	null
2025-04-11	Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design	Robin Grapin et.al.	2504.08671	null
2025-04-11	Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner	Liu Xiao et.al.	2504.08247	null
2025-04-10	C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing	Zhongyang Li et.al.	2504.07964	link
2025-04-11	Scaling Laws for Native Multimodal Models	Mustafa Shukor et.al.	2504.07951	null
2025-04-10	Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models	Hongcheng Guo et.al.	2504.07807	link
2025-04-10	Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network	Peng Jia et.al.	2504.07777	null
2025-04-15	Kimi-VL Technical Report	Kimi Team et.al.	2504.07491	link
2025-04-09	MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution	Zhe Wang et.al.	2504.07308	link
2025-04-11	Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models	Ling Team et.al.	2504.07158	null
2025-05-28	Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations	Zican Dong et.al.	2504.06792	null
2025-04-24	FedMerge: Federated Personalization via Model Merging	Shutong Chen et.al.	2504.06768	null
2025-04-08	S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning	Hanqing Zeng et.al.	2504.06426	null
2025-04-08	HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference	Shuzhang Zhong et.al.	2504.05897	link
2025-04-08	Adaptive Substructure-Aware Expert Model for Molecular Property Prediction	Tianyi Jiang et.al.	2504.05844	null
2025-04-10	Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations	Ajay Jaiswal et.al.	2504.05586	null
2025-04-07	SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement	Zuying Xie et.al.	2504.04818	null
2025-04-06	On the Spatial Structure of Mixture-of-Experts in Transformers	Daniel Bershatsky et.al.	2504.04444	null
2025-04-05	Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator	Bing Wang et.al.	2504.04076	link
2025-04-04	HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs	Yongji Wu et.al.	2504.03871	null
2025-04-01	Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns	Diego Vallarino et.al.	2504.03750	null
2025-04-01	A Unified Virtual Mixture-of-Experts Framework:Enhanced Inference and Hallucination Mitigation in Single-Model System	Mingyan Liu et.al.	2504.03739	null
2025-03-26	A multi-scale lithium-ion battery capacity prediction using mixture of experts and patch-based MLP	Yuzhu Lei et.al.	2504.03706	link
2025-04-04	RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation	Hanbo Bi et.al.	2504.03166	null
2025-06-01	TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models	Xinquan Wang et.al.	2504.02712	null
2025-04-07	MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators	Beichen Huang et.al.	2504.02658	link
2025-04-24	Cognitive Memory in Large Language Models	Lianlei Shan et.al.	2504.02441	null
2025-04-23	MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism	Ruidong Zhu et.al.	2504.02263	null
2025-04-20	Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design	Mohan Zhang et.al.	2504.01337	null
2025-04-01	Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function	Qiuchen Song et.al.	2504.00819	null
2025-04-01	DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism	Dengchun Li et.al.	2504.00661	link
2025-04-01	CFP: Low-overhead Profiling-based Intra-operator Parallelism Generation by Preserving Communication-Free Structures	Weifang Hu et.al.	2504.00598	null
2025-04-01	Continual Cross-Modal Generalization	Yan Xia et.al.	2504.00561	null
2025-04-01	Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection	Shunxin Chen et.al.	2504.00458	null
2025-03-31	Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion	Jiagen Li et.al.	2503.23721	null
2025-05-16	Mixture of Routers	Jia-Chen Zhang et.al.	2503.23362	null
2025-05-25	MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models	Zehua Liu et.al.	2503.23100	null
2025-03-29	S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning	Giang Do et.al.	2503.23007	null
2025-03-29	Sparse Mixture of Experts as Unified Competitive Learning	Giang Do et.al.	2503.22996	null
2025-03-26	Reasoning Beyond Limits: Advances and Open Problems for LLMs	Mohamed Amine Ferrag et.al.	2503.22732	null
2025-04-01	Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities	Raman Dutt et.al.	2503.22517	null
2025-04-29	RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction	Armin Abdollahi et.al.	2503.21971	null
2025-05-08	Binarity at LOw Metallicity (BLOeM): Enhanced multiplicity of early B-type dwarfs and giants at $Z=0.2\,{\rm Z}_\odot$	J. I. Villaseñor et.al.	2503.21936	null
2025-03-27	iMedImage Technical Report	Ran Wei et.al.	2503.21836	null
2025-03-27	LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models	Hengyuan Zhao et.al.	2503.21227	null
2025-05-17	MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness	Zihao Zheng et.al.	2503.21135	null
2025-03-26	Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework	Soham Sane et.al.	2503.20750	null
2025-03-26	UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines	Chen Tang et.al.	2503.20748	null
2025-03-26	Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning	Sashuai Zhou et.al.	2503.20633	null
2025-04-14	MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation	Rongyu Zhang et.al.	2503.20384	null
2025-03-26	Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning	Yousef Sadegheih et.al.	2503.20326	link
2025-03-31	Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion	Konyul Park et.al.	2503.19776	null
2025-04-30	BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts	Suzhe Xu et.al.	2503.19769	null
2025-03-25	M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation	Ziyuan Liu et.al.	2503.19406	null
2025-04-21	Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design	Rui Xie et.al.	2503.18869	null
2025-04-30	Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding	Tianyu Chen et.al.	2503.18578	null
2025-03-24	SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking	Wenrui Cai et.al.	2503.18338	null
2025-04-01	Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding	Ze Zhang et.al.	2503.18104	link
2025-03-22	Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM	Codefuse et.al.	2503.17793	null
2025-03-25	Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts	Yike Yuan et.al.	2503.16057	null
2025-03-21	UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations	Debabrata Mandal et.al.	2503.15868	null
2025-03-20	Mixture of Lookup Experts	Shibo Jie et.al.	2503.15798	link
2025-03-21	Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication	Sin-Yu Huang et.al.	2503.15722	null
2025-04-29	SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation	Thomas Pickard et.al.	2503.15358	null
2025-03-21	Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition	Seungyeon Cho et.al.	2503.14960	null
2025-03-18	Core-Periphery Principle Guided State Space Model for Functional Connectome Classification	Minheng Chen et.al.	2503.14655	null
2025-03-18	DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers	Minglei Shi et.al.	2503.14487	null
2025-03-18	MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts	Runqi Meng et.al.	2503.14355	null
2025-03-18	Frac-Connections: Fractional Extension of Hyper-Connections	Defa Zhu et.al.	2503.14125	null
2025-03-18	SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture	Tian Qin et.al.	2503.13808	null
2025-03-13	Ensemble Learning for Large Language Models in Text and Code Generation: A Survey	Mari Ashiga et.al.	2503.13505	null
2025-03-17	Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge	Shengling Qin et.al.	2503.13421	null
2025-05-10	Channel Estimation for Pinching-Antenna Systems (PASS)	Jian Xiao et.al.	2503.13268	null
2025-03-17	Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation	Yu Liu et.al.	2503.13254	null
2025-05-21	Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps	Mohammad Al-Jarrah et.al.	2503.12633	link
2025-03-16	MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts	Harshit et.al.	2503.12592	null
2025-03-16	MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification	Jianwei Zhao et.al.	2503.12401	null
2025-05-10	Adaptive Mixture of Low-Rank Experts for Robust Audio Spoofing Detection	Qixian Chen et.al.	2503.12010	null
2025-03-14	FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA	Jieming Bian et.al.	2503.11880	null
2025-03-10	MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical Care	Jiaqing Zhang et.al.	2503.11695	null
2025-03-14	A Review of DeepSeek Models’ Key Innovative Techniques	Chengen Wang et.al.	2503.11486	null
2025-03-14	MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling	Rachel S. Y. Teo et.al.	2503.11144	link
2025-03-13	Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores	Chenpeng Wu et.al.	2503.10725	link
2025-05-19	dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis	Luyuan Xie et.al.	2503.10412	null
2025-04-10	Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing	Zecheng Zhao et.al.	2503.10111	link
2025-03-12	MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching	Tairan Xu et.al.	2503.09716	null
2025-03-12	Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework	Bakary Badjie et.al.	2503.09504	null
2025-03-12	Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment	Nazanin Moradinasab et.al.	2503.09498	link
2025-04-01	Astrea: A MOE-based Visual Understanding Model with Progressive Alignment	Xiaoda Yang et.al.	2503.09445	null
2025-03-12	Automatic Operator-level Parallelism Planning for Distributed Deep Learning – A Mixed-Integer Programming Approach	Ruifeng She et.al.	2503.09357	null
2025-03-12	Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference	Mohammad Siavashi et.al.	2503.09304	null
2025-03-13	FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models	Fufangchen Zhao et.al.	2503.09158	null
2025-03-11	MoE-Loco: Mixture of Experts for Multitask Locomotion	Runhan Huang et.al.	2503.08564	null
2025-03-11	BoundarEase: Fostering Constructive Community Engagement to Inform More Equitable Student Assignment Policies	Cassandra Overney et.al.	2503.08543	link
2025-03-11	Accelerating MoE Model Inference with Expert Sharding	Oana Balmau et.al.	2503.08467	null
2025-03-26	Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models	Junzhe Li et.al.	2503.08120	null
2025-03-11	MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models	Han Zhao et.al.	2503.08007	null
2025-03-10	Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM	Yongqiang Yao et.al.	2503.07680	null
2025-04-01	TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster	Kanghui Ning et.al.	2503.07649	null
2025-03-05	BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification	Jing Zhang et.al.	2503.07640	null
2025-03-05	Mixture of Experts Made Intrinsically Interpretable	Xingyi Yang et.al.	2503.07639	null
2025-03-26	GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts	Minwen Liao et.al.	2503.07417	null
2025-04-18	A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications	Siyuan Mu et.al.	2503.07137	link
2025-03-10	VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots	Fu Chen et.al.	2503.07049	link
2025-03-10	ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration	Mengting Ai et.al.	2503.06881	link
2025-03-10	eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference	Suraiya Tairin et.al.	2503.06823	null
2025-03-09	MoFE: Mixture of Frozen Experts Architecture	Jean Seo et.al.	2503.06491	null
2025-03-25	Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models	Nguyen Do et.al.	2503.06413	link
2025-03-08	MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering	Vinay Kumar Verma et.al.	2503.06296	null
2025-03-08	A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts	Wenzhuo Du et.al.	2503.06064	null
2025-03-08	MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model	Miguel Contreras et.al.	2503.06059	null
2025-03-08	GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices	Xudong Lu et.al.	2503.06019	null
2025-03-03	How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model	Diego Vallarino et.al.	2503.05800	null
2025-03-11	Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning	Justin Chih-Yao Chen et.al.	2503.05641	null
2025-03-07	FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework	Jingyu Xu et.al.	2503.05626	null
2025-04-15	Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts	Weigao Sun et.al.	2503.05447	link
2025-03-10	Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs	Ling Team et.al.	2503.05139	null
2025-03-07	Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts	Shwai He et.al.	2503.05066	null
2025-03-06	Continual Pre-training of MoEs: How robust is your router?	Benjamin Thérien et.al.	2503.05029	null
2025-02-25	Comparative Analysis Based on DeepSeek, ChatGPT, and Google Gemini: Features, Techniques, Performance, Future Prospects	Anichur Rahman et.al.	2503.04783	null
2025-03-19	Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining	Houyi Li et.al.	2503.04715	null
2025-03-07	Question-Aware Gaussian Experts for Audio-Visual Question Answering	Hongyeob Kim et.al.	2503.04459	link
2025-03-19	Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling	Yan Li et.al.	2503.04398	null
2025-03-06	A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery	Yiheng Zhu et.al.	2503.04362	null
2025-03-06	Quantum metric induced magneto-optical effects in $\mathcal{PT}$ -symmetric antiferromagnets	Yongpan Li et.al.	2503.04312	null
2025-03-06	DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval	Yating Liu et.al.	2503.04144	null
2025-03-05	VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection	Enkhtogtokh Togootogtokh et.al.	2503.03797	link
2025-03-09	Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs	Haoran Fan et.al.	2503.03594	link
2025-03-05	Convergence Rates for Softmax Gating Mixture of Experts	Huy Nguyen et.al.	2503.03213	null
2025-03-04	MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation	Weihang Wang et.al.	2503.02799	link
2025-03-04	FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting	Congluo Xu et.al.	2503.02692	null
2025-03-06	Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer	Yujiao Yang et.al.	2503.02495	link
2025-03-04	Tabby: Tabular Data Synthesis with Language Models	Sonia Cromp et.al.	2503.02152	null
2025-03-03	ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition	Nastaran Mansourian et.al.	2503.01750	null
2025-03-03	Effective High-order Graph Representation Learning for Credit Card Fraud Detection	Yao Zou et.al.	2503.01556	null
2025-03-03	DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models	Yongqi Huang et.al.	2503.01359	null
2025-03-03	PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation	Linhai Zhang et.al.	2503.01303	null
2025-03-03	Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting	Xiaobin Hong et.al.	2503.01157	null
2025-03-02	Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion	Daiki Nishiyama et.al.	2503.00925	null
2025-03-01	Efficiently Editing Mixture-of-Experts Models with Compressed Experts	Yifei He et.al.	2503.00634	null
2025-03-01	CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering	Tianyu Huai et.al.	2503.00413	null
2025-02-28	CoSMoEs: Compact Sparse Mixture of Experts	Patrick Huber et.al.	2503.00245	null
2025-02-26	Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos	Jiamin Luo et.al.	2503.00049	null
2025-03-01	R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts	Zhongyang Li et.al.	2502.20395	link
2025-02-27	Mixture of Experts for Recognizing Depression from Interview and Reading Tasks	Loukas Ilias et.al.	2502.20213	null
2025-02-27	Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems	Zeyi Ren et.al.	2502.20183	null
2025-02-27	UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook	Yidi Jiang et.al.	2502.20067	null
2025-02-27	AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs	Xuyang Wei et.al.	2502.20035	link
2025-03-04	Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Shulai Zhang et.al.	2502.19811	link
2025-02-27	Extension of SUSY SU(5) GUTs with Nelson-Barr models	Junji Hisano et.al.	2502.19686	null
2025-03-15	Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization	Taishi Nakamura et.al.	2502.19261	null
2025-02-26	OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment	Jiaxin Deng et.al.	2502.18965	null
2025-02-26	Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM	Junxiao Ma et.al.	2502.18863	null
2025-02-25	Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking	Changyuan Zhao et.al.	2502.18118	null
2025-02-09	MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition	Mehran Shabanpour et.al.	2502.17457	null
2025-03-17	The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE	Andrei Chernov et.al.	2502.17391	null
2025-02-24	Delta Decompression for MoE-based LLMs Compression	Hao Gu et.al.	2502.17298	link
2025-02-24	Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks	Andrei Chernov et.al.	2502.17187	null
2025-02-24	Muon is Scalable for LLM Training	Jingyuan Liu et.al.	2502.16982	link
2025-03-07	BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference	Zewen Jin et.al.	2502.16927	null
2025-02-24	ENACT-Heart – ENsemble-based Assessment Using CNN and Transformer on Heart Sounds	Jiho Han et.al.	2502.16914	null
2025-02-26	Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment	Chenghao Fan et.al.	2502.16894	null
2025-02-22	An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning	Masoud Shokrnezhad et.al.	2502.16198	null
2025-02-20	A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models	Mengyang Sun et.al.	2502.15828	link
2025-03-20	Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models	Yuan Sun et.al.	2502.15451	link
2025-03-02	Tight Clusters Make Specialized Experts	Stefan K. Nielsen et.al.	2502.15315	link
2025-02-21	Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction	Baohang Zhou et.al.	2502.15290	link
2025-02-20	Ray-Tracing for Conditionally Activated Neural Networks	Claudio Gallicchio et.al.	2502.14788	null
2025-02-21	ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model	Zhongyi Zhou et.al.	2502.14420	null
2025-02-19	MoM: Linear Sequence Modeling with Mixture-of-Memories	Jusen Du et.al.	2502.13685	link
2025-02-19	Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts	Xin Li et.al.	2502.13577	null
2025-02-18	MoBA: Mixture of Block Attention for Long-Context LLMs	Enzhe Lu et.al.	2502.13189	link
2025-02-18	Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models	Gyeongman Kim et.al.	2502.12947	null
2025-03-13	DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs	Minxuan Lv et.al.	2502.12455	null
2025-02-17	From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs	Kumari Nishu et.al.	2502.12325	null
2025-02-17	Binarity at LOw Metallicity (BLOeM): Multiplicity of early B-type supergiants in the Small Magellanic Cloud	N. Britavskiy et.al.	2502.12239	null
2025-02-17	Accurate Expert Predictions in MoE Inference via Cross-Layer Gate	Zhiyuan Fang et.al.	2502.12224	null
2025-02-17	How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines	Ayan Sengupta et.al.	2502.12051	null
2025-02-17	Connector-S: A Survey of Connectors in Multi-modal Large Language Models	Xun Zhu et.al.	2502.11453	null
2025-02-16	Mixture of Tunable Experts – Behavior Modification of DeepSeek-R1 at Inference Time	Robert Dahlke et.al.	2502.11096	null
2025-02-16	ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models	Shixuan Li et.al.	2502.11059	null
2025-02-15	Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization	Matthew Lyle Olson et.al.	2502.10928	null
2025-02-11	MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition	Sungnyun Kim et.al.	2502.10447	null
2025-04-03	Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution	Bowen Chen et.al.	2502.09654	null
2025-02-14	Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting	Nicholas Dronen et.al.	2502.09500	link
2025-02-12	The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities	Ning Li et.al.	2502.08381	null
2025-02-12	Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification	Xuanze Chen et.al.	2502.08083	null
2025-03-09	Training Sparse Mixture Of Experts Text Embedding Models	Zach Nussbaum et.al.	2502.07972	link
2025-02-11	Memory Analysis on the Training Course of DeepSeek Models	Ping Zhang et.al.	2502.07846	null
2025-02-11	LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid	Weigao Sun et.al.	2502.07563	link
2025-02-11	MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks	Lotfi Abdelkrim Mecharbat et.al.	2502.07422	null
2025-02-11	Online Aggregation of Trajectory Predictors	Alex Tong et.al.	2502.07178	null
2025-02-09	Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline	Zhiyuan Fang et.al.	2502.06888	null
2025-02-12	Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach	Xu Zhang et.al.	2502.06832	null
2025-02-10	MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing	Seokjin Go et.al.	2502.06643	null
2025-02-10	Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE	Haiduo Huang et.al.	2502.06282	link
2025-02-10	Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models	Peiran Wang et.al.	2502.06094	null
2025-02-08	Mol-MoE: Training Preference-Guided Routers for Molecule Generation	Diego Calanzone et.al.	2502.05633	null
2025-02-17	UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA	Jiale Dong et.al.	2502.05602	link
2025-02-07	fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving	Hanfei Yu et.al.	2502.05370	null
2025-02-07	Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts	Roussel Desmond Nzoyem et.al.	2502.05335	null
2025-02-19	Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient	Jan Ludziejewski et.al.	2502.05172	null
2025-02-06	Mixture of neural operator experts for learning boundary conditions and model selection	Dwyer Deighan et.al.	2502.04562	null
2025-02-06	CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Zehua Pei et.al.	2502.04416	link
2025-02-06	Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning	Peizhuang Cong et.al.	2502.03884	null
2025-03-20	A Retrospective Systematic Study on Hierarchical Sparse Query Transformer-assisted Ultrasound Screening for Early Hepatocellular Carcinoma	Chaoyin She et.al.	2502.03772	link
2025-02-05	(GG) MoE vs. MLP on Tabular Data	Andrei Chernov et.al.	2502.03608	null
2025-02-05	RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts	Tuan Truong et.al.	2502.03044	null
2025-03-22	On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation	Nghiem T. Diep et.al.	2502.03029	null
2025-02-05	Scaling Laws for Upcycling Mixture-of-Experts Language Models	Seng Pei Liew et.al.	2502.03009	null
2025-02-04	ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals	Jianan Nie et.al.	2502.02748	null
2025-02-04	Binarity at LOw Metallicity (BLOeM): The multiplicity properties and evolution of BAF-type supergiants	L. R. Patrick et.al.	2502.02644	null
2025-02-04	Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism	Yuhao Qing et.al.	2502.02581	null
2025-02-07	Brief analysis of DeepSeek R1 and its implications for Generative AI	Sarah Mercer et.al.	2502.02523	null
2025-02-04	M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference	Nikhil Bhendawade et.al.	2502.02040	null
2025-02-07	MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation	Haibo Tong et.al.	2502.01719	null
2025-02-27	Omni-Mol: Exploring Universal Convergent Space for Omni-Molecular Tasks	Chengxin Hu et.al.	2502.01074	null
2025-02-17	MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs	Yuhang Zhou et.al.	2502.00997	null
2025-02-03	CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling	Xinze Wang et.al.	2502.00965	null
2025-02-02	UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs	Yufei He et.al.	2502.00806	null
2025-02-02	Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective	Yujin Oh et.al.	2502.00619	null
2025-02-05	Weak-to-Strong Diffusion with Reflection	Lichen Bai et.al.	2502.00473	null
2025-02-01	PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning	Yu Feng et.al.	2502.00354	link
2025-02-01	Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective	Fanqi Yan et.al.	2502.00281	null
2025-01-31	Pheromone-based Learning of Optimal Reasoning Paths	Anirudh Chari et.al.	2501.19278	null
2025-03-03	Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning	Minh Le et.al.	2501.18936	null
2025-01-30	MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability	Yan Sun et.al.	2501.18439	null
2025-02-10	Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework	Jung-Hua Liu et.al.	2501.17903	null
2025-01-29	Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks	Lucio La Cava et.al.	2501.17557	null
2025-01-28	3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow	Yueen Ma et.al.	2501.16698	null
2025-01-27	Searching for GEMS: Discovery and Characterization of Two Brown Dwarfs Around M Dwarfs	Alexander Larsen et.al.	2501.16554	null
2025-02-12	One-for-All Does Not Work! Enhancing Vulnerability Detection by Mixture-of-Experts (MoE)	Xu Yang et.al.	2501.16454	null
2025-01-18	Mixture of Experts (MoE): A Big Data Perspective	Wensheng Gan et.al.	2501.16352	null
2025-01-27	Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference	Yinghan Li et.al.	2501.16103	null
2025-01-25	ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning	Shangqian Gao et.al.	2501.15316	null
2025-03-16	FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts	Ziqi Liu et.al.	2501.15125	link
2025-01-25	Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning	Ziyu Zhao et.al.	2501.15103	null
2025-01-24	Mean-field limit from general mixtures of experts to quantum neural networks	Anderson Melchor Hernandez et.al.	2501.14660	null
2025-01-30	Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation	Shengzhe Zhang et.al.	2501.14269	link
2025-03-12	Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images	Zeyun Deng et.al.	2501.14198	null
2025-01-23	CSAOT: Cooperative Multi-Agent System for Active Object Tracking	Hy Nguyen et.al.	2501.13994	null
2025-01-22	Autonomy-of-Experts Models	Ang Lv et.al.	2501.13074	null
2025-02-07	LLM4WM: Adapting LLM for Wireless Multi-Tasking	Xuanyu Liu et.al.	2501.12983	null
2025-01-22	UniUIR: Considering Underwater Image Restoration as An All-in-One Learner	Xu Zhang et.al.	2501.12981	null
2025-01-22	BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR	Guodong Ma et.al.	2501.12602	null
2025-02-26	Modality Interactive Mixture-of-Experts for Fake News Detection	Yifan Liu et.al.	2501.12431	link
2025-01-21	SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection	Xiaocheng Zhang et.al.	2501.12430	null
2025-01-25	Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models	Samira Abnar et.al.	2501.12370	null
2025-01-21	MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks	Qishen Zhou et.al.	2501.12281	link
2025-02-04	Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models	Zihan Qiu et.al.	2501.11873	null
2025-01-18	FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models	Xinglin Pan et.al.	2501.10714	null
2024-12-16	DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference	Yujie Zhang et.al.	2501.10375	null
2025-01-17	OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning	Jinyuan Feng et.al.	2501.10062	null
2025-01-17	LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading	Kuan-Ming Liu et.al.	2501.09636	null
2025-01-16	MoE $^2$ : Optimizing Collaborative Inference for Edge Large Language Models	Lyudong Jin et.al.	2501.09410	null
2025-01-14	MiniMax-01: Scaling Foundation Models with Lightning Attention	MiniMax et.al.	2501.08313	null
2025-01-14	Guiding polaritonic energy and momentum through two-dimensional Bravais lattices	Zhonglin Li et.al.	2501.08123	null
2025-02-11	GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism	Chen Tang et.al.	2501.07890	null
2025-01-18	PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration	Xiaoshui Huang et.al.	2501.07762	null
2025-01-13	A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis	Binyu Zhang et.al.	2501.07016	link
2025-01-12	Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning	Hanwen Zhong et.al.	2501.06884	link
2025-01-12	A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context	Noureldin Zahran et.al.	2501.06859	null
2025-03-18	TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning	Yinghao Zhu et.al.	2501.05661	link
2025-01-09	Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing	Mengfan Liu et.al.	2501.05313	null
2025-01-07	LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes	Xiang Xu et.al.	2501.04004	link
2025-01-07	mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training	Xudong Liao et.al.	2501.03905	null
2025-01-08	Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection	Donatella Genovese et.al.	2501.03432	null
2025-01-06	Solving the Porous Medium Equation with the eXtreme Mesh deformation approach (X-Mesh)	Alexandre Chemin et.al.	2501.03083	null
2025-01-05	Soft and Compliant Contact-Rich Hair Manipulation and Care	Uksang Yoo et.al.	2501.02630	null
2025-01-12	Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning	Zhongyi Zhou et.al.	2501.02198	null
2025-03-18	MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders	Jiajun Cao et.al.	2501.01709	null
2025-01-01	REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization	Huyen Nguyen et.al.	2501.00779	null
2025-01-06	Superposition in Transformers: A Novel Way of Building Mixture of Experts	Ayoub Ben Chaliah et.al.	2501.00530	link
2024-12-31	CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection	Xiaolei Wang et.al.	2501.00346	null
2024-12-30	SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection	Yuxuan Li et.al.	2412.20665	link
2024-12-29	Multimodal Variational Autoencoder: a Barycentric View	Peijie Qiu et.al.	2412.20487	null
2025-03-05	A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement	Sidra Nasir et.al.	2412.20468	null
2024-12-29	Mind the Data Gap: Bridging LLMs to Enterprise Data Integration	Moe Kayali et.al.	2412.20331	null
2025-03-09	UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity	Jingbo Lin et.al.	2412.20157	link
2024-12-28	Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection	Yaning Zhang et.al.	2412.20156	null
2025-02-18	DeepSeek-V3 Technical Report	DeepSeek-AI et.al.	2412.19437	link
2024-12-26	AskChart: Universal Chart Understanding through Textual Enhancement	Xudong Yang et.al.	2412.19146	link
2024-12-30	Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection	Xiaoyu Huang et.al.	2412.19108	null
2024-12-26	DAPoinTr: Domain Adaptive Point Transformer for Point Cloud Completion	Yinghui Li et.al.	2412.19062	link
2025-03-10	Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making	David Shoresh et.al.	2412.18593	link
2024-12-24	BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing	Yingjie Ma et.al.	2412.18065	link
2024-12-23	UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition	Li Fu et.al.	2412.17507	null
2025-02-01	BrainMAP: Learning Multiple Activation Pathways in Brain Networks	Song Wang et.al.	2412.17404	link
2024-12-23	Efficient fine-tuning methodology of text embedding models for information retrieval: contrastive learning penalty (clp)	Jeongsu Yu et.al.	2412.17364	link
2024-12-22	The Fermat curves and arrangements of lines and conics	Nils Peder Astrup Toft et.al.	2412.16993	null
2024-12-22	Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models	Elie Antoine et.al.	2412.16971	null
2024-12-18	GraphLoRA: Empowering LLMs Fine-Tuning via Graph Collaboration of MoE	Ting Bai et.al.	2412.16216	null
2024-12-20	Theory of Mixture-of-Experts for Mobile Edge Computing	Hongbo Li et.al.	2412.15690	null
2024-12-19	MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale	Swapnil Gandhi et.al.	2412.15411	null
2025-01-03	Qwen2.5 Technical Report	Qwen et.al.	2412.15115	link
2025-02-27	ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing	Ziteng Wang et.al.	2412.14711	link
2025-01-22	A Survey on Inference Optimization Techniques for Mixture of Experts Models	Jiacheng Liu et.al.	2412.14219	link
2024-12-18	SEKE: Specialised Experts for Keyword Extraction	Matej Martinc et.al.	2412.14087	link
2024-12-18	MedCoT: Medical Chain of Thought via Hierarchical Expert	Jiaxiang Liu et.al.	2412.13736	link
2024-12-17	SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks	Mátyás Vincze et.al.	2412.13053	null
2024-12-17	Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning	Moritz Reuss et.al.	2412.12953	null
2025-01-09	CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition	He Wang et.al.	2412.12760	null
2024-12-16	Investigating Mixture of Experts in Dense Retrieval	Effrosyni Sokli et.al.	2412.11864	null
2024-12-20	Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture	Jingze Shi et.al.	2412.11834	link
2024-12-16	Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation	Svetlana Pavlitska et.al.	2412.11608	link
2024-12-16	Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture	Jingyu Xu et.al.	2412.11557	null
2024-12-14	DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification	Yuhao Wang et.al.	2412.10650	link
2024-12-13	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Zhiyu Wu et.al.	2412.10302	link
2024-12-13	Llama 3 Meets MoE: Efficient Upcycling	Aditya Vavre et.al.	2412.09952	link
2024-12-20	Memory Layers at Scale	Vincent-Pierre Berges et.al.	2412.09764	link
2025-01-10	Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Xiaoshuang Huang et.al.	2412.09278	link
2024-12-12	MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning	Lulu Zhao et.al.	2412.08946	null
2024-11-26	Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection	Tzu-Ting Yang et.al.	2412.08651	null
2025-01-18	Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective	Minh Le et.al.	2412.08285	null
2025-02-12	Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification	Xuanze Chen et.al.	2412.08193	link
2024-12-10	MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning	Yufei Ma et.al.	2412.07405	null
2024-12-10	Post-Training Statistical Calibration for Higher Activation Sparsity	Vui Seng Chua et.al.	2412.07174	link
2025-03-02	MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	Yao Fu et.al.	2412.07067	null
2024-12-07	Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts	Arturo Rodriguez et.al.	2412.06842	null
2024-12-09	Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset	Xiao Wang et.al.	2412.06647	link
2024-12-09	UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts	Zhen Wan et.al.	2412.06340	null
2024-12-08	Hallucination-aware Optimization for Large Language Model-empowered Communications	Yinqiu Liu et.al.	2412.06007	link
2024-12-10	An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism	Qing Zhang et.al.	2412.05821	null
2024-12-10	RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts	Xu Liu et.al.	2412.05679	link
2024-12-07	SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts	Gengze Zhou et.al.	2412.05552	link
2024-12-07	Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers	Boxun Xu et.al.	2412.05540	null
2024-12-23	Steps are all you need: Rethinking STEM Education with Prompt Engineering	Krishnasai Addala et.al.	2412.05023	null
2024-12-05	Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts	Chenyang Zhu et.al.	2412.04220	null
2025-03-02	Monet: Mixture of Monosemantic Experts for Transformers	Jungwoo Park et.al.	2412.04139	link
2024-12-05	Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks	Zhaoyang Liu et.al.	2412.03850	null
2024-12-04	Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond	Loukas Ilias et.al.	2412.03483	null
2024-12-03	CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting	Hao Chen et.al.	2412.02503	null
2025-02-14	MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption	Siddhant Dutta et.al.	2412.01858	null
2025-01-22	Yi-Lightning Technical Report	Alan Wake et.al.	2412.01253	null
2024-11-30	Mixture of Experts for Node Classification	Yu Shi et.al.	2412.00418	null
2025-01-22	HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting	Shaohan Yu et.al.	2412.00316	null
2024-11-27	Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference	Andrii Skliar et.al.	2412.00099	null
2025-02-16	Condense, Don’t Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning	Mingyu Cao et.al.	2412.00069	link
2024-11-29	LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References	Shuguo Jiang et.al.	2411.19758	null
2024-11-28	On the effectiveness of discrete representations in sparse mixture of experts	Giang Do et.al.	2411.19402	null
2024-11-28	Bayesian Cluster Weighted Gaussian Models	Panagiotis Papastamoulis et.al.	2411.18957	link
2024-11-27	UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS	Haomin Zhuang et.al.	2411.18797	null
2024-11-27	Complexity Experts are Task-Discriminative Learners for Any Image Restoration	Eduard Zamfir et.al.	2411.18466	null
2024-11-27	Mixture of Experts in Image Classification: What’s the Sweet Spot?	Mathurin Videau et.al.	2411.18322	null
2024-11-26	$H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs	Selim Furkan Tekin et.al.	2411.17792	link
2024-11-26	The Tempered Finite Element Method	Antoine Quiriny et.al.	2411.17564	null
2024-11-25	Staleness-Centric Optimizations for Efficient Diffusion MoE Inference	Jiajun Luo et.al.	2411.16786	null
2024-11-29	MH-MoE: Multi-Head Mixture-of-Experts	Shaohan Huang et.al.	2411.16205	null
2024-11-25	LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy	Peng Cui et.al.	2411.16095	null
2024-11-24	Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution	Haiquan Wang et.al.	2411.15871	null
2024-11-24	LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training	Xiaoye Qu et.al.	2411.15708	link
2024-11-23	Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts	Qizhou Chen et.al.	2411.15432	null
2024-11-23	Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation	Fahao Chen et.al.	2411.15419	null
2024-11-21	Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning	Jiange Yang et.al.	2411.14519	null
2024-11-20	MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification	Yuxuan Chen et.al.	2411.13004	null
2024-11-23	KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning	Ming Yin et.al.	2411.12950	null
2025-02-06	Ultra-Sparse Memory Network	Zihao Huang et.al.	2411.12364	null
2025-01-28	CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese Characters	Zishuo Feng et.al.	2411.11770	link
2024-11-18	MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao et.al.	2411.11217	null
2024-11-16	Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts	Jinqiang Long et.al.	2411.10669	link
2024-11-15	Weakly-Supervised Multimodal Learning on MIMIC-CXR	Andrea Agostini et.al.	2411.10356	link
2024-11-21	Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models	Wei Wang et.al.	2411.10003	null
2024-11-13	Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection	Vima Gupta et.al.	2411.08982	null
2024-11-13	Sparse Upcycling: Inference Inefficient Finetuning	Sasha Doubov et.al.	2411.08968	null
2024-11-13	LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing	Xiaonan Nie et.al.	2411.08446	null
2024-11-12	Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach	Renzi Wang et.al.	2411.08232	null
2024-11-12	PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model	Yilun Liu et.al.	2411.08212	null
2024-11-08	Biodynamic Analysis of Alpine Skiing with a Skier-Ski-Snow Interaction Model	Nan Gao et.al.	2411.08056	null
2024-11-12	Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge	Emmanuel Azuh Mensah et.al.	2411.07834	null
2024-11-11	Adaptive Conditional Expert Selection Network for Multi-domain Recommendation	Kuiyao Dong et.al.	2411.06826	null
2024-11-11	WDMoE: Wireless Distributed Mixture of Experts for Large Language Models	Nan Xue et.al.	2411.06681	null
2024-11-09	Learning Mixtures of Experts with EM	Quentin Fruytier et.al.	2411.06056	null
2024-11-08	NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts	Yen-Ting Lin et.al.	2411.05945	null
2024-11-05	DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts	Zelin Yao et.al.	2411.03025	link
2024-11-05	Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts	Yuan Xie et.al.	2411.02787	null
2024-11-27	SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models	Jianyi Zhang et.al.	2411.02433	link
2024-11-06	Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	Xingwu Sun et.al.	2411.02265	null
2024-12-27	FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation	Ziwei Zhan et.al.	2411.02115	null
2024-11-06	Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis	Mohammad Zbeeb et.al.	2411.01929	link
2025-02-10	RS-MoE: A Vision-Language Model with Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering	Hui Lin et.al.	2411.01595	null
2025-02-10	Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation	Mingrui Liu et.al.	2411.01457	null
2024-11-06	HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference	Peng Tang et.al.	2411.01433	null
2024-12-12	HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy	Shuqing Luo et.al.	2411.01288	link
2024-11-02	PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment	Dongxu Liu et.al.	2411.01245	null
2024-11-01	MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition	Cheng Yang et.al.	2411.01016	null
2024-11-01	LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models	Nam V. Nguyen et.al.	2411.00918	link
2024-10-16	TradExpert: Revolutionizing Trading with Mixture of Expert LLMs	Qianggang Ding et.al.	2411.00782	null
2024-11-01	MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization	Jingming Guo et.al.	2411.00662	link
2024-11-01	A Fast, Analytic Empirical Model of the Gaia Data Release 3 Astrometric Orbit Catalog Selection Function	Casey Y. Lam et.al.	2411.00654	link
2024-10-31	Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts	Xiang Deng et.al.	2410.23836	null
2024-10-30	Efficient and Interpretable Grammatical Error Correction with Mixture of Experts	Muhammad Reza Qorib et.al.	2410.23507	link
2024-10-30	Stealing User Prompts from Mixture of Experts	Itay Yona et.al.	2410.22884	null
2024-10-30	MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning	Xujia Wang et.al.	2410.22782	null
2025-02-08	ProMoE: Fast MoE-based LLM Serving using Proactive Caching	Xiaoniu Song et.al.	2410.22134	null
2024-10-29	Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging	Li Shen et.al.	2410.21804	null
2024-10-29	Neural Experts: Mixture of Experts for Implicit Neural Representations	Yizhak Ben-Shabat et.al.	2410.21643	null
2024-11-07	FinTeamExperts: Role Specialized MOEs For Financial Analysis	Yue Yu et.al.	2410.21338	null
2024-10-28	Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving	Jiyao Wang et.al.	2410.21086	null
2024-10-27	Towards a Blockchain and Opportunistic Edge Driven Metaverse of Everything	Paula Fraga-Lamas et.al.	2410.20594	null
2024-10-27	Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation	Maohao Shen et.al.	2410.20336	null
2024-10-27	GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields	Yusuke Sekikawa et.al.	2410.20306	null
2024-11-12	LLMs Can Evolve Continually on Modality for X-Modal Reasoning	Jiazuo Yu et.al.	2410.20178	link
2024-10-25	DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction	Zelin Zang et.al.	2410.19504	link
2025-01-27	Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis	Weikai Li et.al.	2410.19225	link
2024-10-24	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai et.al.	2410.19123	link
2024-10-24	Mixture of Parrots: Experts improve memorization more than reasoning	Samy Jelassi et.al.	2410.19034	null
2024-10-24	MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases	Zhisheng Lin et.al.	2410.18406	null
2024-10-23	Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches	Kexin Feng et.al.	2410.18298	null
2024-10-23	MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning	Jingfan Zhang et.al.	2410.18035	null
2024-10-23	ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference	Xin He et.al.	2410.17954	null
2024-10-23	Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition	Artem Basharin et.al.	2410.17765	null
2024-10-22	Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling	Jialong Li et.al.	2410.17043	null
2024-10-21	LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset	Ruikun Zhang et.al.	2410.16095	link
2024-10-22	CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts	Zhenpeng Su et.al.	2410.16077	link
2024-10-29	Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Qiao Sun et.al.	2410.15774	link
2024-11-23	ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts	Xumeng Han et.al.	2410.15732	null
2024-10-20	Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs	Xin Zhou et.al.	2410.15438	null
2024-11-16	LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration	Yuang Ai et.al.	2410.15385	link
2024-10-19	MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning	Suning Huang et.al.	2410.14972	null
2024-10-29	Collaboratively adding new knowledge to an LLM	Rhui Dih Lee et.al.	2410.14753	link
2024-10-18	MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts	Rachel S. Y. Teo et.al.	2410.14574	link
2024-10-18	Towards a Simple and Extensible Standard for Object-Centric Event Data (OCED) – Core Model, Design Space, and Lessons Learned	Dirk Fahland et.al.	2410.14495	link
2024-10-18	ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction	Haoyu He et.al.	2410.14099	link
2024-10-17	Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks	Jinze Zhao et.al.	2410.13964	null
2024-10-18	MoR: Mixture of Ranks for Low-Rank Adaptation Tuning	Chuanyu Tang et.al.	2410.13408	null
2024-10-16	Satellite-Terrestrial Quantum Networks and the Global Quantum Internet	Andrea Conti et.al.	2410.13096	null
2024-10-16	On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs	Herun Wan et.al.	2410.12600	null
2024-10-16	Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion	Minkyoung Cho et.al.	2410.12592	null
2024-10-16	Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts	Fanqi Yan et.al.	2410.12258	null
2025-01-03	EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference	Yulei Qian et.al.	2410.12247	null
2024-10-15	MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router	Yanyue Xie et.al.	2410.12013	null
2024-10-15	MoH: Multi-Head Attention as Mixture-of-Head Attention	Peng Jin et.al.	2410.11842	link
2024-10-15	GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation	Fei Tang et.al.	2410.11841	link
2024-10-15	Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models	James Vo et.al.	2410.11654	null
2024-10-16	Quadratic Gating Functions in Mixture of Experts: A Statistical Insight	Pedram Akbarian et.al.	2410.11222	null
2024-10-19	AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approach	Xurui Li et.al.	2410.10896	null
2024-10-01	Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models	Keivan Alizadeh et.al.	2410.10846	null
2024-10-16	Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free	Ziyue Li et.al.	2410.10814	link
2024-10-14	Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts	Guorui Zheng et.al.	2410.10626	link
2024-10-14	Learning to Ground VLMs without Forgetting	Aritra Bhowmik et.al.	2410.10491	null
2024-10-14	Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts	Xu Liu et.al.	2410.10469	null
2024-10-15	Ada-K Routing: Boosting the Efficiency of MoE-based LLMs	Tongtian Yue et.al.	2410.10456	null
2024-10-14	Tighter Risk Bounds for Mixtures of Experts	Wissam Akretche et.al.	2410.10397	null
2024-10-24	Scalable Multi-Domain Adaptation of Language Models using Modular Experts	Peter Schafhalter et.al.	2410.10181	null
2024-10-16	Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models	Jun Luo et.al.	2410.10114	null
2024-10-14	AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality	Peijun Qing et.al.	2410.10054	link
2024-10-13	ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL	Zhanqiu Guo et.al.	2410.09781	null
2024-10-13	MoIN: Mixture of Introvert Experts to Upcycle an LLM	Ajinkya Tejankar et.al.	2410.09687	null
2024-10-12	GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks	Dingyi Zhuang et.al.	2410.09570	null
2024-10-11	Semi-Supervised Learning of Noisy Mixture of Experts Models	Oh-Ran Kwon et.al.	2410.09039	null
2024-10-11	Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering	I-Chun Chen et.al.	2410.08589	null
2024-10-31	Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts	Sukwon Yun et.al.	2410.08245	link
2024-11-20	Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training	Gen Luo et.al.	2410.08202	null
2024-10-10	Efficient Dictionary Learning with Switch Sparse Autoencoders	Anish Mudide et.al.	2410.08201	link
2024-10-18	More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing	Sagi Shaier et.al.	2410.08003	null
2024-10-10	SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture	Jiayi Han et.al.	2410.07739	null
2024-10-10	Upcycling Large Language Models into Mixture of Experts	Ethan He et.al.	2410.07524	null
2024-10-09	User Feedback in Continuous Software Engineering: Revealing the State-of-Practice	Anastasiia Tkalich et.al.	2410.07459	null
2024-10-09	MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts	Peng Jin et.al.	2410.07348	link
2024-10-04	A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles	Diego Vallarino et.al.	2410.07234	null
2024-10-09	Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders	David Noever et.al.	2410.06462	null
2024-10-09	Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs	Ruijia Niu et.al.	2410.06431	null
2024-10-08	Probing the Robustness of Theory of Mind in Large Language Models	Christian Nickel et.al.	2410.06271	null
2024-10-08	MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More	Wei Huang et.al.	2410.06270	link
2024-12-17	Aria: An Open Multimodal Native Mixture-of-Experts Model	Dongxu Li et.al.	2410.05993	link
2024-10-08	Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models	Siqi Wang et.al.	2410.05661	null
2024-12-05	Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild	Xinyu Zhao et.al.	2410.05357	link
2024-10-07	Multimodal Fusion Strategies for Mapping Biophysical Landscape Features	Lucia Gordon et.al.	2410.04833	link
2024-10-06	Realizing Video Summarization from the Path of Language-based Semantic Understanding	Kuan-Chen Mu et.al.	2410.04511	null
2024-10-09	Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding	Wei Wu et.al.	2410.03553	null
2024-10-04	Exploring the Benefit of Activation Sparsity in Pre-training	Zhengyan Zhang et.al.	2410.03440	link
2024-10-03	MLP-KAN: Unifying Deep Representation and Function Learning	Yunhong He et.al.	2410.03027	link
2024-10-03	On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions	Huy Nguyen et.al.	2410.02935	null
2024-10-03	Neutral residues: revisiting adapters for model extension	Franck Signe Talla et.al.	2410.02744	null
2024-10-03	Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping	Ziye Huang et.al.	2410.02475	null
2024-10-03	MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction	Zhaojian Yu et.al.	2410.02241	null
2024-10-03	Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts	Minh Le et.al.	2410.02200	null
2024-10-04	Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices	Andres Potapczynski et.al.	2410.02117	link
2024-10-04	EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing	Haotian Sun et.al.	2410.02098	null
2024-10-02	Don’t flatten, tokenize! Unlocking the key to SoftMoE’s efficacy in deep RL	Ghada Sokar et.al.	2410.01930	null
2024-09-15	Integrating AI’s Carbon Footprint into Risk Management Frameworks: Strategies and Tools for Sustainable Compliance in Banking Sector	Nataliya Tkachenko et.al.	2410.01818	null
2024-10-02	Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models	Shayekh Bin Islam et.al.	2410.01782	link
2024-10-02	TIC 290061484: A Triply Eclipsing Triple System with the Shortest Known Outer Period of 24.5 Days	Veselin B. Kostov et.al.	2410.01711	null
2024-10-02	Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging	Tingfeng Hui et.al.	2410.01610	null
2024-10-02	The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Hong Li et.al.	2410.01417	null
2024-10-01	MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards	Sheng Wang et.al.	2410.00938	null
2024-10-01	UniAdapt: A Universal Adapter for Knowledge Calibration	Tai D. Nguyen et.al.	2410.00454	null
2024-10-01	Robust Traffic Forecasting against Spatial Shift over Years	Hongjun Wang et.al.	2410.00373	link
2024-09-29	IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method	Chaohui Xu et.al.	2410.00059	null
2024-09-30	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Haotian Zhang et.al.	2409.20566	null
2024-09-30	HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models	Bingshen Mu et.al.	2409.19878	null
2024-10-02	CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling	Jihai Zhang et.al.	2409.19291	link
2024-11-12	SciDFM: A Large Language Model with Mixture-of-Experts for Science	Liangtai Sun et.al.	2409.18412	null
2024-11-01	Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE	Xun Zhu et.al.	2409.17508	link
2024-09-26	A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction	Guangyu Wang et.al.	2409.17440	link
2024-09-24	Leveraging Mixture of Experts for Improved Speech Deepfake Detection	Viola Negroni et.al.	2409.16077	null
2024-10-02	Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts	Xiaoming Shi et.al.	2409.16040	link
2024-10-31	Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM	Fengrun Zhang et.al.	2409.15905	null
2024-09-24	Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks	Jiayi He et.al.	2409.15695	null
2024-12-13	A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts	Hugo Inzirillo et.al.	2409.15161	link
2024-09-23	Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond	Hong Chen et.al.	2409.14993	null
2024-09-21	Routing in Sparsely-gated Language Models responds to Context	Stefan Arnold et.al.	2409.14107	null
2024-10-01	On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists	Dongyang Fan et.al.	2409.13931	link
2024-09-20	Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning	Annette Spooner et.al.	2409.13791	null
2024-09-19	On the rationality problem for hypersurfaces	Jan Lange et.al.	2409.12834	null
2024-09-19	Retrieval-Augmented Test Generation: How Far Are We?	Jiho Shin et.al.	2409.12682	null
2024-09-19	Robust Audiovisual Speech Recognition Models with Mixture-of-Experts	Yihan Wu et.al.	2409.12370	null
2024-09-18	Mixture of Diverse Size Experts	Manxi Sun et.al.	2409.12210	null
2024-09-18	GRIN: GRadient-INformed MoE	Liyuan Liu et.al.	2409.12136	null
2024-09-18	Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0	Zhiyong Wang et.al.	2409.11909	null
2024-09-17	LPT++: Efficient Training on Mixture of Long-tailed Experts	Bowen Dong et.al.	2409.11323	null
2024-12-09	LOLA – An Open-Source Massively Multilingual Large Language Model	Nikit Srivastava et.al.	2409.11272	link
2024-09-16	Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression	Yi-Hsin Li et.al.	2409.10101	null
2024-11-20	MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving	Enming Zhang et.al.	2409.07267	link
2024-09-10	DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models	Maryam Akhavan Aghdam et.al.	2409.06669	null
2024-09-10	STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning	Jaeseong Lee et.al.	2409.06211	null
2024-10-31	VE: Modeling Multivariate Time Series Correlation with Variate Embedding	Shangjiong Wang et.al.	2409.06169	link
2024-09-09	Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models	Hongyang Lei et.al.	2409.05929	null
2024-09-09	Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks	Bo Xu et.al.	2409.05726	null
2024-09-09	Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection	Tianwu Lei et.al.	2409.05611	null
2024-09-06	Hot Stars in the GALEX Ultraviolet Sky Surveys (GUVcat_AISxSDSS_HS) and the Binary Fraction of Hot Evolved Stars	Luciana Bianchi et.al.	2409.04626	null
2024-09-05	Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions	Zemian Ke et.al.	2409.03282	null
2024-09-05	ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding	Zhengzhuo Xu et.al.	2409.03277	null
2024-09-05	xLAM: A Family of Large Action Models to Empower AI Agent Systems	Jianguo Zhang et.al.	2409.03215	link
2024-09-04	Configurable Foundation Models: Building LLMs from a Modular Perspective	Chaojun Xiao et.al.	2409.02877	null
2024-09-04	Pluralistic Salient Object Detection	Xuelu Feng et.al.	2409.02368	null
2024-09-03	OLMoE: Open Mixture-of-Experts Language Models	Niklas Muennighoff et.al.	2409.02060	link
2024-09-05	Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model	Hukai Huang et.al.	2409.02050	null
2024-09-03	BEAVER: An Enterprise Benchmark for Text-to-SQL	Peter Baile Chen et.al.	2409.02038	null
2024-09-03	Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information	Xinyu Zhang et.al.	2409.01605	null
2024-09-02	Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning	Soumajyoti Sarkar et.al.	2409.01483	null
2024-09-02	Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching	Sungmin Yun et.al.	2409.01141	null
2024-09-04	Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack	Guanzhong Chen et.al.	2409.00960	link
2024-09-02	Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts	Youngseog Chung et.al.	2409.00879	null
2024-09-11	Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts	Rhui Dih Lee et.al.	2408.17280	null
2024-08-29	Gradient-free variational learning with conditional mixture networks	Conor Heins et.al.	2408.16429	link
2024-09-07	Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models	Yuncheng Yang et.al.	2408.15915	link
2024-08-28	Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts	Nikolas Gritsch et.al.	2408.15901	null
2024-10-23	LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	Fangxun Shu et.al.	2408.15881	link
2024-08-28	Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts	Lean Wang et.al.	2408.15664	null
2024-08-27	Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis	Sakhinana Sagar Srinivas et.al.	2408.15305	null
2024-08-28	A Survey of Large Language Models for European Languages	Wazir Ali et.al.	2408.15040	null
2024-08-27	MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce	Hao Jiang et.al.	2408.14968	null
2024-08-24	Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings	Sagar Srinivas Sakhinana et.al.	2408.13622	null
2024-09-11	Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler	Yikang Shen et.al.	2408.13359	null
2024-10-30	The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities	Venkatesh Balavadhani Parthasarathy et.al.	2408.13296	null
2024-08-23	Guiding IoT-Based Healthcare Alert Systems with Large Language Models	Yulan Gao et.al.	2408.13071	null
2024-08-23	O-Mamba: O-shape State-Space Model for Underwater Image Enhancement	Chenyu Dong et.al.	2408.12816	link
2024-08-23	DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation	Xiaowei Mao et.al.	2408.12809	null
2024-08-23	Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth	Yuxiang Wei et.al.	2408.12803	null
2024-08-23	La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection	Hang Zou et.al.	2408.12793	null
2024-10-02	SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging	Mohammadreza Pourreza et.al.	2408.12733	null
2024-08-22	Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Jamba Team et.al.	2408.12570	null
2024-09-09	Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators	Dingkang Yang et.al.	2408.12325	null
2024-08-15	FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models	Zhongyu Zhao et.al.	2408.11855	link
2024-08-21	MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing	Hao Zhou et.al.	2408.11396	link
2024-08-21	KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting?	Xiao Han et.al.	2408.11306	link
2024-08-21	FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts	Hanzi Mei et.al.	2408.11304	null
2024-08-27	Unboxing Occupational Bias: Grounded Debiasing of LLMs with U.S. Labor Data	Atmika Gorti et.al.	2408.11247	null
2024-08-25	Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting	Jianxiang Zhou et.al.	2408.10822	link
2024-08-20	AnyGraph: Graph Foundation Model in the Wild	Lianghao Xia et.al.	2408.10700	link
2024-08-20	HMoE: Heterogeneous Mixture of Experts for Language Modeling	An Wang et.al.	2408.10681	null
2024-08-19	AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference	Shuzhang Zhong et.al.	2408.10284	link
2024-10-29	FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models	Xiaochen Wang et.al.	2408.10276	link
2024-08-26	SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models	Anke Tang et.al.	2408.10174	link
2024-11-01	Customizing Language Models with Instance-wise LoRA for Sequential Recommendation	Xiaoyu Kong et.al.	2408.10159	link
2024-08-19	A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method	Hang Zou et.al.	2408.09752	null
2024-08-16	Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection	Haohao Zhu et.al.	2408.08551	null
2024-08-17	BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts	Qizhen Zhang et.al.	2408.08274	null

Speculative Decoding

Publish Date	Title	Authors	PDF	Code
2025-07-22	Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges	Senyao Li et.al.	2507.16731	null
2025-07-22	Enhancing Compiler Optimization Efficiency through Grammatical Decompositions of Control-Flow Graphs	Xuran Cai et.al.	2507.16660	null
2025-07-22	Ly $α$ Emission from [OIII] Emitters Near Reionization: The role of environment in galaxy Ly$α$ detection	Seyedazim Hashemi et.al.	2507.16231	null
2025-07-20	Designing Robots with, not for: A Co-Design Framework for Empowering Interactions in Forensic Psychiatry	Qiaoqiao Ren et.al.	2507.14931	null
2025-07-18	On the asymptotic equidistribution of word values in symmetric groups	Vadim Alekseev et.al.	2507.13928	null
2025-07-22	Gravity and the Higgs boson mass	Carlo Branchina et.al.	2507.13832	null
2025-07-16	Modeling Feasible Locomotion of Nanobots for Cancer Detection and Treatment	Noble Harasha et.al.	2507.12400	null
2025-07-16	Efficient Control Flow Attestation by Speculating on Control Flow Path Representations	Liam Tyler et.al.	2507.12345	null
2025-07-17	DSSD: Efficient Edge-Device LLM Deployment and Collaborative Inference via Distributed Split Speculative Decoding	Jiahong Ning et.al.	2507.12000	null
2025-07-16	Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential	Mohammad Samragh et.al.	2507.11851	null
2025-07-16	Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI	Samyam Rajbhandari et.al.	2507.11830	null
2025-07-14	Exploring ultra-high energy neutrino experiments through the lens of the transport equation	Stefano Palmisano et.al.	2507.10665	null
2025-07-14	Large Interconnected Thermodynamic Systems Nearly Minimize Entropy Production	Kyle J. Ray et.al.	2507.10476	null
2025-07-14	Supernova-induced binary-interaction-powered supernovae: a model for SN2022jli	Ryosuke Hirai et.al.	2507.09974	null
2025-07-12	TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding	Shukai Gong et.al.	2507.09252	null
2025-07-21	Bringing the Norma Dark Cloud to Light in X-rays	Stephen L. Skinner et.al.	2507.09047	null
2025-07-11	On Evaluating Performance of LLM Inference Serving Systems	Amey Agrawal et.al.	2507.09019	null
2025-07-10	Greening Schoolyards and the Spatial Distribution of Property Values in Denver, Colorado	Mahshid Gorjian et.al.	2507.08894	null
2025-07-11	BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity	Chenyang Song et.al.	2507.08771	null
2025-07-11	Time Variation in the TeV Cosmic Ray Anisotropy with IceCube and Energy Dependence of the Solar Dipole	Perri Zilberman et.al.	2507.08242	null
2025-07-08	Optically Overluminous Tidal Disruption Events: Outflow Properties and Implications for Extremely Relativistic Disruptions	Yuhan Yao et.al.	2507.06453	null
2025-07-08	Experiments to test the hypothesis for solar and dark matter axions	Babette Döbrich et.al.	2507.06414	null
2025-07-08	Supernovae from stellar mergers and accretors of binary mass transfer: Implications for Type IIP, 1987A-like and interacting supernovae	F. R. N. Schneider et.al.	2507.06391	null
2025-07-08	Bouncing Grains Keep Protoplanetary Disks Bright	Yansong Qian et.al.	2507.06298	null
2025-07-08	Tropical Donagi theorem	Felix Röhrle et.al.	2507.05987	null
2025-07-04	Impact of flavor condensate dark matter on accretion disk luminosity in spherical spacetimes	Antonio Capolupo et.al.	2507.03758	null
2025-06-18	Evolution, Future of AI, and Singularity	Zeki Doruk Erden et.al.	2507.02876	null
2025-07-03	NVIDIA GPU Confidential Computing Demystified	Zhongshu Gu et.al.	2507.02770	null
2025-07-03	OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding	Ramchalam Kinattinkara Ramakrishnan et.al.	2507.02659	null
2025-07-03	High-Order Deep Meta-Learning with Category-Theoretic Interpretation	David H. Mguni et.al.	2507.02634	null
2025-07-14	FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference	Xing Liu et.al.	2507.02620	null
2025-07-02	H.E.S.S. programme searching for VHE gamma rays associated with FRBs	F. Aharonian et.al.	2507.02143	null
2025-07-07	Handling out-of-order input arrival in CEP engines on the edge combining optimistic, pessimistic and lazy evaluation	Styliani Kyrama et.al.	2507.01461	null
2025-07-02	LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation	Tianyu Liu et.al.	2507.01449	null
2025-07-01	Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding	Guangyi Zhang et.al.	2507.00605	null
2025-06-30	User Concerns Regarding Social Robots for Mood Regulation: A Case Study on the “Sunday Blues”	Zhuochao Peng et.al.	2507.00271	null
2025-07-08	Fully Parallelized BP Decoding for Quantum LDPC Codes Can Outperform BP-OSD	Ming Wang et.al.	2507.00254	null
2025-06-30	Metal-poor single Wolf-Rayet stars: the interplay of optically thick winds and rotation	Lumen Boco et.al.	2507.00137	null
2025-06-30	Segmented Operations using Matrix Multiplications	Aleksandros Sobczyk et.al.	2506.23906	null
2025-06-29	From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows	Mohamed Amine Ferrag et.al.	2506.23260	null
2025-06-28	Polar alignment of a circumbinary disc around a brown dwarf binary	Jeremy L. Smallwood et.al.	2506.22747	null
2025-07-03	VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs	Raghavv Goel et.al.	2506.22694	null
2025-06-27	QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization	Danush Khanna et.al.	2506.22396	null
2025-07-10	Cool Gas in the Circumgalactic Medium of Massive Post Starburst Galaxies	Zoe Harvey et.al.	2506.22287	null
2025-06-26	Small Encoders Can Rival Large Decoders in Detecting Groundedness	Istabrak Abbes et.al.	2506.21288	null
2025-06-26	You never have enough J/ $ψ$ events: the case for a J/$ψ$ factory	Stephen Lars Olsen et.al.	2506.20975	null
2025-06-17	Utility-Driven Speculative Decoding for Mixture-of-Experts	Anish Saxena et.al.	2506.20675	null
2025-07-09	Charged rotating quantum black holes	Dyuman Bhattacharya et.al.	2506.19941	null
2025-06-23	Entangled Quantum Negative Energy Teleportation as a Probe of Semiclassical Gravity	Daniel S. Zachary et.al.	2506.19878	null
2025-06-24	Scaling Speculative Decoding with Lookahead Reasoning	Yichao Fu et.al.	2506.19830	null
2025-06-23	LLMs on a Budget? Say HOLA	Zohaib Hasan Siddiqui et.al.	2506.18952	null
2025-07-10	The Full Nonlinear Vortex Tube-Vorton Method: the post-stall condition	Jesus Carlos Pimentel-Garcia et.al.	2506.18719	null
2025-06-17	Semantic uncertainty in advanced decoding methods for LLM generation	Darius Foodeei et.al.	2506.17296	null
2025-07-08	Capturing Misalignment	Pierfrancesco Guarino et.al.	2506.17176	null
2025-06-20	ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models	Bin Chen et.al.	2506.16712	null
2025-07-02	Rethinking LLM Training through Information Geometry and Quantum Metrics	Riccardo Di Sipio et.al.	2506.15830	null
2025-06-15	$\texttt{SPECS}$ : Faster Test-Time Scaling through Speculative Drafts	Mert Cemri et.al.	2506.15733	null
2025-06-18	CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies	Donghyun Gouk et.al.	2506.15601	null
2025-06-18	PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction	Shufan Li et.al.	2506.15556	null
2025-06-17	Optimistic MEV in Ethereum Layer 2s: Why Blockspace Is Always in Demand	Ozan Solmaz et.al.	2506.14768	null
2025-06-17	S $^4$ C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models	Tao He et.al.	2506.14158	null
2025-06-16	Stimulus Motion Perception Studies Imply Specific Neural Computations in Human Visual Stabilization	David W Arathorn et.al.	2506.13506	null
2025-06-21	Exploring the Secondary Risks of Large Language Models	Jiawei Chen et.al.	2506.12382	null
2025-06-14	Quantum Machine Learning	Muhammad Usman et.al.	2506.12292	null
2025-06-13	Fluid-induced snap-through instability of spherical shells	Pier Giuseppe Ledda et.al.	2506.12247	null
2025-06-13	Eliciting Reasoning in Language Models with Cognitive Tools	Brown Ebouky et.al.	2506.12115	null
2025-06-12	SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding	Ziyi Zhang et.al.	2506.11309	null
2025-06-11	Speculative Design in Spiraling Time: Methods and Indigenous HCI	James Eschrich et.al.	2506.10229	null
2025-06-11	V455 Car: an oscillating eclipsing Algol-type binary in triple star system	Zhao-Long Deng et.al.	2506.10124	null
2025-06-11	Patterns of Patterns III	Joseph Corneli et.al.	2506.09696	null
2025-07-13	SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving	Xiangchen Li et.al.	2506.09397	null
2025-06-11	A collection of results relating the geometry of plane domains and the exit time of planar Brownian motion, II	Greg Markowsky et.al.	2506.09364	null
2025-07-19	Draft-based Approximate Inference for LLMs	Kevin Galim et.al.	2506.08373	link
2025-06-10	Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity	Lesi Chen et.al.	2506.08362	null
2025-06-09	MiniCPM4: Ultra-Efficient LLMs on End Devices	MiniCPM Team et.al.	2506.07900	link
2025-06-09	FREESS: An Educational Simulator of a RISC-V-Inspired Superscalar Processor Based on Tomasulo’s Algorithm	Roberto Giorgi et.al.	2506.07665	link
2025-06-09	LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments	Jin Huang et.al.	2506.07416	null
2025-06-08	Exploiting Inaccurate Branch History in Side-Channel Attacks	Yuhui Zhu et.al.	2506.07263	null
2025-06-07	Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit	Charles Goddard et.al.	2506.06607	null
2025-06-06	Fake Friends and Sponsored Ads: The Risks of Advertising in Conversational Search	Jacob Erickson et.al.	2506.06447	null
2025-07-08	On the Fundamental Impossibility of Hallucination Control in Large Language Models	Michał P. Karpowicz et.al.	2506.06382	null
2025-06-06	Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): Evidence of planet-disk interaction in the 2MASSJ16120668-3010270 system	C. Ginski et.al.	2506.05892	null
2025-06-10	Gumbel-max List Sampling for Distribution Coupling with Multiple Samples	Joseph Rowan et.al.	2506.05632	null
2025-06-05	Accelerated Test-Time Scaling with Model-Free Speculative Sampling	Woomin Song et.al.	2506.04708	null
2025-06-04	Guided Speculative Inference for Efficient Test-Time Alignment of LLMs	Jonathan Geuter et.al.	2506.04118	link
2025-06-04	The Causal-Noncausal Tail Processes: An Introduction	Christian Gouriéroux et.al.	2506.04046	null
2025-06-04	AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism	Zhepei Wei et.al.	2506.03700	link
2025-06-04	POSS: Position Specialist Generates Better Draft for Speculative Decoding	Langlin Huang et.al.	2506.03566	link
2025-06-02	Out-of-Vocabulary Sampling Boosts Speculative Decoding	Nadav Timor et.al.	2506.03206	null
2025-06-03	Feedstack: Layering Structured Representations over Unstructured Feedback to Scaffold Human AI Conversation	Hannah Vy Nguyen et.al.	2506.03052	null
2025-06-03	Reuse or Generate? Accelerating Code Editing via Edit-Oriented Speculative Decoding	Peiding Wang et.al.	2506.02780	null
2025-06-28	Multi Layered Autonomy and AI Ecologies in Robotic Art Installations	Baoyang Chen et.al.	2506.02606	null
2025-06-03	Consultant Decoding: Yet Another Synergistic Mechanism	Chuanghao Ding et.al.	2506.02391	null
2025-06-02	Radiation GRMHD Models of Accretion onto Stellar-Mass Black Holes: I. Survey of Eddington Ratios	Lizhong Zhang et.al.	2506.02289	null
2025-05-16	SpecMemo: Speculative Decoding is in Your Pocket	Selin Yildirim et.al.	2506.01986	null
2025-05-16	Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism	Yuhao Shen et.al.	2506.01979	null
2025-06-02	Synchronic Web Digital Identity: Speculations on the Art of the Possible	Thien-Nam Dinh et.al.	2506.01856	null
2025-07-04	Playing with Transformer at 30+ FPS via Next-Frame Diffusion	Xinle Cheng et.al.	2506.01380	null
2025-06-02	Shape Shifting Light Dark Matter Solitons	Dor Ben-Amotz et.al.	2506.01282	null
2025-06-01	The $M_{\rm BH}-M_\star$ Relation of the hyperluminous Dust-obscured Quasars up to $z \sim 4$	Yibin Luo et.al.	2506.01218	null
2025-06-01	Mamba Drafters for Speculative Decoding	Daewon Choi et.al.	2506.01206	null
2025-06-01	The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage	Byung-Doh Oh et.al.	2506.01172	null
2025-05-31	Accelerating Diffusion LLMs via Adaptive Parallel Decoding	Daniel Israel et.al.	2506.00413	null
2025-05-31	Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively	Jiawei Gu et.al.	2506.00396	link
2025-05-30	Cross-Attention Speculative Decoding	Wei Zhong et.al.	2505.24544	null
2025-05-30	CLaSp: In-Context Layer Skip for Self-Speculative Decoding	Longze Chen et.al.	2505.24196	null
2025-06-10	Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism	Jinhui Wei et.al.	2505.23219	null
2025-05-28	Pre-Training Curriculum for Multi-Token Prediction in Language Models	Ansar Aynetdinov et.al.	2505.22757	link
2025-05-28	Mass-feeding of jet-launching white dwarfs in grazing and common envelope evolution	Noam Soker et.al.	2505.22621	null
2025-05-29	Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design	Yudi Zhang et.al.	2505.22179	link
2025-05-28	RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding	Yuichiro Hoshino et.al.	2505.22135	null
2025-05-28	Robust and Symmetric Magnetic Field Dependency of Superconducting Diode Effect in Asymmetric Dirac Semimetal SQUIDs	H. C. Travaglini et.al.	2505.21861	null
2025-05-27	Computocene: Notes from an Age of Observation	Simone Severini et.al.	2505.21744	null
2025-05-27	Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits	Yeshwanth Venkatesha et.al.	2505.21594	null
2025-05-27	Hardware-Efficient Attention for Fast Decoding	Ted Zadouri et.al.	2505.21487	null
2025-05-27	Pair binding and Hund’s rule breaking in high-symmetry fullerenes	R. Rausch et.al.	2505.21455	null
2025-05-28	Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity	Yehui Tang et.al.	2505.21411	null
2025-05-27	Repeated Auctions with Speculators: Arbitrage Incentives and Forks in DAOs	Nicolas Eschenbaum et.al.	2505.21296	null
2025-05-27	SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences	Jungyoub Cha et.al.	2505.20776	link
2025-05-27	Replication of Reference-Dependent Preferences and the Risk-Return Trade-Off in the Chinese Market	Penggan Xu et.al.	2505.20608	null
2025-05-26	Academic Research Output Derivatives: Structuring Futures and Options on Research Output Index	Amarendra Sharma et.al.	2505.20492	null
2025-05-26	Bounded cohomology, quotient extensions, and hierarchical hyperbolicity	Francesco Fournier-Facio et.al.	2505.20462	null
2025-05-26	HAMburger: Accelerating LLM Inference via Token Smashing	Jingyu Liu et.al.	2505.20438	null
2025-05-23	Reinforcement Speculative Decoding for Fast Ranking	Yingpeng Du et.al.	2505.20316	null
2025-06-13	MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE	Zongle Huang et.al.	2505.19645	null
2025-05-28	Faster and Better LLMs via Latency-Aware Test-Time Scaling	Zili Wang et.al.	2505.19634	null
2025-07-23	Turing Test 2.0: The General Intelligence Threshold	Georgios Mappouras et.al.	2505.19550	null
2025-05-29	DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding	Yunhai Hu et.al.	2505.19201	link
2025-05-25	Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs	Xuan Zhang et.al.	2505.19155	null
2025-05-24	Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding	Yixuan Wang et.al.	2505.18629	null
2025-05-23	VeriThinker: Learning to Verify Makes Reasoning Model Efficient	Zigeng Chen et.al.	2505.17941	link
2025-05-20	Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency	Ruixiao Li et.al.	2505.17074	null
2025-05-16	SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs	Jinwoo Park et.al.	2505.17052	null
2025-05-22	KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization	Mingbo Song et.al.	2505.16162	null
2025-05-21	Strong Hilbert space fragmentation and fractons from subsystem and higher-form symmetries	Charles Stahl et.al.	2505.15889	null
2025-05-21	Quasinormal Modes of Schwarzschild Black Holes in the Dehnen-(1, 4, 5/2) Type Dark Matter Halos	Qi-Qi Liang et.al.	2505.15540	null
2025-06-03	Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding	Zijian Lin et.al.	2505.15380	null
2025-05-21	SSR: Speculative Parallel Scaling Reasoning in Test-time	Yuanlin Chu et.al.	2505.15340	null
2025-05-21	BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms	Yunlong Hou et.al.	2505.15141	null
2025-05-20	STree: Speculative Tree Decoding for Hybrid State-Space Models	Yangchao Wu et.al.	2505.14969	null
2025-05-20	On the Day They Experience: Awakening Self-Sovereign Experiential AI Agents	Botao Amber Hu et.al.	2505.14893	null
2025-05-20	Unremarkable to Remarkable AI Agent: Exploring Boundaries of Agent Intervention for Adults With and Without Cognitive Impairment	Mai Lee Chang et.al.	2505.14872	null
2025-05-20	X-ray properties of compact elliptical galaxies	Orsolya E. Kovacs et.al.	2505.14768	null
2025-05-20	Speculative Decoding Reimagined for Multimodal Large Language Models	Luxi Lin et.al.	2505.14260	link
2025-05-19	Language and Thought: The View from LLMs	Daniel Rothschild et.al.	2505.13561	null
2025-05-19	HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding	Siran Liu et.al.	2505.13254	null
2025-05-19	Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification	Jikai Wang et.al.	2505.13204	null
2025-05-19	FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference	Guangda Liu et.al.	2505.13109	null
2025-05-25	FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks	Zihua Wang et.al.	2505.12728	link
2025-05-18	Traversal Verification for Speculative Tree Decoding	Yepeng Weng et.al.	2505.12398	null
2025-05-16	FAIR Ecosystems for Science at Scale	Sean R. Wilkinson et.al.	2505.11742	null
2025-05-16	Prime Number Error Terms	Nathan Ng et.al.	2505.11295	null
2025-05-16	Beyond surfaces: quantifying internal radiative heat transport in dense materials	Janak Tiwari et.al.	2505.10853	null
2025-05-16	Qualia Optimization	Philip S. Thomas et.al.	2505.10779	null
2025-07-10	Anchoring AI Capabilities in Market Valuations: The Capability Realization Rate Model and Valuation Misalignment Risk	Xinmin Fang et.al.	2505.10590	null
2025-05-18	MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models	Mugilan Ganesan et.al.	2505.10526	null
2025-05-21	SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices	Xiangwen Zhuge et.al.	2505.10259	link
2025-05-14	Chandra Rules Out Super-Eddington Accretion For Little Red Dots	Andrea Sacchi et.al.	2505.09669	null
2025-06-28	Extended Structural Dynamics – Emergent Irreversibility from Reversible Dynamics	Patrick BarAvi et.al.	2505.09650	null
2025-05-14	Observational study of the formation of homologous confined circular-ribbon flares	Shuhong Yang et.al.	2505.09093	null
2025-05-13	Long timescale numerical simulations of large, super-critical accretion discs	P. Chris Fragile et.al.	2505.08859	null
2025-05-13	Kudzu: Fast and Simple High-Throughput BFT	Victor Shoup et.al.	2505.08771	null
2025-05-13	Automatic Task Detection and Heterogeneous LLM Speculative Decoding	Danying Ge et.al.	2505.08600	null
2025-05-12	GUP Effective Metric Without GUP: Implications for the Sign of GUP Parameter and Quantum Bounce	Yen Chin Ong et.al.	2505.07972	null
2025-05-12	Localized Gravity, de Sitter, and the Horizon Criterion	Bjoern Friedrich et.al.	2505.07934	null
2025-06-22	TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking	Ching Nam Hang et.al.	2505.07891	null
2025-05-08	Scaling Laws for Speculative Decoding	Siyuan Yan et.al.	2505.07858	null
2025-05-12	SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models	Hang Wu et.al.	2505.07680	null
2025-05-10	N-body simulations of the Self-Confinement of Viscous Self-Gravitating Narrow Eccentric Planetary Ringlets	Joseph M. Hahn et.al.	2505.06639	null
2025-05-09	FastDup: a scalable duplicate marking tool using speculation-and-test mechanism	Zhonghai Zhang et.al.	2505.06127	link
2025-05-08	A Physics Model for Origin of Life	Paul Howard Frampton et.al.	2505.05634	null
2025-05-08	Memory Under Siege: A Comprehensive Survey of Side-Channel Attacks on Memory	MD Mahady Hassan et.al.	2505.04896	null
2025-05-08	Topological phase transition to a hidden charge density wave liquid	Joshua S. H. Lee et.al.	2505.04867	null
2025-05-07	SOAEsV2-7B/72B: Full-Pipeline Optimization for State-Owned Enterprise LLMs via Continual Pre-Training, Domain-Progressive SFT and Distillation-Enhanced Speculative Decoding	Jingyang Deng et.al.	2505.04723	null
2025-05-06	Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation	Hengyuan Hu et.al.	2505.03983	null
2025-05-06	QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies	Shuyao Cheng et.al.	2505.03195	null
2025-05-04	The quest for explosive bubbles in the Indonesian Rupiah/US exchange rate: Does the uncertainty trinity matter?	Abdul Khaliq et.al.	2505.02869	null
2025-05-24	Accelerating Large Language Model Reasoning via Speculative Search	Zhihai Wang et.al.	2505.02865	null
2025-05-21	Dirac Singleton as a Relativistic Field Beyond Standard Model	M. A. Vasiliev et.al.	2505.01915	null
2025-05-03	Speculative Evolution Through 3D Cellular Automata	Amir Hossein Khazaei et.al.	2505.01692	null
2025-05-02	PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding	Bradley McDanel et.al.	2505.01572	null
2025-05-12	Emotions in Artificial Intelligence	Hermann Borotschnig et.al.	2505.01462	null
2025-04-29	X-ray Spectroscopy via Temporal Decomposition	William Setterberg et.al.	2504.21169	null
2025-07-02	Ground to Dust: Collisional Cascades and the Fate of Kardashev II Megaswarms	Brian C. Lacki et.al.	2504.21151	null
2025-06-10	EvoPort: An Evolutionary Framework for Portfolio Optimization via Randomized Alpha Discovery and Ensemble-Based Allocation	Nguyen Van Thanh et.al.	2504.21095	null
2025-04-29	Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding	Gabe Guo et.al.	2504.20456	link
2025-04-28	AutoJudge: Judge Decoding Without Manual Annotation	Roman Garipov et.al.	2504.20039	null
2025-04-27	Detecting speculative data flow vulnerabilities using weakest precondition reasoning	Graeme Smith et.al.	2504.19128	null
2025-05-25	Efficient Reasoning for LLMs through Speculative Chain-of-Thought	Jikai Wang et.al.	2504.19095	link
2025-04-26	Global Simulations of Gravitational Instability in Protostellar Disks with Full Radiation Transport II. Locality of Gravitoturbulence, Clumpy Spirals, and Implications for Observable Substructure	Wenrui Xu et.al.	2504.18751	null
2025-06-15	PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation	Zihao An et.al.	2504.18583	null
2025-04-25	Generalizing the relativistic precession model of quasi-periodic oscillations through anharmonic corrections	Roberto Giambò et.al.	2504.18403	null
2025-04-23	A Vision for AI-Driven Adaptation of Dynamic AR Content to Users and Environments	Julian Rasch et.al.	2504.16562	null
2025-04-23	Hardness of Median and Center in the Ulam Metric	Nick Fischer et.al.	2504.16437	null
2025-04-22	On commuting integer matrices	Jonathan Chapman et.al.	2504.15839	null
2025-04-22	Delayed Keen Model with Inflation	Ali Tolga Dincer et.al.	2504.15819	null
2025-04-21	Speculative Sampling via Exponential Races	Szymon Kobus et.al.	2504.15475	null
2025-05-16	Rendezvous in CAVITY: Kinematics and gas properties of an isolated dwarf-dwarf merging pair in a cosmic void region	Bahar Bidaran et.al.	2504.15359	null
2025-04-21	*The phase diagram of CeRh ${2}$As${2}$ for out-of-plane magnetic field*	P. Khanenko et.al.	2504.15112	null
2025-04-21	Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds	Heidy Khlaaf et.al.	2504.15088	null
2025-04-21	Note on Type $III_1$ Algebras in $ c= 1$ String Theory and Bulk Causal Diamonds	T. Banks et.al.	2504.15076	null
2025-04-21	Exploring Collaborative GenAI Agents in Synchronous Group Settings: Eliciting Team Perceptions and Design Considerations for the Future of Work	Janet G. Johnson et.al.	2504.14779	null
2025-05-27	BLACKOUT: Data-Oblivious Computation with Blinded Capabilities	Hossam ElAtali et.al.	2504.14654	null
2025-04-25	UFO2: The Desktop AgentOS	Chaoyun Zhang et.al.	2504.14603	link
2025-04-20	An interstellar mission to test astrophysical black holes	Cosimo Bambi et.al.	2504.14576	null
2025-04-19	Charge Densities in Crystals and Triply-Periodic Minimal Surfaces	Mengdi Yin et.al.	2504.14148	null
2025-04-18	Going Whole Hog: A Philosophical Defense of AI Cognition	Herman Cappelen et.al.	2504.13988	null
2025-04-16	From job titles to jawlines: Using context voids to study generative AI systems	Shahan Ali Memon et.al.	2504.13947	null
2025-03-21	Bio-crafting Architecture: Experiences of growing mycelium in minimal surface molds	Anca-Simona Horvath et.al.	2504.13855	null
2025-05-28	The Sky as a Killing Horizon	Níckolas de Aguiar Alves et.al.	2504.12514	null
2025-04-12	Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time	Wang Yang et.al.	2504.12329	link
2025-04-18	Large Language Model-Based Knowledge Graph System Construction for Sustainable Development Goals: An AI-Based Speculative Design Perspective	Yi-De Lin et.al.	2504.12309	null
2025-04-16	Purposefully Induced Psychosis (PIP): Embracing Hallucination as Imagination in Large Language Models	Kris Pilcher et.al.	2504.12012	null
2025-04-16	Who Said Only Military Officers Can Deal with Uncertainty? On the Importance of Uncertainty in EdTech Data Visualisations	Felicitas Macgilchrist et.al.	2504.11974	null
2025-04-15	Five dimensional rotating and Quintessence black hole and their shadows	Milko Estrada et.al.	2504.11408	null
2025-04-16	Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance	Shangyu Liu et.al.	2504.11197	null
2025-04-14	Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing Obfuscation	Kartik Ramkrishnan et.al.	2504.10318	null
2025-04-14	Gravitational metamaterials from optical properties of spacetime media	Orlando Luongo et.al.	2504.09987	null
2025-04-12	Authoritarian Recursions: How Fiction, History, and AI Reinforce Control in Education, Warfare, and Discourse	Hasan Oguz et.al.	2504.09030	null
2025-04-11	SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting	Jiaming Xu et.al.	2504.08850	null
2025-05-31	SD $^2$ : Self-Distilled Sparse Drafters	Mike Lasby et.al.	2504.08838	null
2025-04-05	SLOs-Serve: Optimized Serving of Multi-SLO LLMs	Siyuan Chen et.al.	2504.08784	null
2025-04-11	Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices	Shengyuan Ye et.al.	2504.08242	null
2025-05-16	SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning	Rui Pan et.al.	2504.07891	link
2025-04-10	Synthetic Fluency: Hallucinations, Confabulations, and the Creation of Irish Words in LLM-Generated Translations	Sheila Castilho et.al.	2504.07680	null
2025-04-10	Proceedings of the Purposeful XR Workshop for CHI 2025	Elizabeth Childs et.al.	2504.07475	null
2025-04-09	Joint Survey Processing. III. Compact Oddballs in the COSMOS Field – Little Red Dots and Transients	Yu-Heng Lin et.al.	2504.07196	null
2025-04-09	ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation Schemes	Amund Bergland Kvalsvik et.al.	2504.07018	null
2025-04-08	SPIRe: Boosting LLM Inference Throughput with Speculative Decoding	Sanjit Neelam et.al.	2504.06419	null
2025-04-08	Decoding the Ishango Bone: Unveiling Prehistoric Mathematical Art	Jenny Baur et.al.	2504.06412	null
2025-04-08	Interplay between trimer structure and magnetic ground state in Ba5Ru3O12 probed by Neutron and muSR techniques	E. Kushwaha et.al.	2504.06113	null
2025-04-08	Strong Evidence That Abiogenesis Is a Rapid Process on Earth Analogs	David Kipping et.al.	2504.05993	null
2025-04-08	DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding	Hossein Entezari Zarch et.al.	2504.05598	null
2025-06-03	Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution	Raffi Khatchadourian et.al.	2504.05424	null
2025-04-06	pc-COP: An Efficient and Configurable 2048-p-Bit Fully-Connected Probabilistic Computing Accelerator for Combinatorial Optimization	Kiran Magar et.al.	2504.04543	null
2025-06-02	Representations of $p$ -adic groups and orbits with smooth closure in a variety of Langlands parameters	Kristaps Balodis et.al.	2504.04163	null
2025-04-05	PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models	Haofei Yin et.al.	2504.04104	null
2025-03-23	Agentic Business Process Management: The Past 30 Years And Practitioners’ Future Perspectives	Hoang Vu et.al.	2504.03693	null
2025-04-04	Ethics Readiness of Technology: The case for aligning ethical approaches with technological maturity	Eline de Jong et.al.	2504.03336	null
2025-04-03	A Review of Prototyping in XR: Linking Extended Reality to Digital Fabrication	Bixun Chen et.al.	2504.02998	null
2025-05-02	GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation	Zhiyuan Yan et.al.	2504.02782	link
2025-04-03	Black Holes, Moduli Stabilisation and the Swampland	Matilda Delgado et.al.	2504.02645	null
2025-04-08	Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge	Dong-Sig Han et.al.	2504.02618	null
2025-06-16	Graviton Scattering on Gravitational Atoms: Relic Graviton Shot Noise	Benjamin Avila-Lopez et.al.	2504.01286	null
2025-04-01	Reminiscences about Steven Weinberg (This Time it’s Personal)	C. P. Burgess et.al.	2504.01118	null
2025-04-01	Mesoscale Eddy – Internal Wave Coupling. III. The End of the Enstrophy Cascade and Maintenance of Gyre Scale Potential Vorticity Gradients	Kurt L. Polzin et.al.	2504.00486	null
2025-04-01	The Impact of Triangular-Toothed Gears on the Functionality of the Antikythera Mechanism	Esteban Guillermo Szigety y Gustavo Francisco Arenas et.al.	2504.00327	null
2025-06-04	Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding	Aayush Gautam et.al.	2504.00030	null
2025-03-31	*What the Fck Is Artificial General Intelligence?**	Michael Timothy Bennett et.al.	2503.23923	null
2025-03-31	A search for the three isomers of cyano-1,3-butadiene in TMC-1: Implications for bottom-up routes involving 1,3-butadiene	M. Agundez et.al.	2503.23841	null
2025-03-30	Credit, Land Speculation, and Low-Interest-Rate Policy	Tomohiro Hirano et.al.	2503.23552	null
2025-03-30	The Longest Duration SGRE Event in Solar Cycle 25	Nat Gopalswamy et.al.	2503.23544	null
2025-03-30	Speculative End-Turn Detector for Efficient Speech Chatbot Assistant	Hyunjong Ok et.al.	2503.23439	null
2025-03-29	Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation	Dominik Macko et.al.	2503.23242	null
2025-03-28	Formation and Evolution of Compact Binaries Containing Intermediate Mass Black Holes in Dense Star Clusters`	Seungjae Lee et.al.	2503.22109	null
2025-03-27	How to Constrain the Stochastic Gravitational Wave Background with Multi-Frequency Detections	Eleanor Gleave et.al.	2503.21508	null
2025-03-26	Speculations on higher Fukaya categories	James Pascaleff et.al.	2503.20906	null
2025-03-24	The Centers and Margins of Modeling Humans in Well-being Technologies: A Decentering Approach	Jichen Zhu et.al.	2503.19132	null
2025-05-14	Spectropolarimetry of A Nuclear Transient AT2023clx: Revealing The Geometrical Alignment between The Transient Outflow and The Nuclear Dusty Region	Kohki Uno et.al.	2503.19024	null
2025-03-23	A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models	Zuan Xie et.al.	2503.18989	null
2025-03-23	A Multi-Model Adaptation of Speculative Decoding for Classification	Somnath Roy et.al.	2503.18076	null
2025-03-20	SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs	Shibo Jie et.al.	2503.16163	null
2025-03-20	“This could save us months of work” – Use Cases of AI and Automation Support in Investigative Journalism	Besjon Cifliku et.al.	2503.16011	null
2025-03-20	SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models	Fahao Chen et.al.	2503.15921	null
2025-03-19	Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices	Ziyao Wang et.al.	2503.14932	null
2025-06-12	The Origin of the Very-High-Energy Diffuse $γ$ -Ray Emission: The Case for Galactic Source Cocoons	Antonio Ambrosone et.al.	2503.14651	null
2025-05-04	Superconductivity in magnetars: Exploring type-I and type-II states in toroidal magnetic fields	Mayusree Das et.al.	2503.14594	null
2025-03-26	Association of 220 PeV Neutrino KM3-230213A with Gamma-Ray Bursts	Ruiqi Wang et.al.	2503.14471	null
2025-03-18	Neutron portal to ultra-high-energy neutrinos	Gustavo F. S. Alves et.al.	2503.14419	null
2025-03-18	Speculative Decoding for Verilog: Speed and Quality, All in One	Changran Xu et.al.	2503.14153	null
2025-03-18	Growing a Twig to Accelerate Large Vision-Language Models	Zhenwei Shao et.al.	2503.14075	null
2025-03-17	ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts	Evangelos Georganas et.al.	2503.13565	null
2025-03-17	Enhanced anomalous Hall effect in the topological Kagome metal Cs(V $_{1-x}$Mn$_x$)$_3$Sb$_5$	Xinmin Wang et.al.	2503.13351	null
2025-03-28	WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows	Fabian Lehmann et.al.	2503.13072	link
2025-05-15	Collaborative Speculative Inference for Efficient LLM Inference Serving	Luyao Gao et.al.	2503.10325	null
2025-03-13	Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding	Jinze Li et.al.	2503.10135	null
2025-03-12	A practical guide to machine learning interatomic potentials – Status and future	Ryan Jacobs et.al.	2503.09814	null
2025-03-11	In Search of the Potentially Hazardous Asteroids in the Taurid Resonant Swarm	Jasmine Li et.al.	2503.08670	null
2025-03-11	Liquidity Competition Between Brokers and an Informed Trader	Ryan Donnelly et.al.	2503.08287	null
2025-03-25	Training Domain Draft Models for Speculative Decoding: Best Practices and Insights	Fenglu Hong et.al.	2503.07807	null
2025-03-10	Did smartphones break the world as we knew it?	Mikhail V. Tamm et.al.	2503.07773	null
2025-03-13	Design as Hope: Reimagining Futures for Seemingly Doomed Problems	JaeWon Kim et.al.	2503.07586	null
2025-03-09	A parallel parser for regular expressions	Angelo Borsotti et.al.	2503.06763	null
2025-03-07	Quantum-like cognition and decision making in the light of quantum measurement theory	Miho Fuyama et.al.	2503.05859	null
2025-02-25	Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research	Veda C. Storey et.al.	2503.05770	null
2025-03-07	Speculative Decoding for Multi-Sample Inference	Yiwei Li et.al.	2503.05330	null
2025-03-07	SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding	Kaiyu Huang et.al.	2503.05096	null
2025-02-11	Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations	Kunal Handa et.al.	2503.04761	null
2025-03-19	Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling	Yan Li et.al.	2503.04398	null
2025-03-06	A possible jet and corona configuration for Swift J1727.8–1613 during the hard state	Jing-Qiang Peng et.al.	2503.04044	null
2025-03-05	RASD: Retrieval-Augmented Speculative Decoding	Guofeng Quan et.al.	2503.03434	null
2025-03-26	SpecInF: Exploiting Idle GPU Resources in Distributed DL Training via Speculative Inference Filling	Cunchi Lv et.al.	2503.02550	null
2025-04-02	Linear Representations of Political Perspective Emerge in Large Language Models	Junsol Kim et.al.	2503.02080	link
2025-04-23	EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test	Yuhui Li et.al.	2503.01840	link
2025-03-03	Efficient Long-Term Structural Reliability Estimation with Non-Gaussian Stochastic Models: A Design of Experiments Approach	Sebastian Winter et.al.	2503.01566	null
2025-03-17	MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing	Haoxuan Li et.al.	2503.01425	null
2025-03-24	Turbulence in virtual: II. Origin of skewness and dual fraction processes	Xunchuan Liu et.al.	2503.01160	null
2025-03-02	DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting	Kai Lv et.al.	2503.00784	link
2025-03-02	Speculative Ad-hoc Querying	Haoyu Li et.al.	2503.00714	link
2025-03-01	Tutorial Proposal: Speculative Decoding for Efficient LLM Inference	Heming Xia et.al.	2503.00491	null
2025-03-01	Peek into the `White-Box’: A Field Study on Bystander Engagement with Urban Robot Uncertainty	Xinyan Yu et.al.	2503.00337	null
2025-03-01	Doraemon’s Gadget Lab: Unpacking Human Needs and Interaction Design in Speculative Technology	Tram Thi Minh Tran et.al.	2503.00257	null
2025-02-28	Broadband pulsed quadrature measurements with calorimeters	Ezad Shojaee et.al.	2503.00188	null
2025-02-28	AMuLeT: Automated Design-Time Testing of Secure Speculation Countermeasures	Bo Fu et.al.	2503.00145	link
2025-02-28	Assessment of universal relations among second-order moments of relativistic stars via reformulated perturbation equations	Koutarou Kyutoku et.al.	2503.00098	null
2025-02-14	A Short History of Rocks: or, How to Invent Quantum Computing	David Wakeham et.al.	2503.00005	null
2025-05-13	Nano Drone-based Indoor Crime Scene Analysis	Martin Cooney et.al.	2502.21019	null
2025-03-04	Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff	Maximilian Holsman et.al.	2502.20704	link
2025-02-28	MonadBFT: Fast, Responsive, Fork-Resistant Streamlined Consensus	Mohammad Mussadiq Jalalzai et.al.	2502.20692	null
2025-03-24	Turbulence in virtual: Origin of the variance and skewness of density function	Xunchuan Liu et.al.	2502.20458	null
2025-02-27	Long-Context Inference with Retrieval-Augmented Speculative Decoding	Guanzheng Chen et.al.	2502.20330	link
2025-04-28	Frobenius subalgebra lattices in tensor categories	Mainak Ghosh et.al.	2502.19876	null
2025-03-04	Speculative Decoding and Beyond: An In-Depth Survey of Techniques	Yunhai Hu et.al.	2502.19732	null
2025-02-26	From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens	Tong Wu et.al.	2502.18890	link
2025-02-26	Reimagining Personal Data: Unlocking the Potential of AI-Generated Images in Personal Data Meaning-Making	Soobin Park et.al.	2502.18853	null
2025-02-26	Towards Optimal Multi-draft Speculative Decoding	Zhengmian Hu et.al.	2502.18779	null
2025-03-02	Variability of Central Stars of Planetary Nebulae with the Zwicky Transient Facility. II. Long-Timescale Variables including Wide Binary and Late Thermal Pulse Candidates	Soumyadeep Bhattacharjee et.al.	2502.18651	null
2025-02-27	Kinematics of metallicity populations in Omega Centauri using Gaia Focused Product Release and Hubble Space Telescope	Nagaraj Vernekar et.al.	2502.17755	null
2025-02-24	Knowledge Distillation with Training Wheels	Guanlin Liu et.al.	2502.17717	null
2025-02-24	THOR: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX	Farshad Dizani et.al.	2502.17658	null
2025-02-24	LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification	Penghui Yang et.al.	2502.17421	link
2025-02-24	Defects in the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$ MOS system and the effect of thermal treatments	Khushabu. S. Agrawal et.al.	2502.17112	null
2025-05-25	CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter	Yepeng Weng et.al.	2502.16880	null
2025-02-24	APINT: A Full-Stack Framework for Acceleration of Privacy-Preserving Inference of Transformers based on Garbled Circuits	Hyunjun Cho et.al.	2502.16877	null
2025-04-03	Towards Reinforcement Learning for Exploration of Speculative Execution Vulnerabilities	Evan Lai et.al.	2502.16756	null
2025-02-22	Fluctuating Lattice, Several Energy Scales	Holger Bech Nielsen et.al.	2502.16369	null
2025-02-21	DReSD: Dense Retrieval for Speculative Decoding	Milan Gritta et.al.	2502.15572	link
2025-02-27	PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System	Yintao He et.al.	2502.15470	null
2025-02-24	Ultra-high-energy $γ$ -ray emission associated with the tail of a bow-shock pulsar wind nebula	Zhen Cao et.al.	2502.15447	null
2025-02-21	TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding	Zhaoxuan Wu et.al.	2502.15197	null
2025-02-21	A Critical Examination of the Nested Leaky Box Model for Galactic Cosmic Ray Transport	Benedikt Schroer et.al.	2502.15115	null
2025-03-11	FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling	Weilin Zhao et.al.	2502.14856	null
2025-05-07	Fusion rules and structure constants of E-series minimal models	Rongvoram Nivesvivat et.al.	2502.14295	null
2025-02-19	Which Attention Heads Matter for In-Context Learning?	Kayo Yin et.al.	2502.14010	link
2025-03-17	NVR: Vector Runahead on NPUs for Sparse Memory Access	Hui Wang et.al.	2502.13873	null
2025-02-19	Hierarchical accretion flow from the G351 infrared dark filament to its central cores	H. Beuther et.al.	2502.13866	null
2025-02-19	C2T: A Classifier-Based Tree Construction Method in Speculative Decoding	Feiye Huo et.al.	2502.13652	null
2025-02-19	Near-extremal dumb holes and some aspects of the Hawking effect	Akshat Pandey et.al.	2502.13557	null
2025-02-19	Radio observations of the ultra-long GRB 220627A reveal a hot cocoon supporting the blue supergiant progenitor scenario	James K. Leung et.al.	2502.13435	null
2025-02-18	Inconsistent metallicity spreads in first generation stars of globular clusters from high resolution spectroscopy and HST photometry	Eugenio Carretta et.al.	2502.13206	null
2025-02-17	SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs	Yige Xu et.al.	2502.12134	null
2025-02-16	AI Generations: From AI 1.0 to AI 4.0	Jiahao Wu et.al.	2502.11312	null
2025-02-16	Coherent Spin Pumping Originated from Sub-Terahertz Néel Vector Dynamics in Easy Plane α-Fe2O3/Pt	Gregory Fritjofson et.al.	2502.11281	null
2025-02-16	GRIFFIN: Effective Token Alignment for Faster Speculative Decoding	Shijing Hu et.al.	2502.11018	link
2025-02-05	QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache	Rishabh Tiwari et.al.	2502.10424	null
2025-02-13	Rosette Nebula Outburst Gaia 24djk from the Young Stellar Object V557 Mon	Adolfo S. Carvalho et.al.	2502.09523	null
2025-02-13	$^{18}$ F-FDG brain PET hypometabolism in post-SARS-CoV-2 infection: substrate for persistent/delayed disorders?	Eric Guedj et.al.	2502.09077	null
2025-02-13	CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality	Razvan-Gabriel Dumitru et.al.	2502.08923	link
2025-03-19	Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding	Ziyao Wang et.al.	2502.08020	null
2025-04-13	Regular Black Holes in Lovelock gravity with a Degenerate AdS Ground State and their shadows	Milko Estrada et.al.	2502.07992	null
2025-03-06	Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs	Ruichen Zhang et.al.	2502.07942	null
2025-02-05	Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference	Toby Simonds et.al.	2502.06833	null
2025-02-10	Persistent spin grids with spin-orbit coupled 2D electron gas	A. V. Poshakinskiy et.al.	2502.06745	null
2025-03-27	LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models	Sihwan Park et.al.	2502.06352	link
2025-02-10	Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE	Haiduo Huang et.al.	2502.06282	link
2025-02-08	Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding	Sukmin Cho et.al.	2502.05609	link
2025-01-31	Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies	Nadav Timor et.al.	2502.05202	null
2025-02-07	Learning Universal Multi-level Market Irrationality Factors to Improve Stock Return Forecasting	Chen Yang et.al.	2502.04737	null
2025-02-06	Speeding up Speculative Decoding via Approximate Verification	Meiyu Zhong et.al.	2502.04557	null
2025-02-06	Gig2Gether: Data-sharing to Empower, Unify and Demystify Gig Work	Jane Hsieh et.al.	2502.04482	null
2025-02-06	The Evolution of Hypervelocity Supernova Survivors and the Outcomes of Interacting Double White Dwarf Binaries	Ken J. Shen et.al.	2502.04451	null
2025-02-06	Properties of the emission region in pulsars with opposite subpulse drift directions in different profile components	H. M. Tedila et.al.	2502.03833	null
2025-02-05	COSMOS-Web: The emergence of the Hubble Sequence	M. Huertas-Company et.al.	2502.03532	null
2025-02-13	FSLH: Flexible Mechanized Speculative Load Hardening	Roberto Blanco et.al.	2502.03203	null
2025-02-05	How probable is the Lyman- $α$ damping wing in the spectrum of the redshift z = 5.9896 quasar ULAS J0148+0600?	Fiona Sawyer et.al.	2502.03085	null
2025-02-05	A comprehensive study of the gas-phase formation network of HC $_5$ N: theory, experiments, observations and models	Lisa Giani et.al.	2502.03046	null
2025-04-17	The connection between high-redshift galaxies and Lyman $α$ transmission in the Sherwood-Relics simulations of patchy reionisation	Luke Conaboy et.al.	2502.02983	null
2025-02-05	Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation	Jingyu Liu et.al.	2502.02789	link
2025-02-04	EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization	Yize Wu et.al.	2502.02493	null
2025-02-04	M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference	Nikhil Bhendawade et.al.	2502.02040	null
2025-02-03	Cosmic Ray Feedback in Massive Halos: Implications for the Distribution of Baryons	Eliot Quataert et.al.	2502.01753	null
2025-02-01	Speculative Ensemble: Fast Large Language Model Ensemble via Speculation	Jiale Fu et.al.	2502.01662	link
2025-02-03	Time-dependent solutions of biadjoint scalar field theories	Kymani Armstrong-Williams et.al.	2502.01294	null
2025-02-02	Constructing AI ethics narratives based on real-world data: Human-AI collaboration in data-driven visual storytelling	Mengyi Wei et.al.	2502.00637	null
2025-02-01	Predicting the number density of heavy seed massive black holes due to an intense Lyman-Werner field	Hannah O’Brennan et.al.	2502.00574	null
2025-02-04	Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation	Yang Cao et.al.	2502.00500	null
2025-02-14	Reward-Guided Speculative Decoding for Efficient LLM Reasoning	Baohao Liao et.al.	2501.19324	null
2025-01-31	Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment	Gregor Bachmann et.al.	2501.19309	null
2025-02-19	Emancipatory Information Retrieval	Bhaskar Mitra et.al.	2501.19241	null
2025-01-31	Trading Inference-Time Compute for Adversarial Robustness	Wojciech Zaremba et.al.	2501.18841	null
2025-01-30	Human Re-ID Meets LVLMs: What can we expect?	Kailash Hambarde et.al.	2501.18698	null
2025-01-28	How Hamilton-Jacobi formalism helps to address the physical meaning of the wave function in Bohmian mechanics	Arnaud Amblard et.al.	2501.16989	null
2025-03-04	Distilling Large Language Models for Network Active Queue Management	Deol Satish et.al.	2501.16734	null
2025-01-24	The disrupting and growing open cluster spiral arm patterns of the Milky Way	Xiaochen Liu et.al.	2501.14215	null
2025-01-19	Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks	Diego Gosmar et.al.	2501.13946	link
2025-01-23	Inflaton Self Resonance, Oscillons, and Gravitational Waves in Small Field Polynomial Inflation	Manuel Drees et.al.	2501.13811	null
2025-01-23	Considerations on the Origin of IRAS 19312+1950 Based on Long-Term Maser Observations	Huan-Xue Feng et.al.	2501.13769	null
2025-01-23	Compiler Support for Speculation in Decoupled Access/Execute Architectures	Robert Szafarczyk et.al.	2501.13553	null
2025-02-01	Concentration in Governance Control Across Decentralised Finance Protocols	Thomas Eisermann et.al.	2501.13377	link
2025-01-22	The outer structure of old star clusters in the Small Magellanic Cloud	Andrés E. Piatti et.al.	2501.13062	null
2025-01-22	Entanglement dynamics in collision models and entanglement quilts	Le Hu et.al.	2501.12629	null
2025-01-22	Link in $\mathbb{R}\mathbb{P}^3$ and the Topological Vertex	John Chae et.al.	2501.12566	null
2025-01-21	AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding	Zikun Li et.al.	2501.12162	null
2025-01-20	MIDIS: Quantifying the AGN component of X-ray-detected galaxies	Steven Gillman et.al.	2501.11491	null
2025-01-23	The JWST EXCELS survey: an extremely metal-poor galaxy at $z=8.271$ hosting an unusual population of massive stars	F. Cullen et.al.	2501.11099	null
2025-01-30	Vortices for lake equations (review with questions and speculations)	Jair Koiller et.al.	2501.10433	null
2025-01-17	From strong to weak correlations in breathing-mode kagome van der Waals materials: Nb $_3$(F,Cl,Br,I)$_8$ as a robust and versatile platform for many-body engineering	Joost Aretz et.al.	2501.10320	null
2025-01-16	25 years of XMM-Newton observations of the Sgr A complex: 3D distribution and internal structure of the clouds	G. Stel et.al.	2501.09737	null
2025-01-16	Weak electronic correlations in the cobalt oxychalcogenide superconductor Na2CoSe2O	Zhenchao Wu et.al.	2501.09675	null
2025-02-11	Anatomy of a Digital Bubble: Lessons Learned from the NFT and Metaverse Frenzy	Daisuke Kawai et.al.	2501.09601	null
2025-01-16	A universal break in energy functions of three hyperactive repeating fast radio bursts	Q. Wu et.al.	2501.09248	null
2025-01-15	The emission of interpulses by a 6.45-hour period coherent radio transient	Y. W. J. Lee et.al.	2501.09133	null
2025-01-13	Cassiopeia A’s Reverse Shock and its Effects on the Expanding SN Ejecta	Robert A. Fesen et.al.	2501.07708	null
2025-01-11	Is the Monetary Transmission Mechanism Broken? Time for People’s Quantitative Easing	Sebastian Dragoe et.al.	2501.06575	null
2025-01-27	QPEs as Lense-Thirring precession of super-Eddington flows	M. Middleton et.al.	2501.06185	link
2025-01-10	Analysing the coverage of the University of Bologna’s publication metadata in an existing source of open research information	Erica Andreose et.al.	2501.05821	null
2025-01-09	Accelerated Diffusion Models via Speculative Sampling	Valentin De Bortoli et.al.	2501.05370	null
2025-01-09	The CO-Fuelled Time Machine: Tracing Birth Conditions and Terrestrial Planet Formation Outcomes in HD 163296 through Pebble Drift-induced CO Enhancements	Joe Williams et.al.	2501.05316	null
2025-01-09	Observational Study of the Atmospheric Gravity Waves in the lower Solar Atmosphere	Ravi Chaurasiya et.al.	2501.05042	null
2025-01-07	Transparent Decompilation for Timing Side-Channel Analyses	Santiago Arranz Olmos et.al.	2501.04183	null
2025-01-07	Spin Environment of a Superconducting Qubit in High Magnetic Fields	S. Günzler et.al.	2501.03661	null
2025-01-07	Neural Cellular Automata and Deep Equilibrium Models	Zhibai Jia et.al.	2501.03573	null
2025-01-07	CI at Scale: Lean, Green, and Fast	Dhruva Juloori et.al.	2501.03440	null
2025-01-02	Vertex algebras, topological defects, and Moonshine	Roberto Volpato et.al.	2412.21141	null
2024-12-30	Strategic Learning and Trading in Broker-Mediated Markets	Alif Aqsha et.al.	2412.20847	null
2024-12-28	From Worms to Mice: Homeostasis Maybe All You Need	Jesus Marco de Lucas et.al.	2412.20090	null
2025-01-13	HADES: Hardware Accelerated Decoding for Efficient Speculation in Large Language Models	Ze Yang et.al.	2412.19925	null
2024-12-27	Cosmohedra	Nima Arkani-Hamed et.al.	2412.19881	null
2024-12-27	Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design	Junjie Zhang et.al.	2412.19439	null
2024-12-25	Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference	Libo Zhang et.al.	2412.18934	null
2024-12-25	AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures	Situo Zhang et.al.	2412.18910	null
2024-12-23	The Unique Helium Nova V445 Puppis Ejected $\gg$0.001 M$_{\odot}$ in the Year 2000 and Will Not Become a Type Ia Supernova	Bradley E. Schaefer et.al.	2412.17286	null
2024-12-20	Gravitational Observatories in AdS $_4$	Dionysios Anninos et.al.	2412.16305	null
2024-12-20	Two-Part Interplanetary Type II Solar Radio Bursts	Silja Pohjolainen et.al.	2412.15961	null
2025-01-10	Minimizing speculation overhead in a parallel recognizer for regular texts	Angelo Borsotti et.al.	2412.14975	null
2025-01-13	$\mathcal{N}=2$ superconformal gravitino in harmonic superspace	Evgeny Ivanov et.al.	2412.14822	null
2025-02-07	The JWST/NIRSpec view of the nuclear region in the prototypical merging galaxy NGC 6240	Matteo Ceci et.al.	2412.14685	null
2024-12-18	Fermion-Portal Dark Matter at a High-Energy Muon Collider	Pouya Asadi et.al.	2412.14235	null
2024-12-18	Current and secular accretion rates of EX Hydrae	K. Beuermann et.al.	2412.13850	null
2024-12-18	Fool’s gold: ligand-receptor interactions and the origins of life	Betony Adams et.al.	2412.13836	null
2024-12-18	Diffusion models and stochastic quantisation in lattice field theory	Gert Aarts et.al.	2412.13704	null
2024-12-17	Distributed Speculative Execution for Resilient Cloud Applications	Tianyu Li et.al.	2412.13314	null
2024-12-17	Where do X-ray low surface brightness clusters sit with respect to filaments?	S. Zarattini et.al.	2412.13258	null
2024-12-17	Agnosticism About Artificial Consciousness	Tom McClelland et.al.	2412.13145	null
2024-12-17	Insight into the Starburst Nature of Galaxy GN-z11 with JWST MIRI Spectroscopy	J. Álvarez-Márquez et.al.	2412.12826	null
2025-03-18	Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models	Seungeun Oh et.al.	2412.12687	null
2024-12-26	Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree	Xiangxiang Gao et.al.	2412.12639	null
2024-12-15	Heat kernel and local index theorem for open complex manifolds with $\mathbb{C}^{\ast }$ -action	Jih-Hsin Cheng et.al.	2412.11037	null
2024-12-14	The JWST-NIRCam View of Sagittarius C. II. Evidence for Magnetically Dominated HII Regions in the CMZ	John Bally et.al.	2412.10983	null
2025-02-23	Interference in Fuzzy Dark Matter Filaments: Idealised Models and Statistics	Tim Zimmermann et.al.	2412.10829	null
2025-02-10	Constrained Decoding with Speculative Lookaheads	Nishanth Nakshatri et.al.	2412.10418	null
2025-01-15	Asymmetric Temperature Variations In Protoplanetary disks: I. Linear Theory, Corotating Spirals, and Ring Formation	Zhaohuan Zhu et.al.	2412.09571	null
2024-12-12	AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs’ Complex Reasoning Capabilities	Fabrizio Davide et.al.	2412.09385	null
2024-12-11	Can transformative AI shape a new age for our civilization?: Navigating between speculation and reality	Jesus L. Lobo et.al.	2412.08273	null
2024-12-10	Mapping the spatial extent of HI-rich absorbers using MgII absorption along gravitational arcs	Trystyn A. M. Berg et.al.	2412.07652	null
2024-12-26	CoinCLIP: A Multimodal Framework for Assessing Viability in Web3 Memecoins	Hou-Wan Long et.al.	2412.07591	null
2024-12-10	Modeling Speculative Trading Patterns in Token Markets: An Agent-Based Analysis with TokenLab	Mengjue Wang et.al.	2412.07512	null
2024-12-10	KPZ-like scaling on a high-dimensional hypersphere	Daniil Fedotov et.al.	2412.07432	null
2024-12-10	Exploring types I and IIA effective actions through T-duality	Mohammad R. Garousi et.al.	2412.07234	null
2024-12-10	Relativistic Mott transition in strongly correlated artificial graphene	Liguo Ma et.al.	2412.07150	null
2024-12-10	Gravitational focusing and horizon entropy for higher-spin fields	Zihan Yan et.al.	2412.07107	null
2024-12-09	Inelastic H + H $^+_3$ Collision rates and their impact in the determination of the excitation temperature of H$^+_3$	Daniel Felix-Gonzalez et.al.	2412.06697	null
2024-12-09	Systematic comparison of deep generative models applied to multivariate financial time series	Howard Caulfield et.al.	2412.06417	null
2024-12-09	Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects	Louis Milliken et.al.	2412.06294	link
2024-12-06	Revisiting the hallmark freezing and melting points in colloidal dispersions and the search for the elusive coexistence region	J. Galen Wang et.al.	2412.05422	null
2024-12-06	Penetrative rotating magnetoconvection subject to lateral variations in temperature gradients	Tirtharaj Barman et.al.	2412.05235	null
2024-12-06	Predictive Window Decoding for Fault-Tolerant Quantum Programs	Joshua Viszlai et.al.	2412.05115	null
2024-12-04	Successive magnetic transitions in the spin-5/2 easy-axis triangular-lattice antiferromagnet Na $_2$BaMn(PO$_4$)$_2$ : A neutron diffraction study	Chuandi Zhang et.al.	2412.03149	null
2025-01-02	The Reality of AI and Biorisk	Aidan Peppin et.al.	2412.01946	null
2024-12-02	PLD+: Accelerating LLM inference by leveraging Language Model Artifacts	Shwetha Somasundaram et.al.	2412.01447	null
2024-12-02	Enhanced solid solution hardening by off-center substitutional solute atoms in α-Ti	Zi-Han Yu et.al.	2412.01298	null
2024-11-25	Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration	Zhuofan Wen et.al.	2412.00061	null
2024-11-12	The Copernican Argument for Alien Consciousness; The Mimicry Argument Against Robot Consciousness	Eric Schwitzgebel et.al.	2412.00008	null
2024-11-28	Night-Side Relativistic Electron Precipitation Bursts in the Outer Radiation Belt: Insights from ELFIN and THEMIS	Xi Lu et.al.	2411.19232	null
2024-11-27	*Magnetic field tuned superconducting and normal phase magnetism in CeCo ${0.5}$Rh${0.5}$In$_{5}$*	A. Howell et.al.	2411.18540	null
2024-11-27	Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding	Ziyin Zhang et.al.	2411.18462	link
2024-11-27	6G Takes Shape	Jeffrey G. Andrews et.al.	2411.18435	null
2024-11-27	An evolution of matrix-valued orthogonal polynomials	Erik Koelink et.al.	2411.18362	null
2024-11-27	Comprehensive Kernel Safety in the Spectre Era: Mitigations and Performance Evaluation (Extended Version)	Davide Davoli et.al.	2411.18094	null
2024-12-25	Stellar evolution along the AGB as revealed by the shape of Miras’ visual light curves	D. T. Hoai et.al.	2411.18044	null
2024-11-26	Stable curves and chromatic polynomials	Bernhard Reinke et.al.	2411.17551	null
2024-12-08	A revamped understanding of Cosmic Rays and Gamma-Ray Bursts	A. De Rújula et.al.	2411.15850	null
2024-11-20	The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz	David Noever et.al.	2411.14486	null
2024-12-03	Mediating Modes of Thought: LLM’s for design scripting	Moritz Rietschel et.al.	2411.14485	null
2024-11-21	*THz optical response of Ba(Fe ${1-x}$Ni$_x$)$_2$As$_2$ films analyzed within the three-band Eliashberg s$\pm$ -wave model*	Yurii A. Aleshchenko et.al.	2411.14011	null
2024-11-27	Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding	Hyun Ryu et.al.	2411.13157	null
2024-11-20	Far-field Boundary Conditions for Airfoil Simulation at High Incidence in Steady, Incompressible, Two-dimensional Flow	Narges Golmirzaee et.al.	2411.13077	null
2024-11-19	Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing	Ruyi Ding et.al.	2411.12508	null
2024-11-18	Continuous Speculative Decoding for Autoregressive Image Generation	Zili Wang et.al.	2411.11925	link
2024-12-26	Teapot: Efficiently Uncovering Spectre Gadgets in COTS Binaries	Fangzheng Lin et.al.	2411.11624	null
2024-11-30	Diversity of disc viscosities can explain the period ratios of resonant and non-resonant systems of hot super-Earths and mini-Neptunes	Bertram Bitsch et.al.	2411.11452	null
2024-11-25	First memoir on the asymptotics of certain infinite products	Wadim Zudilin et.al.	2411.11100	null
2024-11-17	FastDraft: How to Train Your Draft	Ofir Zafrir et.al.	2411.11055	null
2024-12-16	SAM Decoding: Speculative Decoding via Suffix Automaton	Yuxuan Hu et.al.	2411.10666	link
2024-11-15	Moving Forward: A Review of Autonomous Driving Software and Hardware Systems	Xu Wang et.al.	2411.10291	null
2024-11-14	Cosmic inflation in an extended non-commutative foliated quantum gravity: the wave function of the universe	César A. Zen Vasconcellos et.al.	2411.09756	null
2024-11-15	Provocation: Who benefits from “inclusion” in Generative AI?	Samantha Dalal et.al.	2411.09102	null
2024-11-13	Thought Experiments in Design Fiction for Visualization	Swaroop Panda et.al.	2411.08621	null
2025-01-01	A Geometric Substructure for Quantum Dynamics	Anthony John Bracken et.al.	2411.08230	null
2025-01-11	The Grass of the Universe: Rethinking Technosphere, Planetary History, and Sustainability with Fermi Paradox	Lukáš Likavčan et.al.	2411.08057	null
2024-11-12	A rich structure of renormalization group flows for Higgs-like models in 4 dimensions	André LeClair et.al.	2411.07476	null
2024-11-12	Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions	Siddharth Agarwal et.al.	2411.07444	null
2024-11-11	The Inherent Adversarial Robustness of Analog In-Memory Computing	Corey Lammie et.al.	2411.07023	null
2024-11-10	Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents	Yu Gu et.al.	2411.06559	link
2024-11-10	MOCCA-III: Effects of pristine gas accretion and cluster migration on globular cluster evolution, global parameters and multiple stellar populations	Mirek Giersz et.al.	2411.06421	null
2024-11-10	Generating Mixcode Popular Songs with Artificial Intelligence: Concepts, Plans, and Speculations	Abhishek Kaushik et.al.	2411.06420	null
2024-11-08	SSSD: Simply-Scalable Speculative Decoding	Michele Marzollo et.al.	2411.05894	null
2024-11-08	SpecHub: Provable Acceleration to Multi-Draft Speculative Decoding	Ryan Sun et.al.	2411.05289	link
2024-11-07	SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference	Gabriele Oliaro et.al.	2411.04975	null
2024-11-06	The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation	Lawrence Stewart et.al.	2411.03786	null
2024-11-05	Remarkable Scale Relation, Approximate SU(5), Fluctuating Lattice	Holger Bech Nielsen et.al.	2411.03552	null
2024-11-05	Shared Memory-Aware Latency-Sensitive Message Aggregation for Fine-Grained Communication	Kavitha Chandrasekar et.al.	2411.03533	null
2024-11-07	A high resolution simulation of protoplanetary disk turbulence driven by the vertical shear instability	Karim Shariff et.al.	2411.03467	null
2024-11-04	PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption	Yifan Tan et.al.	2411.03357	null
2024-11-05	On the possible core shift break in relativistic jets	E. E. Nokhrina et.al.	2411.02925	null
2024-11-04	A proof of self-organized criticality in a sandpile	Christopher Hoffman et.al.	2411.02541	null
2025-02-07	Pseudo Transitions in the Finite-Size Blume-Capel Model	Lei Shi et.al.	2411.01743	null
2024-11-05	Privacy Risks of Speculative Decoding in Large Language Models	Jiankun Wei et.al.	2411.01076	null
2024-10-30	Accelerated AI Inference via Dynamic Execution Methods	Haim Barad et.al.	2411.00853	null
2024-10-30	A Theoretical Perspective for Speculative Decoding Algorithm	Ming Yin et.al.	2411.00841	null
2024-10-31	Interpretable Language Modeling via Induction-head Ngram Models	Eunji Kim et.al.	2411.00066	link
2024-10-31	ALISE: Accelerating Large Language Model Serving with Speculative Scheduling	Youpeng Zhao et.al.	2410.23537	null
2024-10-30	Flavor Patterns of Fundamental Particles from Quantum Entanglement?	Jesse Thaler et.al.	2410.23343	null
2024-10-29	Lost and Found in Speculation: Hybrid Speculative Vulnerability Detection	Mohamadreza Rostami et.al.	2410.22555	null
2025-02-10	Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding	Bohan Li et.al.	2410.21951	null
2024-10-29	Rapid cooling of the Cassiopeia A neutron star due to superfluid quantum criticality	Hao-Fu Zhu et.al.	2410.21945	null
2024-10-28	Model-agnostic basis functions for the 2-point correlation function of dark matter in linear theory	Aseem Paranjape et.al.	2410.21374	link
2024-10-11	The Social Impact of Generative LLM-Based AI	Yu Xie et.al.	2410.21281	null
2024-10-28	On the limits of informationally efficient stock markets: New insights from a chartist-fundamentalist model	Laura Gardini et.al.	2410.21198	null
2024-10-27	A Jet-Induced Shock in a Young, Powerful Radio Galaxy at z=3.00	Nick Seymour et.al.	2410.20609	null
2024-10-27	FIRP: Faster LLM inference via future intermediate representation prediction	Pengfei Wu et.al.	2410.20488	null
2024-10-27	Inevitable Trade-off between Watermark Strength and Speculative Sampling Efficiency for Language Models	Zhengmian Hu et.al.	2410.20418	null
2024-10-31	Fast Best-of-N Decoding via Speculative Rejection	Hanshi Sun et.al.	2410.20290	link
2024-10-24	Intention Is All You Need	Advait Sarkar et.al.	2410.18851	null
2024-10-24	AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability	Sudhanshu Agrawal et.al.	2410.18351	null
2024-10-23	Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits	Ashish Khisti et.al.	2410.18234	null
2025-02-10	Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition	Artem Basharin et.al.	2410.17765	null
2024-10-22	AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration	Bradley McDanel et.al.	2410.17375	link
2024-10-22	Remote Timing Attacks on Efficient Language Model Inference	Nicholas Carlini et.al.	2410.17175	null
2024-10-23	Quantum many-body scars as remnants of stable many-body periodic orbits	Keita Omiya et.al.	2410.16916	null
2024-10-22	Chiral polaritonics: cavity-mediated enantioselective excitation condensation	Rosario R. Riso et.al.	2410.16861	null
2024-10-22	An Extreme Radio Fluctuation of Pulsar B1929 $+$ 10	Zhengli Wang et.al.	2410.16816	null
2024-10-21	Galaxy Size and Mass Build-up in the First 2 Gyrs of Cosmic History from Multi-Wavelength JWST NIRCam Imaging	Natalie Allen et.al.	2410.16354	null
2024-10-30	TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling	Jiahao Qiu et.al.	2410.16033	null
2024-10-21	Efficient and Universally Accessible Cross-Chain Options without Upfront Holder Collateral	Zifan Peng et.al.	2410.15724	null
2024-10-21	Investigating Unusual H $α$ Features towards the Scutum Supershell	R. Alsulami et.al.	2410.15712	null
2024-10-17	Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding	Tan Dat Nguyen et.al.	2410.13839	null
2024-10-17	Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions	Michael J. Q. Zhang et.al.	2410.13788	null
2024-10-17	Looking Inward: Language Models Can Learn About Themselves by Introspection	Felix J Binder et.al.	2410.13787	link
2024-10-17	PGC 44685: A Dwarf Star-forming Lenticular Galaxy with Wolf-Rayet Population	Shiying Lu et.al.	2410.13119	null
2024-10-16	Gravitational instantons and the quality problem of the QCD axion: Facts, speculations, and statements in between	Pier Giuseppe Catinari et.al.	2410.12741	null
2024-10-15	Evolution of Ferromagnetism and Electrical Resistivity in Sb-Doped Cr4PtGa17	Chaoguo Wang et.al.	2410.12078	null
2024-10-15	MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation	Chenxi Wang et.al.	2410.11779	link
2024-10-15	DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure	Yunfan Xiong et.al.	2410.11744	null
2024-10-15	Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling	Wenda Xu et.al.	2410.11325	null
2025-02-01	QSpec: Speculative Decoding with Complementary Quantization Schemes	Juntao Zhao et.al.	2410.11305	null
2024-11-20	Unveiling dust, molecular gas, and high star formation efficiency in extremely UV bright star-forming galaxies at $z\sim 2.1-3.6$	M. Dessauges-Zavadsky et.al.	2410.11121	null
2024-10-01	Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models	Keivan Alizadeh et.al.	2410.10846	null
2024-10-15	The Discovery of Polarized Water Vapor Megamaser Emission in a Molecular Accretion Disk	Jack F. Gallimore et.al.	2410.10569	null
2024-10-14	Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation	Siru Ouyang et.al.	2410.10141	null
2024-11-12	Self-Data Distillation for Recovering Quality in Pruned Large Language Models	Vithursan Thangarasa et.al.	2410.09982	null
2024-10-13	Super-Bandgap Electroluminescence from Cesium Lead Bromide	Justin Sculley et.al.	2410.09702	null
2024-10-21	On Two Nucleons Near Unitarity with Perturbative Pions	Yu Ping Teng et.al.	2410.09653	null
2024-10-11	Compact [OIII] emission-line regions (“Green Seeds”) in $\mathrm{Hα}$ emitters at Cosmic Noon from JWST Observations	Nuo Chen et.al.	2410.08520	null
2024-10-09	SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration	Heming Xia et.al.	2410.06916	link
2025-02-06	Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level	Xinyi Zeng et.al.	2410.06809	null
2024-10-08	ParallelSpec: Parallel Drafter for Efficient Speculative Decoding	Zilin Xiao et.al.	2410.05589	null
2024-10-09	Density estimation with LLMs: a geometric investigation of in-context learning trajectories	Toni J. B. Liu et.al.	2410.05218	null
2024-10-08	Efficient Inference for Large Language Model-based Generative Recommendation	Xinyu Lin et.al.	2410.05165	null
2024-10-04	Density functional theory based investigation of heavy fermion band candidates in triplet superconductor UTe2	Shouzheng Liu et.al.	2410.03840	null
2024-10-04	Mixture of Attentions For Speculative Decoding	Matthieu Zimmer et.al.	2410.03804	null
2024-10-03	AI-rays: Exploring Bias in the Gaze of AI Through a Multimodal Interactive Installation	Ziyao Gao et.al.	2410.03786	null
2024-09-24	Nonmetric geometric flows and quasicrystalline topological phases for dark energy and dark matter in $f(Q)$ cosmology	L. Bubuianu et.al.	2410.03700	null
2025-01-31	LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding	Doohyuk Jang et.al.	2410.03355	null
2024-10-04	Generative Edge Detection with Stable Diffusion	Caixia Zhou et.al.	2410.03080	null
2024-10-03	Inductive Generative Recommendation via Retrieval-based Speculation	Yijie Ding et.al.	2410.02939	link
2024-10-03	The Stellar Initial Mass Function of Early Dark Matter-free Gas Objects	William Lake et.al.	2410.02868	null
2024-10-03	Atoms near a conducting wedge: decay rates and entanglement around a corner	Romuald Kilianski et.al.	2410.02349	null
2024-10-02	Time Variation of the Solar Tachocline	Sarbani Basu et.al.	2410.01895	null
2024-12-25	Interpretable Contrastive Monte Carlo Tree Search Reasoning	Zitian Gao et.al.	2410.01707	link
2024-10-02	Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding	Yao Teng et.al.	2410.01699	link
2024-12-09	Forte : Finding Outliers with Representation Typicality Estimation	Debargha Ganguly et.al.	2410.01322	link
2024-10-02	Speculative Coreset Selection for Task-Specific Fine-tuning	Xiaoyu Zhang et.al.	2410.01296	null
2024-10-01	Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity	Michael R. Metel et.al.	2410.01028	null
2024-10-01	A Scheduling-Aware Defense Against Prefetching-Based Side-Channel Attacks	Till Schlüter et.al.	2410.00452	null
2024-11-12	Galactic center G objects as dust-enshrouded stars near the supermassive black hole	Michal Zajaček et.al.	2410.00304	null
2024-09-30	Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface	Wenyue Hua et.al.	2410.00079	null
2024-09-30	Statistical view of orbital circularisation with 14 000 characterised TESS eclipsing binaries	L. W. IJspeert et.al.	2409.20540	null
2024-09-30	New HI observations Toward the NGC 5055 Galaxy Group with FAST	Xiao-Lan Liu et.al.	2409.20109	null
2024-09-27	Thermal Conductivity of Cubic Silicon Carbide Single Crystals Heavily Doped by Nitrogen	Zifeng Huang et.al.	2409.18843	null
2024-09-27	SpecCFA: Enhancing Control Flow Attestation/Auditing via Application-Aware Sub-Path Speculation	Adam Caulfield et.al.	2409.18403	null
2024-09-25	Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference	Zongyue Qin et.al.	2409.16560	null
2024-09-22	ALMASOP. The Localized and Chemically rich Features near the Bases of the Protostellar Jet in HOPS 87	Shih-Ying Hsu et.al.	2409.14445	null
2024-09-21	Triangulating on Possible Futures: Conducting User Studies on Several Futures Instead of Only One	Antti Salovaara et.al.	2409.14137	null
2024-09-29	String Invention, Viable 3-3-1 Model, Dark Matter Black Holes	Holger B. Nielsen et.al.	2409.13776	null
2024-09-20	Interstellar Glycolaldehyde, Methyl Formate, and Acetic Acid. II. Chemical Modeling of the Bimodal Abundance Pattern in NGC 6334I	Brielle M. Shope et.al.	2409.13673	null
2024-09-20	A Comparison between Financial and Gambling Markets	Haoyu Liu et.al.	2409.13528	null
2024-12-12	Consequences of Minimal Entanglement in Bosonic Field Theories	Spencer Chang et.al.	2409.13030	null
2024-09-17	UNCOVER: Significant Reddening in Cosmic Noon Quiescent Galaxies	Jared Siegel et.al.	2409.11457	null
2024-09-17	The ALMA-CRISTAL Survey: Spatially-resolved Star Formation Activity and Dust Content in 4 < z < 6 Star-forming Galaxies	Juno Li et.al.	2409.10961	null
2024-12-14	Improving Multi-candidate Speculative Decoding	Xiaofan Lu et.al.	2409.10644	link
2024-09-16	Aggregation-diffusion in heterogeneous environments	Jonathan R. Potts et.al.	2409.10147	link
2024-12-12	Pure Lovelock Gravity regular black holes	Milko Estrada et.al.	2409.09559	null
2024-09-14	Ground State Phase Diagram of $\text{SU}(3)$ $t$-$J$ Chain	Junhao Zhang et.al.	2409.09344	null
2024-12-02	Two-Time Relativistic Bohmian Model of Quantum Mechanics	Giuseppe Raguní et.al.	2409.09049	null
2024-09-13	Dynamic Simultaneous Multithreaded Architecture	Daniel Ortiz-Arroyo et.al.	2409.07903	null
2024-09-09	DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL	Arturo Gonzalez-Escribano et.al.	2409.06075	null
2024-10-05	Predicting Foreign Exchange EUR/USD direction using machine learning	Kevin Cedric Guyard et.al.	2409.04471	null
2024-09-05	Evidence for Dust Depletion in a Misaligned Protoplanetary Disk with JWST	C. C. Espaillat et.al.	2409.03702	null
2024-09-04	Cavitating bubbles in condensing gas as a means of forming clumps, chondrites, and planetesimals	Eugene Chiang et.al.	2409.02978	null
2024-09-03	Light-Ray Wave Functions and Integrability	Alexandre Homrich et.al.	2409.02160	null
2024-09-03	Foreactor: Exploiting Storage I/O Parallelism with Explicit Speculation	Guanzhou Hu et.al.	2409.01580	null
2024-09-02	A Comprehensive Analysis of the Future of Atomically Precise Manufacturing	Vadym Shvydun et.al.	2409.00955	null
2024-08-30	Dynamic Depth Decoding: Faster Speculative Decoding for LLMs	Oscar Brown et.al.	2409.00142	null
2024-08-29	LightSLH: Provable and Low-Overhead Spectre v1 Mitigation through Targeted Instruction Hardening	Yiming Zhu et.al.	2408.16220	null
2024-08-28	An Empirical Study of API Misuses of Data-Centric Libraries	Akalanka Galappaththi et.al.	2408.15853	null
2024-08-28	Indirect nonlinear interaction between toroidal Alfvén eigenmode and ion temperature gradient mode mediated by zonal structures	Qian Fang et.al.	2408.15782	null
2024-09-19	Learning Harmonized Representations for Speculative Sampling	Lefan Zhang et.al.	2408.15766	null
2024-08-28	Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation	Lujun Gui et.al.	2408.15562	null
2024-11-18	The companion mass distribution of post common envelope hot subdwarf binaries: evidence for boosted and disrupted magnetic braking?	Lisa Blomberg et.al.	2408.15334	null
2024-08-27	The Way To Circumbinary Planets	Hans J Deeg et.al.	2408.15307	null
2024-12-26	The Mamba in the Llama: Distilling and Accelerating Hybrid Models	Junxiong Wang et.al.	2408.15237	link
2024-08-26	SO as shock tracer in protoplanetary disks: the AB Aurigae case	A. Dutrey et.al.	2408.14276	null
2024-08-25	The origins of noise in the Zeeman splitting of spin qubits in natural-silicon devices	Juan S. Rojas-Arias et.al.	2408.13707	null
2024-07-22	Simopt – Simulation pass for Speculative Optimisation of FPGA-CAD flow	Eashan Wadhwa et.al.	2408.12676	null
2024-12-19	Exposing Shadow Branches	Chrysanthos Pepi et.al.	2408.12592	null
2024-08-22	Enhancing Causal Discovery in Financial Networks with Piecewise Quantile Regression	Cameron Cornell et.al.	2408.12210	null
2024-08-21	Electrostatic Origins of the Dirichlet Principle	Steven Deckelman et.al.	2408.12002	null
2024-09-04	Parallel Speculative Decoding with Adaptive Draft Length	Tianyu Liu et.al.	2408.11850	link
2024-08-21	Chemical models of interstellar glycine and adenine precursor aminoacetonitrile (NH2CH2CN)	Xia Zhang et.al.	2408.11776	null
2024-08-20	High detection significance of the dark substructure in gravitational lens SDSSJ0946+1006 is revealed by image pixel supersampling	Quinn E. Minor et.al.	2408.11090	null
2024-08-23	MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding	Jian Chen et.al.	2408.11049	link
2024-08-20	Revisiting the measurements and interpretations of DLVO forces	Bo Feng et.al.	2408.10870	null
2024-08-19	Constraining the Generalized Tolman-Oppenheimer-Volkoff (GTOV) equation with Bayesian analysis	Franciele M. da Silva et.al.	2408.10425	null
2024-08-18	A new measure of risk using Fourier analysis	Michael Grabinski et.al.	2408.10279	null
2024-08-19	Excitonic-trion population in two-dimensional halide perovskites	Efstratios Manousakis et.al.	2408.10097	null
2024-08-16	Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling	Xianzhen Luo et.al.	2408.08696	null
2024-08-15	KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning	Kaiqi Zhang et.al.	2408.08146	null
2024-08-19	Coupling without Communication and Drafter-Invariant Speculative Decoding	Majid Daliri et.al.	2408.07978	link
2024-12-06	The Small Sizes and High Implied Densities of `Little Red Dots’ with Balmer Breaks Could Explain Their Broad Emission Lines Without an AGN	Josephine F. W. Baggen et.al.	2408.07745	null
2024-08-14	Only One Relation Possible? Modeling the Ambiguity in Event Temporal Relation Extraction	Yutong Hu et.al.	2408.07353	null
2024-07-23	Stablecoin Runs and Disclosure Policy in the Presence of Large Sales	Brian Zhu et.al.	2408.07227	null
2024-08-13	Speculations on Uncertainty and Humane Algorithms	Nicholas Gray et.al.	2408.06736	null
2024-08-15	Inefficiencies of Carbon Trading Markets	Nicola Borri et.al.	2408.06497	null
2024-08-12	Correct Wrong Path	Bhargav Reddy Godala et.al.	2408.05912	null
2024-08-11	A Decoding Acceleration Framework for Industrial Deployable LLM-based Recommender Systems	Yunjia Xi et.al.	2408.05676	link
2024-08-16	Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion	Jacob K Christopher et.al.	2408.05636	null
2024-08-09	Recurrent Stochastic Fluctuations with Financial Speculation	Tomohiro Hirano et.al.	2408.05047	null
2024-08-08	HotStuff-1: Linear Consensus with One-Phase Speculation	Dakai Kang et.al.	2408.04728	null
2024-08-08	CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding	Sophia Ho et.al.	2408.04678	null
2024-08-08	Black hole mass and optical radiation mechanism of the tidal disruption event AT 2023clx	Shiyan Zhong et.al.	2408.04448	null
2024-08-05	Rich dynamical behaviors from a digital reversal operation	Yannis Almirantis et.al.	2408.02527	null
2024-08-08	A speculative model for cyclic information preservation in Kerr-Newman spacetime using closed timelike curves	Aviral Damle et.al.	2408.02116	null
2024-08-06	Selection bias obfuscates the discovery of fast radio burst sources	Mohit Bhardwaj et.al.	2408.01876	null
2024-08-03	Dissolution zone model of the oxide structure in additively manufactured dispersion-strengthened alloys	Wenyuan Hou et.al.	2408.01845	null
2024-08-02	AT2023vto: An Exceptionally Luminous Helium Tidal Disruption Event from a Massive Star	Harsh Kumar et.al.	2408.01482	null
2024-08-01	Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection	Steven Fincke et.al.	2408.00914	null
2024-08-01	Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding	Bin Xiao et.al.	2408.00264	null
2024-07-31	Designing Beyond Current Conceptualizations of Spaceflight Experiences	James Cole et.al.	2408.00085	null
2024-07-31	Revisiting the fundamental metallicity relation with observation and simulation	Chengyu Ma et.al.	2407.21716	null
2024-07-31	The Bulk Densities of Small Solar System Bodies as a Probe of Planetesimal Formation	Misako Tatsuuma et.al.	2407.21386	null
2024-08-19	Instantons and the Large N=4 Algebra	Edward Witten et.al.	2407.20964	null
2024-07-17	Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies	Lachlan McGinness et.al.	2407.20244	null
2024-08-19	Reduced decay in Josephson coupling across ferromagnetic junctions with spin-orbit coupling layers	Ivan Kindiak et.al.	2407.19799	null
2024-07-26	Ionized and cold gas components in low surface brightness galaxy AGC 102004	Tian-Wen Cao et.al.	2407.18530	null
2024-07-25	Phase transitions in (2 + 1)D subsystem-symmetric monitored quantum circuits	Cole Kelson-Packer et.al.	2407.18340	null
2024-08-31	Uniqueness of an $E_8$ model of elementary particles	Robert A. Wilson et.al.	2407.18279	null
2024-07-24	Automorphisms of Calabi-Yau threefolds from algebraic dynamics and the second Chern class	Keiji Oguiso et.al.	2407.17297	null
2024-07-24	Mapping the individual, social, and biospheric impacts of Foundation Models	Andrés Domínguez Hernández et.al.	2407.17129	null
2024-07-04	Integrated Deflector Shield Technology for Spacecraft	Florian Neukart et.al.	2407.16701	null
2024-07-23	Graph-Structured Speculative Decoding	Zhuocheng Gong et.al.	2407.16207	null
2024-07-22	AI for Handball: predicting and explaining the 2024 Olympic Games tournament with Deep Learning and Large Language Models	Florian Felice et.al.	2407.15987	null
2024-07-22	An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph	B. Kaan Karamete et.al.	2407.15906	null
2024-07-23	Unveiling the Multifaceted GRB 200613A: Prompt Emission Dynamics, Afterglow Evolution, and the Host Galaxy’s Properties	Shao-Yu Fu et.al.	2407.15824	null
2024-11-21	SNIP: Speculative Execution and Non-Interference Preservation for Compiler Transformations	Sören van der Wall et.al.	2407.15080	null
2024-10-21	Is the difference between deep hedging and delta hedging a statistical arbitrage?	Pascal François et.al.	2407.14736	link
2024-07-19	Rational Bubbles: A Clarification	Tomohiro Hirano et.al.	2407.14017	null
2024-07-18	Surface roughening in nanoparticle catalysts	Cameron J. Owen et.al.	2407.13643	null
2024-07-18	SecScale: A Scalable and Secure Trusted Execution Environment for Servers	Ani Sunny et.al.	2407.13572	null
2024-07-17	RTL Verification for Secure Speculation Using Contract Shadow Logic	Qinhan Tan et.al.	2407.12232	null
2024-07-16	Breakup dynamics of a neutron-halo projectile on heavy target at deep sub-barrier energies	B. Mukeru et.al.	2407.12129	null
2024-11-16	PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	Branden Butler et.al.	2407.11798	null
2024-10-02	Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference	Zongyue Qin et.al.	2407.09722	null
2024-07-17	Accelerating the inference of string generation-based chemical reaction models for industrial applications	Mikhail Andronov et.al.	2407.09685	null
2024-09-12	Krylov complexity and chaos in deformed SYK models	Shira Chapman et.al.	2407.09604	null
2024-07-21	6G: The Intelligent Network of Everything – A Comprehensive Vision, Survey, and Tutorial	Harri Pennanen et.al.	2407.09398	null
2024-07-11	Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting	Zilong Wang et.al.	2407.08223	null
2024-07-10	Purity benchmarking study of error coherence in a single Xmon qubit	Auda Zhu et.al.	2407.07960	null
2024-07-10	Carbon Pricing and Resale in Emission Trading Systems	Peyman Khezr et.al.	2407.07386	null
2024-08-21	Fuzzy Spheres in Stringy Matrix Models: Quantifying Chaos in a Mixed Phase Space	Paolo Amore et.al.	2407.07259	null
2024-07-09	Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1)	Yanlong Peng et.al.	2407.06590	null
2024-07-05	Statistical investigations into the geometry and homology of random programs	Jon Sporring et.al.	2407.04854	null
2024-07-05	Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models	Bolaji Yusuf et.al.	2407.04641	null
2024-11-13	Black Holes with a charged quantum dust core	R. Casadio et.al.	2407.04146	null
2024-08-23	A distance conjecture beyond moduli?	Cédric Debusschere et.al.	2407.03715	null
2024-07-03	Braneworld Black Bounce to Transversable Wormhole Analytically Connected to an asymptotically $AdS_5$ Boundary	T. M. Crispim et.al.	2407.03528	null
2024-07-03	*Origin of anomalous magnetotransport in kagome superconductors AV ${3}$Sb${5}$ (A=K,Rb,Cs)*	A. E. Koshelev et.al.	2407.03189	null
2024-09-24	Large-scale ordered magnetic fields generated in mergers of helium white dwarfs	Rüdiger Pakmor et.al.	2407.02566	null
2024-07-02	A thermodynamic model of inflation without inflaton field	Jesus Anaya-Galeana et.al.	2407.02429	null
2024-07-02	MICONIC: JWST/MIRI MRS observations of the nuclear and circumnuclear regions of Mrk231	A. Alonso-Herrero et.al.	2407.02180	null
2024-07-02	S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models	Parsa Kavehzadeh et.al.	2407.01955	null
2024-08-31	Description of molecular chirality and its analysis with high harmonic generation	Akihito Kato et.al.	2407.01947	null
2024-07-01	Universal properties of residual moments in heavy-fermion metals	Ewan Scott et.al.	2407.01218	null
2024-07-01	Staying vigilant in the Age of AI: From content generation to content authentication	Yufan Li et.al.	2407.00922	null

Multimodal System

Publish Date	Title	Authors	PDF	Code
2025-07-11	BlindSight: Harnessing Sparsity for Efficient VLMs	Tharun Adithya Srikrishnan et.al.	2507.09071	null
2025-06-20	Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models	Michael Plainer et.al.	2506.17139	link
2025-06-18	VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service	Xiasi Wang et.al.	2506.15755	null
2025-07-01	Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model	Anirud Aggarwal et.al.	2506.15682	link
2025-06-09	Event-Priori-Based Vision-Language Model for Efficient Visual Understanding	Haotong Qin et.al.	2506.07627	null
2025-06-15	RNE: a plug-and-play framework for diffusion density estimation and inference-time control	Jiajun He et.al.	2506.05668	null
2025-05-29	Inference-time Scaling of Diffusion Models through Classical Search	Xiangcheng Zhang et.al.	2505.23614	null
2025-05-27	InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling	Xiaoxiao Jiang et.al.	2505.20600	null
2025-05-25	SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation	Shenggan Cheng et.al.	2505.19151	null
2025-05-23	VERDI: VLM-Embedded Reasoning for Autonomous Driving	Bowen Feng et.al.	2505.15925	null
2025-05-20	Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism	Kunyun Wang et.al.	2505.14741	null
2025-04-14	Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization	Haiyong Yu et.al.	2504.09927	null
2025-03-17	VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers	Ruanjun Li et.al.	2503.09387	null
2025-02-20	Light communicative materials	Hongshuang Guo et.al.	2503.05744	null
2025-03-10	Probing the Quantum Nature of Gravity through Classical Diffusion	Oliviero Angeli et.al.	2501.13030	null
2025-01-16	PATCHEDSERVE: A Patch Management Framework for SLO-Optimized Hybrid Resolution Diffusion Serving	Desen Sun et.al.	2501.09253	null
2025-01-16	StructSR: Refuse Spurious Details in Real-World Image Super-Resolution	Yachao Li et.al.	2501.05777	link
2024-12-19	Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model	Minglong Xue et.al.	2412.14630	link
2025-06-30	Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension	Xiyao Wang et.al.	2412.03704	link
2024-12-05	A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs	Wangbo Zhao et.al.	2412.03324	link
2024-12-02	[CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster	Qizhe Zhang et.al.	2412.01818	link
2025-03-30	Staleness-Centric Optimizations for Parallel Diffusion MoE Inference	Jiajun Luo et.al.	2411.16786	null
2024-10-29	VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration	Dezhan Tu et.al.	2410.23317	null
2025-01-07	Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance	Dongmin Park et.al.	2410.22376	link
2024-10-08	A scaling limit for additive functionals	Thibaud Taillefumier et.al.	2410.06383	null
2024-09-03	CT-SDM: A Sampling Diffusion Model for Sparse-View CT Reconstruction across All Sampling Rates	Liutao Yang et.al.	2409.01571	null
2024-07-27	Faster Image2Video Generation: A Closer Look at CLIP Image Embedding’s Impact on Spatio-Temporal Cross-Attentions	Ashkan Taghipour et.al.	2407.19205	null
2024-07-15	LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis	Zhenxiong Tan et.al.	2407.10468	link
2024-06-13	DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning	Xuemin Hu et.al.	2406.09089	null
2024-10-03	I4VGen: Image as Free Stepping Stone for Text-to-Video Generation	Xiefan Guo et.al.	2406.02230	null
2024-05-30	DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation	Zachary Novack et.al.	2405.20289	null
2024-05-26	Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference	Xunpeng Huang et.al.	2405.16387	null
2025-04-16	Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models	Katherine Xu et.al.	2405.14828	null
2024-04-25	Inferring solid-state diffusivity in lithium-ion battery active materials: improving upon the classical GITT method	A. Emir Gumrukcuoglu et.al.	2404.16658	null
2024-05-02	Privacy-Preserving Diffusion Model Using Homomorphic Encryption	Yaojian Chen et.al.	2403.05794	link
2024-05-08	ToDo: Token Downsampling for Efficient Generation of High-Resolution Images	Ethan Smith et.al.	2402.13573	null
2024-06-03	DITTO: Diffusion Inference-Time T-Optimization for Music Generation	Zachary Novack et.al.	2401.12179	null
2023-12-10	Statistical Spatially Inhomogeneous Diffusion Inference	Yinuo Ren et.al.	2312.05793	null
2024-01-04	Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference	Zihao Yu et.al.	2305.17423	link
2023-10-25	ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval	Kexun Zhang et.al.	2302.02285	link
2021-08-11	Manifold-aware Synthesis of High-resolution Diffusion from Structural Imaging	Benoit Anctil-Robitaille et.al.	2108.04135	null
2021-12-22	Functional Data Analysis with Rough Sample Paths?	Neda Mohammadi et.al.	2105.12035	null
2014-06-03	$C^0$ -estimates and smoothness of solutions to the parabolic equation defined by Kimura operators	Camelia A. Pop et.al.	1406.0742	null
2015-04-01	On nonnegative unbiased estimators	Pierre E. Jacob et.al.	1309.6473	null