Mlsys Arxiv Daily

Updated on 2026.05.25

Usage instructions: here

LLM inference

Publish Date	Title	Authors	PDF	Code
2026-05-22	CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference	Guanlong Wu et.al.	2605.23640	null
2026-05-22	AlignedServe: Orchestrating Prefix-aware Batching to Build a High-throughput and Computing-efficient LLM Serving System	Fengyao Bai et.al.	2605.23389	null
2026-05-22	XWind: A Cross-site Router for Large Language Model Inference Serving at Renewable Energy Farms	Tella Rajashekhar Reddy et.al.	2605.23348	link
2026-05-22	NASiC: 3D NAND-based CAM-Selected Multibit CIM Architecture for Efficient On-Device Mixture-of-Experts LLM Inference	Weikai Xu et.al.	2605.23294	null
2026-05-22	FastKernels: Benchmarking GPU Kernel Generation in Production	Gabriele Oliaro et.al.	2605.23215	null
2026-05-22	Adaptive Mass-Segmented KV Compression for Long-Context Reasoning	Junzhe Yang et.al.	2605.23200	null
2026-05-22	Prompt Overflow: What the Guardrail Inspects Is Not What the Model Infers	Yuanbo Zhou et.al.	2605.23196	null
2026-05-21	ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU	Aman Sunesh et.al.	2605.23057	null
2026-05-21	LLM Code Smells: A Taxonomy and Detection Approach	Zacharie Chenail-Larcher et.al.	2605.22976	null
2026-05-20	When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions	Wei Xia et.al.	2605.22873	null
2026-05-21	Think Thrice Before You Speak: Dual knowledge-enhanced Theory-of-Mind Reasoning for Persuasive Agents	Minghui Ma et.al.	2605.22602	null
2026-05-21	GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving	Ao Li et.al.	2605.22566	null
2026-05-21	Skill Weaving: Efficient LLM Improvement via Modular Skillpacks	Zhuo Li et.al.	2605.22205	null
2026-05-21	Planning in the LLM Era: Building for Reliability and Efficiency	Michael Katz et.al.	2605.21902	null
2026-05-20	From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment	Hao Chen et.al.	2605.21558	null
2026-05-20	PALS: Power-Aware LLM Serving for Mixture-of-Experts Models	Can Hankendi et.al.	2605.21427	null
2026-05-20	Frontier: Towards Comprehensive and Accurate LLM Inference Simulation	Yicheng Feng et.al.	2605.21312	null
2026-05-20	DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU	Weizhe Chen et.al.	2605.20936	null
2026-05-20	Runtime-Certified Bounded-Error Quantized Attention	Dean Calver et.al.	2605.20868	null
2026-05-20	Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU	Reese Levine et.al.	2605.20706	null
2026-05-19	Code Generation by Differential Test Time Scaling	Yifeng He et.al.	2605.20473	null
2026-05-19	TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload	Zhiben Chen et.al.	2605.20179	null
2026-05-19	Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding	Yuhao Shen et.al.	2605.20104	null
2026-05-19	Stage-adaptive Token Selection for Efficient Omni-modal LLMs	Zijie Xin et.al.	2605.20035	null
2026-05-19	FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration	Yaojie Zhang et.al.	2605.20022	null
2026-05-19	Block-Sphere Vector Quantization	Heesang Ann et.al.	2605.19972	null
2026-05-20	SSV: Sparse Speculative Verification for Efficient LLM Inference	Zhibin Wang et.al.	2605.19893	null
2026-05-19	Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption	Mert Yildiz et.al.	2605.19593	null
2026-05-19	C2CServe: Leveraging NVLink-C2C for Elastic Serverless LLM Serving on MIG	Shutian Luo et.al.	2605.19481	null
2026-05-19	OpenCompass: A Universal Evaluation Platform for Large Language Models	Maosong Cao et.al.	2605.19276	null
2026-05-14	An Interpretable Latency Model for Speculative Decoding in LLM Serving	Linghao Kong et.al.	2605.15051	null
2026-05-14	Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing	Jie Jiang et.al.	2605.14978	null
2026-05-14	XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference	Thomas Witt et.al.	2605.14844	null
2026-05-14	EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization	Zhiye Song et.al.	2605.14249	null
2026-05-13	Know When To Fold ‘Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection	Anjir Ahmed Chowdhury et.al.	2605.14062	null
2026-05-13	Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding	Shuoyang Sun et.al.	2605.14005	null
2026-05-13	Towards Resource-Efficient LLMs: End-to-End Energy Accounting of Distillation Pipelines	Katherine Lambert et.al.	2605.13981	null
2026-05-13	Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference	Lingchao Zheng et.al.	2605.13915	null
2026-05-13	KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving	Zedong Liu et.al.	2605.13734	null
2026-05-13	Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction	Takumi Goto et.al.	2605.13624	null
2026-05-13	MARLIN: Multi-Agent Game-Theoretic Reinforcement Learning for Sustainable LLM Inference in Cloud Datacenters	H. Moore et.al.	2605.13496	null
2026-05-14	PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding	Yunhe Han et.al.	2605.13319	null
2026-05-13	Respecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoning	Junlong Ke et.al.	2605.13255	null
2026-05-12	LISA: Cognitive Arbitration for Signal-Free Autonomous Intersection Management	Abderrahmane Lakas et.al.	2605.12321	null
2026-05-12	BatchBench: Toward a Workload-Aware Benchmark for Autoscaling Policies in Big Data Batch Processing – A Proposed Framework	Venkata Krishna Prasanth Budigi et.al.	2605.12272	null
2026-05-12	CR^2: Cost-Aware Risk-Controlled Routing for Wireless Device-Edge LLM Inference	Nan Xue et.al.	2605.12001	null
2026-05-12	The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures	Bole Ma et.al.	2605.11999	null
2026-05-12	Very Efficient Listwise Multimodal Reranking for Long Documents	Yiqun Sun et.al.	2605.11864	null
2026-05-12	Position: LLM Inference Should Be Evaluated as Energy-to-Token Production	Xiang Liu et.al.	2605.11733	null
2026-05-12	GAR: Carbon-Aware Routing for LLM Inference via Constrained Optimization	Disha Sheshanarayana et.al.	2605.11603	null
2026-05-12	Efficient LLM-based Advertising via Model Compression and Parallel Verification	Wenxin Dong et.al.	2605.11582	null
2026-05-12	Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference	Wenxin Dong et.al.	2605.11581	null
2026-05-11	SOMA: Efficient Multi-turn LLM Serving via Small Language Model	Xueqi Cheng et.al.	2605.11317	null
2026-05-11	Rethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stack	Prathamesh Vasudeo Naik et.al.	2605.11232	null
2026-05-11	Measuring Five-Nines Reliability: Sample-Efficient LLM Evaluation in Saturated Benchmarks	Eungyeup Kim et.al.	2605.11209	null
2026-05-11	Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing	Yunze Zhao et.al.	2605.11202	null
2026-05-11	CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration	Yuning Han et.al.	2605.11186	null
2026-05-11	Enabling Performant and Flexible Model-Internal Observability for LLM Inference	Nengneng Yu et.al.	2605.11093	null
2026-05-11	Compute Where it Counts: Self Optimizing Language Models	Yash Akhauri et.al.	2605.10875	null
2026-05-11	EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving	Vittorio Palladino et.al.	2605.10556	null
2026-05-11	Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration	Shuzhang Zhong et.al.	2605.10195	null
2026-05-11	GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference	Zengzipeng Tang et.al.	2605.10124	null
2026-05-11	When Are LLM Inferences Acceptable? User Reactions and Control Preferences for Inferred Personal Information	Kyzyl Monteiro et.al.	2605.10013	null
2026-05-08	Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation	Joon Ha Kim et.al.	2605.07985	null
2026-05-08	MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference	Ruijie Zhou et.al.	2605.07363	null
2026-05-08	SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting	Weijie Shi et.al.	2605.07243	null
2026-05-08	Reformulating KV Cache Eviction Problem for Long-Context LLM Inference	Tho Mai et.al.	2605.07234	null
2026-05-08	Learning Agent Routing From Early Experience	Yimin Wang et.al.	2605.07180	link
2026-05-07	Regulating Branch Parallelism in LLM Serving	Swapnil Gandhi et.al.	2605.06914	null
2026-05-07	Sparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV Cache	Mohsen Dehghankar et.al.	2605.06763	null
2026-05-07	UniSD: Towards a Unified Self-Distillation Framework for Large Language Models	Yiqiao Jin et.al.	2605.06597	null
2026-05-08	Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale	Tianci Bu et.al.	2605.06113	null
2026-05-07	VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?	Keisuke Kamahori et.al.	2605.06068	null
2026-05-07	XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA	Feng Yu et.al.	2605.06052	null
2026-05-07	Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference	Saksham Rathi et.al.	2605.06046	null
2026-05-07	Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving	Bole Ma et.al.	2605.05696	null
2026-05-07	TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference	Zhuoran Li et.al.	2605.05639	null
2026-05-07	Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems	Chen Zhang et.al.	2605.05628	null
2026-05-08	LLMSpace: Carbon Footprint Modeling for Large Language Model Inference on LEO Satellites	Lei Jiang et.al.	2605.05615	null
2026-05-06	ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis	Atharva Naik et.al.	2605.05485	null
2026-05-06	Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism	Vikranth Srivatsa et.al.	2605.05467	null
2026-05-06	Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours	The Verkor Team et.al.	2605.05170	null
2026-05-06	Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop	Wu-Yuin Hwang et.al.	2605.04761	null
2026-05-06	A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints	Chengyi Nie et.al.	2605.04595	null
2026-05-05	Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs	Yixuan Mei et.al.	2605.04357	null
2026-05-05	Parallel Prefix Verification for Speculative Generation	Yuncheng Yao et.al.	2605.04263	null
2026-05-05	Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs	Michael Rottoli et.al.	2605.04215	null
2026-05-07	Two Calls, Two Moments, and the Vote-Accuracy Curve of Repeated LLM Inference	Yi Liu et.al.	2605.03379	null
2026-05-05	Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving	Shi Qiu et.al.	2605.03375	null
2026-05-04	VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU	Zijian He et.al.	2605.03190	null
2026-05-05	SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection	Shikhar Shukla et.al.	2605.02888	null
2026-05-04	CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation	Berk Çiçek et.al.	2605.02600	null
2026-05-04	Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference	Qipeng Wang et.al.	2605.02329	null
2026-05-04	PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers	Hongbin Zhang et.al.	2605.02189	null
2026-05-04	AAFLOW: Scalable Patterns for Agentic AI Workflows	Arup Kumar Sarker et.al.	2605.02162	null
2026-05-03	Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models	Nikolaos Giarelis et.al.	2605.01870	null
2026-05-03	SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving	Yipin Guo et.al.	2605.01708	null
2026-05-05	Neuro-Symbolic Agents for Hallucination-Free Requirements Reuse	Ahmed F. Ibrahim et.al.	2605.01562	null
2026-05-02	LLM-Foraging: Large Language Models for Decentralized Swarm Robot Foraging	Peihan Li et.al.	2605.01461	null
2026-05-02	Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics	Zijie Zhou et.al.	2605.01280	null
2026-05-01	Position: agentic AI orchestration should be Bayes-consistent	Theodore Papamarkou et.al.	2605.00742	null
2026-05-01	LLM-Emu: Native Runtime Emulation of LLM Inference via Profile-Driven Sampling	Wei Da et.al.	2605.00616	null
2026-05-04	Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge	M. Grailoo et.al.	2605.00536	null
2026-05-04	Silicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inference	Abdurrahman Javat et.al.	2605.00519	null
2026-05-01	Rethinking LLM Ensembling from the Perspective of Mixture Models	Jiale Fu et.al.	2605.00419	null
2026-05-01	VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices	Zi-Wei Lin et.al.	2605.00320	null
2026-04-30	Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving	Junsun Choi et.al.	2605.00254	null
2026-04-30	One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation	Ethan Bito et.al.	2604.27599	null
2026-04-30	EdgeFM: Efficient Edge Inference for Vision-Language Models	Mengling Deng et.al.	2604.27476	null
2026-04-30	VitaLLM: A Versatile, Ultra-Compact Ternary LLM Accelerator with Dependency-Aware Scheduling	Zi-Wei Lin et.al.	2604.27396	null
2026-04-30	To Diff or Not to Diff? Structure-Aware and Adaptive Output Formats for Efficient LLM-based Code Editing	Wei Cheng et.al.	2604.27296	null
2026-04-29	Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving	Zihan Zhao et.al.	2604.26837	null
2026-04-29	Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel	Yiqi Liu et.al.	2604.26821	null
2026-04-29	DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference	Bodon Jeong et.al.	2604.26557	null
2026-04-29	When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?	Tianyu Liu et.al.	2604.26412	null
2026-04-29	OpenSOC-AI: Democratizing Security Operations with Parameter Efficient LLM Log Analysis	Chaitanya Vilas Garware et.al.	2604.26217	null
2026-04-29	Breaking the Autoregressive Chain: Hyper-Parallel Decoding for Efficient LLM-Based Attribute Value Extraction	Theodore Glavas et.al.	2604.26209	null
2026-04-28	EvoSelect: Data-Efficient LLM Evolution for Targeted Task Adaptation	Ting-Wei Li et.al.	2604.26170	null
2026-04-30	AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving	Zhongkai Yu et.al.	2604.26103	null
2026-04-28	DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference	Shouxu Lin et.al.	2604.26074	null
2026-04-28	Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective	Jiaming Yang et.al.	2604.25975	null
2026-04-28	Pythia: Toward Predictability-Driven Agent-Native LLM Serving	Shan Yu et.al.	2604.25899	null
2026-04-28	SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission	Ce Zheng et.al.	2604.25777	null
2026-04-28	CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation	Wei-Chun Chen et.al.	2604.25774	null
2026-04-28	NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference	Mingbo Hao et.al.	2604.25699	null
2026-04-28	FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture	Zihao Xuan et.al.	2604.25317	null
2026-04-28	Hardware Generation and Exploration of Lookup Table-Based Accelerators for 1.58-bit LLM Inference	Robin Geens et.al.	2604.25183	null
2026-04-28	CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration	Sean Nian et.al.	2604.25080	null
2026-04-27	PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference	Ishan Patel et.al.	2604.24971	null
2026-04-24	Cloud to Edge: Benchmarking LLM Inference On Hardware-Accelerated Single-Board Computers	Harri Renney et.al.	2604.24785	null
2026-04-27	DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference	Zahra Dehghanighobadi et.al.	2604.24647	null
2026-04-26	JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training	Zhengding Hu et.al.	2604.23838	null
2026-04-26	Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning	Zichuan Fu et.al.	2604.23623	null
2026-04-25	Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference	Divakar Kumar Yadav et.al.	2604.23467	null
2026-04-25	Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs	Divakar Kumar Yadav et.al.	2604.23466	null
2026-04-25	Small Language Model Helps Resolve Semantic Ambiguity of LLM Prompt	Zhenzhen Huang et.al.	2604.23263	null
2026-04-25	UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks	Tianlong Yu et.al.	2604.23141	null
2026-04-24	Secure eFPGA-Enabled Edge LLM Inference: Architectural and Hardware Countermeasures	Voktho Das et.al.	2604.22935	null
2026-04-24	Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities	Zhixiong Chen et.al.	2604.22906	null
2026-04-24	Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation	Rajinder Sandhu et.al.	2604.22722	null
2026-04-24	Guess-Verify-Refine: Data-Aware Top-K for Sparse-Attention Decoding on Blackwell via Temporal Correlation	Long Cheng et.al.	2604.22312	null
2026-04-24	An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments	Hong Su et.al.	2604.22199	null
2026-04-23	LayerBoost: Layer-Aware Attention Reduction for Efficient LLMs	Mohamed Ali Souibgui et.al.	2604.22050	null
2026-04-23	Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation	Nikita Severin et.al.	2604.21536	null
2026-04-23	A Task Decomposition and Planning Framework for Efficient LLM Inference in AI-Enabled WiFi-Offload Networks	Mingqi Han et.al.	2604.21399	null
2026-04-23	Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture	Robin Dey et.al.	2604.21284	null
2026-04-23	SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference	Hongyao Liu et.al.	2604.21231	null
2026-04-22	Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization	Jiu Chen et.al.	2604.21072	null
2026-04-22	TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping	Yannis Belkhiter et.al.	2604.21057	null
2026-04-24	MCAP: Deployment-Time Layer Profiling for Memory-Constrained LLM Inference	Anurita Das et.al.	2604.21026	null
2026-04-22	DiP-SD: Distributed Pipelined Speculative Decoding for Efficient LLM Inference at the Edge	Yaodan Xu et.al.	2604.20919	null
2026-04-22	FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels	Fei Zuo et.al.	2604.20913	null
2026-04-22	FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving	Wenyan Chen et.al.	2604.20503	null
2026-04-22	LLM-guided phase diagram construction through high-throughput experimentation	Ryo Tamura et.al.	2604.20304	null
2026-04-21	Statistics, Not Scale: Modular Medical Dialogue with Bayesian Belief Engine	Yusuf Kesmen et.al.	2604.20022	null
2026-04-21	Continuous Semantic Caching for Low-Cost LLM Serving	Baran Atalar et.al.	2604.20021	null
2026-04-21	Can Continual Pre-training Bridge the Performance Gap between General-purpose and Specialized Language Models in the Medical Domain?	Niclas Doll et.al.	2604.19394	null
2026-04-22	DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing	Jinyu Guo et.al.	2604.19351	null
2026-04-21	LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation	Siqing Song et.al.	2604.19167	null
2026-04-21	SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving	Jinda Jia et.al.	2604.19157	null
2026-04-21	DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs	Isaiah Thompson et.al.	2604.19118	null
2026-04-21	Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control	Julian Skifstad et.al.	2604.19018	null
2026-04-20	Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs	Afsara Benazir et.al.	2604.18788	null
2026-04-20	GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling	Alireza Dadgarnia et.al.	2604.18556	null
2026-04-20	HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing	Mao Lin et.al.	2604.18529	null
2026-04-20	Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes	Justin Bauer et.al.	2604.18381	null
2026-04-21	DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization	Haokun Lin et.al.	2604.17789	null
2026-04-20	WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference	Zixuan Liu et.al.	2604.17701	null
2026-04-20	MoE-nD: Per-Layer Mixture-of-Experts Routing for Multi-Axis KV Cache Compression	Libo Sun et.al.	2604.17695	null
2026-04-19	SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving	Christian Lysenstøen et.al.	2604.17627	null
2026-04-19	Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning	Raman Saparkhan et.al.	2604.17433	null
2026-04-19	Representation-Guided Parameter-Efficient LLM Unlearning	Zeguan Xiao et.al.	2604.17396	null
2026-04-19	Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems	Yuji Yamamoto et.al.	2604.17249	null
2026-04-18	If Only My CGM Could Speak: A Privacy-Preserving Agent for Question Answering over Continuous Glucose Data	Yanjun Cui et.al.	2604.17133	null
2026-04-18	Open-TQ-Metal: Fused Compressed-Domain Attention for Long-Context LLM Inference on Apple Silicon	Sai Vegasena et.al.	2604.16957	null
2026-04-18	MEMRES: A Memory-Augmented Resolver with Confidence Cascade for Agentic Python Dependency Resolution	Dao Sy Duy Minh et.al.	2604.16941	null
2026-04-17	KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving	Yichao Yuan et.al.	2604.16682	null
2026-04-17	POLAR: Online Learning for LoRA Adapter Caching and Routing in Edge LLM Serving	Shaoang Li et.al.	2604.16583	null
2026-04-20	Neurosymbolic Repo-level Code Localization	Xiufeng Xu et.al.	2604.16021	null
2026-04-17	Accuracy Is Speed: Towards Long-Context-Aware Routing for Distributed LLM Serving	Takeshi Yoshimura et.al.	2604.15732	null
2026-04-17	Faster LLM Inference via Sequential Monte Carlo	Yahya Emara et.al.	2604.15672	null
2026-04-16	Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU	Jevin Jiang et.al.	2604.15464	null
2026-04-16	The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference	Ranjith Chodavarapu et.al.	2604.15409	null
2026-04-16	From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning	Kiran Purohit et.al.	2604.15244	null
2026-04-16	Atropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model Hotswap	Naryeong Kim et.al.	2604.15075	null
2026-04-16	Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter	Ruoyu Qin et.al.	2604.15039	null
2026-04-16	Serving Chain-structured Jobs with Large Memory Footprints with Application to Large Foundation Model Serving	Tingyang Sun et.al.	2604.14993	null
2026-04-16	RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding	Zihong Zhang et.al.	2604.14885	link
2026-04-16	SkillDroid: Compile Once, Reuse Forever	Qijia Chen et.al.	2604.14872	null
2026-04-16	CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning	Zhuo Wang et.al.	2604.14768	null
2026-04-16	Acceptance Dynamics Across Cognitive Domains in Speculative Decoding	Saif Mahmoud et.al.	2604.14682	null
2026-04-15	YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference	You Wu et.al.	2604.13556	null
2026-04-15	ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding	Heming Xia et.al.	2604.13519	null
2026-04-14	Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel	Hongyi Jin et.al.	2604.13327	null
2026-04-15	Growing Pains: Extensible and Efficient LLM Benchmarking Via Fixed Parameter Calibration	Eliya Habba et.al.	2604.12843	null
2026-04-14	Three Birds, One Stone: Solving the Communication-Memory-Privacy Trilemma in LLM Fine-tuning Over Wireless Networks with Zeroth-Order Optimization	Zhijie Cai et.al.	2604.12401	null
2026-04-14	oxo-call: Documentation-grounded Skill Augmentation for Accurate Bioinformatics Command-line Generation with Large Language Models	Yun Peng et.al.	2604.12387	null
2026-04-14	LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin Machines	Jiechao Gao et.al.	2604.12223	null
2026-04-14	Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus Score	Manh Nguyen et.al.	2604.12196	null
2026-04-14	PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM Serving	Xu Bai et.al.	2604.12171	null
2026-04-14	Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference	Anes Abdennebi et.al.	2604.12168	null
2026-04-13	ProbeLogits: Kernel-Level LLM Inference Primitives for AI-Native Operating Systems	Daeyeon Son et.al.	2604.11943	null
2026-04-13	Discourse Diversity in Multi-Turn Empathic Dialogue	Hongli Zhan et.al.	2604.11742	null
2026-04-13	From Agent Loops to Structured Graphs:A Scheduler-Theoretic Framework for LLM Agent Execution	Hu Wei et.al.	2604.11378	null
2026-04-13	Flow-Controlled Scheduling for LLM Inference with Provable Stability Guarantees	Zhuolun Dong et.al.	2604.11001	null
2026-04-13	When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies	Zhengzhe Yang et.al.	2604.10996	null
2026-04-14	Ro-SLM: Onboard Small Language Models for Robot Task Planning and Operation Code Generation	Wenhao Wang et.al.	2604.10929	null
2026-04-13	RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving	Hossein Hosseini Kasnavieh et.al.	2604.10907	null
2026-04-12	The xPU-athalon: Quantifying the Competition of AI Acceleration	Alicia Golden et.al.	2604.10852	null
2026-04-11	CodeComp: Structural KV Cache Compression for Agentic Coding	Qiujiang Chen et.al.	2604.10235	null
2026-04-11	WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning	Kaixuan Zhang et.al.	2604.10187	null
2026-04-11	Rebooting Microreboot: Architectural Support for Safe, Parallel Recovery in Microservice Systems	Laurent Bindschaedler et.al.	2604.09963	null
2026-04-10	Dynamic Ranked List Truncation for Reranking Pipelines via LLM-generated Reference-Documents	Nilanjan Sinhababu et.al.	2604.09492	null
2026-04-10	EdgeFlow: Fast Cold Starts for LLMs on Mobile Devices	Yongsheng Yan et.al.	2604.09083	null
2026-04-10	Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures	Mauricio Fadel Argerich et.al.	2604.09048	null
2026-04-09	Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models	Marcus Armstrong et.al.	2604.08335	null
2026-04-09	Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving	Xunzhuo Liu et.al.	2604.08075	null
2026-04-09	Automating aggregation strategy selection in federated learning	Dian S. Y. Pang et.al.	2604.08056	null
2026-04-09	A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators	Cong Li et.al.	2604.08044	null
2026-04-09	Open-Ended Instruction Realization with LLM-Enabled Multi-Planner Scheduling in Autonomous Vehicles	Jiawei Liu et.al.	2604.08031	null
2026-04-09	LogAct: Enabling Agentic Reliability via Shared Logs	Mahesh Balakrishnan et.al.	2604.07988	null
2026-04-09	Robust Length Prediction: A Perspective from Heavy-Tailed Prompt-Conditioned Distributions	Jing Wang et.al.	2604.07931	null
2026-04-09	Valve: Production Online-Offline Inference Colocation with Jointly-Bounded Preemption Latency and Rate	Fangyue Liu et.al.	2604.07874	null
2026-04-09	AsyncTLS: Efficient Generative LLM Inference with Asynchronous Two-level Sparse Attention	Yuxuan Hu et.al.	2604.07815	null
2026-04-09	SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization	Wooin Lee et.al.	2604.07663	null
2026-04-08	DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification	Ziyi Wang et.al.	2604.07622	null
2026-04-08	Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC	Mohammad Siavashi et.al.	2604.07609	null
2026-04-08	Fast Heterogeneous Serving: Scalable Mixed-Scale LLM Allocation for SLO-Constrained Inference	Jiaming Cheng et.al.	2604.07472	null
2026-04-08	SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs	Jintao Zhang et.al.	2604.07396	null
2026-04-08	Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference	Quantong Qiu et.al.	2604.07394	null
2026-04-08	Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics	Youhe Jiang et.al.	2604.07144	null
2026-04-08	Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale	Renzhong Yuan et.al.	2604.06970	null
2026-04-09	TurboAgent: An LLM-Driven Autonomous Multi-Agent Framework for Turbomachinery Aerodynamic Design	Juan Du et.al.	2604.06747	null
2026-04-08	Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start	Xueshen Liu et.al.	2604.06664	null
2026-04-07	Inference-Time Code Selection via Symbolic Equivalence Partitioning	David Cho et.al.	2604.06485	null
2026-04-07	HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference	Bowen Zeng et.al.	2604.05887	null
2026-04-07	Vision-Guided Iterative Refinement for Frontend Code Generation	Hannah Sansford et.al.	2604.05839	null
2026-04-07	CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models	Tim Lukas Adam et.al.	2604.05755	null
2026-04-07	Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion	Zhen Cheng et.al.	2604.05688	null
2026-04-07	Multi-Drafter Speculative Decoding with Alignment Feedback	Taehyeon Kim et.al.	2604.05417	null
2026-04-07	DAT: Dual-Aware Adaptive Transmission for Efficient Multimodal LLM Inference in Edge-Cloud Systems	Qi Guo et.al.	2604.05375	null
2026-04-06	Comparative Characterization of KV Cache Management Strategies for LLM Inference	Oteo Mamo et.al.	2604.05012	null
2026-04-06	GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference	Guoci Chen et.al.	2604.04783	null
2026-04-06	DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators	Zhiwen Mo et.al.	2604.04750	null
2026-04-06	Don’t Waste Bits! Adaptive KV-Cache Quantization for Lightweight On-Device LLMs	Sayed Pedram Haeri Boroujeni et.al.	2604.04722	null
2026-04-06	MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition	Seoungsub Lee et.al.	2604.04701	null
2026-04-06	Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks	Songge Zhang et.al.	2604.04654	null
2026-04-03	Hume’s Representational Conditions for Causal Judgment: What Bayesian Formalization Abstracted Away	Yiling Wu et.al.	2604.03387	null
2026-04-03	TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing	Zhuohang Bian et.al.	2604.03143	null
2026-04-03	Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference	Cornelius Kummer et.al.	2604.02985	null
2026-04-03	MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference	Zheming Yang et.al.	2604.02945	null
2026-04-02	Fast NF4 Dequantization Kernels for Large Language Model Inference	Xiangbo Qi et.al.	2604.02556	null
2026-04-02	OmniTQA: A Cost-Aware System for Hybrid Query Processing over Semi-Structured Data	Nima Shahbazi et.al.	2604.02444	null
2026-04-02	SelRoute: Query-Type-Aware Routing for Long-Term Conversational Memory Retrieval	Matthew McKee et.al.	2604.02431	null
2026-04-02	Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding	Tao Jin et.al.	2604.02047	null
2026-04-02	DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72	Wanqian Li et.al.	2604.01621	null
2026-04-01	Fast and Accurate Probing of In-Training LLMs’ Downstream Performances	Zhichen Liu et.al.	2604.01025	null
2026-04-01	Learning from Many and Adapting to the Unknown in Open-set Test Streams	Xiao Zhang et.al.	2604.00533	null
2026-04-01	Scheduling LLM Inference with Uncertainty-Aware Output Length Predictions	Haoyu Zheng et.al.	2604.00499	null
2026-04-01	TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving	Feng Ren et.al.	2604.00368	null
2026-03-31	ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving	Annette Taberner-Miller et.al.	2604.00136	null
2026-03-30	Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference	Zifan He et.al.	2603.29002	null
2026-03-24	StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving	Azam Nouri et.al.	2603.28795	null
2026-03-30	A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN	Gabriele Gemmi et.al.	2603.28680	null
2026-03-30	Tiered Super-Moore’s Law: Price Evolution, Production Frontiers, and Market Competition in Large Language Model Inference Services	Mingdeng Du et.al.	2603.28576	null
2026-03-31	A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network	Aojie Jiang et.al.	2603.28239	null
2026-03-31	ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing	Edward J. Yoon et.al.	2603.27914	null
2026-03-29	KVSculpt: KV Cache Compression as Distillation	Bo Jiang et.al.	2603.27819	null
2026-03-28	From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification	Huamin Chen et.al.	2603.27299	null
2026-03-28	ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference	Qiuyang Zhang et.al.	2603.27138	null
2026-03-27	MemBoost: A Memory-Boosted Framework for Cost-Aware LLM Inference	Joris Köster et.al.	2603.26557	null
2026-03-27	Rocks, Pebbles and Sand: Modality-aware Scheduling for Multimodal Large Language Model Inference	Konstantinos Papaioannou et.al.	2603.26498	null
2026-03-27	AgentCollab: A Self-Evaluation-Driven Collaboration Paradigm for Efficient LLM Agents	Wenbo Gao et.al.	2603.26034	null
2026-03-26	Supercharging Federated Intelligence Retrieval	Dimitris Stripelis et.al.	2603.25374	null
2026-03-26	Interpretable Zero-shot Referring Expression Comprehension with Query-driven Scene Graphs	Yike Wu et.al.	2603.25004	null
2026-03-25	LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control	Yifeng Zhang et.al.	2603.24361	null
2026-03-25	Self-Distillation for Multi-Token Prediction	Guoliang Zhao et.al.	2603.23911	null
2026-03-24	The Diminishing Returns of Early-Exit Decoding in Modern LLMs	Rui Wei et.al.	2603.23701	null
2026-03-24	Energy Efficient Software Hardware CoDesign for Machine Learning: From TinyML to Large Language Models	Mohammad Saleh Vahdatpour et.al.	2603.23668	null
2026-03-24	LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load	Pranay Tummalapalli et.al.	2603.23640	null
2026-03-24	Sparser, Faster, Lighter Transformer Language Models	Edoardo Cetin et.al.	2603.23198	null
2026-03-24	Characterizing CPU-Induced Slowdowns in Multi-GPU LLM Inference	Euijun Chung et.al.	2603.22774	null
2026-03-23	Chimera: Latency- and Performance-Aware Multi-agent Serving for Heterogeneous LLMs	Kangqi Ni et.al.	2603.22206	null
2026-03-23	GSEM: Graph-based Self-Evolving Memory for Experience Augmented Clinical Reasoning	Xiao Han et.al.	2603.22096	null
2026-03-23	CurvZO: Adaptive Curvature-Guided Sparse Zeroth-Order Optimization for Efficient LLM Fine-Tuning	Shuo Wang et.al.	2603.21725	null
2026-03-25	PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection	Hyoseok Park et.al.	2603.21576	null
2026-03-22	TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference	Jaber Jaber et.al.	2603.21365	null
2026-03-22	The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project	Huamin Chen et.al.	2603.21354	null
2026-03-22	Improving Coherence and Persistence in Agentic AI for System Optimization	Pantea Karimi et.al.	2603.21321	null
2026-03-22	CALVO: Improve Serving Efficiency for LLM Inferences with Intense Network Demands	Weiye Wang et.al.	2603.21257	null
2026-03-22	Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs	Zihui Chen et.al.	2603.21155	null
2026-03-24	WWW.Serve: Interconnecting Global LLM Services through Decentralization	Huanyu Wang et.al.	2603.20661	null
2026-03-20	KV Cache Optimization Strategies for Scalable and Efficient LLM Inference	Yichun Xu et.al.	2603.20397	null
2026-03-20	Utility-Guided Agent Orchestration for Efficient LLM Tool Use	Boyan Liu et.al.	2603.19896	null
2026-03-20	Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification	Baoding He et.al.	2603.19715	null
2026-03-20	HyEvo: Self-Evolving Hybrid Agentic Workflows for Efficient Reasoning	Beibei Xu et.al.	2603.19639	null
2026-03-19	A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference	Yida Zhang et.al.	2603.19133	null
2026-03-19	BeamAgent: LLM-Aided MIMO Beamforming with Decoupled Intent Parsing and Alternating Optimization for Joint Site Selection and Precoding	Xiucheng Wang et.al.	2603.18855	null
2026-03-19	From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning	Grant Wilkins et.al.	2603.18383	null
2026-03-18	Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL	Xunzhuo Liu et.al.	2603.18174	null
2026-03-18	Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction	Xin Wei Chia et.al.	2603.18085	null
2026-03-17	NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference	Zhaohui Geoffrey Wang et.al.	2603.18046	null
2026-03-18	RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference	Arpit Singh Gautam et.al.	2603.17891	null
2026-03-18	Swarm: Co-Activation Aware KVCache Offloading Across Multiple SSDs	Tuowei Wang et.al.	2603.17803	null
2026-03-18	Multi-stage Flow Scheduling for LLM Serving	Yijun Sun et.al.	2603.17456	null
2026-03-18	ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression	Ruibo Fan et.al.	2603.17435	null
2026-03-18	OmniVLN: Omnidirectional 3D Perception and Token-Efficient LLM Reasoning for Visual-Language Navigation across Air and Ground Platforms	Zhongyuang Liu et.al.	2603.17351	null
2026-03-18	IEMAS: An Incentive-Efficiency Routing Framework for Open Agentic Web Ecosystems	Hongze Liu et.al.	2603.17302	null
2026-03-18	The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency	Huamin Chen et.al.	2603.17280	null
2026-03-17	An End-to-End Framework for Functionality-Embedded Provenance Graph Construction and Threat Interpretation	Kushankur Ghosh et.al.	2603.17100	null
2026-03-17	FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism	Huamin Chen et.al.	2603.16514	null
2026-03-17	Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective	Noppanat Wadlom et.al.	2603.16104	null
2026-03-18	Resource Consumption Threats in Large Language Models	Yuanhe Zhang et.al.	2603.16068	null
2026-03-17	inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference	Huamin Chen et.al.	2603.16054	null
2026-03-16	BANGLASOCIALBENCH: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction	Tanvir Ahmed Sijan et.al.	2603.15949	null
2026-03-16	SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration	Yu Pan et.al.	2603.15397	null
2026-03-16	SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation	Zicheng He et.al.	2603.14785	null
2026-03-16	AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems	Zhaohui Geoffrey Wang et.al.	2603.14688	null
2026-03-15	Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use	Ziling Zhou et.al.	2603.14332	null
2026-03-14	SVD Contextual Sparsity Predictors for Fast LLM Inference	Georgii Serbin et.al.	2603.14110	null
2026-03-17	APEX-Searcher: Augmenting LLMs’ Search Capabilities through Agentic Planning and Execution	Kun Chen et.al.	2603.13853	null
2026-03-14	Retrieval-Feedback-Driven Distillation and Preference Alignment for Efficient LLM-based Query Expansion	Minghan Li et.al.	2603.13776	null
2026-03-13	Orla: A Library for Serving LLM-Based Multi-Agent Systems	Rana Shahout et.al.	2603.13605	null
2026-03-13	Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference	Huamin Chen et.al.	2603.13426	null
2026-03-17	Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking	Zizhao Mo et.al.	2603.12831	null
2026-03-13	Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation	Yichen Zhang et.al.	2603.12793	null
2026-03-13	ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning	Shuo Yang et.al.	2603.12740	null
2026-03-13	Cost-Efficient Multimodal LLM Inference via Cross-Tier GPU Heterogeneity	Donglin Yu et.al.	2603.12707	null
2026-03-13	98 $\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router	Xunzhuo Liu et.al.	2603.12646	null
2026-03-13	When Drafts Evolve: Speculative Decoding Meets Online Learning	Yu-Yang Qian et.al.	2603.12617	null
2026-03-12	TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition	Prabhu Vellaisamy et.al.	2603.12465	null
2026-03-10	Detecting Miscitation on the Scholarly Web through LLM-Augmented Text-Rich Graph Learning	Huidong Wu et.al.	2603.12290	null
2026-03-12	IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL	Zhoujun Cheng et.al.	2603.12151	null
2026-03-12	Where Matters More Than What: Decoding-aligned KV Cache Compression via Position-aware Pseudo Queries	Zhenxu Tian et.al.	2603.11564	null
2026-03-11	Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI	Yonas Atinafu et.al.	2603.11340	null
2026-03-11	Markovian Generation Chains in Large Language Models	Mingmeng Geng et.al.	2603.11228	null
2026-03-11	Leech Lattice Vector Quantization for Efficient LLM Compression	Tycho F. A. van der Ouderaa et.al.	2603.11021	null
2026-03-11	CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems	Panagiotis Georgios Pennas et.al.	2603.10726	null
2026-03-11	S-HPLB: Efficient LLM Attention Serving via Sparsity-Aware Head Parallelism Load Balance	Di Liu et.al.	2603.10353	null
2026-03-11	MultiwayPAM: Multiway Partitioning Around Medoids for LLM-as-a-Judge Score Analysis	Chihiro Watanabe et.al.	2603.10287	null
2026-03-10	ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling	Dechuan Teng et.al.	2603.09691	null
2026-03-10	Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation	Luxi Lin et.al.	2603.09527	null
2026-03-10	PIM-SHERPA: Software Method for On-device LLM Inference by Resolving PIM Memory Attribute and Layout Inconsistencies	Sunjung Lee et.al.	2603.09216	null
2026-03-10	FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation	Yinpeng Wu et.al.	2603.09046	null
2026-03-09	Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning	Juming Xiong et.al.	2603.08999	null
2026-03-09	ConFu: Contemplate the Future for Better Speculative Sampling	Zongyue Qin et.al.	2603.08899	null
2026-03-07	Turn: A Language for Agentic Computation	Muyukani Kizito et.al.	2603.08755	null
2026-03-09	SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization	Yeonsik Park et.al.	2603.08185	null
2026-03-09	EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs	Chang Han et.al.	2603.08088	null
2026-03-09	Deterministic Differentiable Structured Pruning for Large Language Models	Weiyu Huang et.al.	2603.08065	null
2026-03-09	DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention	Younjoo Lee et.al.	2603.08026	null
2026-03-09	Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization	Jingwei Li et.al.	2603.08022	null
2026-03-09	SageSched: Efficient LLM Scheduling Confronting Demand Uncertainty and Hybridity	Zhenghao Gan et.al.	2603.07917	null
2026-03-09	Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents	Jingbo Yang et.al.	2603.07915	null
2026-03-08	Temperature-Aware Scheduling of LLM Inference in Large-Scale Geo-Distributed Edge Data Centers with Distributed Optimization	Arash Khalatbarisoltani et.al.	2603.07810	null
2026-03-08	ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs	Yuzhuang Xu et.al.	2603.07770	null
2026-03-06	MoEless: Efficient MoE LLM Serving via Serverless Computing	Hanfei Yu et.al.	2603.06350	null
2026-03-06	LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis	Tao Zhang et.al.	2603.05904	null
2026-03-06	Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation	Changcheng Li et.al.	2603.05881	null
2026-03-05	Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks	Burak Topcu et.al.	2603.05692	null
2026-03-05	POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation	Zeju Qiu et.al.	2603.05500	null
2026-03-05	Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity	Di Zhang et.al.	2603.05168	null
2026-03-05	Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents	Natchanon Pollertlam et.al.	2603.04814	null
2026-03-05	Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator	Cong Li et.al.	2603.04797	null
2026-03-05	SLO-Aware Compute Resource Allocation for Prefill-Decode Disaggregated LLM Inference	Luchang Li et.al.	2603.04716	null
2026-03-04	Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows	Alfio Massimiliano Gliozzo et.al.	2603.04241	null
2026-03-04	A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality	Arther Tian et.al.	2603.04028	null
2026-03-03	From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?	Shinas Shaji et.al.	2603.03148	null
2026-03-03	SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark Driven Embodiment	Priyavanshi Pathania et.al.	2603.02949	null
2026-03-03	Agentic Self-Evolutionary Replanning for Embodied Navigation	Guoliang Li et.al.	2603.02772	null
2026-03-03	Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference	Yiqi Liu et.al.	2603.02737	null
2026-03-03	SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving	Sunghyeon Woo et.al.	2603.02599	null
2026-03-02	Beyond Microservices: Testing Web-Scale RCA Methods on GPU-Driven LLM Workloads	Dominik Scheinert et.al.	2603.02057	null
2026-03-02	Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning	Jiebin Zhang et.al.	2603.01639	null
2026-03-02	Graph-Based Self-Healing Tool Routing for Cost-Efficient LLM Agents	Neeraj Bholani et.al.	2603.01548	null
2026-03-02	Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report)	Yu Lin et.al.	2603.01499	null
2026-03-02	Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study	Emmanuel Aboah Boateng et.al.	2603.01486	null
2026-03-02	SFCo-Nav: Efficient Zero-Shot Visual Language Navigation via Collaboration of Slow LLM and Fast Attributed Graph Alignment	Chaoran Xiong et.al.	2603.01477	null
2026-03-02	Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification	Guang Huang et.al.	2603.01399	null
2026-02-27	Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving	Ferran Agullo et.al.	2602.24044	null
2026-02-27	LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding	Alexander Samarin et.al.	2602.23881	null
2026-02-27	SLA-Aware Distributed LLM Inference Across Device-RAN-Cloud	Hariz Yet et.al.	2602.23722	null
2026-02-26	Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems	Siyuan Liu et.al.	2602.23266	null
2026-02-26	LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure	Jaehong Cho et.al.	2602.23036	null
2026-02-26	Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching	Hiroki Matsutani et.al.	2602.22812	null
2026-02-26	Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement	Shuchen Zhu et.al.	2602.22681	null
2026-02-26	Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning	Qin-Wen Luo et.al.	2602.22642	null
2026-03-02	FLYING SERVING: On-the-Fly Parallelism Switching for Large Language Model Serving	Shouwei Gao et.al.	2602.22593	null
2026-02-25	AutoQRA: Joint Optimization of Mixed-Precision Quantization and Low-rank Adapters for Efficient LLM Fine-Tuning	Changhai Zhou et.al.	2602.22268	null
2026-02-25	Sustainable LLM Inference using Context-Aware Model Switching	Yuvarani et.al.	2602.22261	null
2026-02-25	Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text	Bitan Majumder et.al.	2602.21933	null
2026-02-25	Multi-Layer Scheduling for MoE-Based LLM Reasoning	Yifan Sun et.al.	2602.21626	null
2026-02-26	DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference	Yongtong Wu et.al.	2602.21548	null
2026-02-25	Pancake: Hierarchical Memory System for Multi-Agent LLM Serving	Zhengding Hu et.al.	2602.21477	null
2026-02-24	SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks	Elizabeth S. Z. Tan et.al.	2602.21307	null
2026-02-24	ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments	Haley Li et.al.	2602.21140	null
2026-02-24	CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference	Chao Fei et.al.	2602.20732	null
2026-02-24	OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services	Longxiang Wang et.al.	2602.20595	null
2026-02-24	FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill	Rakshith Jayanth et.al.	2602.20515	null
2026-02-23	KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem	Seongjin Cha et.al.	2602.20217	null
2026-02-21	MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs	Dongwei Wang et.al.	2602.20191	null
2026-02-23	ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?	Ayush Nangia et.al.	2602.19594	null
2026-02-22	A Power Market Model with Hypersaclers and Modular Datacenters	Yihsu Chen et.al.	2602.19310	null
2026-02-22	Scaling Inference-Time Computation via Opponent Simulation: Enabling Online Strategic Adaptation in Repeated Negotiation	Xiangyu Liu et.al.	2602.19309	null
2026-02-21	WANSpec: Leveraging Global Compute Capacity for LLM Inference	Noah Martin et.al.	2602.18931	null
2026-02-25	BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS	Omar Basit et.al.	2602.18755	null
2026-02-21	HillInfer: Efficient Long-Context LLM Inference on the Edge with Hierarchical KV Eviction using SmartSSD	He Sun et.al.	2602.18750	null
2026-02-24	RPU – A Reasoning Processing Unit	Matthew Adiletta et.al.	2602.18568	null
2026-02-20	Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative Filtering	Jiayi Wu et.al.	2602.18249	null
2026-02-24	MASPO: Unifying Gradient Utilization, Probability Mass, and Signal Reliability for Robust and Sample-Efficient LLM Reasoning	Xiaoliang Fu et.al.	2602.17550	null
2026-02-19	Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs	Arka Pal et.al.	2602.17223	null
2026-02-18	Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark	Charalampos Mastrokostas et.al.	2602.16811	null
2026-02-18	Privacy-Aware Split Inference with Speculative Decoding for Large Language Models over Wide-Area Networks	Michael Cunningham et.al.	2602.16760	null
2026-02-18	FlowPrefill: Decoupling Preemption from Prefill Scheduling Granularity to Mitigate Head-of-Line Blocking in LLM Serving	Chia-chi Hsieh et.al.	2602.16603	null
2026-02-18	LLM-Driven Intent-Based Privacy-Aware Orchestration Across the Cloud-Edge Continuum	Zijie Su et.al.	2602.16100	null
2026-02-17	CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill	Bradley McDanel et.al.	2602.16054	null
2026-02-17	MoE-Spec: Expert Budgeting for Efficient Speculative Decoding	Bradley McDanel et.al.	2602.16052	null
2026-02-17	Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation	Shutian Gu et.al.	2602.15724	null
2026-02-17	LLM-as-Judge on a Budget	Aadirupa Saha et.al.	2602.15481	null
2026-02-16	Text Style Transfer with Parameter-efficient LLM Finetuning and Round-trip Translation	Ruoxi Liu et.al.	2602.15013	null
2026-02-16	Efficient Multi-round LLM Inference over Disaggregated Serving	Wenhao He et.al.	2602.14516	null
2026-02-16	WiSparse: Boosting LLM Inference Efficiency with Weight-Aware Mixed Activation Sparsity	Lei Chen et.al.	2602.14452	null
2026-02-15	HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming	Jiahui Chen et.al.	2602.14214	null
2026-02-14	ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System	Hao Kang et.al.	2602.13692	null
2026-02-13	Characterize LSM-tree Compaction Performance via On-Device LLM Inference	Jiabiao Ding et.al.	2602.12669	null
2026-02-13	Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats	Pengxiang Zhao et.al.	2602.12635	null
2026-02-13	TensorCommitments: A Lightweight Verifiable Inference for Language Models	Oguzhan Baser et.al.	2602.12630	null
2026-02-12	OServe: Accelerating LLM Serving via Spatial-Temporal Workload Orchestration	Youhe Jiang et.al.	2602.12151	null
2026-02-12	PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving	Sunghyeon Woo et.al.	2602.12029	null
2026-02-12	Predicting LLM Output Length via Entropy-Guided Representations	Huanyi Xie et.al.	2602.11812	null
2026-02-12	Deep Kernel Fusion for Transformers	Zixi Zhang et.al.	2602.11808	null
2026-02-12	GORGO: Maximizing KV-Cache Reuse While Minimizing Network Latency in Cross-Region LLM Load Balancing	Alessio Ricci Toniolo et.al.	2602.11688	null
2026-02-12	LoRA-based Parameter-Efficient LLMs for Continuous Learning in Edge-based Malware Detection	Christian Rondanini et.al.	2602.11655	null
2026-02-12	PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models	Eunyeong Cho et.al.	2602.11530	null
2026-02-12	PAM: Processing Across Memory Hierarchy for Efficient KV-centric LLM Serving System	Lian Liu et.al.	2602.11521	null
2026-02-12	Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt	Yujie Gu et.al.	2602.11513	null
2026-02-12	Cachemir: Fully Homomorphic Encrypted Inference of Generative Large Language Model with KV Cache	Ye Yu et.al.	2602.11470	null
2026-02-12	FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight	Jiayi Zhou et.al.	2602.11136	null
2026-02-11	Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise	Abhishek Saini et.al.	2602.11088	null
2026-02-11	BOute: Cost-Efficient LLM Serving with Heterogeneous LLMs and GPUs via Multi-Objective Bayesian Optimization	Youhe Jiang et.al.	2602.10729	null
2026-02-12	S-GRec: Personalized Semantic-Aware Generative Recommendation with Asymmetric Advantage	Jie Jiang et.al.	2602.10606	null
2026-02-12	QTALE: Quantization-Robust Token-Adaptive Layer Execution for LLMs	Kanghyun Noh et.al.	2602.10431	null
2026-02-10	Beyond SMILES: Evaluating Agentic Systems for Drug Discovery	Edward Wijaya et.al.	2602.10163	null
2026-02-12	Internalizing Multi-Agent Reasoning for Accurate and Efficient LLM-based Recommendation	Yang Wu et.al.	2602.09829	null
2026-02-12	Efficient Remote Prefix Fetching with GPU-native Media ASICs	Liang Mi et.al.	2602.09725	null
2026-02-10	MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering	Sieun Hyeon et.al.	2602.09642	null
2026-02-10	Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning	Zhida Jiang et.al.	2602.09578	null
2026-02-10	LLM-CoOpt: A Co-Design and Optimization Framework for Efficient LLM Inference on Heterogeneous Platforms	Jie Kong et.al.	2602.09323	null
2026-02-09	PABU: Progress-Aware Belief Update for Efficient LLM Agents	Haitao Jiang et.al.	2602.09138	null
2026-02-09	Benchmarking the Energy Savings with Speculative Decoding Strategies	Rohit Dutta et.al.	2602.09113	null
2026-02-09	FlattenGPT: Depth Compression for Transformer with Layer Flattening	Ruihan Xu et.al.	2602.08858	null
2026-02-09	Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems	Lang Feng et.al.	2602.08847	null
2026-02-09	QUOKA: Query-Oriented KV Selection For Efficient LLM Prefill	Dalton Jones et.al.	2602.08722	null
2026-02-09	Near-Oracle KV Selection via Pre-hoc Sparsity for Long-Context Inference	Yifei Gao et.al.	2602.08329	null
2026-02-10	Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices	Alejandro Ruiz y Mesa et.al.	2602.08060	null
2026-02-08	Accuracy-Delay Trade-Off in LLM Offloading via Token-Level Uncertainty	Yumin Kim et.al.	2602.07958	null
2026-02-08	MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation	Yu Zhao et.al.	2602.07905	null
2026-02-08	Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model	Tianyi Wang et.al.	2602.07878	null
2026-02-10	ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs	Yanlin Qi et.al.	2602.07721	null
2026-02-07	A Two-Layer Framework for Joint Online Configuration Selection and Admission Control	Owen Shen et.al.	2602.07663	null
2026-02-07	Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference	Hoang Anh Duy Le et.al.	2602.07397	null
2026-02-07	Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization	Chong Wang et.al.	2602.07306	null
2026-02-06	SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding	Yikang Yue et.al.	2602.07223	null
2026-02-06	When RL Meets Adaptive Speculative Training: A Unified Training-Serving System	Junxiong Wang et.al.	2602.06932	null
2026-02-06	DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving	Ying Yuan et.al.	2602.06502	null
2026-02-06	Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making	Khurram Yamin et.al.	2602.06286	null
2026-02-06	RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution	Isaac Picov et.al.	2602.06275	null
2026-02-03	PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference	Rui Ning et.al.	2602.06072	null
2026-02-05	Towards Green AI: Decoding the Energy of LLM Inference in Software Development	Lola Solovyeva et.al.	2602.05712	null
2026-02-05	Determining Energy Efficiency Sweet Spots in Production LLM Inference	Hiari Pizzini Cavagna et.al.	2602.05695	null
2026-02-05	Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers	Jingkai Huang et.al.	2602.05395	null
2026-02-05	RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs	Youngcheon You et.al.	2602.05367	null
2026-02-05	TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference	Jiyoung Park et.al.	2602.05145	null
2026-02-04	GPU-to-Grid: Voltage Regulation via GPU Utilization Control	Zhirui Liang et.al.	2602.05116	null
2026-02-04	LinGO: A Linguistic Graph Optimization Framework with LLMs for Interpreting Intents of Online Uncivil Discourse	Yuan Zhang et.al.	2602.04693	null
2026-02-04	Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference	Xinyu Wang et.al.	2602.04595	null
2026-02-04	LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding	Gang Lin et.al.	2602.04541	null
2026-02-04	Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning	Yansong Ning et.al.	2602.04284	null
2026-02-04	BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models	Junyu Chen et.al.	2602.04163	null
2026-02-03	MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling	Ning Ding et.al.	2602.03359	null
2026-02-03	DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference	Jiancai Ye et.al.	2602.03184	null
2026-02-03	NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference	Jiangyong Yu et.al.	2602.02988	null
2026-02-03	Large-Scale LLM Inference with Heterogeneous Workloads: Prefill-Decode Contention and Asymptotically Optimal Control	Ruihan Lin et.al.	2602.02987	null
2026-02-03	3D-Learning: Diffusion-Augmented Distributionally Robust Decision-Focused Learning	Jiaqi Wen et.al.	2602.02943	null
2026-02-02	A Single Revision Step Improves Token-Efficient LLM Reasoning	Yingchuan Zhang et.al.	2602.02828	null
2026-02-02	Trust by Design: Skill Profiles for Transparent, Cost-Aware LLM Routing	Mika Okamoto et.al.	2602.02386	null
2026-02-02	Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing	Lingkun Long et.al.	2602.02159	null
2026-02-02	Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?	Susan Liang et.al.	2602.01623	null
2026-02-01	Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models	Katrina Brown et.al.	2602.01237	null
2026-02-01	Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching	Tianhao Miao et.al.	2602.01233	null
2026-02-01	A State-Transition Framework for Efficient LLM Reasoning	Liang Zhang et.al.	2602.01198	null
2026-02-01	ReLayout: Versatile and Structure-Preserving Design Layout Editing via Relation-Aware Design Reconstruction	Jiawei Lin et.al.	2602.01046	null
2026-02-01	ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning	Zhishen Sun et.al.	2602.01003	null
2026-01-31	Sparsity-Aware Unlearning for Large Language Models	Yuze Wang et.al.	2602.00577	null
2026-01-30	Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity	Aayush Gautam et.al.	2602.00397	null
2026-01-30	Harvest: Opportunistic Peer-to-Peer GPU Caching for LLM Inference	Nikhil Gopal et.al.	2602.00328	null
2026-01-30	EigenAI: Deterministic Inference, Verifiable Results	David Ribeiro Alves et.al.	2602.00182	null
2026-01-30	Safer Policy Compliance with Dynamic Epistemic Fallback	Joseph Marvin Imperial et.al.	2601.23094	null
2026-01-30	InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning	Junyou Su et.al.	2601.23006	null
2026-01-30	Competitive Non-Clairvoyant KV-Cache Scheduling for LLM Inference	Yiding Feng et.al.	2601.22996	null
2026-01-30	Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding	Zhanglu Yan et.al.	2601.22876	null
2026-01-30	OSNIP: Breaking the Privacy-Utility-Efficiency Trilemma in LLM Inference via Obfuscated Semantic Null Space	Zhiyuan Cao et.al.	2601.22752	null
2026-01-30	CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control	Qiaoling Chen et.al.	2601.22705	null
2026-01-30	Small is Beautiful: A Practical and Efficient Log Parsing Framework	Minxing Wang et.al.	2601.22590	null
2026-01-30	SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation	Ruiqi Zheng et.al.	2601.22543	null
2026-01-30	Towards Resiliency in Large Language Model Serving with KevlarFlow	Shangshu Qian et.al.	2601.22438	null
2026-01-29	Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use	Julien Delavande et.al.	2601.22362	null
2026-01-29	Small Talk, Big Impact: The Energy Cost of Thanking AI	Julien Delavande et.al.	2601.22357	null
2026-01-29	Causal Autoregressive Diffusion Language Model	Junhao Ruan et.al.	2601.22031	null
2026-01-29	A Unified XAI-LLM Approach for EndotrachealSuctioning Activity Recognition	Hoang Khang Phan et.al.	2601.21802	null
2026-01-29	EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference	Bronislav Sidik et.al.	2601.21758	null
2026-01-29	ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management	Zaifeng Pan et.al.	2601.21473	null
2026-01-29	Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving	Chendong Song et.al.	2601.21351	null
2026-01-29	Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks	Arther Tian et.al.	2601.21189	null
2026-01-28	ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference	Ketan Thakkar et.al.	2601.21109	null
2026-01-29	ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler	Bohua Zou et.al.	2601.20755	null
2026-01-29	DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning	Yanlin Wang et.al.	2601.20615	null
2026-01-28	TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs	Minjae Lee et.al.	2601.20357	null
2026-01-28	Beyond Speedup – Utilizing KV Cache for Sampling and Reasoning	Zeyu Xing et.al.	2601.20326	null
2026-01-28	SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips	Jiahuan Yu et.al.	2601.20309	null
2026-01-28	LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis	Marcus Emmanuel Barnes et.al.	2601.20148	null
2026-01-27	Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering	Fangan Dong et.al.	2601.19847	null
2026-01-27	*Algorithmic Prompt-Augmentation for Efficient LLM-Based Heuristic Design for A Search**	Thomas Bömer et.al.	2601.19622	null
2026-01-29	PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving Systems	Amit Singh Bhatti et.al.	2601.19402	null
2026-01-27	DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference	Fuliang Liu et.al.	2601.19278	null
2026-01-29	Native LLM and MLLM Inference at Scale on Apple Silicon	Wayner Barrios et.al.	2601.19139	null
2026-01-26	Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective	Fangzhou Wu et.al.	2601.18999	null
2026-01-26	Flatter Tokens are More Valuable for Speculative Draft Model Training	Jiaming Fan et.al.	2601.18902	null
2026-01-26	Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B	Jaiyoung Park et.al.	2601.18511	null
2026-01-26	CovertComBench: The First Domain-Specific Testbed for LLMs in Wireless Covert Communication	Zhaozhi Liu et.al.	2601.18315	null
2026-01-26	FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning	Lin Sun et.al.	2601.18116	null
2026-01-25	A Universal Load Balancing Principle and Its Application to Large Language Model Serving	Zixi Chen et.al.	2601.17855	null
2026-01-25	LLM-42: Enabling Determinism in LLM Inference with Verified Speculation	Raja Gond et.al.	2601.17768	null
2026-01-25	Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction	Jang-Hyun Kim et.al.	2601.17668	null
2026-01-24	GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference	Thomas Ziller et.al.	2601.17551	null
2026-01-24	Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning	Lianlei Shan et.al.	2601.17275	null
2026-01-22	FlexLLM: Composable HLS Library for Flexible Hybrid LLM Accelerator Design	Jiahao Zhang et.al.	2601.15710	null
2026-01-21	Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform	Jiazhu Xie et.al.	2601.15528	null
2026-01-21	MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification	Jingwei Song et.al.	2601.15498	null
2026-01-21	DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs	Mingxuan Song et.al.	2601.14711	null
2026-01-21	QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design	Nilesh Prasad Pandey et.al.	2601.14549	null
2026-01-20	Confident Rankings with Fewer Items: Adaptive LLM Evaluation with Continuous Scores	Esma Balkır et.al.	2601.13885	null
2026-01-20	ELSA: Efficient LLM-Centric Split Aggregation for Privacy-Aware Hierarchical Federated Learning over Resource-Constrained Edge Networks	Xiaohong Yang et.al.	2601.13824	null
2026-01-20	HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference	Zhiyuan Shi et.al.	2601.13684	null
2026-01-20	PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator	Yue Jiet Chong et.al.	2601.13628	null
2026-01-19	Explicit Cognitive Allocation: A Principle for Governed and Auditable Inference in Large Language Models	Héctor Manuel Manzanilla-Granados et.al.	2601.13443	null
2026-01-19	Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference	Zimeng Wu et.al.	2601.13155	null
2026-01-19	FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference	Chaeyoung Jung et.al.	2601.13143	null
2026-01-19	Sutradhara: An Intelligent Orchestrator-Engine Co-design for Tool-based Agentic Inference	Anish Biswas et.al.	2601.12967	null
2026-01-19	From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation	Jiahao Wang et.al.	2601.12904	null
2026-01-23	An Evolutionary Framework for Automatic Optimization Benchmark Generation via Large Language Models	Yuhiro Ono et.al.	2601.12723	null
2026-01-18	Power Aware Dynamic Reallocation For Inference	Yiwei Jiang et.al.	2601.12241	null
2026-01-16	RAPID-Serve: Resource-efficient and Accelerated P/D Intra-GPU Disaggregation	Amna Masood et.al.	2601.11822	null
2026-01-16	PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation	Yu Yang et.al.	2601.11702	null
2026-01-16	HALO: Semantic-Aware Distributed LLM Inference in Lossy Edge Network	Peirong Zheng et.al.	2601.11676	null
2026-01-15	WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching	Xiangchen Li et.al.	2601.11652	null
2026-01-16	FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning	Zhihan Yang et.al.	2601.11311	null
2026-01-16	SwiftKV: An Edge-Oriented Attention Algorithm and Multi-Head Accelerator for Fast, Efficient LLM Decoding	Junming Zhang et.al.	2601.10953	null
2026-01-15	Mugi: Value Level Parallelism For Efficient LLMs	Daniel Price et.al.	2601.10823	null
2026-01-14	Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs	Jonathan Knoop et.al.	2601.09527	null
2026-01-19	RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering	Wencheng Ye et.al.	2601.09269	null
2026-01-14	LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference	Du Yin et.al.	2601.09258	null
2026-01-14	Evaluating local large language models for structured extraction from endometriosis-specific transvaginal ultrasound reports	Haiyi Li et.al.	2601.09053	null
2026-01-13	HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding	Qitan Lv et.al.	2601.08273	null
2026-01-13	Coordinated Cooling and Compute Management for AI Datacenters	Nardos Belay Abera et.al.	2601.08113	null
2026-01-13	Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment	Qitao Tan et.al.	2601.08089	null
2026-01-12	Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference	Rei Taniguchi et.al.	2601.07667	null
2026-01-12	ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs	Haoqian Meng et.al.	2601.07475	null
2026-01-12	TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees	Tianyu Liu et.al.	2601.07353	null
2026-01-12	Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition	Tanmay Joshi et.al.	2601.07239	null
2026-01-11	MicLog: Towards Accurate and Efficient LLM-based Log Parsing via Progressive Meta In-Context Learning	Jianbo Yu et.al.	2601.07005	null
2026-01-09	AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving	Tianhao Xu et.al.	2601.06288	null
2026-01-07	AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization	Zhiqiang Wang et.al.	2601.06177	null
2026-01-08	Publishing FAIR and Machine-actionable Reviews in Materials Science: The Case for Symbolic Knowledge in Neuro-symbolic Artificial Intelligence	Jennifer D’Souza et.al.	2601.05051	null
2026-01-14	Challenges and Research Directions for Large Language Model Inference Hardware	Xiaoyu Ma et.al.	2601.05047	null
2026-01-08	CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters	Ao Sun et.al.	2601.04885	null
2026-01-08	Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence	Shengyin Sun et.al.	2601.04766	null
2026-01-08	GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models	Maanas Taneja et.al.	2601.04719	null
2026-01-08	Prior-Informed Zeroth-Order Optimization with Adaptive Direction Alignment for Memory-Efficient LLM Fine-Tuning	Feihu Jin et.al.	2601.04710	null
2026-01-07	XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs	Linzhang Li et.al.	2601.04426	null
2026-01-06	Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning	Yu Luo et.al.	2601.03320	null
2026-01-01	$α^3$ -Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks	Mohamed Amine Ferrag et.al.	2601.03281	null
2026-01-06	Joint Encoding of KV-Cache Blocks for Scalable LLM Serving	Joseph Kampeas et.al.	2601.03067	null
2026-01-05	LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference	Hossein Rajabzadeh et.al.	2601.02569	null
2026-01-04	Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration	Albert Sadowski et.al.	2601.01609	null
2026-01-06	Making MoE-based LLM Inference Resilient with Tarragon	Songyu Zhang et.al.	2601.01310	null
2026-01-08	From Policy to Logic for Efficient and Interpretable Coverage Assessment	Rhitabrat Pokharel et.al.	2601.01266	null
2025-12-31	Universal Conditional Logic: A Formal Language for Prompt Engineering	Anthony Mikinka et.al.	2601.00880	null
2026-01-02	HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts	Zihan Fang et.al.	2601.00583	null
2026-01-01	Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving	Amey Agrawal et.al.	2601.00397	null
2026-01-01	FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems	Shanli Xing et.al.	2601.00227	null
2025-12-31	FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference	Fen-Yu Hsieh et.al.	2512.24713	null
2026-01-04	Hardware Acceleration for Neural Networks: A Comprehensive Survey	Bin Xu et.al.	2512.23914	null
2025-12-29	Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding	Yue Guan et.al.	2512.23858	null
2025-12-25	Break Out the Silverware – Semantic Understanding of Stored Household Items	Michaela Levi-Richter et.al.	2512.23739	null
2025-12-28	Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware	Alex Khalil et.al.	2512.23029	null
2025-12-28	Argus: Token Aware Distributed LLM Inference Optimization	Panlong Wu et.al.	2512.22925	null
2025-12-27	Modality Inflation: Energy Characterization and Optimization Opportunities for MLLM Inference	Mona Moghadampanah et.al.	2512.22695	null
2025-12-27	Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving	Rui Li et.al.	2512.22420	null
2025-12-22	Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs	Xinhao Cheng et.al.	2512.22219	null
2025-12-20	MatKV: Trading Compute for Flash Storage in LLM Inference	Kun-Woo Shin et.al.	2512.22195	null
2025-12-26	Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling	Hannah Atmer et.al.	2512.22066	null
2025-12-26	Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models	Tingyang Sun et.al.	2512.21884	null
2025-12-26	LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices	Mingyu Sun et.al.	2512.21835	null
2025-12-25	nncase: An End-to-End Compiler for Efficient LLM Deployment on Heterogeneous Storage Architectures	Hui Guo et.al.	2512.21571	null
2025-12-25	Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model	Yanhao Li et.al.	2512.21540	null
2025-12-23	KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System	Zhongyu Xia et.al.	2512.20299	null
2025-12-23	Predictive-LoRA: A Proactive and Fragmentation-Aware Serverless Inference System for LLMs	Yinan Ni et.al.	2512.20210	null
2025-12-23	Concept Generalization in Humans and Large Language Models: Insights from the Number Game	Arghavan Bazigaran et.al.	2512.20162	null
2025-12-22	Demystifying LLM-as-a-Judge: Analytically Tractable Model for Inference-Time Scaling	Indranil Halder et.al.	2512.19905	null
2025-12-22	L4: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling	Yitao Yuan et.al.	2512.19179	null
2025-12-22	FASTRIC: Prompt Specification Language for Verifiable LLM Interactions	Wen-Long Jin et.al.	2512.18940	null
2025-12-20	LLM-based Few-Shot Early Rumor Detection with Imitation Agent	Fengzhu Zeng et.al.	2512.18352	null
2025-12-20	TraCT: Disaggregated LLM Serving with CXL Shared Memory KV Cache at Rack-Scale	Dongha Yoon et.al.	2512.18194	null
2025-12-20	Making Strong Error-Correcting Codes Work Effectively for HBM in AI Inference	Rui Xie et.al.	2512.18152	null
2025-12-19	Specification and Detection of LLM Code Smells	Brahim Mahmoudi et.al.	2512.18020	null
2025-12-19	CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs	Gunho Park et.al.	2512.17970	null
2025-12-19	Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing	Lingxiao Zhao et.al.	2512.17574	null
2025-12-22	Learning What to Write: Write-Gated KV for Efficient Long-Context Inference	Yen-Chieh Huang et.al.	2512.17452	null
2025-12-18	Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving	Jiakun Fan et.al.	2512.17077	null
2025-12-18	MEPIC: Memory Efficient Position Independent Caching for LLM Serving	Qian Wang et.al.	2512.16822	null
2025-12-18	Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference	Dhruv Deshmukh et.al.	2512.16391	null
2025-12-18	Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference	Arther Tian et.al.	2512.16317	null
2025-12-18	Fast Collaborative Inference via Distributed Speculative Decoding	Ce Zheng et.al.	2512.16273	null
2025-12-18	Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference	Jian Tian et.al.	2512.16134	null
2025-12-18	WeMusic-Agent: Efficient Conversational Music Recommendation via Knowledge Internalization and Agentic Boundary Learning	Wendong Bi et.al.	2512.16108	null
2025-12-19	LLM4Perf: Large Language Models Are Effective Samplers for Multi-Objective Performance Modeling	Xin Wang et.al.	2512.16070	null
2025-12-18	MultiPath Transfer Engine: Breaking GPU and Host-Memory Bandwidth Bottlenecks in LLM Services	Lingfeng Tang et.al.	2512.16056	null
2025-12-16	EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving	Shaoting Feng et.al.	2512.14946	null
2025-12-16	Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement	Songze Liu et.al.	2512.14151	null
2025-12-16	RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees	Junjie Ma et.al.	2512.14069	null
2025-12-16	MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning	Haoyu Fu et.al.	2512.13636	null
2025-12-15	PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving	Weizhe Huang et.al.	2512.12928	null
2025-12-14	Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM	Furong Jia et.al.	2512.12868	null
2025-12-14	Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution	Boyang Yan et.al.	2512.12806	null
2025-12-14	Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P	Anurag Dutt et.al.	2512.12801	null
2025-12-19	V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval	Donghyuk Kim et.al.	2512.12284	null
2025-12-13	WATOS: Efficient LLM Training Strategies and Architecture Co-exploration for Wafer-scale Chip	Huizheng Wang et.al.	2512.12279	null
2025-12-12	Learning to Extract Context for Context-Aware LLM Inference	Minseon Kim et.al.	2512.11986	null
2025-12-11	CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving	Dong Liu et.al.	2512.11920	null
2025-12-12	PD-Swap: Prefill-Decode Logic Swapping for End-to-End LLM Inference on Edge FPGAs via Dynamic Partial Reconfiguration	Yifan Zhang et.al.	2512.11550	null
2025-12-12	xGR: Efficient Generative Recommendation Serving at Scale	Qingxiao Sun et.al.	2512.11529	null
2025-12-12	AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference	Kuan-Wei Lu et.al.	2512.11280	null
2025-12-12	Adaptive Soft Rolling KV Freeze with Entropy-Guided Recovery: Sublinear Memory Growth for Efficient LLM Inference	Adilet Metinov et.al.	2512.11221	null
2025-12-11	ESS: An Offload-Centric Latent-Cache Management Architecture for DeepSeek-V3.2-Exp	Xinhang Chen et.al.	2512.10576	null
2025-12-11	LLM-Auction: Generative Auction towards LLM-Native Advertising	Chujie Zhao et.al.	2512.10551	null
2025-12-12	BAMBO: Construct Ability and Efficiency LLM Pareto Set via Bayesian Adaptive Multi-objective Block-wise Optimization	Kesheng Chen et.al.	2512.09972	null
2025-12-10	GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference	Phuong Tran et.al.	2512.09963	null
2025-12-07	ELANA: A Simple Energy and Latency Analyzer for LLMs	Hung-Yueh Chiang et.al.	2512.09946	null
2025-12-11	Exqutor: Extended Query Optimizer for Vector-augmented Analytical Queries	Hyunjoon Kim et.al.	2512.09695	null
2025-12-10	WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving	Chiheng Lou et.al.	2512.09472	null
2025-12-10	ODMA: On-Demand Memory Allocation Framework for LLM Serving on LPDDR-Class Accelerators	Guoqiang Zou et.al.	2512.09427	null
2025-12-10	RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference	Siyuan Ma et.al.	2512.09304	null
2025-12-09	LaMoSys3.5D: Enabling 3.5D-IC-Based Large Language Model Inference Serving Systems via Hardware/Software Co-Design	Qipan Wang et.al.	2512.08731	null
2025-12-09	Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging	Yi Pan et.al.	2512.08365	null
2025-12-08	LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples	Yezi Liu et.al.	2512.07375	null
2025-12-08	Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning	Yezi Liu et.al.	2512.07374	null
2025-12-08	NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models	Feng Liang et.al.	2512.07218	null
2025-12-08	FOAM: Blocked State Folding for Memory-Efficient LLM Training	Ziqing Wen et.al.	2512.07112	null
2025-12-08	Leveraging KV Similarity for Online Structured Pruning in LLMs	Jungmin Lee et.al.	2512.07090	null
2025-12-11	LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding	Yu Yu et.al.	2512.06982	null
2025-12-07	PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance	Jifar Wakuma Ayana et.al.	2512.06747	null
2025-12-07	KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models	Sourjya Roy et.al.	2512.06727	null
2025-12-06	Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices	Xiangyu Li et.al.	2512.06443	null
2025-12-05	Compass: Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads	Boyu Li et.al.	2512.06093	null
2025-12-05	MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution	Sara Patel et.al.	2512.05958	null
2025-12-05	KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity	Damien Lesens et.al.	2512.05916	null
2025-12-05	From Text to Returns: Using Large Language Models for Mutual Fund Portfolio Optimization and Risk-Adjusted Allocation	Abrar Hossain Mufakir Qamar Ansari Haziq Jeelani Monia Digra Fayeq Jeelani Syed et.al.	2512.05907	null
2025-12-05	Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework	Tasnimul Hassan et.al.	2512.05863	null
2025-12-05	Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning	Jinlong Liu et.al.	2512.05747	null
2025-12-05	Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision	Lennart Maack et.al.	2512.05740	null
2025-12-05	Efficient Text Classification with Conformal In-Context Learning	Ippokratis Pantelidis et.al.	2512.05732	null
2025-12-05	LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving	Yiming Shu et.al.	2512.05686	null
2025-12-05	A Greek Government Decisions Dataset for Public-Sector Analysis and Insight	Giorgos Antoniou et.al.	2512.05647	null
2025-12-05	ProPhy: Progressive Physical Alignment for Dynamic World Simulation	Zijun Wang et.al.	2512.05564	null
2025-12-05	Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models	Weijue Bu et.al.	2512.05546	null
2025-12-05	RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs	Jonathan Geuter et.al.	2512.05542	null
2025-12-05	Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches	Namu Park et.al.	2512.05537	null
2025-12-05	Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement	Nils Strassenburg et.al.	2512.05525	null
2025-12-05	Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning	Chinthani Sugandhika et.al.	2512.05513	null
2025-12-05	A Hybrid Approach for EMF Code Generation:Code Templates Meet Large Language Models	Xiao He et.al.	2512.05498	null
2025-12-05	Knowing Your Uncertainty – On the application of LLM in social sciences	Bolun Zhang et.al.	2512.05461	null
2025-12-05	BEAVER: An Efficient Deterministic LLM Verifier	Tarun Suresh et.al.	2512.05439	null
2025-12-05	A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems	Pranav Pushkar Mishra et.al.	2512.05411	null
2025-12-05	SQ-format: A Unified Sparse-Quantized Hardware-friendly Data Format for LLMs	Ruixuan Huang et.al.	2512.05409	null
2025-12-04	Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning	Purbesh Mitra et.al.	2512.05105	null
2025-12-04	David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design?	Shashwat Shankar et.al.	2512.05073	null
2025-12-04	Arbitrage: Efficient Reasoning via Advantage-Aware Speculation	Monishwaran Maheswaran et.al.	2512.05033	null
2025-12-04	SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs	Hao Wang et.al.	2512.04868	null
2025-12-04	Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing	Rasul Tutunov et.al.	2512.04829	null
2025-12-04	MemLoRA: Distilling Expert Adapters for On-Device Memory Systems	Massimo Bini et.al.	2512.04763	null
2025-12-04	EtCon: Edit-then-Consolidate for Reliable Knowledge Editing	Ruilin Li et.al.	2512.04753	null
2025-12-04	RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting	Siqi Wang et.al.	2512.04752	null
2025-12-04	Model Whisper: Steering Vectors Unlock Large Language Models’ Potential in Test-time	Xinyue Kang et.al.	2512.04748	null
2025-12-04	SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs	Wenhua Cheng et.al.	2512.04746	null
2025-12-04	OsmT: Bridging OpenStreetMap Queries and Natural Language with Open-source Tag-aware Language Models	Zhuoyue Wan et.al.	2512.04738	null
2025-12-04	Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild	Yigui Feng et.al.	2512.04728	null
2025-12-04	TRINITY: An Evolved LLM Coordinator	Jinglue Xu et.al.	2512.04695	null
2025-12-04	Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective	Jae Hee Lee et.al.	2512.04691	null
2025-12-04	PBFuzz: Agentic Directed Fuzzing for PoV Generation	Haochen Zeng et.al.	2512.04611	null
2025-12-04	Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space	Joey Hong et.al.	2512.04601	null
2025-12-04	A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution	Huifeng Zhu et.al.	2512.04580	null
2025-12-04	On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference	Yue Yu et.al.	2512.04558	null
2025-12-04	AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees	Yangning Li et.al.	2512.04550	null
2025-12-04	EvoEdit: Lifelong Free-Text Knowledge Editing through Latent Perturbation Augmentation and Knowledge-driven Parameter Fusion	Pengfei Cao et.al.	2512.04545	null
2025-12-04	LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models	Jiaqi Sun et.al.	2512.04474	null
2025-12-03	Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study	Yixuan Li et.al.	2512.04031	null
2025-12-03	AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving	Ying Wang et.al.	2512.04013	null
2025-12-03	Training-Free Policy Violation Detection via Activation-Space Whitening in LLMs	Oren Rachmil et.al.	2512.03994	null
2025-12-03	Sponsored Questions and How to Auction Them	Kshipra Bhawalkar et.al.	2512.03975	null
2025-12-03	OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference	Liujianfu Wang et.al.	2512.03927	null
2025-12-03	UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework	Youxin Pang et.al.	2512.03918	null
2025-12-03	Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers	Hongzhan Lin et.al.	2512.03870	null
2025-12-03	Training and Evaluation of Guideline-Based Medical Reasoning in LLMs	Michael Staniek et.al.	2512.03838	null
2025-12-03	Log Probability Tracking of LLM APIs	Timothée Chauvin et.al.	2512.03816	null
2025-12-03	Enhancing Instruction-Following Capabilities in Seq2Seq Models: DoLA Adaptations for T5	Huey Sun et.al.	2512.03803	null
2025-12-03	RoCo: Role-Based LLMs Collaboration for Automatic Heuristic Design	Jiawei Xu et.al.	2512.03762	null
2025-12-03	AR-Med: Automated Relevance Enhancement in Medical Search via LLM-Driven Information Augmentation	Chuyue Wang et.al.	2512.03737	null
2025-12-03	Tutorial on Large Language Model-Enhanced Reinforcement Learning for Wireless Networks	Lingyi Cai et.al.	2512.03722	null
2025-12-03	Knowing oneself with and through AI: From self-tracking to chatbots	Lucy Osler et.al.	2512.03682	null
2025-12-03	ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers	Feice Huang et.al.	2512.03673	null
2025-12-03	Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning	Ge-Peng Ji et.al.	2512.03667	null
2025-12-03	FFTrainer: Fast Failover in Large-Language Model Training with Almost-Free State Management	Bohan Zhao et.al.	2512.03644	null
2025-12-03	KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing	Lishuo Deng et.al.	2512.03608	null
2025-12-03	EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths	Zhening Li et.al.	2512.03571	null
2025-12-03	State Space Models for Bioacoustics: A comparative Evaluation with Transformers	Chengyu Tang et.al.	2512.03563	null
2025-12-03	TokenScale: Timely and Accurate Autoscaling for Disaggregated LLM Serving with Token Velocity	Ruiqi Lai et.al.	2512.03416	null
2025-12-03	Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs	Ngoc Bui et.al.	2512.03324	null
2025-12-02	LORE: A Large Generative Model for Search Relevance	Chenji Lu et.al.	2512.03025	null
2025-12-02	TokenPowerBench: Benchmarking the Power Consumption of LLM Inference	Chenxu Niu et.al.	2512.03024	null
2025-12-02	Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge	Hamid Dadkhahi et.al.	2512.03019	null
2025-12-02	From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?	Dawei Li et.al.	2512.03005	null
2025-12-02	FAIRY2I: Universal Extremely-Low Bit QAT framework via Widely-Linear Representation and Phase-Aware Quantization	Feiyu Wang et.al.	2512.02901	null
2025-12-02	MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm	Wei Chen et.al.	2512.02895	null
2025-12-02	OptPO: Optimal Rollout Allocation for Test-time Policy Optimization	Youkang Wang et.al.	2512.02882	null
2025-12-02	Network Self-Configuration based on Fine-Tuned Small Language Models	Oscar G. Lira et.al.	2512.02861	null
2025-12-02	GraphMatch: Fusing Language and Graph Representations in a Dynamic Two-Sided Work Marketplace	Mikołaj Sacha et.al.	2512.02849	null
2025-12-02	Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages	Lechen Zhang et.al.	2512.02841	null
2025-12-02	Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach	Siyuan Yang et.al.	2512.02834	null
2025-12-02	A Comparative Study on How Data Normalization Affects Zero-Shot Generalization in Time Series Foundation Models	Ihab Ahmed et.al.	2512.02833	null
2025-12-02	Phase-Adaptive LLM Framework with Multi-Stage Validation for Construction Robot Task Allocation: A Systematic Benchmark Against Traditional Optimization Algorithms	Shyam prasad reddy Kaitha et.al.	2512.02810	null
2025-12-02	FiMMIA: scaling semantic perturbation-based membership inference across modalities	Anton Emelyanov et.al.	2512.02786	null
2025-12-02	PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models	Robert Belanec et.al.	2512.02764	null
2025-12-02	RoboWheel: A Data Engine from Real-World Human Demonstrations for Cross-Embodiment Robotic Learning	Yuhong Zhang et.al.	2512.02729	null
2025-12-02	AuditCopilot: Leveraging LLMs for Fraud Detection in Double-Entry Bookkeeping	Md Abdul Kadir et.al.	2512.02726	null
2025-12-02	Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs	Julian Ma et.al.	2512.02719	null
2025-12-02	CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer	Lavish Bansal et.al.	2512.02711	null
2025-12-02	VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm	Zhenkai Wu et.al.	2512.02700	null
2025-12-01	Trinity: Disaggregating Vector Search from Prefill-Decode Disaggregation in LLM Serving	Yi Liu et.al.	2512.02281	null
2025-12-01	Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling	Jack Cook et.al.	2512.02010	link
2025-12-01	The Art of Scaling Test-Time Compute for Large Language Models	Aradhye Agarwal et.al.	2512.02008	null
2025-12-01	Low-Rank Prehab: Preparing Neural Networks for SVD Compression	Haoran Qin et.al.	2512.01980	link
2025-12-01	KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference	Sai Gokhale et.al.	2512.01953	null
2025-12-01	Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models	Zhongyu Yang et.al.	2512.01949	null
2025-12-01	Agentic Policy Optimization via Instruction-Policy Co-Evolution	Han Zhou et.al.	2512.01945	link
2025-12-01	An Empirical Study of Agent Developer Practices in AI Agent Frameworks	Yanlin Wang et.al.	2512.01939	null
2025-12-01	Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding	Zahra Mahdavi et.al.	2512.01922	null
2025-12-01	Latent Debate: A Surrogate Framework for Interpreting LLM Thinking	Lihu Chen et.al.	2512.01909	null
2025-12-01	CauSight: Learning to Supersense for Visual Causal Discovery	Yize Zhang et.al.	2512.01827	null
2025-12-01	Generating REST API Tests With Descriptive Names	Philip Garrett et.al.	2512.01690	null
2025-12-01	DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models	Patrick Kwon et.al.	2512.01686	null
2025-12-01	A Systematic Characterization of LLM Inference on GPUs	Haonan Wang et.al.	2512.01644	null
2025-12-01	Agent-Kernel: A MicroKernel Multi-Agent System Framework for Adaptive Social Simulation Powered by LLMs	Yuren Mao et.al.	2512.01610	null
2025-12-01	LLM2Fx-Tools: Tool Calling For Music Post-Production	Seungheon Doh et.al.	2512.01559	null
2025-12-01	LPCD: Unified Framework from Layer-Wise to Submodule Quantization	Yuma Ichikawa et.al.	2512.01546	null
2025-12-01	MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages	Yexing Du et.al.	2512.01512	null
2025-12-01	Multi-Path Collaborative Reasoning via Reinforcement Learning	Jindi Lv et.al.	2512.01485	null
2025-12-01	ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation	Rohin Manvi et.al.	2512.01457	null
2025-12-01	\textit{ViRectify}: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models	Xusen Hei et.al.	2512.01424	null
2025-11-30	SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving	Bohan Zhao et.al.	2512.00719	null
2025-11-29	Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA	Takuto Ando et.al.	2512.00335	null
2025-11-28	Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction	Bao Shu et.al.	2511.23476	null
2025-11-28	ThetaEvolve: Test-time Learning on Open Problems	Yiping Wang et.al.	2511.23473	link
2025-11-28	Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent	Jianzhe Lin et.al.	2511.23436	null
2025-11-28	Hierarchical AI-Meteorologist: LLM-Agent System for Multi-Scale and Explainable Weather Forecast Reporting	Daniil Sukhorukov et.al.	2511.23387	null
2025-11-28	Do LLM-judges Align with Human Relevance in Cranfield-style Recommender Evaluation?	Gustavo Penha et.al.	2511.23312	null
2025-11-28	MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of Different Agent Interfaces to the Web (Technical Report)	Aaron Steiner et.al.	2511.23281	null
2025-11-28	Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs	Jiancheng Dong et.al.	2511.23271	null
2025-11-28	Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering	Qiming Li et.al.	2511.23231	null
2025-11-28	Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day	Milad Abdollahzadeh et.al.	2511.23220	null
2025-11-28	Obstruction reasoning for robotic grasping	Runyu Jiao et.al.	2511.23186	null
2025-11-28	HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding	Chen Li et.al.	2511.23178	null
2025-11-28	Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models	Yujiao Yang et.al.	2511.23136	null
2025-11-28	Evolutionary Discovery of Heuristic Policies for Traffic Signal Control	Ruibing Wang et.al.	2511.23122	null
2025-11-28	Dripper: Token-Efficient Main HTML Extraction with a Lightweight LM	Mengjie Liu et.al.	2511.23119	null
2025-11-28	Conveying Imagistic Thinking in TCM Translation: A Prompt Engineering and LLM-Based Evaluation Framework	Jiatong Han et.al.	2511.23059	null
2025-11-28	Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match	Jinze Li et.al.	2511.22972	null
2025-11-28	Experts are all you need: A Composable Framework for Large Language Model Inference	Shrihari Sridharan et.al.	2511.22955	null
2025-11-28	Visual Puns from Idioms: An Iterative LLM-T2IM-MLLM Framework	Kelaiti Xiao et.al.	2511.22943	null
2025-11-28	RAG-Empowered LLM-Driven Dynamic Radio Resource Management in Open 6G RAN	Onur Salan et.al.	2511.22933	null
2025-11-28	AgentShield: Make MAS more secure and efficient	Kaixiang Wang et.al.	2511.22924	null
2025-11-28	Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems	Shashwat Jaiswal et.al.	2511.22880	null
2025-11-27	PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration	Junfei Zhan et.al.	2511.22788	null
2025-11-26	Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework	Dong Wang et.al.	2511.21686	null
2025-11-26	DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving	Fengze Yu et.al.	2511.21669	null
2025-11-26	TAGFN: A Text-Attributed Graph Dataset for Fake News Detection in the Age of LLMs	Kay Liu et.al.	2511.21624	null
2025-11-26	Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining	Dongyang Fan et.al.	2511.21613	null
2025-11-26	Auxiliary Metrics Help Decoding Skill Neurons in the Wild	Yixiu Zhao et.al.	2511.21610	null
2025-11-26	SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition	Peiran Xu et.al.	2511.21471	null
2025-11-26	MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning	Junjian Wang et.al.	2511.21460	null
2025-11-26	A Systematic Study of Model Merging Techniques in Large Language Models	Oğuz Kağan Hitit et.al.	2511.21437	null
2025-11-26	Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM	Tim Trappen et.al.	2511.21413	null
2025-11-26	Prune4Web: DOM Tree Pruning Programming for Web Agent	Jiayuan Zhang et.al.	2511.21398	null
2025-11-26	PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark	Robert Belanec et.al.	2511.21285	null
2025-11-26	Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale	Yicheng Zhong et.al.	2511.21270	null
2025-11-26	Can Finetuing LLMs on Small Human Samples Increase Heterogeneity, Alignment, and Belief-Action Coherence?	Steven Wang et.al.	2511.21218	null
2025-11-26	Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation	Joonhyung Park et.al.	2511.21185	null
2025-11-26	How to Correctly Report LLM-as-a-Judge Evaluations	Chungpa Lee et.al.	2511.21140	null
2025-11-26	Beyond Patch Aggregation: 3-Pass Pyramid Indexing for Vision-Enhanced Document Retrieval	Anup Roy et.al.	2511.21121	null
2025-11-26	BRIDGE: Building Representations In Domain Guided Program Verification	Robert Joseph George et.al.	2511.21104	null
2025-11-26	MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts	Ivan Novikov et.al.	2511.21089	null
2025-11-26	5G Network Automation Using Local Large Language Models and Retrieval-Augmented Generation	Ahmadreza Majlesara et.al.	2511.21084	null
2025-11-26	Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning	Zhenchao Tang et.al.	2511.21075	null
2025-11-25	LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight	Yunze Man et.al.	2511.20648	null
2025-11-25	Latent Collaboration in Multi-Agent Systems	Jiaru Zou et.al.	2511.20639	link
2025-11-25	ROOT: Robust Orthogonalized Optimizer for Neural Network Training	Wei He et.al.	2511.20626	null
2025-11-25	Copyright Detection in Large Language Models: An Ethical Approach to Generative AI Development	David Szczecina et.al.	2511.20623	null
2025-11-25	DiFR: Inference Verification Despite Nondeterminism	Adam Karvonen et.al.	2511.20621	null
2025-11-25	Translating Large-Scale C Repositories to Idiomatic Rust	Saman Dehghan et.al.	2511.20617	null
2025-11-25	Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models	Shamima Hossain et.al.	2511.20531	null
2025-11-25	Assessing LLMs’ Performance: Insights from the Chinese Pharmacist Exam	Xinran Wang et.al.	2511.20526	null
2025-11-25	HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation	Xiang Wang et.al.	2511.20520	null
2025-11-25	Soft Adaptive Policy Optimization	Chang Gao et.al.	2511.20347	null
2025-11-25	The Curious Case of Analogies: Investigating Analogical Reasoning in Large Language Models	Taewhoo Lee et.al.	2511.20344	null
2025-11-25	Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios	Luohe Shi et.al.	2511.20340	null
2025-11-25	Improving Language Agents through BREW	Shashank Kirtania et.al.	2511.20297	null
2025-11-25	APT-CGLP: Advanced Persistent Threat Hunting via Contrastive Graph-Language Pre-Training	Xuebo Qiu et.al.	2511.20290	null
2025-11-25	SMoG: Schema Matching on Graph	Mingyu Jeon et.al.	2511.20285	null
2025-11-25	Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement	Yang Liu et.al.	2511.20280	null
2025-11-25	HVAdam: A Full-Dimension Adaptive Optimizer	Yiheng Zhang et.al.	2511.20277	null
2025-11-25	LLM-Driven Transient Stability Assessment: From Automated Simulation to Neural Architecture Design	Lianzhe Hu et.al.	2511.20276	null
2025-11-25	Rectified Flow for Vision-Aided mmWave V2I Beam Prediction	Can Zheng et.al.	2511.20265	null
2025-11-25	REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance	Chuyi Kong et.al.	2511.20233	null
2025-11-24	Be My Eyes: Extending Large Language Models to New Modalities Through Multi-Agent Collaboration	James Y. Huang et.al.	2511.19417	null
2025-11-24	Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning	Qihan Huang et.al.	2511.19343	link
2025-11-24	Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces	Shaltiel Shmidman et.al.	2511.19333	null
2025-11-24	MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization	Boyuan Wu et.al.	2511.19253	null
2025-11-24	Learning Plug-and-play Memory for Guiding Video Diffusion Models	Selena Song et.al.	2511.19229	link
2025-11-24	Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization	Xurui Li et.al.	2511.19218	null
2025-11-24	From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation	Moazzam Umer Gondal et.al.	2511.19149	null
2025-11-24	LLMs-Powered Real-Time Fault Injection: An Approach Toward Intelligent Fault Test Cases Generation	Mohammad Abboush et.al.	2511.19132	null
2025-11-24	Facilitating the Integration of LLMs Into Online Experiments With Simple Chat	R. Bermudez Schettino et.al.	2511.19123	null
2025-11-24	MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images	Qirui Wang et.al.	2511.19119	null
2025-11-24	Large Language Model-Assisted Planning of Electric Vehicle Charging Infrastructure with Real-World Case Study	Xinda Zheng et.al.	2511.19055	null
2025-11-24	FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning	Xin Yuan et.al.	2511.18977	null
2025-11-24	SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression	Santhosh G S et.al.	2511.18936	null
2025-11-24	Skeletons Matter: Dynamic Data Augmentation for Text-to-Query	Yuchen Ji et.al.	2511.18934	null
2025-11-24	Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations	Ryan Wong et.al.	2511.18933	null
2025-11-24	FineXtrol: Controllable Motion Generation via Fine-Grained Text	Keming Shen et.al.	2511.18927	null
2025-11-24	BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models	Juncheng Li et.al.	2511.18921	null
2025-11-24	EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models	Wenhao Xu et.al.	2511.18920	null
2025-11-24	Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference	Wengyi Zhan et.al.	2511.18875	null
2025-11-24	KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit	Dezhi Ran et.al.	2511.18868	null
2025-11-21	Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models	Mark Endo et.al.	2511.17487	link
2025-11-21	SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding	Nikolay Nikolov et.al.	2511.17411	null
2025-11-21	That’s not natural: The Impact of Off-Policy Training Data on Probe Performance	Nathalie Kirch et.al.	2511.17408	null
2025-11-21	Beyond Multiple Choice: A Hybrid Framework for Unifying Robust Evaluation and Verifiable Reasoning Training	Yesheng Liu et.al.	2511.17405	null
2025-11-21	SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion	Jiajie Guo et.al.	2511.17308	null
2025-11-21	SlsReuse: LLM-Powered Serverless Function Reuse	Jinfeng Wen et.al.	2511.17262	null
2025-11-21	A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback	Bulat Khaertdinov et.al.	2511.17255	null
2025-11-21	E $^3$ -Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models	Tao Yuan et.al.	2511.17205	null
2025-11-21	AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale	Ziyang Wang et.al.	2511.17190	null
2025-11-21	Efficient Robot Design with Multi-Objective Black-Box Optimization and Large Language Models	Kento Kawaharazuka et.al.	2511.17178	null
2025-11-21	FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle	Mario Markov et.al.	2511.17171	null
2025-11-21	Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models	Vy Nguyen et.al.	2511.17170	null
2025-11-21	Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation	Yeqin Zhang et.al.	2511.17129	null
2025-11-21	ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better	Yuan Zhang et.al.	2511.17106	null
2025-11-21	Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models	He Huang et.al.	2511.17094	null
2025-11-21	MUCH: A Multilingual Claim Hallucination Benchmark	Jérémie Dentan et.al.	2511.17081	null
2025-11-21	Principled Design of Interpretable Automated Scoring for Large-Scale Educational Assessments	Yunsung Kim et.al.	2511.17069	null
2025-11-21	Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters	Zhan Su et.al.	2511.17044	null
2025-11-21	CLLMRec: LLM-powered Cognitive-Aware Concept Recommendation via Semantic Alignment and Prerequisite Knowledge Distillation	Xiangrui Xiong et.al.	2511.17041	null
2025-11-21	FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models	Fatemeh et.al.	2511.16992	null
2025-11-20	Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter	Qinghao Hu et.al.	2511.16665	null
2025-11-20	Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs	Ali Taghibakhshi et.al.	2511.16664	null
2025-11-20	Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems	Elias Lumer et.al.	2511.16654	null
2025-11-20	You Only Forward Once: An Efficient Compositional Judging Paradigm	Tianlong Zhang et.al.	2511.16600	null
2025-11-20	TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding	Boshen Xu et.al.	2511.16595	null
2025-11-20	Integrating Symbolic Natural Language Understanding and Language Models for Word Sense Disambiguation	Kexin Zhao et.al.	2511.16577	null
2025-11-20	Utilizing Large Language Models for Zero-Shot Medical Ontology Extension from Clinical Notes	Guanchen Wu et.al.	2511.16548	null
2025-11-20	The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation	Jiaheng Zhang et.al.	2511.16543	null
2025-11-20	Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks	Éloïse Benito-Rodriguez et.al.	2511.16540	null
2025-11-20	LLM4EO: Large Language Model for Evolutionary Optimization in Flexible Job Shop Scheduling	Rongjie Liao et.al.	2511.16485	null
2025-11-20	Optimizing Federated Learning in the Era of LLMs: Message Quantization and Streaming	Ziyue Xu et.al.	2511.16450	null
2025-11-20	An Efficient LLM-based Evolutional Recommendation with Locate-Forget-Update Paradigm	Hao Liu et.al.	2511.16414	null
2025-11-20	CorrectHDL: Agentic HDL Design with LLMs Leveraging High-Level Synthesis as Reference	Kangwei Xu et.al.	2511.16395	null
2025-11-20	Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement	Jiashu Yao et.al.	2511.16331	null
2025-11-20	ARK: Answer-Centric Retriever Tuning via KG-augmented Curriculum Learning	Jiawei Zhou et.al.	2511.16326	null
2025-11-20	SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning	Wei Xia et.al.	2511.16324	null
2025-11-20	“To Survive, I Must Defect”: Jailbreaking LLMs via the Game-Theory Scenarios	Zhen Sun et.al.	2511.16278	null
2025-11-20	Pass@k Metric for RLVR: A Diagnostic Tool of Exploration, But Not an Objective	Yang Yu et.al.	2511.16231	null
2025-11-20	Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security	Wei Zhao et.al.	2511.16229	null
2025-11-20	Beyond Code Similarity: Benchmarking the Plausibility, Efficiency, and Complexity of LLM-Generated Smart Contracts	Francesco Salzano et.al.	2511.16224	null
2025-11-19	MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping	Yushi Huang et.al.	2511.15690	null
2025-11-19	DuoZone: A User-Centric, LLM-Guided Mixed-Initiative XR Window Management System	Jing Qian et.al.	2511.15676	null
2025-11-19	Quantum-Guided Test Case Minimization for LLM-Based Code Generation	Huixiang Zhang et.al.	2511.15665	null
2025-11-19	HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models through Curriculum Tuning	Qihao Yang et.al.	2511.15574	null
2025-11-19	A Tensor Compiler for Processing-In-Memory Architectures	Peiming Yang et.al.	2511.15503	null
2025-11-19	Insights from the ICLR Peer Review and Rebuttal Process	Amir Hossein Kargaran et.al.	2511.15462	null
2025-11-19	Know Your Intent: An Autonomous Multi-Perspective LLM Agent Framework for DeFi User Transaction Intent Mining	Qian’ang Mao et.al.	2511.15456	null
2025-11-19	CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search	Ao Xie et.al.	2511.15443	null
2025-11-19	Small Language Models for Phishing Website Detection: Cost, Performance, and Privacy Trade-Offs	Georg Goldenits et.al.	2511.15434	null
2025-11-19	DEPO: Dual-Efficiency Preference Optimization for LLM Agents	Sirui Chen et.al.	2511.15392	null
2025-11-19	Unveiling Inference Scaling for Difference-Aware User Modeling in LLM Personalization	Suyu Chen et.al.	2511.15389	null
2025-11-19	A Compliance-Preserving Retrieval System for Aircraft MRO Task Search	Byungho Jo et.al.	2511.15383	null
2025-11-19	HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning	Alexis Correa-Guillén et.al.	2511.15355	null
2025-11-19	Reflexive Evidence-Based Multimodal Learning for Clean Energy Transitions: Causal Insights on Cooking Fuel Access, Urbanization, and Carbon Emissions	Shan Shan et.al.	2511.15342	null
2025-11-19	What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs	Zhihan Ren et.al.	2511.15316	null
2025-11-19	EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control	Kai Yang et.al.	2511.15248	null
2025-11-19	OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition	Xinli Tao et.al.	2511.15211	null
2025-11-19	As If We’ve Met Before: LLMs Exhibit Certainty in Recognizing Seen Files	Haodong Li et.al.	2511.15192	null
2025-11-19	A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models	Duo Li et.al.	2511.15098	null
2025-11-19	Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference	Kexin Chu et.al.	2511.15015	null
2025-11-18	Natural Language Interfaces for Databases: What Do Users Think?	Panos Ipeirotis et.al.	2511.14718	null
2025-11-18	Strategic Innovation Management in the Age of Large Language Models Market Intelligence, Adaptive R&D, and Ethical Governance	Raha Aghaei et.al.	2511.14709	null
2025-11-18	Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models	Rui Zhu et.al.	2511.14694	link
2025-11-18	Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer	Kallol Mondal et.al.	2511.14691	null
2025-11-18	SkillGen: Learning Domain Skills for In-Context Sequential Decision Making	Ruomeng Ding et.al.	2511.14670	null
2025-11-18	Bias in, Bias out: Annotation Bias in Multilingual Large Language Models	Xia Cui et.al.	2511.14662	null
2025-11-18	AutoTool: Efficient Tool Selection for Large Language Model Agents	Jingyi Jia et.al.	2511.14650	null
2025-11-18	Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning	Ruoyu Qin et.al.	2511.14617	null
2025-11-18	A Controllable Perceptual Feature Generative Model for Melody Harmonization via Conditional Variational Autoencoder	Dengyun Huang et.al.	2511.14600	null
2025-11-18	OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models	Keda Tao et.al.	2511.14582	null
2025-11-18	Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language	Minyoung Hwang et.al.	2511.14565	null
2025-11-18	LLM-Assisted Thematic Analysis: Opportunities, Limitations, and Recommendations	Tatiane Ornelas et.al.	2511.14528	null
2025-11-18	CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design	Jiawei Yi et.al.	2511.14510	null
2025-11-18	Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks	Mulei Ma et.al.	2511.14450	null
2025-11-18	Watchdogs and Oracles: Runtime Verification Meets Large Language Models for Autonomous Systems	Angelo Ferrando et.al.	2511.14435	null
2025-11-18	When Words Change the Model: Sensitivity of LLMs for Constraint Programming Modelling	Alessio Pellegrino et.al.	2511.14334	null
2025-11-18	PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models	Yu Liu et.al.	2511.14256	null
2025-11-18	Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning	Rui Liu et.al.	2511.14249	null
2025-11-18	N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator	Zheyu Lin et.al.	2511.14195	null
2025-11-18	AdaTok: Adaptive Token Compression with Object-Aware Representations for Efficient Multimodal LLMs	Xinliang Zhang et.al.	2511.14169	null
2025-11-17	TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone	Xunjie Wang et.al.	2511.13717	null
2025-11-17	Generalist Foundation Models Are Not Clinical Enough for Hospital Operations	Lavender Y. Jiang et.al.	2511.13703	null
2025-11-17	T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization	Hyunwoo Oh et.al.	2511.13676	null
2025-11-17	Part-X-MLLM: Part-aware 3D Multimodal Large Language Model	Chunshi Wang et.al.	2511.13647	link
2025-11-17	Data Value in the Age of Scaling: Understanding LLM Scaling Dynamics Under Real-Synthetic Data Mixtures	Haohui Wang et.al.	2511.13640	null
2025-11-17	CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product	Kaiwen Xue et.al.	2511.13626	null
2025-11-17	P1: Mastering Physics Olympiads with Reinforcement Learning	Jiacheng Chen et.al.	2511.13612	null
2025-11-17	Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents	Piaohong Wang et.al.	2511.13593	null
2025-11-17	Automated Construction of Medical Indicator Knowledge Graphs Using Retrieval Augmented Large Language Models	Zhengda Wang et.al.	2511.13526	null
2025-11-17	FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI	Yuhang Peng et.al.	2511.13524	null
2025-11-17	Tight and Practical Privacy Auditing for Differentially Private In-Context Learning	Yuyang Xia et.al.	2511.13502	null
2025-11-17	Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation	Zhipeng Ma et.al.	2511.13476	null
2025-11-17	Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline	Rui Zuo et.al.	2511.13442	null
2025-11-17	Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction	Zhaopei Huang et.al.	2511.13410	null
2025-11-17	A Novel Hierarchical Integration Method for Efficient Model Merging in Medical LLMs	Prakrit Timilsina et.al.	2511.13373	null
2025-11-17	Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning	Caroline Baumgartner et.al.	2511.13371	null
2025-11-17	FLOWER: Flow-Oriented Entity-Relationship Tool	Dmitry Moskalev et.al.	2511.13357	null
2025-11-17	An LLM-based Quantitative Framework for Evaluating High-Stealthy Backdoor Risks in OSS Supply Chains	Zihe Yan et.al.	2511.13341	null
2025-11-17	ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning	Juntao Jian et.al.	2511.13327	null
2025-11-17	Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment	Jea Kwon et.al.	2511.13290	null
2025-11-14	Optimizing Mixture of Block Attention	Guangxuan Xiao et.al.	2511.11571	null
2025-11-14	Experience-Guided Adaptation of Inference-Time Reasoning Strategies	Adam Stein et.al.	2511.11519	null
2025-11-14	W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search	Zhenyu Ding et.al.	2511.11518	link
2025-11-14	PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision–Language Models	Nhat Hoang-Xuan et.al.	2511.11502	null
2025-11-14	Benchmarking Visual LLMs Resilience to Unanswerable Questions on Visually Rich Documents	Davide Napolitano et.al.	2511.11468	null
2025-11-14	CURENet: Combining Unified Representations for Efficient Chronic Disease Prediction	Cong-Tinh Dao et.al.	2511.11423	null
2025-11-14	SCRUTINEER: Detecting Logic-Level Usage Violations of Reusable Components in Smart Contracts	Xingshuang Lin et.al.	2511.11411	null
2025-11-14	MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism	Shulin Liu et.al.	2511.11373	null
2025-11-14	SEAL: Subspace-Anchored Watermarks for LLM Ownership	Yanbo Dai et.al.	2511.11356	null
2025-11-14	UFO $^3$ : Weaving the Digital Agent Galaxy	Chaoyun Zhang et.al.	2511.11332	null
2025-11-14	LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models	Jawad Ibn Ahad et.al.	2511.11315	null
2025-11-14	iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference	Wei Fan et.al.	2511.11306	null
2025-11-14	EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment	Ruoxi Cheng et.al.	2511.11301	null
2025-11-14	GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving	Fabian Schmidt et.al.	2511.11266	null
2025-11-14	KGQuest: Template-Driven QA Generation from Knowledge Graphs with LLM-Based Refinement	Sania Nayab et.al.	2511.11258	null
2025-11-14	T-MAN: Enabling End-to-End Low-Bit LLM Inference on NPUs via Unified Table Lookup	Jianyu Wei et.al.	2511.11248	null
2025-11-14	STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models	Huajian Zhang et.al.	2511.11233	null
2025-11-14	Questioning the Stability of Visual Question Answering	Amir Rosenfeld et.al.	2511.11206	null
2025-11-14	Viper-F1: Fast and Fine-Grained Multimodal Understanding with Cross-Modal State-Space Modulation	Quoc-Huy Trinh et.al.	2511.11177	null
2025-11-14	Explainable Deep Convolutional Multi-Type Anomaly Detection	Alex George et.al.	2511.11165	null
2025-11-13	ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference	Yesheng Liang et.al.	2511.10645	null
2025-11-13	Textual understanding boost in the WikiRace	Raman Ebrahimi et.al.	2511.10585	null
2025-11-13	URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding	Yongxin Shi et.al.	2511.10552	link
2025-11-13	Don’t Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding	Yunkai Zhang et.al.	2511.10492	link
2025-11-13	Scalable Synthesis of distributed LLM workloads through Symbolic Tensor Graphs	Changhai Man et.al.	2511.10480	null
2025-11-13	AgentEvolver: Towards Efficient Self-Evolving Agent System	Yunpeng Zhai et.al.	2511.10395	link
2025-11-13	SITA: A Framework for Structure-to-Instance Theorem Autoformalization	Chenyi Li et.al.	2511.10356	null
2025-11-13	EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training	Qingao Yi et.al.	2511.10333	null
2025-11-13	Rethinking Visual Information Processing in Multimodal LLMs	Dongwan Kim et.al.	2511.10301	null
2025-11-13	Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models	Zhengtao Zou et.al.	2511.10292	null
2025-11-13	FactGuard: Event-Centric and Commonsense-Guided Fake News Detection	Jing He et.al.	2511.10281	null
2025-11-13	Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics	Xin Sun et.al.	2511.10271	null
2025-11-13	LangGPS: Language Separability Guided Data Pre-Selection for Joint Multilingual Instruction Tuning	Yangfan Ye et.al.	2511.10229	null
2025-11-13	Persona-Aware Alignment Framework for Personalized Dialogue Generation	Guanrong Li et.al.	2511.10215	null
2025-11-13	Advanced Black-Box Tuning of Large Language Models with Limited API Calls	Zhikang Xie et.al.	2511.10210	null
2025-11-13	EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models	Junquan Huang et.al.	2511.10201	null
2025-11-13	Efficient Thought Space Exploration through Strategic Intervention	Ziheng Li et.al.	2511.10038	null
2025-11-13	AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models	Xinyi Wang et.al.	2511.10017	null
2025-11-13	AssertMiner: Module-Level Spec Generation and Assertion Mining using Static Analysis Guided LLMs	Hongqin Lyu et.al.	2511.10007	null
2025-11-13	PustakAI: Curriculum-Aligned and Interactive Textbooks Using Large Language Models	Shivam Sharma et.al.	2511.10002	null
2025-11-10	Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models	Tianrui Song et.al.	2511.07295	link
2025-11-10	LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure	Jaehong Cho et.al.	2511.07229	null
2025-11-10	Importance-Aware Data Selection for Efficient LLM Instruction Tuning	Tingyu Jiang et.al.	2511.07074	null
2025-11-10	GoCkpt: Gradient-Assisted Multi-Step overlapped Checkpointing for Efficient LLM Training	Keyao Zhang et.al.	2511.07035	null
2025-11-10	P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats	Yuzong Chen et.al.	2511.06838	null
2025-11-09	Efficient LLM Safety Evaluation through Multi-Agent Debate	Dachuan Lin et.al.	2511.06396	null
2025-11-09	ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction	Wenxuan Wu et.al.	2511.06288	null
2025-11-09	Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism	Cong Li et.al.	2511.06247	null
2025-11-09	Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning	Sangmook Lee et.al.	2511.06190	null
2025-11-09	LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs	Zifan He et.al.	2511.06174	null
2025-11-08	Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving	Hui Zeng et.al.	2511.06029	null
2025-11-08	MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference	Myunghyun Rhee et.al.	2511.06010	null
2025-11-08	MCP-RiskCue: Can LLM infer risk information from MCP server System Logs?	Jiayi Fu et.al.	2511.05867	null
2025-11-05	From Prompts to Power: Measuring the Energy Footprint of LLM Inference	Francisco Caravaca et.al.	2511.05597	null
2025-11-06	DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing	Lei Gao et.al.	2511.04791	null
2025-11-06	Enabling Dynamic Sparsity in Quantized LLM Inference	Rongxiang Wang et.al.	2511.04477	null
2025-11-06	E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce	Ge Zhang et.al.	2511.04087	null
2025-11-06	PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration	Yue Jiet Chong et.al.	2511.04036	null
2025-11-06	LLM-Driven Adaptive Source-Sink Identification and False Positive Mitigation for Static Analysis	Shiyin Lin et.al.	2511.04023	null
2025-11-05	RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse	Yinsicheng Jiang et.al.	2511.03475	null
2025-11-07	UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM	Hai Huang et.al.	2511.03293	null
2025-11-04	Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes	Mohammadsajad Alipour et.al.	2511.02681	null
2025-11-04	Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks	Xiumei Deng et.al.	2511.02647	null
2025-11-04	Verifying LLM Inference to Prevent Model Weight Exfiltration	Roy Rinberg et.al.	2511.02620	null
2025-11-04	KV Cache Transform Coding for Compact Storage in LLM Inference	Konrad Staniszewski et.al.	2511.01815	null
2025-11-04	Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding	Jungyeon Koh et.al.	2511.01695	null
2025-11-03	Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving	Chengying Huan et.al.	2511.01633	null
2025-11-03	When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding	Min Fang et.al.	2511.01282	null
2025-11-04	CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing	Yifan Zhou et.al.	2511.01197	null
2025-11-02	FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management	Nazmul Takbir et.al.	2511.00868	null
2025-11-05	FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs	Xuan He et.al.	2511.00807	null
2025-11-04	SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding	Jameson Sandler et.al.	2511.00606	null
2025-11-01	FlashEVA: Accelerating LLM inference via Efficient Attention	Juan Gabriel Kostelec et.al.	2511.00576	null
2025-11-01	Proactive DDoS Detection and Mitigation in Decentralized Software-Defined Networking via Port-Level Monitoring and Zero-Training Large Language Models	Mohammed N. Swileh et.al.	2511.00460	null
2025-10-31	Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits	Dowon Kim et.al.	2511.00321	null
2025-11-05	PDE-SHARP: PDE Solver Hybrids through Analysis and Refinement Passes	Shaghayegh Fazliani et.al.	2511.00183	null
2025-10-31	AMD MI300X GPU Performance Analysis	Chandrish Ambati et.al.	2510.27583	null
2025-10-31	Glia: A Human-Inspired AI for Automated Systems Design and Optimization	Pouya Hamadanian et.al.	2510.27176	null
2025-10-29	Category-Aware Semantic Caching for Heterogeneous LLM Workloads	Chen Wang et.al.	2510.26835	null
2025-10-30	Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model	Biao Zhang et.al.	2510.26622	null
2025-10-30	1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models	Zeliang Zong et.al.	2510.26446	null
2025-10-30	Beyond Benchmarks: The Economics of AI Inference	Boqin Zhuang et.al.	2510.26136	null
2025-10-31	AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache	Dinghong Song et.al.	2510.25979	link
2025-10-31	NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium	Dinghong Song et.al.	2510.25977	null
2025-10-29	A Survey on Efficient Large Language Model Training: From Data-centric Perspectives	Junyu Luo et.al.	2510.25817	null
2025-10-29	Serve Programs, Not Prompts	In Gim et.al.	2510.25412	null
2025-10-29	GPTOpt: Towards Efficient LLM-Based Black-Box Optimization	Jamison Meindl et.al.	2510.25404	null
2025-10-29	OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning	Ziyou Hu et.al.	2510.24636	null
2025-10-28	Pie: A Programmable Serving System for Emerging LLM Applications	In Gim et.al.	2510.24051	null
2025-10-28	Resource-Efficient LLM Application for Structured Transformation of Unstructured Financial Contracts	Maruf Ahmed Mridul et.al.	2510.23990	null
2025-10-26	Batch Speculative Decoding Done Right	Ranran Haoran Zhang et.al.	2510.22876	null
2025-10-26	TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination	Omar Naim et.al.	2510.22767	null
2025-10-26	Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration	Yuval Kainan et.al.	2510.22679	null
2025-10-26	SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size	Jinhan Chen et.al.	2510.22556	null
2025-10-23	Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples	Shiva Sreeram et.al.	2510.20800	null
2025-10-23	RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging	Bowen Wang et.al.	2510.20479	null
2025-10-22	Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs	Hongyi Liu et.al.	2510.20064	null
2025-10-22	AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders	Yuezhou Hu et.al.	2510.19779	null
2025-10-22	Are Large Language Models Sensitive to the Motives Behind Communication?	Addison J. Wu et.al.	2510.19687	null
2025-10-22	DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference	Xiang Liu et.al.	2510.19669	null
2025-10-22	Energy-Efficient and Dequantization-Free Q-LLMs: A Spiking Neural Network Approach to Salient Value Mitigation	Chenyu Wang et.al.	2510.19498	null
2025-10-21	EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval	Zebin Yang et.al.	2510.18546	null
2025-10-21	SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices	Pan Zhou et.al.	2510.18544	null
2025-10-21	Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs	Song Bian et.al.	2510.18245	null
2025-10-20	Planned Diffusion	Daniel Israel et.al.	2510.18087	null
2025-10-20	Language Models as Semantic Augmenters for Sequential Recommenders	Mahsa Valizadeh et.al.	2510.18046	null
2025-10-19	Justitia: Fair and Efficient Scheduling for LLM Applications	Mingyan Yang et.al.	2510.17015	null
2025-10-18	FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference	Jian Ma et.al.	2510.16418	null
2025-10-16	AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization	Mengtao Lv et.al.	2510.16045	null
2025-10-16	Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing	Tianhua Xia et.al.	2510.16040	null
2025-10-17	TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs	Sibo Xiao et.al.	2510.15545	null
2025-10-16	Tail-Optimized Caching for LLM Inference	Wenxin Zhang et.al.	2510.15152	null
2025-10-16	Identity-Link IRT for Label-Free LLM Evaluation: Preserving Additivity in TVD-MI Scores	Zachary Robertson et.al.	2510.14966	null
2025-10-16	xLLM Technical Report	Tongxuan Liu et.al.	2510.14686	null
2025-10-16	MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving	Jungi Lee et.al.	2510.14557	null
2025-10-16	FairBatching: Fairness-Aware Batch Formation for LLM Inference	Hongtao Lyu et.al.	2510.14392	null
2025-10-16	Qwen3Guard Technical Report	Haiquan Zhao et.al.	2510.14276	null
2025-10-15	Efficiently Executing High-throughput Lightweight LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management	Thanh Son Phung et.al.	2510.14024	null
2025-10-15	Adaptive Rescheduling in Prefill-Decode Disaggregated LLM Inference	Zhibin Wang et.al.	2510.13668	null
2025-10-15	F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMs	Jude Haris et.al.	2510.13401	null
2025-10-15	Taming the Fragility of KV Cache Eviction in LLM Inference	Yuan Feng et.al.	2510.13334	null
2025-10-15	BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure	Yiyuan He et.al.	2510.13223	null
2025-10-15	Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference	Nikhil Bhendawade et.al.	2510.13161	null
2025-10-21	Retrieval-in-the-Chain: Bootstrapping Large Language Models for Generative Retrieval	Yingchen Zhang et.al.	2510.13095	null
2025-10-14	On the Role of Preference Variance in Preference Optimization	Jiacheng Guo et.al.	2510.13022	null
2025-10-14	KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems	Hancheng Ye et.al.	2510.12872	null
2025-10-14	Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification?	Cedric Richter et.al.	2510.12702	null
2025-10-14	Traveling Salesman-Based Token Ordering Improves Stability in Homomorphically Encrypted Language Models	Donghwan Rho et.al.	2510.12343	null
2025-10-13	FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters	Yanying Lin et.al.	2510.11938	null
2025-10-13	Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding	Bingjie Zhu et.al.	2510.11331	null
2025-10-13	An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models	Sheikh Azizul Hakim et.al.	2510.11211	null
2025-10-13	Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs	João Paulo Cardoso de Lima et.al.	2510.11192	null
2025-10-12	Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems	Yi Zhang et.al.	2510.10644	null
2025-10-11	MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation	Wentian Zhu et.al.	2510.10271	null
2025-10-11	CacheClip: Accelerating RAG with Effective KV Cache Reuse	Bin Yang et.al.	2510.10129	null
2025-10-11	Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization	Yang Li et.al.	2510.10028	null
2025-10-10	Evaluating LLM-Based Process Explanations under Progressive Behavioral-Input Reduction	P. van Oerle et.al.	2510.09732	null
2025-10-10	Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation	Fanwei Zhu et.al.	2510.09722	null
2025-10-10	FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference	Yu-Chen Lu et.al.	2510.09332	null
2025-10-10	Semantic-Condition Tuning: Fusing Graph Context with Large Language Models for Knowledge Graph Completion	Ruitong Liu et.al.	2510.08966	null
2025-10-13	Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors	Xin Liu et.al.	2510.08907	null
2025-10-10	Mozart: A Chiplet Ecosystem-Accelerator Codesign Framework for Composable Bespoke Application Specific Integrated Circuits	Haoran Jin et.al.	2510.08873	null
2025-10-09	When to Reason: Semantic Router for vLLM	Chen Wang et.al.	2510.08731	null
2025-10-09	SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference	Hengrui Zhang et.al.	2510.08544	null
2025-10-09	From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill	Gunjun Lee et.al.	2510.08055	null
2025-10-09	Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models	Zhiqing Cui et.al.	2510.07858	null
2025-10-09	OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference	Yuzhe Gu et.al.	2510.07651	null
2025-10-08	AsyncSpade: Efficient Test-Time Scaling with Asynchronous Sparse Decoding	Shuqing Luo et.al.	2510.07486	null
2025-10-08	Accelerating Diffusion LLM Inference via Local Determinism Propagation	Fanheng Kong et.al.	2510.07081	null
2025-10-08	Accelerating Sparse Ternary GEMM for Quantized LLM inference on Apple Silicon	Baraq Lipshitz et.al.	2510.06957	null
2025-10-08	PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs	Manuel Frank et.al.	2510.06730	null
2025-10-07	VecInfer: Efficient LLM Inference with Low-Bit KV Cache via Outlier-Suppressed Vector Quantization	Dingyu Yao et.al.	2510.06175	null
2025-10-07	lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models	Haoxin Wang et.al.	2510.06126	null
2025-10-07	From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs	Tianhao Zhu et.al.	2510.05632	null
2025-10-07	Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM	Ryan Solgi et.al.	2510.05544	null
2025-10-07	H1B-KV: Hybrid One-Bit Caches for Memory-Efficient Large Language Model Inference	Harshil Vejendla et.al.	2510.05529	null
2025-10-07	Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting	Zhongkai Yu et.al.	2510.05497	null
2025-10-06	KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction	Utkarsh Saxena et.al.	2510.05373	null
2025-10-06	A novel hallucination classification framework	Maksym Zavhorodnii et.al.	2510.05189	null
2025-10-06	RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms	Samah Kansab et.al.	2510.04796	null
2025-10-06	SpikingMamba: Towards Energy-Efficient Large Language Models via Knowledge Distillation from Mamba	Yulong Huang et.al.	2510.04595	null
2025-10-05	Speculative Actions: A Lossless Framework for Faster Agentic Systems	Naimeng Ye et.al.	2510.04371	null
2025-10-05	Toward a unified framework for data-efficient evaluation of large language models	Lele Liao et.al.	2510.04051	null
2025-10-02	KVComm: Enabling Efficient LLM Communication through Selective KV Sharing	Xiangyu Shi et.al.	2510.03346	null
2025-10-03	Best-of-Majority: Minimax-Optimal Strategy for Pass@ $k$ Inference Scaling	Qiwei Di et.al.	2510.03199	null
2025-10-03	Dissecting Transformers: A CLEAR Perspective towards Green AI	Hemang Jain et.al.	2510.02810	null
2025-10-03	TokenFlow: Responsive LLM Text Streaming Serving under Request Burst via Preemptive Scheduling	Junyi Chen et.al.	2510.02758	null
2025-10-03	HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference	Shubham Negi et.al.	2510.02675	null
2025-10-02	Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework	Nii Osae Osae Dade et.al.	2510.02483	null
2025-10-01	PolyLink: A Blockchain Based Decentralized Edge AI Platform for LLM Inference	Hongbo Liu et.al.	2510.02395	null
2025-10-03	Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey	Qiyuan Liu et.al.	2510.01925	null
2025-10-02	SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning	Shicheng Liu et.al.	2510.01832	null
2025-10-01	HiSpec: Hierarchical Speculative Decoding for LLMs	Avinash Kumar et.al.	2510.01336	null
2025-10-01	Generalized Parallel Scaling with Interdependent Generations	Harry Dong et.al.	2510.01143	null
2025-10-01	Prompt Curriculum Learning for Efficient LLM Post-Training	Zhaolin Gao et.al.	2510.01135	null
2025-10-01	Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese	Jenny Kunz et.al.	2510.00810	null
2025-10-01	Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution	Alessio Devoto et.al.	2510.00636	null
2025-10-01	Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?	Nandan Kumar Jha et.al.	2510.00537	null
2025-10-01	Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs	Kairun Zhang et.al.	2510.00419	null
2025-10-02	Large Language Models Inference Engines based on Spiking Neural Networks	Adarsha Balaji et.al.	2510.00133	null
2025-10-01	AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size	Guanxi Lu et.al.	2509.26432	null
2025-09-30	Toward an Unbiased Collective Memory for Efficient LLM-Based Agentic 6G Cross-Domain Management	Hatim Chergui et.al.	2509.26200	null
2025-09-30	Parallax: Efficient LLM Inference Service over Decentralized Environment	Chris Tong et.al.	2509.26182	null
2025-09-30	Accelerating LLM Inference with Precomputed Query Storage	Jay H. Park et.al.	2509.25919	null
2025-09-30	SAIL: SRAM-Accelerated LLM Inference System with Lookup-Table-based GEMV	Jingyao Zhang et.al.	2509.25853	null
2025-09-29	Scaling with Collapse: Efficient and Predictable Training of LLM Families	Shane Bergsma et.al.	2509.25087	null
2025-09-29	Intra-request branch orchestration for efficient LLM reasoning	Weifan Jiang et.al.	2509.24957	null
2025-09-29	SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching	Xinye Zhao et.al.	2509.24832	null
2025-09-29	SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving	Qihui Zhou et.al.	2509.24626	null
2025-09-29	Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding	Sungkyun Kim et.al.	2509.24328	null
2025-07-22	Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework	Hongyi Tang et.al.	2507.16414	null
2025-07-21	Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing	Shibo Yu et.al.	2507.15553	null
2025-07-18	Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need	Michael Davies et.al.	2507.14397	null
2025-07-18	Characterizing Communication Patterns in Distributed Large Language Model Inference	Lang Xu et.al.	2507.14392	null
2025-07-18	Can LLMs Infer Personality from Real World Conversations?	Jianfeng Zhu et.al.	2507.14355	null
2025-07-14	PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training	Pengfei Du et.al.	2507.14202	null
2025-07-23	Photonic Fabric Platform for AI Accelerators	Jing Ding et.al.	2507.14000	null
2025-07-23	DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training	Zhixin Wang et.al.	2507.13833	null
2025-07-18	Team of One: Cracking Complex Video QA with Model Synergy	Jun Xie et.al.	2507.13820	null
2025-07-18	LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues	Haoyang Li et.al.	2507.13681	null
2025-07-17	Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation	Genki Kusano et.al.	2507.13525	null
2025-07-16	Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage	Junqing Lin et.al.	2507.12205	null
2025-07-15	MIRAGE: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving	Ruihao Li et.al.	2507.11507	null
2025-07-15	Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations	Miray Özcan et.al.	2507.11417	null
2025-07-15	KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding	Luohe Shi et.al.	2507.11273	null
2025-07-16	GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning	Ziru Liu et.al.	2507.10628	null
2025-07-14	Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving	Wonung Kim et.al.	2507.10178	null
2025-07-14	Past-Future Scheduler for LLM Serving under SLA Guarantees	Ruihao Gong et.al.	2507.10150	null
2025-07-14	ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism	Zedong Liu et.al.	2507.10069	null
2025-07-14	Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference	Jiaming Cheng et.al.	2507.09942	null
2025-07-13	Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset	Lily Hong Zhang et.al.	2507.09650	null
2025-07-12	SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding	Weihong Xu et.al.	2507.09201	null
2025-07-11	On Evaluating Performance of LLM Inference Serving Systems	Amey Agrawal et.al.	2507.09019	null
2025-07-11	Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference	Chun-Ting Chen et.al.	2507.09010	null
2025-07-11	Orchestration for Domain-specific Edge-Cloud Language Models	Prasoon Patidar et.al.	2507.09003	null
2025-07-11	InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching	Yilun Wang et.al.	2507.08523	null
2025-07-11	Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training	Aleksei Ilin et.al.	2507.08284	null
2025-07-10	Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions	Quanyan Zhu et.al.	2507.08208	null
2025-07-10	Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing	Junyi Wen et.al.	2507.08045	null
2025-07-11	Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models	Varin Sikka et.al.	2507.07505	null
2025-07-16	Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving	Xiaoxiang Shi et.al.	2507.06608	null
2025-07-11	QUEST: Query Optimization in Unstructured Document Analysis	Zhaoze Sun et.al.	2507.06515	null
2025-07-08	Voltage Regulation in Distribution Systems with Data Center Loads	Yize Chen et.al.	2507.06416	null
2025-07-08	Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models	L’ea Dubois et.al.	2507.05822	null
2025-07-07	Cascade: Token-Sharded Private LLM Inference	Rahul Thomas et.al.	2507.05228	null
2025-07-07	MoLink: Distributed and Efficient Serving Framework for Large Models	Lewei Jin et.al.	2507.05043	null
2025-07-16	Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?	Yun Qu et.al.	2507.04632	null
2025-07-09	Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking	Tim Beyer et.al.	2507.04446	null
2025-07-23	Fairness Evaluation of Large Language Models in Academic Library Reference Services	Haining Wang et.al.	2507.04224	null
2025-07-05	Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States	Karine Karine et.al.	2507.03871	null
2025-07-05	OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference	Seungjun Shin et.al.	2507.03865	null
2025-07-08	MemOS: A Memory OS for AI System	Zhiyu Li et.al.	2507.03724	null
2025-07-04	Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA	Jindong Li et.al.	2507.03308	null
2025-07-03	HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference	Weishu Deng et.al.	2507.03153	null
2025-06-20	Large Language Model-Driven Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization	Lindong Xie et.al.	2507.02892	null
2025-07-03	On the Convergence of Large Language Model Optimizer for Black-Box Network Management	Hoon Lee et.al.	2507.02689	null
2025-07-03	Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure	Rui Xie et.al.	2507.02654	null
2025-07-14	FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference	Xing Liu et.al.	2507.02620	null
2025-07-02	Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency	Zongpu Zhang et.al.	2507.02135	null
2025-07-02	AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training	Zhenyu Han et.al.	2507.01663	null
2025-07-02	Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities	Yingqiang Gao et.al.	2507.01479	null
2025-07-02	LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation	Tianyu Liu et.al.	2507.01449	null
2025-07-02	EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices	Zheyu Shen et.al.	2507.01438	null
2025-07-08	SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech	Zhuangfei Cheng et.al.	2507.01348	null
2025-07-02	La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation	Kai Liu et.al.	2507.01299	null
2025-07-01	PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning	Xingke Yang et.al.	2507.01216	null
2025-06-28	A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval	Puspendu Banerjee et.al.	2507.01058	null
2025-07-01	VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator	Zhican Wang et.al.	2507.00797	null
2025-07-01	Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models	Yilun Zhang et.al.	2507.00653	null
2025-07-01	LLM-Mesh: Enabling Elastic Sharing for Serverless LLM Inference	Chuhao Xu et.al.	2507.00507	null
2025-07-01	Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs	Mohammad Firas Sada et.al.	2507.00418	null
2025-06-30	Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission	Faranaksadat Solat et.al.	2507.00082	null
2025-06-30	Scaling Human Judgment in Community Notes with LLMs	Haiwen Li et.al.	2506.24118	null
2025-06-30	A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications	Boyang Yang et.al.	2506.23749	null
2025-06-28	Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models	Tejas Vaidhya et.al.	2506.23025	null
2025-06-28	Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation	Sen Fang et.al.	2506.22776	null
2025-07-01	Not All Water Consumption Is Equal: A Water Stress Weighted Metric for Sustainable Computing	Yanran Wu et.al.	2506.22773	null
2025-06-27	QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization	Danush Khanna et.al.	2506.22396	null
2025-06-27	Towards Operational Data Analytics Chatbots – Virtual Knowledge Graph is All You Need	Junaid Ahmed Khan et.al.	2506.22267	null
2025-06-27	SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference	Yongchao He et.al.	2506.22033	null
2025-06-27	A Survey of LLM Inference Systems	James Pan et.al.	2506.21901	null
2025-06-26	Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces	Michael Johnston et.al.	2506.21467	null
2025-06-26	BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services	Zhaojiacheng Zhou et.al.	2506.21033	null
2025-06-17	Utility-Driven Speculative Decoding for Mixture-of-Experts	Anish Saxena et.al.	2506.20675	null
2025-06-25	DipSVD: Dual-importance Protected SVD for Efficient LLM Compression	Xuan Ding et.al.	2506.20353	null
2025-07-02	Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU	He Sun et.al.	2506.20187	null
2025-06-24	MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection	Zhengxiang Huang et.al.	2506.19884	null
2025-06-24	Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models	Jungwoo Park et.al.	2506.19697	null
2025-06-25	Adaptive Request Scheduling for CodeLLM Serving with SLA Guarantees	Shi Chang et.al.	2506.19677	null
2025-06-23	Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation	Ahmadreza Saboor Yaraghi et.al.	2506.19045	null
2025-06-23	WiLLM: An Open Wireless LLM Communication System	Boyi Liu et.al.	2506.19030	null
2025-06-23	LLMs on a Budget? Say HOLA	Zohaib Hasan Siddiqui et.al.	2506.18952	null
2025-06-23	CommVQ: Commutative Vector Quantization for KV Cache Compression	Junyan Li et.al.	2506.18879	null
2025-06-26	PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries	Steven Kolawole et.al.	2506.18728	null
2025-06-22	Mechanistic Interpretability in the Presence of Architectural Obfuscation	Marcos Florencio et.al.	2506.18053	null
2025-06-22	LLMs for Customized Marketing Content Generation and Evaluation at Scale	Haoran Liu et.al.	2506.17863	null
2025-07-18	LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning	Haoxuan Che et.al.	2506.17562	null
2025-06-08	Training-free LLM Verification via Recycling Few-shot Examples	Dongseok Lee et.al.	2506.17251	null
2025-06-20	Towards AI Search Paradigm	Yuchen Li et.al.	2506.17188	null
2025-06-23	From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents	Mohammad Amaan Sayeed et.al.	2506.15911	null
2025-05-30	Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding	Feiyu Yao et.al.	2506.15704	null
2025-06-18	eLLM: Elastic Memory Management Framework for Efficient LLM Serving	Jiale Xu et.al.	2506.15155	null
2025-06-17	CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision	Dyah Adila et.al.	2506.14912	null
2025-06-17	Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching	Qizheng Zhang et.al.	2506.14852	null
2025-06-05	MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs	Zhenyan Lu et.al.	2506.13772	null
2025-06-17	Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention	Haonan Wang et.al.	2506.13674	null
2025-06-16	Vector Ontologies as an LLM world view extraction method	Kaspar Rothenfusser et.al.	2506.13252	link
2025-06-16	Empirical Evaluation of Large Language Models in Automated Program Repair	Jiajun Sun et.al.	2506.13186	null
2025-06-19	Serving Large Language Models on Huawei CloudMatrix384	Pengfei Zuo et.al.	2506.12708	null
2025-06-13	Semantic Scheduling for LLM Inference	Wenyue Hua et.al.	2506.12204	link
2025-05-21	FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization	Fangxin Liu et.al.	2506.12024	null
2025-06-13	Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache	Xiaoran Liu et.al.	2506.11886	null
2025-06-13	GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news	Abdul Haque et.al.	2506.11600	null
2025-06-13	Collaborative LLM Inference via Planning for Efficient Reasoning	Byeongchan Lee et.al.	2506.11578	null
2025-06-13	Efficient Long-Context LLM Inference via KV Cache Clustering	Jie Hu et.al.	2506.11418	null
2025-06-12	From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review	Yaohui Zhang et.al.	2506.11343	null
2025-06-12	SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding	Ziyi Zhang et.al.	2506.11309	null
2025-06-06	DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration	Hanzhi Zhang et.al.	2506.11104	link
2025-06-12	Slimming Down LLMs Without Losing Their Minds	Qingda et.al.	2506.10885	null
2025-06-12	AdaptiveLLM: A Framework for Selecting Optimal Cost-Efficient LLM for Code-Generation Based on CoT Length	Junhang Cheng et.al.	2506.10525	link
2025-06-12	TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference	Hongbin Zhang et.al.	2506.10470	null
2025-06-11	A First Look at Bugs in LLM Inference Engines	Mugeng Liu et.al.	2506.09713	link
2025-06-12	Understanding the Performance and Power of LLM Inferencing on Edge Accelerators	Mayank Arya et.al.	2506.09554	null
2025-06-11	Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning	Jiayi Yuan et.al.	2506.09501	null
2025-06-10	Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive- $k$	Chihiro Taguchi et.al.	2506.08479	null
2025-07-19	Draft-based Approximate Inference for LLMs	Kevin Galim et.al.	2506.08373	link
2025-06-09	MiniCPM4: Ultra-Efficient LLMs on End Devices	MiniCPM Team et.al.	2506.07900	link
2025-06-09	How Benchmark Prediction from Fewer Data Misses the Mark	Guanhua Zhang et.al.	2506.07673	link
2025-06-09	TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review	Yuan Chang et.al.	2506.07642	null
2025-06-09	MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts	Wei Tao et.al.	2506.07533	null
2025-06-07	Containerized In-Storage Processing and Computing-Enabled SSD Disaggregation	Miryeong Kwon et.al.	2506.06769	null
2025-06-06	Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques	Adarsh Prasad Behera et.al.	2506.06579	null
2025-06-06	Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage	Ziqi Yuan et.al.	2506.06472	null
2025-07-08	On the Fundamental Impossibility of Hallucination Control in Large Language Models	Michał P. Karpowicz et.al.	2506.06382	null
2025-05-21	Reward Is Enough: LLMs Are In-Context Reinforcement Learners	Kefan Song et.al.	2506.06303	null
2025-06-06	AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search	Yu Li et.al.	2506.06017	null
2025-06-06	FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model	Md Jueal Mia et.al.	2506.05640	link
2025-06-11	Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models	Yanzhao Zhang et.al.	2506.05176	null
2025-06-05	Are LLMs Reliable Translators of Logical Reasoning Across Lexically Diversified Contexts?	Qingchuan Li et.al.	2506.04575	link
2025-06-04	Cascadia: A Cascade Serving System for Large Language Models	Youhe Jiang et.al.	2506.04203	null
2025-06-04	SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling	Anhao Zhao et.al.	2506.04179	null
2025-06-04	GORACS: Group-level Optimal Transport-guided Coreset Selection for LLM-based Recommender Systems	Tiehua Mei et.al.	2506.04015	null
2025-06-04	Pre $^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation	Junyi Chen et.al.	2506.03887	null
2025-06-04	Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis	Avihay Cohen et.al.	2506.03656	null
2025-06-04	POSS: Position Specialist Generates Better Draft for Speculative Decoding	Langlin Huang et.al.	2506.03566	link
2025-07-10	Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs	Jiakun Fan et.al.	2506.03296	null
2025-06-03	QKV Projections Require a Fraction of Their Memory	Malik Khalaf et.al.	2506.02939	null
2025-06-03	Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs	Shangmin Guo et.al.	2506.02918	null
2025-06-14	TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression	Zhong-Zhi Li et.al.	2506.02678	link
2025-07-23	KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider	Jiahao Wang et.al.	2506.02634	link
2025-06-03	HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference	Ping Gong et.al.	2506.02572	link
2025-06-03	Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective	Shenghua He et.al.	2506.02553	null
2025-05-29	NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs	Haeun Lee et.al.	2506.02024	null
2025-05-24	Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing	Zhaoyuan Su et.al.	2506.02006	null
2025-05-16	Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism	Yuhao Shen et.al.	2506.01979	null
2025-06-02	Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts	Spencer Banasik et.al.	2506.01827	null
2025-05-13	AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies	Amit Sharma et.al.	2506.00008	null
2025-05-30	AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption	Yajie Zhou et.al.	2505.24773	null
2025-05-30	SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training	Yehonathan Refael et.al.	2505.24749	null
2025-05-30	Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching	Juan Wisznia et.al.	2505.24643	null
2025-05-30	LLM Inference Enhanced by External Knowledge: A Survey	Yu-Hsuan Lin et.al.	2505.24377	link
2025-05-30	SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference	Tian Xia et.al.	2505.24095	null
2025-05-29	Large Language Model Meets Constraint Propagation	Alexandre Bonlarron et.al.	2505.24012	null
2025-05-29	EmbAdvisor: Adaptive Cache Management for Sustainable LLM Serving	Yuyang Tian et.al.	2505.23970	null
2025-05-29	Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters	Hayden Moore et.al.	2505.23554	null
2025-06-10	Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism	Jinhui Wei et.al.	2505.23219	null
2025-05-29	SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference	Yinghao Tang et.al.	2505.23022	null
2025-05-28	Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference	Donghyeon Joo et.al.	2505.22913	link
2025-05-28	AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models	Feng Luo et.al.	2505.22662	null
2025-05-28	Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR	Mingchen Shao et.al.	2505.22063	null
2025-05-28	ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning	Zhendong Mi et.al.	2505.21987	null
2025-05-28	Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference	Yue Zhu et.al.	2505.21919	null
2025-05-29	EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse	Tianyu Guo et.al.	2505.21889	link
2025-05-28	HoliTom: Holistic Token Merging for Fast Video Large Language Models	Kele Shao et.al.	2505.21334	link
2025-06-04	LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models	Jieyong Kim et.al.	2505.21082	null
2025-05-27	Efficient Large Language Model Inference with Neural Block Linearization	Mete Erdogan et.al.	2505.21077	null
2025-07-18	FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration	Daehyeon Baek et.al.	2505.20839	null
2025-05-26	HAMburger: Accelerating LLM Inference via Token Smashing	Jingyu Liu et.al.	2505.20438	null
2025-05-23	Less Context, Same Performance: A RAG Framework for Resource-Efficient LLM-Based Clinical NLP	Satya Narayana Cheetirala et.al.	2505.20320	null
2025-05-26	APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization	Javier Marín et.al.	2505.19912	link
2025-06-13	MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE	Zongle Huang et.al.	2505.19645	null
2025-05-26	VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning	Maonan Wang et.al.	2505.19486	null
2025-05-26	BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs	Guilong Lu et.al.	2505.19457	link
2025-05-26	WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference	Sihan Chen et.al.	2505.19427	link
2025-05-25	DECA: A Near-Core LLM Decompression Accelerator Supporting Out-of-Order Invocation	Gerasimos Gerogiannis et.al.	2505.19349	null
2025-05-25	Can Large Language Models Infer Causal Relationships from Real-World Text?	Ryan Saklad et.al.	2505.18931	null
2025-06-18	ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models	Hao Chen et.al.	2505.18799	null
2025-06-03	A Survey of LLM $\times$ DATA	Xuanhe Zhou et.al.	2505.18458	null
2025-05-23	LatentLLM: Attention-Aware Joint Tensor Compression	Toshiaki Koike-Akino et.al.	2505.18413	null
2025-05-23	An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs	Rahul Thomas et.al.	2505.18332	null
2025-07-01	Two-Stage Regularization-Based Structured Pruning for LLMs	Mingkuan Feng et.al.	2505.18232	null
2025-05-23	NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache	Donghyun Son et.al.	2505.18231	null
2025-05-23	Navigating Pitfalls: Evaluating LLMs in Machine Learning Programming Education	Smitha Kumar et.al.	2505.18220	null
2025-05-23	Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning	Michael Hassid et.al.	2505.17813	null
2025-05-23	DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies	Ning Yang et.al.	2505.17420	null
2025-05-26	RAP: Runtime-Adaptive Pruning for LLM Inference	Huanrong Liu et.al.	2505.17138	null
2025-05-20	Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency	Ruixiao Li et.al.	2505.17074	null
2025-05-16	SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs	Jinwoo Park et.al.	2505.17052	null
2025-05-22	CASTILLO: Characterizing Response Length Distributions of Large Language Models	Daniel F. Perez-Ramirez et.al.	2505.16881	link
2025-05-24	Recursive Offloading for LLM Serving in Multi-tier Networks	Zhiyuan Wu et.al.	2505.16502	link
2025-05-22	Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization	Vera Neplenbroek et.al.	2505.16467	link
2025-05-22	LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead	Yifan Zhang et.al.	2505.16221	null
2025-05-31	QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design	Benjamin Schneider et.al.	2505.16175	link
2025-05-22	KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization	Mingbo Song et.al.	2505.16162	null
2025-05-21	Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning	Jinghui Lu et.al.	2505.15154	null
2025-05-21	BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms	Yunlong Hou et.al.	2505.15141	null
2025-06-04	Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity	Susav Shrestha et.al.	2505.14884	link
2025-05-20	ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions	Bufang Yang et.al.	2505.14668	null
2025-05-20	ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs	Yifan Sui et.al.	2505.14468	null
2025-05-20	Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning	Jiwon Song et.al.	2505.13866	link
2025-05-19	Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training	Shane Bergsma et.al.	2505.13738	null
2025-05-16	An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents	Ayesha Amjad et.al.	2505.13504	null
2025-04-02	Large Language Model powered Symbolic Execution	Yihe Li et.al.	2505.13452	null
2025-05-19	Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately	Yuhang Wang et.al.	2505.13326	null
2025-05-19	HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding	Siran Liu et.al.	2505.13254	null
2025-05-19	FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference	Guangda Liu et.al.	2505.13109	null
2025-05-19	EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code	Yuhao Qing et.al.	2505.13004	link
2025-05-25	FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks	Zihua Wang et.al.	2505.12728	link
2025-05-19	HydraInfer: Hybrid Disaggregated Scheduling for Multimodal Large Language Model Serving	Xianzhe Dong et.al.	2505.12658	null
2025-05-17	Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning	Yuheng Lu et.al.	2505.11922	null
2025-05-17	Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture	Yu Wu et.al.	2505.11916	null
2025-05-25	Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning	Yansong Ning et.al.	2505.11827	null
2025-07-10	TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference	Raja Gond et.al.	2505.11329	link
2025-05-23	SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning	Zheng Li et.al.	2505.11274	null
2025-05-16	Vaiage: A Multi-Agent Solution to Personalized Travel Planning	Binwen Liu et.al.	2505.10922	null
2025-05-21	SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices	Xiangwen Zhuge et.al.	2505.10259	link
2025-06-05	ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production	Yuxing Xiang et.al.	2505.09999	link
2025-05-15	How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference	Nidhal Jegham et.al.	2505.09598	null
2025-05-14	Statistical Modeling and Uncertainty Estimation of LLM Inference Systems	Kaustabha Ray et.al.	2505.09319	null
2025-05-15	ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor	Seungbeom Choi et.al.	2505.09142	link
2025-05-13	ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition	Keran Zheng et.al.	2505.08981	null
2025-06-30	LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries	Zekun Wu et.al.	2505.08842	null
2025-05-13	Automatic Task Detection and Heterogeneous LLM Speculative Decoding	Danying Ge et.al.	2505.08600	null
2025-05-08	Scaling Laws for Speculative Decoding	Siyuan Yan et.al.	2505.07858	null
2025-05-12	SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models	Hang Wu et.al.	2505.07680	null
2025-05-12	LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning	Xiaotian Lin et.al.	2505.07437	link
2025-05-12	Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity	Guang Yan et.al.	2505.07239	null
2025-05-12	PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications	Kuntai Du et.al.	2505.07203	null
2025-06-15	I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference	Zibo Gao et.al.	2505.06738	null
2025-05-09	Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference	Haolin Zhang et.al.	2505.06461	null
2025-04-30	Towards Efficient LLM Storage Reduction via Tensor Deduplication and Delta Compression	Zirui Wang et.al.	2505.06252	null
2025-05-09	Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM	Zehao Fan et.al.	2505.05772	null
2025-05-08	PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization	Mohamed Salah Bouafif et.al.	2505.05584	link
2025-05-08	HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow	You Peng et.al.	2505.05286	link
2025-05-12	Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving	Shan Yu et.al.	2505.04021	null
2025-05-31	LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection	Xinyue Zeng et.al.	2505.03793	link
2025-05-15	GPU Performance Portability needs Autotuning	Burkhard Ringlein et.al.	2505.03780	link
2025-04-21	Splitwiser: Efficient LM inference with constrained resources	Asad Aali et.al.	2505.03763	link
2025-04-07	AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design	Yanbiao Liang et.al.	2505.03745	null
2025-05-06	Faster MoE LLM Inference for Extremely Large Models	Haoqi Yang et.al.	2505.03531	null
2025-05-16	34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery	Yoel Zimmermann et.al.	2505.03049	null
2025-06-30	RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference	Yaoqi Chen et.al.	2505.02922	null
2025-05-06	EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices	Arnab Sanyal et.al.	2505.02380	null
2025-05-03	Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients	Yezhen Wang et.al.	2505.01744	null
2025-05-03	High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers	Brian Wong et.al.	2505.01693	null
2025-05-08	A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency	Sihyeong Park et.al.	2505.01658	link
2025-05-02	PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding	Bradley McDanel et.al.	2505.01572	null
2025-05-01	Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models	Andrew Adiletta et.al.	2505.00817	null
2025-04-29	Efficient LLMs with AMP: Attention Heads and MLP Pruning	Leandro Giusti Mugnaini et.al.	2504.21174	null
2025-04-29	Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts	Hanhua Hong et.al.	2504.21117	null
2025-04-30	Ascendra: Dynamic Request Prioritization for Efficient LLM Serving	Azam Ikram et.al.	2504.20828	null
2025-04-30	GenTorrent: Scaling Large Language Model Serving with An Overley Network	Fei Fang et.al.	2504.20101	null
2025-04-24	Tempo: Application-aware LLM Serving with Mixed SLO Requirements	Wei Zhang et.al.	2504.20068	null
2025-04-28	AutoJudge: Judge Decoding Without Manual Annotation	Roman Garipov et.al.	2504.20039	null
2025-04-28	semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage	Ke Hong et.al.	2504.19867	null
2025-04-28	Taming the Titans: A Survey of Efficient LLM Inference Serving	Ranran Zhen et.al.	2504.19720	link
2025-04-28	Bullet: Boosting GPU Utilization for LLM Serving via Dynamic Spatial-Temporal Orchestration	Zejia Lin et.al.	2504.19516	null
2025-04-28	R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference	Zhenyu Zhang et.al.	2504.19449	null
2025-04-28	Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory	Prateek Chhikara et.al.	2504.19413	null
2025-05-07	A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification	Junichiro Niimi et.al.	2504.18884	link
2025-06-15	PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation	Zihao An et.al.	2504.18583	null
2025-04-25	EcoServe: Enabling Cost-effective LLM Serving with Proactive Intra- and Inter-Instance Orchestration	Jiangsu Du et.al.	2504.18154	null
2025-04-25	PropRAG: Guiding Retrieval with Beam Search over Proposition Paths	Jingjin Wang et.al.	2504.18070	null
2025-04-25	Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving	Chang Xiao et.al.	2504.17999	null
2025-04-24	Energy Considerations of Large Language Model Inference and Efficiency Optimizations	Jared Fernandez et.al.	2504.17674	null
2025-04-24	L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference	Qingyuan Liu et.al.	2504.17584	null
2025-04-24	A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task	Jiaqi Deng et.al.	2504.17547	null
2025-04-24	On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration	Maoyang Xiang et.al.	2504.17376	null
2025-04-26	QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining	Fengze Liu et.al.	2504.16511	null
2025-04-18	HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing	Myunghyun Rhee et.al.	2504.16112	null
2025-05-29	Optimizing Token Consumption in LLMs: A Nano Surge Approach for Code Reasoning Efficiency	Junwei Hu et.al.	2504.15989	null
2025-04-22	SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference	Yihao Zhao et.al.	2504.15720	null
2025-04-23	A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings	Md Millat Hosen et.al.	2504.15610	link
2025-04-21	Speculative Sampling via Exponential Races	Szymon Kobus et.al.	2504.15475	null
2025-05-20	KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments	Junyoung Park et.al.	2504.15364	null
2025-04-18	High-Throughput LLM inference on Heterogeneous Clusters	Yi Xiong et.al.	2504.15303	null
2025-04-17	D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving	Haodong Wang et.al.	2504.15299	null
2025-06-12	SLO-Aware Scheduling for Large Language Model Inferences	Jinqi Huang et.al.	2504.14966	null
2025-04-21	Hardware-based Heterogeneous Memory Management for Large Language Model Inference	Soojin Hwang et.al.	2504.14893	null
2025-05-28	gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling	Tianyu Guo et.al.	2504.14775	link
2025-04-20	Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions	Luyang Fang et.al.	2504.14772	null
2025-04-22	Optimizing SLO-oriented LLM Serving with PD-Multiplexing	Weihao Cui et.al.	2504.14489	null
2025-04-19	Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator	Akshat Ramachandran et.al.	2504.14365	null
2025-04-19	FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference	Coleman Hooper et.al.	2504.14152	null
2025-05-12	From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs	Jiliang Ni et.al.	2504.13471	null
2025-05-23	The Quantum LLM: Modeling Semantic Spaces with Quantum Principles	Timo Aukusti Laine et.al.	2504.13202	null
2025-04-25	Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving	Yaoyao Ding et.al.	2504.12984	null
2025-04-17	Data-efficient LLM Fine-tuning for Code Generation	Weijie Lv et.al.	2504.12687	link
2025-04-16	Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading	Kihyun Kim et.al.	2504.11816	link
2025-04-16	Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs	Hyungwoo Lee et.al.	2504.11765	null
2025-04-16	Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures	Prabhu Vellaisamy et.al.	2504.11750	null
2025-04-16	Progent: Programmable Privilege Control for LLM Agents	Tianneng Shi et.al.	2504.11703	link
2025-04-15	Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints	Ruicheng Ao et.al.	2504.11320	link
2025-04-14	HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving	Avinash Kumar et.al.	2504.10724	null
2025-04-14	Load Balancing with Network Latencies via Distributed Gradient Descent	Santiago R. Balseiro et.al.	2504.10693	null
2025-04-15	AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference	Yangshen Deng et.al.	2504.10326	null
2025-04-14	KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference	Yuxuan Tian et.al.	2504.09936	null
2025-04-20	Understanding and Optimizing Multi-Stage AI Inference Pipelines	Abhimanyu Rajeshkumar Bambhaniya et.al.	2504.09775	null
2025-04-13	Integrating Large Language Models for Automated Structural Analysis	Haoran Liang et.al.	2504.09754	null
2025-04-13	Efficient LLM Serving on Hybrid Real-time and Best-effort Requests	Wan Borui et.al.	2504.09590	null
2025-04-13	LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference	Jianing Zheng et.al.	2504.09561	link
2025-04-12	MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints	Yichao Yuan et.al.	2504.09345	null
2025-05-22	DynaServe: Unified and Elastic Execution for Dynamic Disaggregated LLM Serving	Chaoyi Ruan et.al.	2504.09285	null
2025-04-11	An Adaptive Vector Index Partitioning Scheme for Low-Latency RAG Pipeline	Junkyum Kim et.al.	2504.08930	null
2025-04-11	SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting	Jiaming Xu et.al.	2504.08850	null
2025-05-31	SD $^2$ : Self-Distilled Sparse Drafters	Mike Lasby et.al.	2504.08838	null
2025-04-07	PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters	Zonghang Li et.al.	2504.08791	link
2025-04-11	Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash	Fucheng Jia et.al.	2504.08378	null
2025-04-11	Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices	Shengyuan Ye et.al.	2504.08242	null
2025-04-10	Token Level Routing Inference System for Edge Devices	Jianshu She et.al.	2504.07878	null
2025-04-10	A System for Comprehensive Assessment of RAG Frameworks	Mattia Rengo et.al.	2504.07803	link
2025-04-11	Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving	Shihong Gao et.al.	2504.07494	null
2025-04-10	UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference	Weikai Xu et.al.	2504.07479	null
2025-04-24	Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents	Yueying Li et.al.	2504.07347	null
2025-04-08	S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning	Hanqing Zeng et.al.	2504.06426	null
2025-04-08	SPIRe: Boosting LLM Inference Throughput with Speculative Decoding	Sanjit Neelam et.al.	2504.06419	null
2025-04-08	Mosaic: Composite Projection Pruning for Resource-efficient LLMs	Bailey J. Eccles et.al.	2504.06323	null
2025-04-08	Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching	Yanhao Dong et.al.	2504.06319	null
2025-05-23	Hogwild! Inference: Parallel LLM Generation via Concurrent Attention	Gleb Rodionov et.al.	2504.06261	null
2025-05-27	User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems	Jianling Wang et.al.	2504.05522	null
2025-04-07	REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding	Sakib Reza et.al.	2504.05491	null
2025-04-07	Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness	Dongzhuoran Zhou et.al.	2504.05163	null
2025-05-20	Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning	Sugyeong Eo et.al.	2504.05047	null
2025-04-05	PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models	Haofei Yin et.al.	2504.04104	null
2025-04-03	FlowKV: A Disaggregated Inference Framework with Low-Latency KV Cache Transfer and Load-Aware Scheduling	Weiqing Li et.al.	2504.03775	null
2025-03-30	VFlow: Discovering Optimal Agentic Workflows for Verilog Generation	Yangbo Wei et.al.	2504.03723	null
2025-04-08	MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization	Zongwu Wang et.al.	2504.03661	link
2025-03-01	Echo: Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model Serving	Zhibin Wang et.al.	2504.03651	null
2025-02-22	AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure	The AIBrix Team et.al.	2504.03648	null
2025-04-04	Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Erik Johannes Husom et.al.	2504.03360	null
2025-04-04	Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation	Weitao Li et.al.	2504.03165	link
2025-04-03	Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search	Parsa Ghaffari et.al.	2504.02426	link
2025-04-01	SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching	Yuxuan Zhu et.al.	2504.00970	null
2025-06-04	Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding	Aayush Gautam et.al.	2504.00030	null
2025-03-31	TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance	Jingxian Xu et.al.	2503.24198	null
2025-04-06	ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance	Tong Xie et.al.	2503.24053	link
2025-03-31	Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving	Wei Gao et.al.	2503.24000	link
2025-03-31	Model Hemorrhage and the Robustness Limits of Large Language Models	Ziyang Ma et.al.	2503.23924	null
2025-03-31	MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration	Tatsuya Kubo et.al.	2503.23817	null
2025-03-30	Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference	Wei Tao et.al.	2503.23294	null
2025-03-30	PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference	Weisheng Jin et.al.	2503.23274	link
2025-03-28	Niyama : Breaking the Silos of LLM Inference Serving	Kanishk Goel et.al.	2503.22562	null
2025-03-26	Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation	Yunkai Liang et.al.	2503.20552	link
2025-03-25	LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation	Han Chen et.al.	2503.19950	link
2025-03-24	LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment	Varsha Embar et.al.	2503.19090	null
2025-03-23	SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices	Jian Ma et.al.	2503.18986	null
2025-03-24	xKV: Cross-Layer SVD for KV-Cache Compression	Chi-Chih Chang et.al.	2503.18893	link
2025-04-21	Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design	Rui Xie et.al.	2503.18869	null
2025-05-14	Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization	Minsu Kim et.al.	2503.18599	null
2025-03-24	DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective	Changlun Li et.al.	2503.18313	null
2025-03-24	Jenga: Effective Memory Management for Serving LLM with Heterogeneity	Chen Zhang et.al.	2503.18292	null
2025-03-27	WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference	Youhui Zuo et.al.	2503.17922	link
2025-03-22	PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling	Chongpeng Liu et.al.	2503.17707	null
2025-03-21	V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms	Javier J. Poveda Rodrigo et.al.	2503.17422	null
2025-03-21	Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation	Jingzhi Fang et.al.	2503.16893	null
2025-05-16	KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse	Huan Yang et.al.	2503.16525	null
2025-03-20	SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models	Fahao Chen et.al.	2503.15921	null
2025-03-19	Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study	Jomar Thomas Almonte et.al.	2503.15248	null
2025-04-15	ELTEX: A Framework for Domain-Driven Synthetic Data Generation	Arina Razmyslovich et.al.	2503.15055	link
2025-03-19	FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Chongjun Tu et.al.	2503.14935	null
2025-03-19	Communication-Efficient Distributed On-Device LLM Inference Over Wireless Networks	Kai Zhang et.al.	2503.14882	null
2025-03-21	RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving	Wenqi Jiang et.al.	2503.14649	null
2025-03-18	PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play	Wei Fang et.al.	2503.14432	null
2025-03-24	Mitigating KV Cache Competition to Enhance User Experience in LLM Inference	Haiying Shen et.al.	2503.13773	null
2025-03-17	AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications	Haiying Shen et.al.	2503.13737	null
2025-03-17	ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts	Evangelos Georganas et.al.	2503.13565	null
2025-03-14	Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce	Jingying Zeng et.al.	2503.13518	null
2025-03-17	xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference	Maximilian Beck et.al.	2503.13427	link
2025-04-14	VeriLeaky: Navigating IP Protection vs Utility in Fine-Tuning for LLM-Driven Verilog Coding	Zeng Wang et.al.	2503.13116	null
2025-03-15	TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation	Mayank Kumar et.al.	2503.12217	null
2025-04-22	Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques	Neusha Javidnia et.al.	2503.11816	null
2025-05-19	D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning	Jia Zhang et.al.	2503.11441	null
2025-03-14	MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens	Jeong Hun Yeo et.al.	2503.11315	link
2025-04-08	Green Prompting	Marta Adamska et.al.	2503.10666	null
2025-05-16	Collaborative Speculative Inference for Efficient LLM Inference Serving	Luyao Gao et.al.	2503.10325	null
2025-03-17	Exploiting Edited Large Language Models as General Scientific Optimizers	Qitan Lv et.al.	2503.09620	null
2025-03-13	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	Md Mohaiminul Islam et.al.	2503.09590	link
2025-05-23	Prompt Inference Attack on Distributed Large Language Model Inference Frameworks	Xinjian Luo et.al.	2503.09291	null
2025-05-02	Prompt Inversion Attack against Collaborative Inference of Large Language Models	Wenjie Qu et.al.	2503.09022	null
2025-03-19	Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning	Yuan Jiang et.al.	2503.09020	link
2025-03-11	Position-Aware Depth Decay Decoding ( $D^3$ ): Boosting Large Language Model Inference Efficiency	Siqi Fan et.al.	2503.08524	null
2025-03-11	FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework	Jianian Zhu et.al.	2503.08461	null
2025-03-19	TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems	Feiyang Wu et.al.	2503.08415	link
2025-03-11	Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference	Pol G. Recasens et.al.	2503.08311	null
2025-03-09	Seesaw: High-throughput LLM Inference via Model Re-sharding	Qidong Su et.al.	2503.06433	null
2025-02-24	Encoding Inequity: Examining Demographic Bias in LLM-Driven Robot Caregiving	Raj Korpan et.al.	2503.05765	null
2025-03-07	Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching	Bowen Pang et.al.	2503.05248	link
2025-05-21	Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching	Simon A. Aytes et.al.	2503.05179	link
2025-03-07	SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding	Kaiyu Huang et.al.	2503.05096	null
2025-03-07	Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size	Alireza Behtash et.al.	2503.04704	null
2025-03-15	Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking	Yijie Xu et.al.	2503.04636	null
2025-03-06	AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services	Xiaoqi Wang et.al.	2503.04418	null
2025-03-06	Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search	Kou Misaki et.al.	2503.04412	null
2025-03-06	ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput	Junsoo Kim et.al.	2503.04253	null
2025-03-06	Beyond Memorization: Evaluating the True Type Inference Capabilities of LLMs for Java Code Snippets	Yiwen Dong et.al.	2503.04076	null
2025-03-04	FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference	Hongchao Du et.al.	2503.03777	null
2025-03-05	MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems	Rui Ye et.al.	2503.03686	null
2025-03-05	Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems	Yaoru Li et.al.	2503.03505	link
2025-03-05	Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism	Xinyuan Lin et.al.	2503.03182	null
2025-03-04	PersonaX: A Recommendation Agent Oriented User Modeling Framework for Long Behavior Sequence	Yunxiao Shi et.al.	2503.02398	link
2025-03-04	VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference	Zihan Liu et.al.	2503.02236	null
2025-02-26	Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis	Long Cheng et.al.	2503.01873	null
2025-04-30	SAGE: A Framework of Precise Retrieval for RAG	Jintao Zhang et.al.	2503.01713	null
2025-03-03	Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens	Xinsheng Wang et.al.	2503.01710	link
2025-03-03	DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems	Minoo Hosseinzadeh et.al.	2503.01704	null
2025-03-15	Towards An Efficient LLM Training Paradigm for CTR Prediction	Allen Lin et.al.	2503.01001	null
2025-03-02	Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers	Yiran Zhao et.al.	2503.00865	null
2025-03-01	Tutorial Proposal: Speculative Decoding for Efficient LLM Inference	Heming Xia et.al.	2503.00491	null
2025-03-04	Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving	Qihui Zhou et.al.	2503.00392	null
2025-02-28	FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference	Xunhao Lai et.al.	2502.20766	link
2025-05-04	SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models	Han-Byul Kim et.al.	2502.20727	null
2025-04-02	Smart Routing: Cost-Effective Multi-LLM Serving for Multi-Core AIOS	Kai Mei et.al.	2502.20576	link
2025-02-27	M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging	Jinghao Feng et.al.	2502.20301	null
2025-02-26	Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs	Yiheng Yang et.al.	2502.19078	null
2025-02-26	Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection	Carter Adams et.al.	2502.18823	null
2025-02-24	LLM Inference Acceleration via Efficient Operation Fusion	Mahsa Salmani et.al.	2502.17728	null
2025-02-24	CodeSwift: Accelerating LLM Inference for Efficient Code Generation	Qianhui Zhao et.al.	2502.17139	null
2025-02-24	Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM	Lian Liu et.al.	2502.16963	null
2025-02-24	DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance	Xuanfan Ni et.al.	2502.16886	null
2025-03-01	CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter	Yepeng Weng et.al.	2502.16880	null
2025-02-23	DISC: Dynamic Decomposition Improves LLM Inference Scaling	Jonathan Light et.al.	2502.16706	null
2025-02-23	Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines	Xinwei Long et.al.	2502.16641	null
2025-05-01	TerEffic: Highly Efficient Ternary LLM Inference on FPGA	Chenyang Yin et.al.	2502.16473	null
2025-02-27	Dynamic Parallel Tree Search for Efficient LLM Reasoning	Yifu Ding et.al.	2502.16235	null
2025-02-21	KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse	Jingbo Yang et.al.	2502.16002	link
2025-02-14	Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization	Bowen Pang et.al.	2502.15763	null
2025-02-21	Towards Swift Serverless LLM Cold Starts with ParaServe	Chiheng Lou et.al.	2502.15524	null
2025-02-24	HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings	Rasmus Aavang et.al.	2502.15411	link
2025-02-24	Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference	Yaohua Tang et.al.	2502.15294	null
2025-02-21	A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation	Shilong Hou et.al.	2502.15233	link
2025-02-19	EvoP: Robust LLM Inference via Evolutionary Pruning	Shangyu Wu et.al.	2502.14910	null
2025-04-21	LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention	Shang Yang et.al.	2502.14866	link
2025-02-20	Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale	Shashwat Jaiswal et.al.	2502.14617	null
2025-02-20	SR-LLM: Rethinking the Structured Representation in Large Language Model	Jiahuan Zhang et.al.	2502.14352	null
2025-02-20	Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications	Kayhan Behdin et.al.	2502.14305	null
2025-02-19	RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression	Payman Behnam et.al.	2502.14051	null
2025-02-19	Autellix: An Efficient Serving Engine for LLM Agents as General Programs	Michael Luo et.al.	2502.13965	null
2025-02-19	Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference	Qingfa Xiao et.al.	2502.13542	null
2025-02-19	What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis	Peiran Wang et.al.	2502.13490	null
2025-02-24	BaKlaVa – Budgeted Allocation of KV cache for Long-context Inference	Ahmed Burak Gulhan et.al.	2502.13176	null
2025-02-18	SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems	Mike Zhang et.al.	2502.12927	link
2025-03-27	R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs	Sumin Jo et.al.	2502.12767	link
2025-02-18	HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading	Cheng Luo et.al.	2502.12574	link
2025-02-18	Distributed On-Device LLM Inference With Over-the-Air Computation	Kai Zhang et.al.	2502.12559	null
2025-02-18	SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs	Ahmed F. AbouElhamayed et.al.	2502.12444	link
2025-02-17	Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs	Kan Zhu et.al.	2502.12216	null
2025-02-17	Designing Role Vectors to Improve LLM Inference Behaviour	Daniele Potertì et.al.	2502.12055	null
2025-02-17	DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services	Ting Sun et.al.	2502.11417	null
2025-02-17	Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment	Ben Dong et.al.	2502.11347	null
2025-02-16	Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View	Yanran Wu et.al.	2502.11256	null
2025-02-16	Diversified Sampling Improves Scaling LLM inference	Tianchun Wang et.al.	2502.11027	null
2025-02-16	Leveraging Uncertainty Estimation for Efficient LLM Routing	Tuo Zhang et.al.	2502.11021	null
2025-04-07	Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings	Liangqi Yuan et.al.	2502.11007	link
2025-02-15	Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA	Jindong Li et.al.	2502.10659	null
2025-02-05	QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache	Rishabh Tiwari et.al.	2502.10424	null
2025-02-14	λScale: Enabling Fast Scaling for Serverless Large Language Model Inference	Minchen Yu et.al.	2502.09922	null
2025-02-14	INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing	Hongsun Jang et.al.	2502.09921	null
2025-02-13	On multi-token prediction for efficient LLM inference	Somesh Mehra et.al.	2502.09419	null
2025-02-13	ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments	Youhe Jiang et.al.	2502.09334	null
2025-03-21	RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models	Quan Wei et.al.	2502.09003	null
2025-02-13	InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU	Heejun Lee et.al.	2502.08910	null
2025-02-13	DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation	Tangyu Jiang et.al.	2502.08905	null
2025-02-12	Universal Model Routing for Efficient LLM Inference	Wittawat Jitkrittum et.al.	2502.08773	null
2025-02-12	MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation	Min Hou et.al.	2502.08271	null
2025-02-12	Memory Offloading for Large Language Model Inference with Latency SLO Guarantees	Chenxiang Ma et.al.	2502.08182	null
2025-02-12	Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences	Shanshan Han et.al.	2502.08142	null
2025-03-19	Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding	Ziyao Wang et.al.	2502.08020	null
2025-02-11	HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment	Youhe Jiang et.al.	2502.07903	null
2025-02-11	SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters	Yiping Wang et.al.	2502.07832	null
2025-03-21	PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference	Yufeng Gu et.al.	2502.07578	link
2025-03-05	Online Scheduling for LLM Inference with KV Cache Constraints	Patrick Jaillet et.al.	2502.07115	null
2025-02-10	Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE	Haiduo Huang et.al.	2502.06282	link
2025-03-15	Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models	Soham Poddar et.al.	2502.05610	null
2025-02-08	Mechanistic Interpretability of Emotion Inference in Large Language Models	Ala N. Tak et.al.	2502.05489	null
2025-02-07	BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference	Reena Elangovan et.al.	2502.05376	null
2025-01-31	Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies	Nadav Timor et.al.	2502.05202	null
2025-03-15	EcoServe: Designing Carbon-Aware AI Inference Systems	Yueying Li et.al.	2502.05043	null
2025-02-07	LLM Query Scheduling with Prefix Reuse and Latency Constraints	Gregory Dexter et.al.	2502.04677	null
2025-02-18	WaferLLM: A Wafer-Scale LLM Inference System	Congjie He et.al.	2502.04563	null
2025-02-25	KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference	Xing Li et.al.	2502.04420	link
2025-02-06	CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Zehua Pei et.al.	2502.04416	link
2025-02-11	Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing	Kunfeng Lai et.al.	2502.04411	null
2025-02-26	AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference	Qingyue Yang et.al.	2502.04077	link
2025-02-06	CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing	Yu Yuan et.al.	2502.03997	null
2025-02-06	Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective	Yuan Feng et.al.	2502.03805	link
2025-04-04	Adaptive Semantic Prompt Caching with VectorQ	Luis Gaspar Schroeder et.al.	2502.03771	null
2025-02-05	Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training	Reza Shirkavand et.al.	2502.03604	null
2025-02-05	HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference	Zeyu Zhang et.al.	2502.03589	null
2025-02-05	Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL	Wenbo Sun et.al.	2502.02818	null
2025-02-05	Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation	Jingyu Liu et.al.	2502.02789	link
2025-02-04	LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing	Yang Li et.al.	2502.02743	null
2025-02-04	EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization	Yize Wu et.al.	2502.02493	null
2025-01-30	Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency	Sazzad Hossain et.al.	2502.01651	null
2025-02-06	An Investigation of FP8 Across Accelerators for LLM Inference	Jiwoo Kim et.al.	2502.01070	null
2025-02-02	Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference	Patrick Yubeaton et.al.	2502.00922	null
2025-02-02	MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies	Ehsaneddin Asgari et.al.	2502.00894	null
2025-02-02	SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models	Jiawen Zhang et.al.	2502.00847	null
2025-02-02	Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs	Youhe Jiang et.al.	2502.00722	null
2025-02-13	Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning	Zhi Zhou et.al.	2502.00511	null
2025-02-01	UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs	Yizhe Xiong et.al.	2502.00439	null
2025-02-01	ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference	Xiang Liu et.al.	2502.00299	null
2025-01-16	Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models	Tom Wallace et.al.	2502.00046	null
2025-02-07	Pushing the Limits of BFP on Narrow Precision LLM Inference	Hui Wang et.al.	2502.00026	null
2025-02-14	Reward-Guided Speculative Decoding for Efficient LLM Reasoning	Baohao Liao et.al.	2501.19324	null
2025-01-31	Pheromone-based Learning of Optimal Reasoning Paths	Anirudh Chari et.al.	2501.19278	null
2025-01-31	Structural Embedding Projection for Contextual Large Language Model Inference	Vincent Enoasmo et.al.	2501.18826	null
2025-01-29	On the Partitioning of GPU Power among Multi-Instances	Tirth Vamja et.al.	2501.17752	null
2025-02-02	RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations	Zunhai Su et.al.	2501.16383	null
2025-01-27	Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs	Antony Bartlett et.al.	2501.16191	null
2025-01-27	TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference	Jack Min Ong et.al.	2501.16007	null
2025-01-27	Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference	Tharindu B. Hewage et.al.	2501.15829	link
2025-01-25	Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads	Xingyang He et.al.	2501.15113	null
2025-01-25	PatchRec: Multi-Grained Patching for Efficient LLM-based Sequential Recommendation	Jiayi Liao et.al.	2501.15087	null
2025-02-09	HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location	Ting Sun et.al.	2501.14808	null
2025-01-11	HeteroLLM: Accelerating Large Language Model Inference on Mobile SoCs platform with Heterogeneous AI Accelerators	Le Chen et.al.	2501.14794	null
2025-01-04	DeServe: Towards Affordable Offline LLM Inference via Decentralization	Linyu Wu et.al.	2501.14784	null
2024-12-13	KVDirect: Distributed Disaggregated LLM Inference	Shiyang Chen et.al.	2501.14743	null
2025-01-24	Accelerated Preference Elicitation with LLM-Based Proxies	David Huang et.al.	2501.14625	null
2025-01-27	DeepFlow: Serverless Large Language Model Serving at Scale	Junhao Hu et.al.	2501.14417	null
2025-01-24	Locality-aware Fair Scheduling in LLM Serving	Shiyi Cao et.al.	2501.14312	null
2025-01-27	Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading	Minrui Xu et.al.	2501.14205	null
2025-01-08	iServe: An Intent-based Serving System for LLMs	Dimitrios Liakopoulos et.al.	2501.13111	null
2025-01-24	EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation	Yifan Yu et.al.	2501.12689	null
2025-03-16	Human-like conceptual representations emerge from language prediction	Ningyu Xu et.al.	2501.12547	null
2025-01-21	AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding	Zikun Li et.al.	2501.12162	null
2025-02-11	Glinthawk: A Two-Tiered Architecture for Offline LLM Inference	Pouya Hamadanian et.al.	2501.11779	link
2025-01-20	Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas	Nishant Balepur et.al.	2501.11549	link
2025-03-21	GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation	Shashikant Ilager et.al.	2501.11006	link
2025-03-06	A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks	Xinzhe Li et.al.	2501.10069	link
2025-01-16	PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks	Huiyou Zhan et.al.	2501.09367	null
2025-01-16	Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition	Takaaki Hori et.al.	2501.09258	null
2025-01-16	Split Fine-Tuning for Large Language Models in Wireless Networks	Songge Zhang et.al.	2501.09237	null
2025-01-15	Guiding Retrieval using LLM-based Listwise Rankers	Mandeep Rathee et.al.	2501.09186	link
2025-01-14	Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings	Paul Joe Maliakel et.al.	2501.08219	null
2025-01-14	PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving	Ahmet Caner Yüzügüler et.al.	2501.08192	null
2025-01-14	Hierarchical Autoscaling for Large Language Model Serving with Chiron	Archit Patke et.al.	2501.08090	null
2025-01-12	MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference	Wenxuan Zeng et.al.	2501.06807	null
2025-01-12	Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management	Liu Qianli et.al.	2501.06709	null
2025-02-07	Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping	Muru Zhang et.al.	2501.06589	link
2025-01-15	Multimodal-to-Text Prompt Engineering in Large Language Models Using Feature Embeddings for GNSS Interference Characterization	Harshith Manjunath et.al.	2501.05079	null
2025-02-08	Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text	Ali Al-Lawati et.al.	2501.03166	link
2025-01-05	TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms	Jovan Stojkovic et.al.	2501.02600	null
2025-01-04	AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference	Zhuomin He et.al.	2501.02336	link
2024-12-31	Towards Sustainable Large Language Model Serving	Sophia Nguyen et.al.	2501.01990	null
2025-01-03	Efficient LLM Inference with Activation Checkpointing and Hybrid Caching	Sanghyeon Lee et.al.	2501.01792	null
2025-01-03	(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges	Mohamed Hisham Abdellatif et.al.	2501.01588	null
2025-01-21	BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference	Wonsuk Jang et.al.	2501.01144	link
2025-04-23	FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving	Zihao Ye et.al.	2501.01005	null
2025-02-25	Rethinking Layer Removal: A Hybrid Pruning Framework Combining Layer Removal and Singular Value Selection for Efficient LLM Compression	Kainan Liu et.al.	2501.00339	null
2024-12-23	Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs	Dibakar Gope et.al.	2501.00032	link
2024-12-29	TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication	Zongwu Wang et.al.	2412.20501	link
2024-12-29	GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions	Tianyao Shi et.al.	2412.20322	null
2025-01-15	LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System	Hyucksung Kwon et.al.	2412.20166	null
2024-12-19	GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors	Chengming Zhang et.al.	2412.19829	null
2025-01-05	Gradient Weight-normalized Low-rank Projection for Efficient LLM Training	Jia-Hong Huang et.al.	2412.19616	link
2025-01-02	A Survey on Large Language Model Acceleration based on KV Cache Management	Haoyang Li et.al.	2412.19442	link
2025-02-13	An Engorgio Prompt Makes Large Language Model Babble on	Jianshuo Dong et.al.	2412.19394	link
2024-12-25	Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference	Libo Zhang et.al.	2412.18934	null
2024-12-24	TimelyLLM: Segmented LLM Serving System for Time-sensitive Robotic Applications	Neiwen Ling et.al.	2412.18695	null
2024-12-26	KunServe: Elastic and Efficient Large Language Model Serving with Parameter-centric Memory Management	Rongxin Cheng et.al.	2412.18169	null
2025-02-22	Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media	Zhen Sun et.al.	2412.18148	null
2024-12-24	Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels	Mingcong Song et.al.	2412.18106	null
2024-12-23	Trustworthy and Efficient LLMs Meet Databases	Kyoungmin Kim et.al.	2412.18022	null
2025-02-20	GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference	Chao Zeng et.al.	2412.17560	null
2025-02-18	VilBias: A Study of Bias Detection through Linguistic and Visual Cues , presenting Annotation Strategies, Evaluation, and Key Challenges	Shaina Raza et.al.	2412.17052	link
2024-12-21	SYMPHONY: Improving Memory Management for LLM Inference Workloads	Saurabh Agarwal et.al.	2412.16434	null
2024-12-20	WebLLM: A High-Performance In-Browser LLM Inference Engine	Charlie F. Ruan et.al.	2412.15803	link
2024-12-19	Fietje: An open, efficient LLM for Dutch	Bram Vanroy et.al.	2412.15450	link
2024-12-19	PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization	Jiayi Wu et.al.	2412.14510	link
2024-12-19	Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems	Genki Kusano et.al.	2412.14454	null
2024-12-18	A Survey on LLM Inference-Time Self-Improvement	Xiangjue Dong et.al.	2412.14352	link
2024-12-18	Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models	Seungeun Oh et.al.	2412.12687	null
2024-12-17	A System for Microserving of LLMs	Hongyi Jin et.al.	2412.12488	null
2024-12-17	LITA: An Efficient LLM-assisted Iterative Topic Augmentation Framework	Chia-Hsuan Chang et.al.	2412.12459	null
2024-12-16	CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation	Hongxuan Zhang et.al.	2412.11741	null
2025-01-20	FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation	Dannong Wang et.al.	2412.11378	null
2025-01-09	Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning	Yun Qu et.al.	2412.11120	link
2024-12-15	NITRO: LLM Inference on Intel Laptop NPUs	Anthony Fei et.al.	2412.11053	link
2025-03-11	SCBench: A KV Cache-Centric Analysis of Long-Context Methods	Yucheng Li et.al.	2412.10319	null
2024-12-17	TurboAttention: Efficient Attention Approximation For High Throughputs LLMs	Hao Kang et.al.	2412.08585	null
2024-12-11	Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths	Naryeong Kim et.al.	2412.08281	null
2024-12-12	TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch	Xingchen Song et.al.	2412.08237	null
2024-12-09	Asynchronous LLM Function Calling	In Gim et.al.	2412.07017	null
2024-12-08	Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization	Dongwei Wang et.al.	2412.06858	null
2024-12-09	JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM	Takuro Fujii et.al.	2412.06738	link
2024-12-09	SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs	James Vo et.al.	2412.06198	null
2024-12-08	XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference	Weizhuo Li et.al.	2412.05896	null
2025-02-17	APOLLO: SGD-like Memory, AdamW-level Performance	Hanqing Zhu et.al.	2412.05270	link
2024-12-06	Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale?	Seyed Amin Tabatabaei et.al.	2412.05137	null
2024-12-11	Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference	Qingyuan Li et.al.	2412.04964	null
2025-01-26	GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments	Yanyu Chen et.al.	2412.04788	null
2024-12-09	Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems	Ayush Gundawar et.al.	2412.04569	link
2024-12-03	Multi-Bin Batching for Increasing LLM Inference Throughput	Ozgur Guldogan et.al.	2412.04504	null
2025-01-17	BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching	Zhen Zheng et.al.	2412.03594	null
2024-12-04	Unifying KV Cache Compression for Large Language Models with LeanKV	Yanqi Zhang et.al.	2412.03131	null
2024-12-03	Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity	Da Ma et.al.	2412.02252	null
2024-12-02	Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training	Yujie Wang et.al.	2412.01523	null
2024-12-02	PLD+: Accelerating LLM inference by leveraging Language Model Artifacts	Shwetha Somasundaram et.al.	2412.01447	null
2024-12-02	Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking	Marco Federici et.al.	2412.01380	null
2024-12-02	Can Large Language Models Serve as Evaluators for Code Summarization?	Yang Wu et.al.	2412.01333	link
2024-12-05	RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy	Geonho Lee et.al.	2412.01129	null
2024-12-02	TruncFormer: Private LLM Inference Using Only Truncations	Patrick Yubeaton et.al.	2412.01042	null
2024-11-25	Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration	Zhuofan Wen et.al.	2412.00061	null
2024-11-29	A dynamic parallel method for performance optimization on hybrid CPUs	Luo Yu et.al.	2411.19542	null
2024-12-04	Marconi: Prefix Caching for the Era of Hybrid LLMs	Rui Pan et.al.	2411.19379	null
2024-12-08	Puzzle: Distillation-Based NAS for Inference-Optimized LLMs	Akhiad Bercovich et.al.	2411.19146	null
2024-11-27	FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving	Ao Shen et.al.	2411.18424	null
2024-11-29	InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks	Xinyao Zheng et.al.	2411.18191	null
2024-11-28	MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache	Akshat Sharma et.al.	2411.18077	null
2024-11-24	Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments	Nikoleta Iliakopoulou et.al.	2411.17741	null
2024-11-18	Generative AI on the Edge: Architecture and Performance Evaluation	Zeinab Nezami et.al.	2411.17712	null
2024-11-26	Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism	Yi-Chien Lin et.al.	2411.17651	null
2024-11-26	PIM-AI: A Novel Architecture for High-Efficiency LLM Inference	Cristobal Ortega et.al.	2411.17309	null
2024-11-26	Star Attention: Efficient LLM Inference over Long Sequences	Shantanu Acharya et.al.	2411.17116	link
2024-11-26	Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation	Chaoyi Jiang et.al.	2411.17089	null
2024-11-25	MixPE: Quantization and Hardware Co-design for Efficient LLM Inference	Yu Zhang et.al.	2411.16158	null
2024-11-24	eFedLLM: Efficient LLM Inference Based on Federated Learning	Shengwen Ding et.al.	2411.16003	null
2024-11-24	Ensuring Fair LLM Serving Amid Diverse Applications	Redwan Ibne Seraj Khan et.al.	2411.15997	null
2024-11-24	Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format	Chao Fang et.al.	2411.15982	null
2024-11-24	Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems	Wenxiang Lin et.al.	2411.15715	null
2024-11-26	Enabling Efficient Serverless Inference Serving for LLM (Large Language Model) in the Cloud	Himel Ghosh et.al.	2411.15664	null
2025-01-14	AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution	Fengyuan Liu et.al.	2411.15102	link
2024-11-27	XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models	Yixin Dong et.al.	2411.15100	null
2024-11-02	Transforming Engineering Education Using Generative AI and Digital Twin Technologies	Yu-Zheng Lin et.al.	2411.14433	null
2024-11-21	InstCache: A Predictive Cache for LLM Serving	Longwei Zou et.al.	2411.13820	null
2024-11-21	Disentangling Memory and Reasoning Ability in Large Language Models	Mingyu Jin et.al.	2411.13504	link
2024-11-27	Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding	Hyun Ryu et.al.	2411.13157	null
2024-11-21	LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts	Zhuohan Gu et.al.	2411.13009	null
2024-11-15	An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2	Pepijn de Reus et.al.	2411.12758	link
2025-01-24	SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference	Jiho Shin et.al.	2411.12692	null
2024-11-18	BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration	Yuzong Chen et.al.	2411.11745	link
2024-11-18	MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao et.al.	2411.11217	null
2024-11-17	FastDraft: How to Train Your Draft	Ofir Zafrir et.al.	2411.11055	null
2024-12-16	SAM Decoding: Speculative Decoding via Suffix Automaton	Yuxuan Hu et.al.	2411.10666	link
2024-11-15	Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity	Zichen Song et.al.	2411.10069	null
2024-11-15	AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference	Janghwan Lee et.al.	2411.09909	null
2024-11-23	Squeezed Attention: Accelerating Long Context Length LLM Inference	Coleman Hooper et.al.	2411.09688	link
2024-11-15	Communication Compression for Tensor Parallel LLM Inference	Jan Hansen-Palmus et.al.	2411.09510	null
2024-11-14	Pie: Pooling CPU Memory for LLM Inference	Yi Xu et.al.	2411.09317	null
2025-01-23	Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism	Libo Wang et.al.	2411.09111	link
2024-11-12	Towards Low-bit Communication for Tensor Parallel LLM Inference	Harry Dong et.al.	2411.07942	null
2024-12-12	ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization	Weibo Zhao et.al.	2411.07762	null
2025-01-08	BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks	Shubham Gandhi et.al.	2411.07464	null
2024-11-19	The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving	Kyoungmin Kim et.al.	2411.07447	null
2024-11-10	EcoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving	Haiying Shen et.al.	2411.06364	null
2024-11-08	SSSD: Simply-Scalable Speculative Decoding	Michele Marzollo et.al.	2411.05894	null
2024-11-08	AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality	Ilias Bournias et.al.	2411.05555	null
2024-11-07	Hardware and Software Platform Inference	Cheng Zhang et.al.	2411.05197	null
2024-10-22	Scattered Forest Search: Smarter Code Space Exploration with LLMs	Jonathan Light et.al.	2411.05010	null
2024-11-07	SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference	Gabriele Oliaro et.al.	2411.04975	null
2024-11-05	CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration	Hongpeng Jin et.al.	2411.02829	null
2024-12-19	DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving	Yuhan Liu et.al.	2411.02820	null
2024-11-10	Context Parallelism for Scalable Million-Token Inference	Amy Yang et.al.	2411.01783	null
2024-11-04	RAGViz: Diagnose and Visualize Retrieval-Augmented Generation	Tevin Wang et.al.	2411.01751	link
2024-11-03	Autoformulation of Mathematical Optimization Models Using LLMs	Nicolás Astorga et.al.	2411.01679	null
2024-11-06	HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference	Peng Tang et.al.	2411.01433	null
2024-11-02	RA-WEBs: Remote Attestation for WEB services	Kosei Akama et.al.	2411.01340	null
2024-11-02	NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference	Xuanlin Jiang et.al.	2411.01142	null
2024-10-30	A Theoretical Perspective for Speculative Decoding Algorithm	Ming Yin et.al.	2411.00841	null
2024-11-01	Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction	Houjing Wei et.al.	2411.00646	null
2024-11-01	LLM-Based Misconfiguration Detection for AWS Serverless Computing	Jinfeng Wen et.al.	2411.00642	null
2024-12-08	ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models	Anbang Wang et.al.	2411.00533	null
2024-11-01	Attention Tracker: Detecting Prompt Injection Attacks in LLMs	Kuo-Han Hung et.al.	2411.00348	null
2024-10-31	LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators	Krishna Teja Chitty-Venkata et.al.	2411.00136	link
2024-10-31	Interpretable Language Modeling via Induction-head Ngram Models	Eunji Kim et.al.	2411.00066	link
2024-10-31	ALISE: Accelerating Large Language Model Serving with Speculative Scheduling	Youpeng Zhao et.al.	2410.23537	null
2024-10-30	BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference	Junqi Zhao et.al.	2410.23079	link
2024-10-29	Scaling LLM Inference with Optimized Sample Compute Allocation	Kexun Zhang et.al.	2410.22480	link
2024-10-29	SVIP: Towards Verifiable Inference of Open-source Large Language Models	Yifan Sun et.al.	2410.22307	null
2025-02-08	ProMoE: Fast MoE-based LLM Serving using Proactive Caching	Xiaoniu Song et.al.	2410.22134	null
2025-01-21	MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression	Noel Elias et.al.	2410.21548	link
2025-04-29	ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference	Hanshi Sun et.al.	2410.21465	null
2024-10-27	FIRP: Faster LLM inference via future intermediate representation prediction	Pengfei Wu et.al.	2410.20488	null
2024-10-29	Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management	Tuowei Wang et.al.	2410.19274	null
2024-10-24	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai et.al.	2410.19123	link
2024-10-30	Dynamic Vocabulary Pruning in Early-Exit LLMs	Jort Vincenti et.al.	2410.18952	link
2024-10-25	A Survey on Speech Large Language Models	Jing Peng et.al.	2410.18908	null
2024-10-24	A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs	Ankit Singh Rawat et.al.	2410.18779	null
2024-10-24	BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching	Peizhuang Cong et.al.	2410.18701	null
2024-10-23	CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation	Qinsi Wang et.al.	2410.18311	null
2024-10-25	Fast Inference for Augmented Large Language Models	Rana Shahout et.al.	2410.18248	null
2024-10-23	POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference	Aditya K Kamath et.al.	2410.18038	null
2024-12-29	AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning	Yehonathan Refael et.al.	2410.17881	null
2024-10-22	FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs	Haoran Lin et.al.	2410.16663	null
2024-10-22	Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency	Prafulla Kumar Choubey et.al.	2410.16597	null
2024-12-18	MagicPIG: LSH Sampling for Efficient LLM Generation	Zhuoming Chen et.al.	2410.16179	link
2024-10-21	Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuning	Arijit Das et.al.	2410.16029	link
2024-10-21	RAC: Efficient LLM Factuality Correction with Retrieval Augmentation	Changmao Li et.al.	2410.15667	link
2024-10-21	Bayesian Concept Bottleneck Models with LLM Priors	Jean Feng et.al.	2410.15555	link
2024-10-20	CompAct: Compressed Activations for Memory-Efficient LLM Training	Yara Shamshoum et.al.	2410.15352	null
2024-10-20	EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models	Junhao Hu et.al.	2410.15332	null
2024-10-19	IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System	Minseok Seo et.al.	2410.15008	null
2024-10-23	Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching	Jie Peng et.al.	2410.14740	null
2024-10-18	A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference	You Wu et.al.	2410.14442	link
2024-10-18	Revisiting SLO and Goodput Metrics in LLM Serving	Zhibin Wang et.al.	2410.14257	null
2024-10-18	Leveraging Large Language Models for Enhancing Public Transit Services	Jiahao Wang et.al.	2410.14147	null
2024-10-17	RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs	Jiatan Huang et.al.	2410.13987	null
2024-11-07	Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs	Tianyu Guo et.al.	2410.13835	link
2024-10-17	Progressive Mixed-Precision Decoding for Efficient LLM Inference	Hao Mark Chen et.al.	2410.13461	null
2024-10-17	Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning	Minseok Choi et.al.	2410.13274	null
2024-10-17	Data Defenses Against Large Language Models	William Agnew et.al.	2410.13138	link
2024-10-19	In-context KV-Cache Eviction for LLMs via Attention-Gate	Zihao Zeng et.al.	2410.12876	null
2024-10-10	RecurFormer: Not All Transformer Heads Need Self-Attention	Ruiqing Yan et.al.	2410.12850	null
2024-10-16	COMET: Towards Partical W4A4KV4 LLMs Serving	Lian Liu et.al.	2410.12168	null
2024-10-16	Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning	Huiwen Wu et.al.	2410.12130	null
2024-10-15	Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix	Yingyu Liang et.al.	2410.11261	null
2024-10-06	Continuous Approximations for Improving Quantization Aware Training of LLMs	He Li et.al.	2410.10849	null
2024-10-14	DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads	Guangxuan Xiao et.al.	2410.10819	link
2024-10-16	SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization	Akrit Mudvari et.al.	2410.10759	null
2024-10-12	Power-Softmax: Towards Secure LLM Inference over Encrypted Data	Itamar Zimerman et.al.	2410.09457	null
2024-10-11	Large Language Models for Energy-Efficient Code: Emerging Results and Future Directions	Huiyun Peng et.al.	2410.09241	null
2024-10-11	SubZero: Random Subspace Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning	Ziming Yu et.al.	2410.08989	link
2024-12-03	HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework	Yinuo Ren et.al.	2410.08316	null
2024-10-14	Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining	Tianyi Bai et.al.	2410.08102	link
2024-10-09	SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration	Heming Xia et.al.	2410.06916	link
2024-10-08	Active Evaluation Acquisition for Efficient LLM Benchmarking	Yang Li et.al.	2410.05952	null
2024-10-08	Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space	Zhonghan Chen et.al.	2410.05752	null
2024-10-08	ParallelSpec: Parallel Drafter for Efficient Speculative Decoding	Zilin Xiao et.al.	2410.05589	null
2024-10-07	Fast State Restoration in LLM Serving with HCache	Shiwei Gao et.al.	2410.05004	null
2024-10-06	RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference	Yige Xu et.al.	2410.04519	link
2025-01-23	Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective	Jinhao Li et.al.	2410.04466	null
2024-12-05	SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation	Aurick Qiao et.al.	2410.03960	null
2024-10-04	LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity	Selim Furkan Tekin et.al.	2410.03953	link
2024-10-04	EXAQ: Exponent Aware Quantization For LLMs Acceleration	Moran Shkolnik et.al.	2410.03185	link
2024-10-04	UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference	Jing Xiong et.al.	2410.03090	null
2024-10-03	LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences	Zhenxiao Fu et.al.	2410.02950	null
2024-10-03	Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration	Yun Qu et.al.	2410.02511	link
2024-10-03	LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services	Małgorzata Łazuka et.al.	2410.02425	link
2024-10-04	Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation	Xiaoqun Liu et.al.	2410.02220	null
2024-10-05	Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models	Yinhong Liu et.al.	2410.02205	null
2024-10-02	Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads	Yuxiang Huang et.al.	2410.01805	link
2024-10-02	ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving	Yifan Qiao et.al.	2410.01228	null
2024-10-01	TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices	Zonghang Li et.al.	2410.00531	link
2024-10-09	LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management	Yi Xiong et.al.	2410.00428	null
2024-11-06	The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems	Linke Song et.al.	2409.20002	null
2024-09-28	SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models	Yi Wu et.al.	2409.19471	null
2024-11-28	Confidential Prompting: Protecting User Prompts from Cloud LLM Providers	In Gim et.al.	2409.19134	link
2024-09-26	Control Industrial Automation System with Large Language Models	Yuchen Xia et.al.	2409.18009	link
2024-10-18	Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores	Shaobo Ma et.al.	2409.17870	null
2024-09-25	Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction	Zhenmei Shi et.al.	2409.17422	link
2025-06-23	Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations	Amey Agrawal et.al.	2409.17264	null
2024-09-25	Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale	Fan Zhou et.al.	2409.17115	link
2024-09-25	Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference	Zongyue Qin et.al.	2409.16560	null
2024-10-21	AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization	Yifan Tan et.al.	2409.16546	link
2024-11-07	Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines	Lei Gao et.al.	2409.15520	link
2024-10-29	Eagle: Efficient Training-Free Router for Multi-LLM Inference	Zesen Zhao et.al.	2409.15518	null
2024-10-03	Archon: An Architecture Search Framework for Inference-Time Techniques	Jon Saad-Falcon et.al.	2409.15254	link
2024-09-23	CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts	Zeyu Zhang et.al.	2409.15104	null
2024-09-25	UELLM: A Unified and Efficient Approach for LLM Inference Serving	Yiyuan He et.al.	2409.14961	null
2024-11-01	RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph	Lindsey Linxi Wei et.al.	2409.14556	null
2024-09-21	Practically implementing an LLM-supported collaborative vulnerability remediation process: a team-based approach	Xiaoqing Wang et.al.	2409.14058	null
2024-10-21	Do Large Language Models Need a Content Delivery Network?	Yihua Cheng et.al.	2409.13761	link
2024-09-19	PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)	Mahmoud Nazzal et.al.	2409.12699	link
2024-09-12	LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs	Han Xu et.al.	2409.11424	null
2024-09-04	ISO: Overlap of Computation and Communication within Seqenence For LLM Inference	Bin Xiao et.al.	2409.11155	null
2024-12-31	RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval	Di Liu et.al.	2409.10516	link
2024-09-12	Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat	Sidong Feng et.al.	2409.07829	null
2024-09-13	LLM-Enhanced Software Patch Localization	Jinhong Yu et.al.	2409.06816	null
2024-09-24	OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models	Jahyun Koo et.al.	2409.05902	null
2024-09-08	InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference	Xiurui Pan et.al.	2409.04992	null
2024-09-07	Achieving Peak Performance for Large Language Models: A Systematic Review	Zhyar Rzgar K Rostam et.al.	2409.04833	null
2024-09-06	Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance	Guanyu Lin et.al.	2409.04593	null
2024-09-06	A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage	Huan Yang et.al.	2409.04040	null
2024-11-05	Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study	Jianwei Zhu et.al.	2409.03992	null
2024-09-05	Sirius: Contextual Sparsity with Correction for Efficient LLMs	Yang Zhou et.al.	2409.03856	link
2024-08-31	HSF: Defending against Jailbreak Attacks with Hidden State Filtering	Cheng Qian et.al.	2409.03788	null
2024-12-11	Efficient Large Foundation Model Inference: A Perspective From Model and System Co-Design	Dong Liu et.al.	2409.01990	null
2024-09-03	Efficient LLM Context Distillation	Rajesh Upadhayayaya et.al.	2409.01930	null
2024-09-03	Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information	Xinyu Zhang et.al.	2409.01605	null
2024-09-02	CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification	Junhui He et.al.	2409.01366	null
2024-12-18	Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference	Barys Liskavets et.al.	2409.01227	null
2024-09-01	Research on LLM Acceleration Using the High-Performance RISC-V Processor “Xiangshan” (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product)	Xu-Hao Chen et.al.	2409.00661	null
2024-11-10	Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling	Guangya Wan et.al.	2408.17017	null
2024-08-28	Decentralized LLM Inference over Edge Networks with Energy Harvesting	Aria Khoshsirat et.al.	2408.15907	null
2024-08-28	Efficient LLM Scheduling by Learning to Rank	Yichao Fu et.al.	2408.15792	link
2024-08-28	Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation	Lujun Gui et.al.	2408.15562	null
2024-08-23	Memory-Efficient LLM Training with Online Subspace Descent	Kaizhao Liang et.al.	2408.12857	link
2024-08-22	NanoFlow: Towards Optimal Large Language Model Serving Throughput	Kan Zhu et.al.	2408.12757	link
2024-10-23	TensorOpera Router: A Multi-Model Router for Efficient LLM Inference	Dimitris Stripelis et.al.	2408.12320	null
2024-09-04	Parallel Speculative Decoding with Adaptive Draft Length	Tianyu Liu et.al.	2408.11850	link
2024-08-21	MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models	Elias Frantar et.al.	2408.11743	link
2024-08-23	Xinyu: An Efficient LLM-based System for Commentary Generation	Yiquan Wu et.al.	2408.11609	null
2024-08-21	Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning	Kai Xiong et.al.	2408.11431	null
2024-08-21	Image Score: Learning and Evaluating Human Preferences for Mercari Search	Chingis Oinar et.al.	2408.11349	null
2024-08-20	Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models	Artem Vazhentsev et.al.	2408.10692	null
2024-08-20	How Well Do Large Language Models Serve as End-to-End Secure Code Producers?	Jianian Gong et.al.	2408.10495	null
2024-09-29	GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making	Arsham Gholamzadeh Khoee et.al.	2408.09785	null
2024-08-19	PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars	Sumanth Prabhu et.al.	2408.08869	null
2024-08-23	ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models	Chao Zeng et.al.	2408.08554	link
2024-08-14	LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference	Seungjae Moon et.al.	2408.07326	null
2024-08-12	LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration	Zhiwen Mo et.al.	2408.06003	null
2024-08-16	Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion	Jacob K Christopher et.al.	2408.05636	null
2024-08-10	LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale	Jaehong Cho et.al.	2408.05499	link
2024-08-05	SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving	Andreas Kosmas Kakolyris et.al.	2408.05235	null
2024-09-14	Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness	Xiaojing Fan et.al.	2408.04585	null
2024-08-08	Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning	Ke Cheng et.al.	2408.04323	null
2024-08-07	Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference	Zeyu Zhang et.al.	2408.04107	null
2024-08-07	MPC-Minimized Secure LLM Inference	Deevashwer Rathee et.al.	2408.03561	null
2024-08-06	Can LLMs Serve As Time Series Anomaly Detectors?	Manqing Dong et.al.	2408.03475	null
2024-08-05	Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning	Hao Zhou et.al.	2408.02549	null
2024-08-02	The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines	Matias Martinez et.al.	2408.01050	null
2024-08-01	DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency	Jovan Stojkovic et.al.	2408.00741	null
2024-08-01	Designing Efficient LLM Accelerators for Edge Devices	Jude Haris et.al.	2408.00462	null
2024-08-01	Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control	Hao Zhou et.al.	2408.00214	null
2024-09-10	ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency	Yuhang Yao et.al.	2408.00008	null
2024-08-01	Responsive ML inference in multi-tenanted environments using AQUA	Abhishek Vijaya Kumar et.al.	2407.21255	null
2024-11-04	Palu: Compressing KV-Cache with Low-Rank Projection	Chi-Chih Chang et.al.	2407.21118	link
2024-07-30	Accelerating Large Language Model Inference with Self-Supervised Early Exits	Florian Valade et.al.	2407.21082	null
2024-10-03	ThinK: Thinner Key Cache by Query-Driven Pruning	Yuhui Xu et.al.	2407.21018	null
2024-07-25	An Efficient Inference Framework for Early-exit Large Language Models	Ruijie Miao et.al.	2407.20272	null
2024-07-29	Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost	Sania Nayab et.al.	2407.19825	null
2024-07-29	Teaching LLMs at Charles University: Assignments and Activities	Jindřich Helcl et.al.	2407.19798	null
2024-07-09	Mobile Edge Intelligence for Large Language Models: A Contemporary Survey	Guanqiao Qu et.al.	2407.18921	null
2024-07-04	The Price of Prompting: Profiling Energy Use in Large Language Models Inference	Erik Johannes Husom et.al.	2407.16893	link
2024-07-23	PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets	Jaeyoung Kim et.al.	2407.16329	null
2024-07-22	RazorAttention: Efficient KV Cache Compression Through Retrieval Heads	Hanlin Tang et.al.	2407.15891	null
2024-07-22	vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving	Jiale Xu et.al.	2407.15309	link
2024-07-20	All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks	Ajay Jaiswal et.al.	2407.14996	null
2024-07-19	LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference	Qichen Fu et.al.	2407.14057	null
2024-07-13	Beyond KV Caching: Shared Attention for Efficient LLMs	Bingli Liao et.al.	2407.12866	link
2025-04-01	PQCache: Product Quantization-based KVCache for Long Context LLM Inference	Hailin Zhang et.al.	2407.12820	null
2024-07-17	Struct-X: Enhancing Large Language Models Reasoning with Structured Data	Xiaoyu Tan et.al.	2407.12522	null
2024-07-17	LLM Inference Serving: Survey of Recent Advances and Opportunities	Baolin Li et.al.	2407.12391	null
2024-10-11	Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale	Ayush Kaushal et.al.	2407.12327	link
2024-11-16	PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	Branden Butler et.al.	2407.11798	null
2024-08-16	Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference	Yuan Feng et.al.	2407.11550	link
2024-07-15	Static Detection of Filesystem Vulnerabilities in Android Systems	Yu-Tsung Lee et.al.	2407.11279	null
2024-10-03	Fast Matrix Multiplications for Lookup Table-Quantized LLMs	Han Guo et.al.	2407.10960	link
2024-10-02	Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference	Zongyue Qin et.al.	2407.09722	null
2024-08-30	Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems	Amey Agrawal et.al.	2407.07000	link
2024-07-08	Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU	Daliang Xu et.al.	2407.05858	link
2024-07-07	A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length	Yuqing Yang et.al.	2407.05347	null
2024-07-06	Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning	Yun-Da Tsai et.al.	2407.05040	null
2024-11-16	Software-Hardware Co-Design For Embodied AI Robots	Yiyang Huang et.al.	2407.04292	link
2024-07-04	Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems	Grant Wilkins et.al.	2407.04014	null
2024-10-30	MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention	Huiqiang Jiang et.al.	2407.02490	link
2024-06-29	When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration	Philipp Allgeuer et.al.	2407.00518	link
2024-06-29	Teola: Towards End-to-End Optimization of LLM-based Applications	Xin Tan et.al.	2407.00326	null
2024-06-25	T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge	Jianyu Wei et.al.	2407.00088	link
2024-07-09	Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving	Ruoyu Qin et.al.	2407.00079	link
2024-06-28	InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management	Wonbeom Lee et.al.	2406.19707	null
2024-08-28	AI-native Memory: A Pathway from LLMs Towards AGI	Jingbo Shang et.al.	2406.18312	null
2024-06-25	FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model	Feijie Wu et.al.	2406.17706	link
2024-06-26	MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool	Cunchen Hu et.al.	2406.17565	null
2024-11-11	Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters	Euiin Yi et.al.	2406.16758	link
2025-05-16	Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models	Abhimanyu Bambhaniya et.al.	2406.01698	null
2025-05-02	QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving	Yujun Lin et.al.	2405.04532	link
2024-11-26	Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction	Haoran Qiu et.al.	2404.08509	null
2024-05-31	InferCept: Efficient Intercept Support for Augmented Large Language Model Inference	Reyna Abhyankar et.al.	2402.01869	null
2023-12-08	Efficient LLM Inference on CPUs	Haihao Shen et.al.	2311.00502	null
2024-04-02	SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification	Xupeng Miao et.al.	2305.09781	null

LLM Scheduling

Publish Date	Title	Authors	PDF	Code
2026-05-19	Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption	Mert Yildiz et.al.	2605.19593	null
2026-03-13	SageSched: Efficient LLM Scheduling Confronting Demand Uncertainty and Hybridity	Zhenghao Gan et.al.	2603.07917	null
2025-12-04	Counting Without Running: Evaluating LLMs’ Reasoning About Code Complexity	Gregory Bolet et.al.	2512.04355	null
2025-11-28	LegalWebAgent: Empowering Access to Justice via LLM-Based Web Agents	Jinzhe Tan et.al.	2512.04105	null
2025-12-03	AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving	Ying Wang et.al.	2512.04013	null
2025-12-02	PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing	Junyi Hou et.al.	2512.02589	null
2025-12-01	Trinity: Disaggregating Vector Search from Prefill-Decode Disaggregation in LLM Serving	Yi Liu et.al.	2512.02281	null
2025-12-01	RoMe: Row Granularity Access Memory System for Large Language Models	Hwayong Nam et.al.	2512.01541	null
2025-12-01	Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity	Wenbin Zhu et.al.	2512.01357	null
2025-12-01	Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding	Yilong Zhao et.al.	2512.01278	null
2025-11-30	Neural Variable Name Repair: Learning to Rename Identifiers for Readability	Muhammad Yousuf et.al.	2512.01141	null
2025-11-28	OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning	Zixun Huang et.al.	2511.23310	null
2025-11-28	Beyond Curve Fitting: Neuro-Symbolic Agents for Context-Aware Epidemic Forecasting	Joongwon Chae et.al.	2511.23276	null
2025-11-27	OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency	Jun Wang et.al.	2511.22481	null
2025-11-27	FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators	Shuao Jia et.al.	2511.22348	null
2025-11-27	Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation	Zehao Deng et.al.	2511.22235	null
2025-11-27	Optimizing NetGPT via Routing-Based Synergy and Reinforcement Learning	Yuxuan Chen et.al.	2511.22217	null
2025-11-26	OOCO: Latency-disaggregated Architecture for Online-Offline Co-locate LLM Serving	Siyu Wu et.al.	2511.21862	null
2025-12-01	DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving	Fengze Yu et.al.	2511.21669	null
2025-11-28	DOPO: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving	Junhan Liao et.al.	2511.20982	null
2025-11-26	Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows	Yinwei Dai et.al.	2511.20975	null
2025-11-25	Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios	Luohe Shi et.al.	2511.20340	null
2025-11-25	Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design	Zixiao Huang et.al.	2511.20048	null
2025-11-25	HiCoGen: Hierarchical Compositional Text-to-Image Generation in Diffusion Models via Reinforcement Learning	Hongji Yang et.al.	2511.19965	null
2025-11-24	Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution	Dingkang Liang et.al.	2511.19430	null
2025-11-24	How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining	Kairong Luo et.al.	2511.18903	null
2025-11-24	Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference	Wengyi Zhan et.al.	2511.18875	null
2025-11-23	Optimal Meal Schedule for a Local Nonprofit Using LLM-Aided Data Extraction	Sergio Marin et.al.	2511.18483	null
2025-11-28	Progressive Localisation in Localist LLMs	Joachim Diederich et.al.	2511.18375	null
2025-11-23	Hybrid Agentic AI and Multi-Agent Systems in Smart Manufacturing	Mojtaba A. Farahani et.al.	2511.18258	null
2025-11-22	Towards a General Framework for HTN Modeling with LLMs	Israel Puerta-Merino et.al.	2511.18165	null
2025-11-20	LLM4EO: Large Language Model for Evolutionary Optimization in Flexible Job Shop Scheduling	Rongjie Liao et.al.	2511.16485	null
2025-11-20	Operon: Incremental Construction of Ragged Data via Named Dimensions	Sungbin Moon et.al.	2511.16080	null
2025-11-19	MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping	Yushi Huang et.al.	2511.15690	null
2025-11-18	Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models	Rui Zhu et.al.	2511.14694	null
2025-11-23	Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning	Ruoyu Qin et.al.	2511.14617	null
2025-11-18	Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks	Mulei Ma et.al.	2511.14450	null
2025-11-17	The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training	Subramanyam Sahoo et.al.	2511.13016	null
2025-11-17	ENGRAM: Effective, Lightweight Memory Orchestration for Conversational Agents	Daivik Patel et.al.	2511.12960	null
2025-11-17	CoS: Towards Optimal Event Scheduling via Chain-of-Scheduling	Yiming Zhao et.al.	2511.12913	null
2025-11-19	Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms	Ao Xu et.al.	2511.11729	null
2025-11-05	AnchorTP: Resilient LLM Inference with State-Preserving Elastic Tensor Parallelism	Wendong Xu et.al.	2511.11617	null
2025-11-13	EEGAgent: A Unified Framework for Automated EEG Analysis Using Large Language Models	Sha Zhao et.al.	2511.09947	null
2025-11-12	AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting	Renda Li et.al.	2511.09478	null
2025-11-12	POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Text Translation	Xuanchen Li et.al.	2511.09232	null
2025-11-12	FLAD: Federated Learning for LLM-based Autonomous Driving in Vehicle-Edge-Cloud Networks	Tianao Xiang et.al.	2511.09025	null
2025-11-07	Motif 2 12.7B technical report	Junghwan Lim et.al.	2511.07464	null
2025-11-10	LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure	Jaehong Cho et.al.	2511.07229	null
2025-11-10	Can LLM Annotations Replace User Clicks for Learning to Rank?	Lulu Yu et.al.	2511.06635	null
2025-11-09	AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving	Ruifei Zhang et.al.	2511.06253	null
2025-11-08	CoEdge-RAG: Optimizing Hierarchical Scheduling for Retrieval-Augmented LLMs in Collaborative Edge Computing	Guihang Hong et.al.	2511.05915	null
2025-11-09	Optimal Inference Schedules for Masked Diffusion Models	Sitan Chen et.al.	2511.04647	null
2025-11-06	PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration	Yue Jiet Chong et.al.	2511.04036	null
2025-11-05	ALAS: Transactional and Dynamic Multi-Agent LLM Planning	Longling Geng et.al.	2511.03094	null
2025-11-04	LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context	Yudong Li et.al.	2511.02366	null
2025-11-04	An LLM-powered MILP modelling engine for workforce scheduling guided by expert knowledge	Qingyang Li et.al.	2511.02364	null
2025-11-04	Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live	Hanchen Li et.al.	2511.02230	null
2025-11-04	Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration	Jingbo Wang et.al.	2511.02200	null
2025-11-03	TPS-Bench: Evaluating AI Agents’ Tool Planning \& Scheduling Abilities in Compounding Tasks	Hanwen Xu et.al.	2511.01527	null
2025-11-03	Modular Task Decomposition and Dynamic Collaboration in Multi-Agent Systems Driven by Large Language Models	Shuaidong Pan et.al.	2511.01149	null
2025-11-05	FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs	Xuan He et.al.	2511.00807	null
2025-11-02	AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs	Ran Yan et.al.	2511.00796	null
2025-10-19	Justitia: Fair and Efficient Scheduling for LLM Applications	Mingyan Yang et.al.	2510.17015	null
2025-10-08	OptPipe: Memory- and Scheduling-Optimized Pipeline Parallelism for LLM Training	Hongpei Li et.al.	2510.05186	null
2025-08-14	Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling	Wei Da et.al.	2508.03611	null
2025-08-05	Optimal Scheduling Algorithms for LLM Inference: Theory and Practice	Agrim Bari et.al.	2508.01002	null
2025-09-16	InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching	Yilun Wang et.al.	2507.08523	null
2025-07-09	Gradientsys: A Multi-Agent LLM Scheduler with ReAct Orchestration	Xinyuan Song et.al.	2507.06520	null
2025-06-17	Semantic Scheduling for LLM Inference	Wenyue Hua et.al.	2506.12204	null
2025-05-29	Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters	Hayden Moore et.al.	2505.23554	null
2025-05-26	Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency	Ruixiao Li et.al.	2505.17074	null
2025-05-14	ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor	Seungbeom Choi et.al.	2505.09142	null
2025-04-25	Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents	Yueying Li et.al.	2504.07347	null
2025-04-08	LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications	Botao Zhu et.al.	2504.03444	null
2025-07-25	How do language models learn facts? Dynamics, curricula and hallucinations	Nicolas Zucchet et.al.	2503.21676	null
2025-05-21	Online Scheduling for LLM Inference with KV Cache Constraints	Patrick Jaillet et.al.	2502.07115	null
2025-11-06	LLM Query Scheduling with Prefix Reuse and Latency Constraints	Gregory Dexter et.al.	2502.04677	null
2024-11-01	ALISE: Accelerating Large Language Model Serving with Speculative Scheduling	Youpeng Zhao et.al.	2410.23537	null
2025-06-08	PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference	Zeyu Zhang et.al.	2409.15104	null
2024-08-28	Efficient LLM Scheduling by Learning to Rank	Yichao Fu et.al.	2408.15792	null
2024-11-15	Large Language Models for Power Scheduling: A User-Centric Approach	Thomas Mongaillard et.al.	2407.00476	null
2024-06-07	Llumnix: Dynamic Scheduling for Large Language Model Serving	Biao Sun et.al.	2406.03243	null
2024-05-24	PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services	Zheming Yang et.al.	2405.14636	null
2024-05-14	Automated Conversion of Static to Dynamic Scheduler via Natural Language	Paul Mingzheng Tang et.al.	2405.06697	null
2024-08-06	On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)	Vishal Pallagani et.al.	2401.02500	null
2023-05-30	Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline	Zangwei Zheng et.al.	2305.13144	null

MoE

Publish Date	Title	Authors	PDF	Code
2026-05-22	ETCHR: Editing To Clarify and Harness Reasoning	Beichen Zhang et.al.	2605.23897	null
2026-05-22	Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models	Hongwu Peng et.al.	2605.23893	null
2026-05-22	Training-Free Looped Transformers	Lizhang Chen et.al.	2605.23872	null
2026-05-22	Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer	Aratrika Mustafi et.al.	2605.23871	null
2026-05-22	HyperParallel-MoE: Multi-Core Interleaved Scheduling for Fast MoE Training on Ascend NPUs	Zewen Jin et.al.	2605.23764	null
2026-05-22	Semantically Structured Mixture-of-Experts for Compositional Robotic Manipulation	Chengyu Deng et.al.	2605.23477	null
2026-05-22	Learning Individual Dynamics from Sparse Cross-Sectional Snapshots	Christian Lagemann et.al.	2605.23470	null
2026-05-22	Parallel Context Compaction for Long-Horizon LLM Agent Serving	Musa Cim et.al.	2605.23296	null
2026-05-22	NASiC: 3D NAND-based CAM-Selected Multibit CIM Architecture for Efficient On-Device Mixture-of-Experts LLM Inference	Weikai Xu et.al.	2605.23294	null
2026-05-22	SpikingMoE: SDPrompt-Guided Dynamic Expert Fusion in Spiking Neural Networks	Yukai Yang et.al.	2605.23188	null
2026-05-22	GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences	Pengfei Song et.al.	2605.23183	null
2026-05-21	GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs	Jianing Deng et.al.	2605.23078	null
2026-05-21	FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection	Huanchi Wang et.al.	2605.22779	null
2026-05-21	Clipping Bottleneck: Stabilizing RLVR via Stochastic Recovery of Near-Boundary Signals	Shuo Yang et.al.	2605.22703	null
2026-05-21	Machine Learning Interatomic Potentials: Advancing Open-Source Software for Efficient and Scalable Molecular Simulation	Christoph Brunken et.al.	2605.22698	null
2026-05-21	MoSA: Motion-constrained Stress Adaptation for Mitigating Real-to-Sim Gap in Continuum Dynamics via Learning Residual Anisotropy	Jiaxu Wang et.al.	2605.22597	null
2026-05-21	Flow-based Gaussian Splatting for Continuous-Scale Remote Sensing Image Super-Resolution	Jiangwei Mo et.al.	2605.22147	null
2026-05-21	Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild	Mao Zheng et.al.	2605.22064	null
2026-05-21	EasyVFX: Frequency-Driven Decoupling for Resource-Efficient VFX Generation	Yue Ma et.al.	2605.22051	null
2026-05-21	Dynamic Mixture of Latent Memories for Self-Evolving Agents	Dianzhi Yu et.al.	2605.21951	null
2026-05-20	Partially isometric truncated and dual truncated Toeplitz operators	Kritika Babbar et.al.	2605.21555	null
2026-05-20	PALS: Power-Aware LLM Serving for Mixture-of-Experts Models	Can Hankendi et.al.	2605.21427	null
2026-05-20	FedCoE: Bridging Generalization and Personalization via Federated Coordinated Dual-level MoEs	Penglin Dai et.al.	2605.21264	null
2026-05-20	To Select or not to Select, that is the Question: Distilling Robot Skill Prediction into a Small Ensemble	Haechan Mark Bong et.al.	2605.21242	null
2026-05-20	RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis	Xuan Yang et.al.	2605.21237	null
2026-05-20	NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding	Jiefei Chen et.al.	2605.21100	null
2026-05-20	Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory	Bole Ma et.al.	2605.20982	null
2026-05-20	Memory Grafting: Scaling Language Model Pre-training via Offline Conditional Memory	Runxi Cheng et.al.	2605.20948	null
2026-05-20	Task-Routed Mixture-of-Experts with Cognitive Appraisal for Implicit Sentiment Analysis	Yaping Chai et.al.	2605.20916	null
2026-05-20	HDMoE: A Hierarchical Decoupling-Fusion Mixture-of-Experts Framework for Multimodal Cancer Survival Prediction	Huayi Wang et.al.	2605.20891	null
2026-05-20	Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series Forecasting	Jiawen Zhu et.al.	2605.20678	null
2026-05-20	LER-YOLO: Reliability-Aware Expert Routing for Misaligned RGB-Infrared UAV Detection	Liming Hou et.al.	2605.20667	link
2026-05-14	Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing	Ellwil Sharma et.al.	2605.15179	null
2026-05-14	Deep Mixture of Experts Network for Resource Optimization in Aerial-Terrestrial CF-mMIMO Systems under URLLC	Donggen Li et.al.	2605.15135	null
2026-05-14	An Interpretable Latency Model for Speculative Decoding in LLM Serving	Linghao Kong et.al.	2605.15051	null
2026-05-14	HiSem: Hierarchical Semantic Disentangling for Remote Sensing Image Change Captioning	Man Wang et.al.	2605.15024	null
2026-05-14	XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference	Thomas Witt et.al.	2605.14844	null
2026-05-14	UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars	Xiaoyu Zhan et.al.	2605.14731	null
2026-05-14	MultiEmo-Bench: Multi-label Visual Emotion Analysis for Multi-modal Large Language Models	Tianwei Chen et.al.	2605.14635	null
2026-05-14	BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE	Juntong Wu et.al.	2605.14438	null
2026-05-14	RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression	Zhengjia Zhong et.al.	2605.14359	null
2026-05-14	Herculean: An Agentic Benchmark for Financial Intelligence	Xueqing Peng et.al.	2605.14355	null
2026-05-14	MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification	Weisen Jiang et.al.	2605.14289	null
2026-05-14	EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization	Zhiye Song et.al.	2605.14249	null
2026-05-13	How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization	Leena Chennuru Vankadara et.al.	2605.14200	null
2026-05-13	PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts	Anjir Ahmed Chowdhury et.al.	2605.14055	null
2026-05-13	Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding	Shuoyang Sun et.al.	2605.14005	null
2026-05-13	HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts	Tao Zhong et.al.	2605.13997	null
2026-05-13	MinT: Managed Infrastructure for Training and Serving Millions of LLMs	Mind Lab et.al.	2605.13779	null
2026-05-13	Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching	Abdalrahman Wael et.al.	2605.13769	null
2026-05-13	Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models	Anuj Sadani et.al.	2605.13538	null
2026-05-13	Many-Shot CoT-ICL: Making In-Context Learning Truly Learn	Tsz Ting Chung et.al.	2605.13511	null
2026-05-12	SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture	Haiwen Diao et.al.	2605.12500	null
2026-05-12	Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts	Sagi Ahrac et.al.	2605.12476	null
2026-05-12	Geometric Asymptotics of Score Mixing and Guidance in Diffusion Models	Kang Liu et.al.	2605.12231	null
2026-05-12	ECTO: Exogenous-Conditioned Temporal Operator for Ultra-Short-Term Wind Power Forecasting	Cao Yuan et.al.	2605.12196	null
2026-05-12	Emergent Vortex Ordering in a Multiflavor Pyrochlore-Lattice Compound GeCo $_2$O$_4$	Jiajun Mo et.al.	2605.12042	null
2026-05-12	Towards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimization	Xu Chu et.al.	2605.11974	null
2026-05-12	From Trajectories to Phenotypes: Disease Progression as Structural Priors for Multi-organ Imaging Representation Learning	Zian Wang et.al.	2605.11958	null
2026-05-12	Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models	Boyi Deng et.al.	2605.11887	null
2026-05-12	Modulation Consistency-based Contrastive Learning for Self-Supervised Automatic Modulation Classification	Chenxu Wang et.al.	2605.11875	null
2026-05-12	ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems	Wenyong Zhou et.al.	2605.11800	null
2026-05-12	M $^4$ -SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection	Jiyuan Liu et.al.	2605.11760	null
2026-05-12	GW240925 and GW250207: Astrophysical Calibration of Gravitational-wave Detectors	The LIGO Scientific Collaboration et.al.	2605.11703	null
2026-05-12	Augmented Lagrangian Method for Last-Iterate Convergence for Constrained MDPs	Michael Lu et.al.	2605.11694	null
2026-05-12	Slicing and Dicing: Configuring Optimal Mixtures of Experts	Margaret Li et.al.	2605.11689	null
2026-05-12	Fast MoE Inference via Predictive Prefetching and Expert Replication	Ankit Jyothish et.al.	2605.11537	null
2026-05-12	Study of $φ\to K\bar{K}$ in the amplitude analysis of $D^{+}\to K_{S}^{0}K_{L}^{0}π^{+}$	BESIII Collaboration et.al.	2605.11464	null
2026-05-12	MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification	Bo Zheng et.al.	2605.11408	null
2026-05-11	Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models	Jungwoo Kim et.al.	2605.11277	null
2026-05-11	HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model	Noam Kayzer et.al.	2605.11255	null
2026-05-12	DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices	Chenyang Song et.al.	2605.10933	null
2026-05-08	NoiseGate: Learning Per-Latent Timestep Schedules as Information Gating in World Action Models	Wen Huang et.al.	2605.07794	null
2026-05-08	Benchmarking EngGPT2-16B-A3B against Comparable Italian and International Open-source LLMs	Andrea Sassella et.al.	2605.07731	null
2026-05-08	PathPainter: Transferring the Generalization Ability of Image Generation Models to Embodied Navigation	Yijin Wang et.al.	2605.07496	null
2026-05-08	DIMoE-Adapters: Dynamic Expert Evolution for Continual Learning in Vision-Language Models	Mengxin Qin et.al.	2605.07494	null
2026-05-08	Tracking Large-scale Shared Bikes with Inertial Motion Learning in GNSS Blocked Environments	Feng Liu et.al.	2605.07412	null
2026-05-08	MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference	Ruijie Zhou et.al.	2605.07363	null
2026-05-08	SoLAR: Error-Resilient Streamable Long-Horizon Free-Viewpoint Video Reconstruction with Anchor Activation and Latent Recalibration	Haotian Zhang et.al.	2605.07346	null
2026-05-08	When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models	Youngsik Yoon et.al.	2605.07260	null
2026-05-08	Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control	Ali Taghibakhshi et.al.	2605.07182	null
2026-05-08	*First Measurement of the $D_s^+\rightarrow K^{}(892)^0μ^+ν_μ$ Decay, Study of Dynamics and Test of Lepton Universality with $D_s^+\rightarrow K^{}(892)^0\ell^+ν_{\ell}$ Decays*	BESIII Collaboration et.al.	2605.07176	null
2026-05-08	ModelLens: Finding the Best for Your Task from Myriads of Models	Rui Cai et.al.	2605.07075	null
2026-05-07	Disentangling bulk and surface electronic structure using targeted cleave planes in RuO $_2$	Maria H. Visscher et.al.	2605.06798	null
2026-05-07	Measurement of the Absolute Branching Fraction of Xi(1530)^{-} to (Xi pi)^{-} and Updated Measurement of the Branching Fraction of psi(3686) to anti-Xi^{+} Xi(1530)^{-} + c.c	BESIII Collaboration et.al.	2605.06753	null
2026-05-07	UniPool: A Globally Shared Expert Pool for Mixture-of-Experts	Minbin Huang et.al.	2605.06665	null
2026-05-07	EMO: Pretraining Mixture of Experts for Emergent Modularity	Ryan Wang et.al.	2605.06663	null
2026-05-07	Efficient Pre-Training with Token Superposition	Bowen Peng et.al.	2605.06546	null
2026-05-07	Scene-Adaptive Continual Learning for CSI-based Human Activity Recognition with Mixture of Experts	Wenhan Zheng et.al.	2605.06447	null
2026-05-07	MiA-Signature: Approximating Global Activation for Long-Context Understanding	Yuqing Li et.al.	2605.06416	null
2026-05-07	*E = TH/(O+B): A Dimensionless Control Parameter for Mixture-of-Experts Ecology**	Qingjun Zhang et.al.	2605.06415	null
2026-05-07	Federation of Experts: Communication Efficient Distributed Inference for Large Language Models	Muhammad Shahir Abdurrahman et.al.	2605.06206	null
2026-05-07	Dynamic Pondering Sparsity-aware Mixture-of-Experts Transformer for Event Stream based Visual Object Tracking	Shiao Wang et.al.	2605.06112	null
2026-05-07	Normalized Architectures are Natively 4-Bit	Maxim Fishman et.al.	2605.06067	null
2026-05-07	Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend	Tianlun Hu et.al.	2605.06055	null
2026-05-07	Verifiable Model-Free Safety Filters via Reinforcement Learning	Bihui Yin et.al.	2605.05989	null
2026-05-07	VisMMOE: Exploiting Visual-Expert Affinity for Efficient Visual-Language MoE Offloading	Cheng Xu et.al.	2605.05899	null
2026-05-07	MTL-MAD: Multi-Task Learners are Effective Medical Anomaly Detectors	Bogdan Alexandru Bercean et.al.	2605.05891	null
2026-05-07	MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems	Zhuoshan Zhou et.al.	2605.05888	null
2026-05-07	Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving	Bole Ma et.al.	2605.05696	null
2026-05-07	Saliency-Aware Regularized Quantization Calibration for Large Language Models	Yanlong Zhao et.al.	2605.05693	null
2026-05-07	Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning	Bing Wang et.al.	2605.05676	null
2026-05-07	Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs	Qijun Zhang et.al.	2605.05607	null
2026-05-07	A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction	Nicole Lincoln et.al.	2605.05532	null
2026-05-06	Searches for Binary Mergers with Sub-solar Mass Components in Data from the First Part of LIGO–Virgo–KAGRA’s Fourth Observing Run	The LIGO Scientific Collaboration et.al.	2605.05444	null
2026-05-06	Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation	Enhui Chai et.al.	2605.05164	null
2026-05-06	Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism	Sajal Dash et.al.	2605.05049	null
2026-05-06	You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation	Marco Arazzi et.al.	2605.04992	null
2026-05-06	Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts	Klaus-Rudolf Kladny et.al.	2605.04952	null
2026-05-06	Phase-Time Array Enabled Multistatic Sensing with Multi-Level Fusion for UAV Localization	Ming Gao et.al.	2605.04919	null
2026-05-06	Measurement of the double Dalitz decay $η\to e^+e^-e^+e^-$	BESIII Collaboration et.al.	2605.04898	null
2026-05-06	OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches	Pietro Bonazzi et.al.	2605.04791	null
2026-05-06	AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures	Omkar B Shende et.al.	2605.04754	null
2026-05-06	SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning	Lirui Luo et.al.	2605.04712	null
2026-05-06	YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts	Zesen Wang et.al.	2605.04528	null
2026-05-06	GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking	Ziqi Zhu et.al.	2605.04449	null
2026-05-06	Misrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMs	Zekun Fei et.al.	2605.04446	null
2026-05-06	Autonomous Laparoscope Control through Unified Mechanics-Based Representation of Multimodal Intraoperative Information	Xiaojian Li et.al.	2605.04408	null
2026-05-05	2D Optical Beam Scanning using Integrated Acousto-Optics and a Frequency Comb	Shucheng Fang et.al.	2605.04287	null
2026-05-05	RD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Prediction	Renjie He et.al.	2605.03999	null
2026-05-05	MiniMind-O Technical Report: An Open Small-Scale Speech-Native Omni Model	Jingyao Gong et.al.	2605.03937	null
2026-05-05	Unified Multimodal Visual Tracking with Dual Mixture-of-Experts	Lingyi Hong et.al.	2605.03716	null
2026-05-05	Identification and characterization of 15265 super-Nyquist frequencies in 1309 δ Scuti stars from Kepler photometry	Yanqi Mo et.al.	2605.03502	null
2026-05-06	Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts	Hahyeon Choi et.al.	2605.03348	null
2026-05-05	Symmetry-Protected Lyapunov Neutral Modes in Equivariant Recurrent Networks	Hanson Hanxuan Mo et.al.	2605.03338	null
2026-05-04	Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE	Yangming Shi et.al.	2605.02641	null
2026-05-04	Universal Smoothness via Bernstein Polynomials: A Constructive Approximation Approach for Activation Functions	Wentao Zhang et.al.	2605.02591	null
2026-05-04	M\textsuperscript{4}Fuse: Lightweight State-Space MoE with a Cross-Scale Gating Bridge for Brain Tumor Segmentation	Meihua Zhou et.al.	2605.02444	null
2026-05-04	Boundary Mass and the Soft-to-Hard Limit in Mixture-of-Experts	Reza Rastegar et.al.	2605.02124	null
2026-05-03	Shifted asymmetric Laplace mixtures of experts	Sphiwe B. Skhosana et.al.	2605.02012	null
2026-05-03	Flexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning for Speech and Reasoning Tasks	Zongqian Li et.al.	2605.01959	null
2026-05-03	Training Non-Differentiable Networks via Optimal Transport	An T. Le et.al.	2605.01928	null
2026-05-02	LiveFMBench: Unveiling the Power and Limits of Agentic Workflows in Specification Generation	Dong Xu et.al.	2605.01394	null
2026-05-01	PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning	Beining Wu et.al.	2605.01061	null
2026-05-01	GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer	Xinyuan Zhao et.al.	2605.00799	null
2026-05-01	Eliminating Hidden Serialization in Multi-Node Megakernel Communication	Byungsoo Oh et.al.	2605.00686	null
2026-05-01	Budget Constraints as Riemannian Manifolds	Michael Helcig et.al.	2605.00649	null
2026-05-01	Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts	Man Yung Wong et.al.	2605.00604	null
2026-05-01	Space Network of Experts: Architecture and Expert Placement	Zhanwei Wang et.al.	2605.00515	null
2026-05-01	PrefMoE: Robust Preference Modeling with Mixture-of-Experts Reward Learning	Ziqin Yuan et.al.	2605.00384	null
2026-05-01	GaMMA: Towards Joint Global-Temporal Music Understanding in Large Multimodal Models	Zuyao You et.al.	2605.00371	null
2026-05-01	Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding	Lehan Pan et.al.	2605.00342	null
2026-04-30	Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving	Junsun Choi et.al.	2605.00254	null
2026-05-01	Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL	Sudong Wang et.al.	2604.28123	null
2026-04-30	Language Models Refine Mechanical Linkage Designs Through Symbolic Reflection and Modular Optimisation	João Pedro Gandarela et.al.	2604.27962	null
2026-04-30	Prediction-powered Inference by Mixture of Experts	Yanwu Gu et.al.	2604.27892	null
2026-04-30	ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training	Wenxiang Lin et.al.	2604.27844	null
2026-04-30	MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks	Jona te Lintelo et.al.	2604.27818	null
2026-04-30	EdgeFM: Efficient Edge Inference for Vision-Language Models	Mengling Deng et.al.	2604.27476	null
2026-04-30	DeepPropNet: an operator learning-based predictor for thermal plasma properties	Zuo Wang et.al.	2604.27298	null
2026-04-29	First-Principles Thermodynamic Analysis of Ternary Chalcogenide Phase Change Materials	Felix Adams et.al.	2604.27120	null
2026-04-29	Observation of a Doubly-strange Hyperon $Ξ(1720)$ in $J/ψ\rightarrow{}K^{-}Σ^0\barΞ^{+}+c.c.$	BESIII Collaboration et.al.	2604.27028	null
2026-04-29	Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models	Gongbo Zhang et.al.	2604.26951	null
2026-04-29	FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving	Minghe Wang et.al.	2604.26881	null
2026-04-29	Human-in-the-Loop Benchmarking of Heterogeneous LLMs for Automated Competency Assessment in Secondary Level Mathematics	Jatin Bhusal et.al.	2604.26607	null
2026-04-29	Advancing multi-site emission control: A physics-informed transfer learning framework with mixture of experts for carbon-pollutant synergy	Yuxuan Ying et.al.	2604.26571	null
2026-04-29	Topology-Aware Representation Alignment for Semi-Supervised Vision-Language Learning	Junwon You et.al.	2604.26370	null
2026-04-29	Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning	Weihang Li et.al.	2604.26340	null
2026-04-29	Efficient, VRAM-Constrained xLM Inference on Clients	Aditya Ukarande et.al.	2604.26334	null
2026-04-29	Optimizing Tracking Accuracy in Energy-Constrained Multimodal ISAC via Lyapunov-Driven Heterogeneous Mixture-of-Experts	Wenqi Fan et.al.	2604.26330	null
2026-04-29	Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference	Vasu Shyam et.al.	2604.26294	null
2026-04-29	Semantic Foam: Unifying Spatial and Semantic Scene Decomposition	Amr Sharafeldin et.al.	2604.26262	null
2026-04-28	Mixture of Experts Framework in Machine Learning Interatomic Potentials for Atomistic Simulations	Gabriel de Miranda Nascimento et.al.	2604.26143	null
2026-04-28	Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling	Fan Jiang et.al.	2604.25578	null
2026-04-28	The Attention Market: Interpreting Online Fair Re-ranking as Manifold Optimization under Walrasian Equilibrium	Chen Xu et.al.	2604.25577	null
2026-04-28	SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton	Xuzheng He et.al.	2604.25498	null
2026-04-28	The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents	Yuwei Sun et.al.	2604.25299	null
2026-04-28	CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation	Rui Qi et.al.	2604.25182	null
2026-04-27	Power Foam: Unifying Real-Time Differentiable Ray Tracing and Rasterization	Shrisudhan Govindarajan et.al.	2604.24994	null
2026-04-27	Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity	Bojie Li et.al.	2604.24827	null
2026-04-27	SWE-QA: A Dataset and Benchmark for Complex Code Understanding	Laïla Elkoussy et.al.	2604.24814	null
2026-04-28	Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations	Zhengru Fang et.al.	2604.24661	null
2026-04-27	Cortex-Inspired Continual Learning: Unsupervised Instantiation and Recovery of Functional Task Networks	Kevin McKee et.al.	2604.24637	null
2026-04-27	Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models	Yuxing Tian et.al.	2604.24608	null
2026-04-27	Vib2Conf: AI-driven discrimination of molecular conformations from vibrational spectra	Xin-Yu Lu et.al.	2604.24310	null
2026-04-27	SVOM/C-GFT: Instrumentation and Performances on the SVOM Alerts	Chao Wu et.al.	2604.24272	null
2026-04-27	SVOM/VT: On-ground processing of VT-VHF data	Chao Wu et.al.	2604.24271	null
2026-04-27	SVOM/VT: Overview of data processing and GRB identifications with X-band data	Hua-Li Li et.al.	2604.24266	null
2026-04-27	SVOM Science User Support Services at Chinese Science Center	Xu-hui Han et.al.	2604.24251	null
2026-04-27	Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing	Kaisheng Fan et.al.	2604.24162	null
2026-04-27	SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs	Zi-Hao Bo et.al.	2604.23996	null
2026-04-27	LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models	Rinyoichi Takezoe et.al.	2604.23950	null
2026-04-26	AMAVA: Adaptive Motion-Aware Video-to-Audio Framework for Visually-Impaired Assistance	Benjamin Klein et.al.	2604.23909	null
2026-04-24	Synchrotron polarization of anisotropic electron distribution in GRB prompt emission	Kang-Fa Cheng et.al.	2604.22598	null
2026-04-24	Towards Adaptive Continual Model Merging via Manifold-Aware Expert Evolution	Haiyun Qiu et.al.	2604.22464	null
2026-04-24	The Cathaya argyrophylla Genome Reveals the Evolutionary Trade-offs of a Living Fossil	Yun Wang et.al.	2604.22440	null
2026-04-24	QAssemble: A Pure Python Package for Quantum Many-Body Theory	Seongjun Mo et.al.	2604.22223	null
2026-04-23	Direct observation of surface bandgap shrinkage and negative electronic compressibility in SrTiO3	Warakorn Jindata et.al.	2604.21783	null
2026-04-23	Rethinking Cross-Domain Evaluation for Face Forgery Detection with Semantic Fine-grained Alignment and Mixture-of-Experts	Yuhan Luo et.al.	2604.21478	null
2026-04-23	Decoupled DiLoCo for Resilient Distributed Pre-training	Arthur Douillard et.al.	2604.21428	null
2026-04-23	Teacher-Guided Routing for Sparse Vision Mixture-of-Experts	Masahiro Kada et.al.	2604.21330	null
2026-04-23	Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation	Minping Chen et.al.	2604.21264	null
2026-04-22	LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model	Inclusion AI et.al.	2604.20796	null
2026-04-22	On Bayesian Softmax-Gated Mixture-of-Experts Models	Nicola Bariletto et.al.	2604.20551	null
2026-04-22	XRF 241001A/SN 2024aiiq: A Faint Soft X-ray Transient Detected by SVOM with a Broad-Line Type Ic Supernova Revealed by JWST	B. Schneider et.al.	2604.20346	null
2026-04-22	MD-Face: MoE-Enhanced Label-Free Disentangled Representation for Interactive Facial Attribute Editing	Xuan Cui et.al.	2604.20317	null
2026-04-22	Multi-Perspective Evidence Synthesis and Reasoning for Unsupervised Multimodal Entity Linking	Mo Zhou et.al.	2604.20283	null
2026-04-22	All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG	Dan Wang et.al.	2604.20199	null
2026-04-22	Aligning Human-AI-Interaction Trust for Mental Health Support: Survey and Position for Multi-Stakeholders	Xin Sun et.al.	2604.20166	null
2026-04-22	Temporally Extended Mixture-of-Experts Models	Zeyu Shen et.al.	2604.20156	null
2026-04-21	Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts	Chaitanya Dwivedi et.al.	2604.19835	null
2026-04-21	FEPLB: Exploiting Copy Engines for Nearly Free MoE Load Balancing in Distributed Training	Shuyao Qi et.al.	2604.19654	null
2026-04-21	CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation	Xiangyang Luo et.al.	2604.19636	null
2026-04-21	LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction	Jiakai Tang et.al.	2604.19550	null
2026-04-22	ReaLB: Real-Time Load Balancing for Multimodal MoE Inference	Yingping Wang et.al.	2604.19503	null
2026-04-21	Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input	Michael Ziegltrum et.al.	2604.19344	null
2026-04-21	UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training	Size Zheng et.al.	2604.19241	null
2026-04-21	SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning	Boyan Shi et.al.	2604.19048	null
2026-04-21	STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation	Shuyuan Zhao et.al.	2604.19042	null
2026-04-20	Multi-Domain Learning with Global Expert Mapping	Pourya Shamsolmoali et.al.	2604.18842	null
2026-04-20	Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs	Afsara Benazir et.al.	2604.18788	null
2026-04-20	CAHAL: Clinically Applicable resolution enHAncement for Low-resolution MRI scans	Sergio Morell-Ortega et.al.	2604.18781	null
2026-04-20	RASP-Tuner: Retrieval-Augmented Soft Prompts for Context-Aware Black-Box Optimization in Non-Stationary Environments	Enze Pan et.al.	2604.18026	null
2026-04-20	MU-GeNeRF: Multi-view Uncertainty-guided Generalizable Neural Radiance Fields for Distractor-aware Scene	Wenjie Mu et.al.	2604.17965	null
2026-04-20	Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs	Charles Ye et.al.	2604.17837	null
2026-04-20	MoE-nD: Per-Layer Mixture-of-Experts Routing for Multi-Axis KV Cache Compression	Libo Sun et.al.	2604.17695	null
2026-04-20	A Hamilton-Jacobi Reachability-Guided Search Framework for Efficient and Safe Indoor Planar Robot Navigation	Hanyang Hu et.al.	2604.17679	null
2026-04-19	Representation-Guided Parameter-Efficient LLM Unlearning	Zeguan Xiao et.al.	2604.17396	null
2026-04-19	When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models	Cui Yakun et.al.	2604.17375	null
2026-04-19	Poisson Flow Model of Cortical Folding Pattern	Moo K. Chung et.al.	2604.17291	null
2026-04-19	From Language to Action: Enhancing LLM Task Efficiency with Task-Aware MCP Server Recommendation	Shiyu He et.al.	2604.17234	null
2026-04-19	Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models – A Research Agenda	Minxian Xu et.al.	2604.17227	null
2026-04-19	Layer-wise MoE Routing Locality under Shared-Prefix Code Generation: Token-Identity Decomposition and Compile-Equivalent Fork Redundancy	Shun-ichiro Hayashi et.al.	2604.17182	null
2026-04-18	Causality as a Minimum Energy Principle	Moo K. Chung et.al.	2604.17151	null
2026-04-18	IMA-MoE: An Interpretable Modality-Aware Mixture-of-Experts Framework for Characterizing the Neurobiological Signatures of Binge Eating Disorder	Lin Zhao et.al.	2604.17028	null
2026-04-18	D-QRELO: Training- and Data-Free Delta Compression for Large Language Models via Quantization and Residual Low-Rank Approximation	Junlin Li et.al.	2604.16940	null
2026-04-18	CoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question Answering	Xiyin Zeng et.al.	2604.16930	null
2026-04-17	Towards Trustworthy Depression Estimation via Disentangled Evidential Learning	Fangyuan Liu et.al.	2604.16579	null
2026-04-17	FL-MHSM: Spatially-adaptive Fusion and Ensemble Learning for Flood-Landslide Multi-Hazard Susceptibility Mapping at Regional Scale	Aswathi Mundayatt et.al.	2604.16265	null
2026-04-17	Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization	Habibeh Naderi et.al.	2604.16247	null
2026-04-17	MOMENTA: Mixture-of-Experts Over Multimodal Embeddings with Neural Temporal Aggregation for Misinformation Detection	Yeganeh Abdollahinejad et.al.	2604.16172	null
2026-04-17	Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials	Yuanchang Zhou et.al.	2604.15821	null
2026-04-16	OmniLight: One Model to Rule All Lighting Conditions	Youngjin Oh et.al.	2604.15170	null
2026-04-16	Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching	Aihua Li et.al.	2604.15009	null
2026-04-16	Switching Efficiency: A Novel Framework for Dissecting AI Data Center Network Efficiency	Niangen Ye et.al.	2604.14690	null
2026-04-16	ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving	Yuseon Choi et.al.	2604.14626	null
2026-04-16	WILD-SAM: Phase-Aware Expert Adaptation of SAM for Landslide Detection in Wrapped InSAR Interferograms	Yucheng Pan et.al.	2604.14540	null
2026-04-16	Geometric Metrics for MoE Specialization: From Fisher Information to Early Failure Detection	Dongxin Guo et.al.	2604.14500	null
2026-04-15	Geometric Routing Enables Causal Expert Control in Mixture of Experts	Ivan Ternovtsii et.al.	2604.14434	null
2026-04-15	Equifinality in Mixture of Experts: Routing Topology Does Not Determine Language Modeling Quality	Ivan Ternovtsii et.al.	2604.14419	null
2026-04-15	Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations	Wentao Hu et.al.	2604.14246	null
2026-04-15	Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation	Svetlana Pavlitska et.al.	2604.13761	null
2026-04-15	Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling	Sanghyeok Chu et.al.	2604.13508	null
2026-04-15	Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning	Shentong Mo et.al.	2604.13504	null
2026-04-14	PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models	Han Bao et.al.	2604.12995	null
2026-04-14	Tree Learning: A Multi-Skill Continual Learning Framework for Humanoid Robots	Yifei Yan et.al.	2604.12909	null
2026-04-14	Stable Fine-Time-Step Long-Horizon Turbulence Prediction with a Multi-Stepsize Mixture-of-Experts Neural Operator	Guanyu Pan et.al.	2604.12794	null
2026-04-14	AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition	Zeheng Wang et.al.	2604.12735	null
2026-04-14	Brain-DiT: A Universal Multi-state fMRI Foundation Model with Metadata-Conditioned Pretraining	Junfeng Xia et.al.	2604.12683	null
2026-04-15	Observation of the Exotic State $π_{1}(1600)$ in $ψ(2S)\rightarrowγχ_{c1},χ_{c1}\rightarrowπ^{+}π^{-}η’$	BESIII Collaboration et.al.	2604.12524	null
2026-04-14	SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker	Junbin Su et.al.	2604.12502	link
2026-04-14	Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning	NVIDIA et.al.	2604.12374	null
2026-04-14	Nucleus-Image: Sparse MoE for Image Generation	Chandan Akiti et.al.	2604.12163	null
2026-04-13	TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction	Seungik Cho et.al.	2604.12026	null
2026-04-14	Relax: An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale	Liujie Zhang et.al.	2604.11554	null
2026-04-13	Learning How Much to Think: Difficulty-Aware Dynamic MoEs for Graph Node Classification	Jiajun Zhou et.al.	2604.11473	null
2026-04-14	Judge Like Human Examiners: A Weighted Importance Multi-Point Evaluation Framework for Generative Tasks with Long-form Answers	Guoxin Yu et.al.	2604.11246	null
2026-04-13	Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE	Wei Bao et.al.	2604.11140	null
2026-04-13	Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds	Pierre Jourlin et.al.	2604.11104	null
2026-04-13	Quantitative propagation of chaos for particle systems with bounded kernels and multiplicative noise	Ning Jiang et.al.	2604.11084	null
2026-04-12	MoEITS: A Green AI approach for simplifying MoE-LLMs	Luis Balderas et.al.	2604.10603	null
2026-04-12	WaveMoE: A Wavelet-Enhanced Mixture-of-Experts Foundation Model for Time Series Forecasting	Shunyu Wu et.al.	2604.10544	null
2026-04-12	Measurement of the branching fractions of $χ_{cJ} \to π^{+}π^{-}π^{0}π^{0}$ via $ψ(3686) \to γχ_{cJ}$	BESIII Collaboration et.al.	2604.10523	null
2026-04-12	How Many Tries Does It Take? Iterative Self-Repair in LLM Code Generation Across Model Scales and Benchmarks	Johin Johny Arimbur et.al.	2604.10508	null
2026-04-12	CodeQuant: Unified Clustering and Quantization for Enhanced Outlier Smoothing in Low-Precision Mixture-of-Experts	Xiangyang Yin et.al.	2604.10496	null
2026-04-12	First Observation of \boldmath{ $D^+ \to a_0(980)ρ$ and $D^+ \to a_0(980)^+ f_0(500)$} in \boldmath{$D^+ \to π^+π^+π^-η$ and $D^+ \to π^+π^0π^0η$ } Decays	BESIII Collaboration et.al.	2604.10444	null
2026-04-11	DREAMuS: Dark matter REsearch with Advanced Muon Source	Xiang Chen et.al.	2604.10257	null
2026-04-11	Adapting 2D Multi-Modal Large Language Model for 3D CT Image Analysis	Yang Yu et.al.	2604.10233	null
2026-04-11	SpecMoE: A Fast and Efficient Mixture-of-Experts Inference via Self-Assisted Speculative Decoding	Jehyeon Bang et.al.	2604.10152	null
2026-04-10	The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise	Xi Wang et.al.	2604.09780	null
2026-04-10	SafeMind: A Risk-Aware Differentiable Control Framework for Adaptive and Safe Quadruped Locomotion	Zukun Zhang et.al.	2604.09474	null
2026-04-10	Compositional-Degradation UAV Image Restoration: Conditional Decoupled MoE Network and A Benchmark	Jinquan Yan et.al.	2604.09313	null
2026-04-10	Generalization and Scaling Laws for Mixture-of-Experts Transformers	Mansour Zoubeirou a Mayaki et.al.	2604.09175	null
2026-04-10	Text-Conditioned Multi-Expert Regression Framework for Fully Automated Multi-Abutment Design	Mianjie Zheng et.al.	2604.09047	null
2026-04-09	Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts	Haolei Xu et.al.	2604.08541	null
2026-04-09	Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification	Xun Zhu et.al.	2604.08333	null
2026-04-09	Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models	Weiwei Qi et.al.	2604.08297	null
2026-04-09	SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection	You Hu et.al.	2604.08211	null
2026-04-09	Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference	Baihui Liu et.al.	2604.08133	null
2026-04-09	Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator	Luozheng Qin et.al.	2604.08121	null
2026-04-09	HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation	Shuanghao Bai et.al.	2604.07993	null
2026-04-09	QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training–Inference Mismatch	Hao Gu et.al.	2604.07853	null
2026-04-09	Lightweight LLM Agent Memory with Small Language Models	Jiaquan Zhang et.al.	2604.07798	null
2026-04-09	Symbiotic-MoE: Unlocking the Synergy between Generation and Understanding	Xiangyue Liu et.al.	2604.07753	null
2026-04-08	From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference	Ravindra Ganti et.al.	2604.07526	null
2026-04-08	SPAMoE: Spectrum-Aware Hybrid Operator Framework for Full-Waveform Inversion	Zhenyu Wang et.al.	2604.07421	null
2026-04-08	Region-Graph Optimal Transport Routing for Mixture-of-Experts Whole-Slide Image Classification	Xin Tian et.al.	2604.07298	null
2026-04-08	VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis	Jian Yu et.al.	2604.07210	null
2026-04-08	InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models	Hongyu Chen et.al.	2604.07173	null
2026-04-08	The Impact of Steering Large Language Models with Persona Vectors in Educational Applications	Yongchao Wu et.al.	2604.07102	null
2026-04-08	Gemma 4, Phi-4, and Qwen3: Accuracy-Efficiency Tradeoffs in Dense and MoE Reasoning Language Models	Md Motaleb Hossen Manik et.al.	2604.07035	null
2026-04-08	MoE Routing Testbed: Studying Expert Specialization and Routing Behavior at Small Scale	Tobias Falke et.al.	2604.07030	null
2026-04-08	Stress Estimation in Elderly Oncology Patients Using Visual Wearable Representations and Multi-Instance Learning	Ioannis Kyprakis et.al.	2604.06990	null
2026-04-08	MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization	Zhixiong Zhao et.al.	2604.06798	null
2026-04-08	HQF-Net: A Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation	Md Aminur Hossain et.al.	2604.06715	null
2026-04-08	Heterogeneous Mixture-of-Experts for Energy-Efficient Multimodal ISAC in Highly Mobile Networks	Wenqi Fan et.al.	2604.06697	null
2026-04-08	Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start	Xueshen Liu et.al.	2604.06664	null
2026-04-08	Short proofs in combinatorics, probability and number theory II	Boris Alexeev et.al.	2604.06609	null
2026-04-08	Does a Global Perspective Help Prune Sparse MoEs Elegantly?	Zeliang Zhang et.al.	2604.06542	null
2026-04-07	Soft-Quantum Algorithms	Basil Kyriacou et.al.	2604.06523	null
2026-04-07	Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees	Mohammed Nowaz Rabbani Chowdhury et.al.	2604.06515	null
2026-04-07	State-of-the-Art Arabic Language Modeling with Sparse MoE Fine-Tuning and Chain-of-Thought Distillation	Navan Preet Singh et.al.	2604.06421	null
2026-04-07	TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models	Lin Mu et.al.	2604.06291	null
2026-04-07	A Mixture of Experts Foundation Model for Scanning Electron Microscopy Image Analysis	Sk Miraj Ahmed et.al.	2604.05960	null
2026-04-07	Precise measurement of the CKM angle $γ$ with a novel approach	The BESIII et.al.	2604.05712	null
2026-04-08	QA-MoE: Towards a Continuous Reliability Spectrum with Quality-Aware Mixture of Experts for Robust Multimodal Sentiment Analysis	Yitong Zhu et.al.	2604.05704	null
2026-04-07	Measurement of the CKM angle $γ$ in $B^{\pm} \rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-})h^{\pm}$ decays with a novel approach	The BESIII et.al.	2604.05701	null
2026-04-07	A Unified Foundation Model for All-in-One Multi-Modal Remote Sensing Image Restoration and Fusion with Language Prompting	Yongchuan Cui et.al.	2604.05629	null
2026-04-07	From Pixels to Personas: Tracking the Evolution of Anime Characters	Rongze Liu et.al.	2604.05507	null
2026-04-07	Task Ecologies and the Evolution of World-Tracking Representations in Large Language Models	Giulio Valentino Dalla Riva et.al.	2604.05469	null
2026-04-07	Do Domain-specific Experts exist in MoE-based LLMs?	Giang Do et.al.	2604.05267	null
2026-04-06	HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection	Vadim Vashkelis et.al.	2604.04908	null
2026-04-06	LiveFact: A Dynamic, Time-Aware Benchmark for LLM-Driven Fake News Detection	Cheng Xu et.al.	2604.04815	null
2026-04-06	Galaxy Populations in Groups and Clusters: II. Conditional Luminosity Functions at Redshifts from z ~ 1 to z ~ 0	Ce Gao et.al.	2604.04794	null
2026-04-06	DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators	Zhiwen Mo et.al.	2604.04750	null
2026-04-06	Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale	Zhengcen Li et.al.	2604.04634	null
2026-04-06	Quantum-inspired Ising machine using sparsified spin connectivity	Moe Shimada et.al.	2604.04606	null
2026-04-06	REAM: Merging Improves Pruning of Experts in LLMs	Saurav Jha et.al.	2604.04356	null
2026-04-06	OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text	Weiguo Pian et.al.	2604.04348	null
2026-04-05	3D-Stacked NMP, LLM Decoding, Systolic Array Microarchitecture, Multi-Core Scheduling	Chenyang Ai et.al.	2604.04253	null
2026-04-05	Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training	Charafeddine Mouzouni et.al.	2604.04230	null
2026-04-05	SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection	Fenghao Song et.al.	2604.04127	null
2026-04-05	Bootstrap-Aggregated Method-of-Moments Estimation of the Copula Correlation Parameter for Marginal Survival Inference under Dependent Censoring	Hyun-Soo Zhang et.al.	2604.04032	null
2026-04-04	SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning	Hessen Bougueffa Eutamene et.al.	2604.03833	null
2026-04-04	Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning	Tianci Luo et.al.	2604.03657	null
2026-04-04	Unveiling Language Routing Isolation in Multilingual MoE Models for Interpretable Subnetwork Adaptation	Kening Zheng et.al.	2604.03592	null
2026-04-03	Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking	Haotian Xiang et.al.	2604.03404	null
2026-04-03	Mixture-of-Experts in Remote Sensing: A Survey	Yongchuan Cui et.al.	2604.03342	null
2026-04-03	CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator	Yuhan Pu et.al.	2604.03156	null
2026-04-03	JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency	Aichen Cai et.al.	2604.03044	null
2026-04-03	PolyReal: A Benchmark for Real-World Polymer Science Workflows	Wanhao Liu et.al.	2604.02934	null
2026-04-03	Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus	Shuai Wu et.al.	2604.02923	null
2026-04-03	Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration	Wachiravit Modecrua et.al.	2604.02869	null
2026-04-03	FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving	Qingxiu Liu et.al.	2604.02715	null
2026-04-03	V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views	Junwei You et.al.	2604.02710	null
2026-04-03	Adaptive Semantic Communication for Wireless Image Transmission Leveraging Mixture-of-Experts Mechanism	Haowen Wan et.al.	2604.02691	null
2026-04-02	The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level	Jeremy Herbst et.al.	2604.02178	null
2026-04-02	FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators	Chi Zhang et.al.	2604.02110	null
2026-04-02	SURE: Synergistic Uncertainty-aware Reasoning for Multimodal Emotion Recognition in Conversations	Yiqiang Cai et.al.	2604.01916	null
2026-04-02	FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models	Juyong Jiang et.al.	2604.01762	null
2026-04-02	M3D-BFS: a Multi-stage Dynamic Fusion Strategy for Sample-Adaptive Multi-Modal Brain Network Analysis	Rui Dong et.al.	2604.01667	null
2026-04-02	Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models	Shuibai Zhang et.al.	2604.01622	null
2026-04-02	DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72	Wanqian Li et.al.	2604.01621	null
2026-04-01	Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation	Jiuzhou Lei et.al.	2604.01414	null
2026-04-01	Sparse Spectral LoRA: Routed Experts for Medical VLMs	Omid Nejati Manzari et.al.	2604.01310	null
2026-04-01	Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning	Mohammad R. Abu Ayyash et.al.	2604.01152	null
2026-04-02	Asymptotically Optimal Sequential Testing with Heterogeneous LLMs	Guokai Li et.al.	2604.01086	null
2026-04-01	PHASOR: Anatomy- and Phase-Consistent Volumetric Diffusion for CT Virtual Contrast Enhancement	Zilong Li et.al.	2604.01053	null
2026-04-01	KUET at StanceNakba Shared Task: StanceMoE: Mixture-of-Experts Architecture for Stance Detection	Abdullah Al Shafi et.al.	2604.00878	null
2026-04-01	Cost-Penalized Fitness in FMA-Orchestrated Mixture of Experts: Experimental Evidence for Molecular Memory in Domain Adaptation	Martin Jaraiz et.al.	2604.00812	null
2026-04-01	Routing-Free Mixture-of-Experts	Yilun Liu et.al.	2604.00801	null
2026-04-01	Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer	Dharma Teja Vooturi et.al.	2604.00785	null
2026-04-01	Toward Optimal Sampling Rate Selection and Unbiased Classification for Precise Animal Activity Recognition	Axiu Mao et.al.	2604.00517	null
2026-04-01	Self-Routing: Parameter-Free Expert Routing from Hidden States	Jama Hussein Mohamud et.al.	2604.00421	null
2026-03-31	From Skew to Symmetry: Node-Interconnect Multi-Path Balancing with Execution-time Planning for Modern GPU Clusters	Jinghan Yao et.al.	2604.00317	null
2026-03-31	Directly visualizing the energy level structure of quantum dot molecules	Heun Mo Yoo et.al.	2604.00232	null
2026-03-31	Towards Verifiable and Self-Correcting AI Physicists for Quantum Many-Body Simulations	Ken Deng et.al.	2604.00149	null
2026-03-31	PASM: Population Adaptive Symbolic Mixture-of-Experts Model for Cross-location Hurricane Evacuation Decision Prediction	Xiao Qian et.al.	2604.00074	null
2026-03-31	Short proofs in combinatorics and number theory	Boris Alexeev et.al.	2603.29961	null
2026-03-31	First energy scan measurement of $e^{+}e^{-}\to K^{+}K^{-}$ around the $ψ(2S)$ resonance	BESIII Collaboration et.al.	2603.29854	null
2026-03-31	Counterfactual Analysis of Brain Network Dynamics	Moo K. Chung et.al.	2603.29843	null
2026-03-31	Training-Free Dynamic Upcycling of Expert Language Models	Eros Fanì et.al.	2603.29765	null
2026-03-31	TrafficMoE: Heterogeneity-aware Mixture of Experts for Encrypted Traffic Classification	Qing He et.al.	2603.29520	null
2026-03-31	Aligning Multimodal Sequential Recommendations via Robust Direct Preference Optimization with Sparse MoE	Hejin Huang et.al.	2603.29259	null
2026-03-31	Route-Induced Density and Stability (RIDE): Controlled Intervention and Mechanism Analysis of Routing-Style Meta Prompts on LLM Internal States	Dianxing Zhang et.al.	2603.29206	null
2026-03-31	BiMoE: Brain-Inspired Experts for EEG-Dominant Affective State Recognition	Hongyu Zhu et.al.	2603.29205	null
2026-03-30	Rethinking Language Model Scaling under Transferable Hypersphere Optimization	Liliang Ren et.al.	2603.28743	null
2026-03-30	StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation	Yiran Shi et.al.	2603.28565	null
2026-03-30	Observation of $Λ^+_c\to nπ^+η$ and search for $Λ^+_c\to na_0(980)^+$	BESIII Collaboration et.al.	2603.28232	null
2026-03-30	Graph Vector Field: A Unified Framework for Multimodal Health Risk Assessment from Heterogeneous Wearable and Environmental Data Streams	Silvano Coletti et.al.	2603.28115	null
2026-03-30	ExFusion: Efficient Transformer Training via Multi-Experts Fusion	Jiacheng Ruan et.al.	2603.27965	null
2026-03-31	MathGen: Revealing the Illusion of Mathematical Competence through Text-to-Image Generation	Ruiyao Liu et.al.	2603.27959	null
2026-03-29	KAT-Coder-V2 Technical Report	Fengxiang Li et.al.	2603.27703	null
2026-03-29	LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation	Shentong Mo et.al.	2603.27693	null
2026-03-29	PRBench: End-to-end Paper Reproduction in Physics Research	Shi Qiu et.al.	2603.27646	null
2026-03-29	Expert Streaming: Accelerating Low-Batch MoE Inference via Multi-chiplet Architecture and Dynamic Expert Trajectory Scheduling	Songchen Ma et.al.	2603.27624	null
2026-03-29	Fully Spiking Neural Networks with Target Awareness for Energy-Efficient UAV Tracking	Pengzhi Zhong et.al.	2603.27493	null
2026-03-29	On Token’s Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models	Chongyang Zhao et.al.	2603.27481	null
2026-03-28	Unveiling Code Clones in the Eclipse IIoT Software Ecosystem	Zengyang Li et.al.	2603.27308	null
2026-03-28	Persistent Memory Through Triple-Loop Consolidation in a Non-Gradient Dissipative Cognitive Architecture	Jianwei Lou et.al.	2603.27188	null
2026-03-28	Routing Sensitivity Without Controllability: A Diagnostic Study of Fairness in MoE Language Models	Junhyeok Lee et.al.	2603.27141	null
2026-03-27	TAPS: Task Aware Proposal Distributions for Speculative Sampling	Mohamad Zbib et.al.	2603.27027	null
2026-03-27	Learning to Commit: Generating Organic Pull Requests via Online Repository Memory	Mo Li et.al.	2603.26664	null
2026-03-27	Sustainability Is Not Linear: Quantifying Performance, Energy, and Privacy Trade-offs in On-Device Intelligence	Eziyo Ehsani et.al.	2603.26603	null
2026-03-26	Can Small Models Reason About Legal Documents? A Comparative Study	Snehit Vaddi et.al.	2603.25944	null
2026-03-26	Narrowband searches for continuous gravitational waves from known pulsars in the first two parts of the fourth LIGO–Virgo–KAGRA observing run	The LIGO Scientific Collaboration et.al.	2603.25938	null
2026-03-26	AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer’s Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study	Wenlong Hou et.al.	2603.25322	null
2026-03-26	SliderQuant: Accurate Post-Training Quantization for LLMs	Shigeng Wang et.al.	2603.25284	null
2026-03-26	A Wireless World Model for AI-Native 6G Networks	Ziqi Chen et.al.	2603.25216	null
2026-03-26	MCLMR: A Model-Agnostic Causal Learning Framework for Multi-Behavior Recommendation	Ranxu Zhang et.al.	2603.25126	null
2026-03-26	MP-MoE: Matrix Profile-Guided Mixture of Experts for Precipitation Forecasting	Huyen Ngoc Tran et.al.	2603.25046	null
2026-03-26	MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models	Dohwan Ko et.al.	2603.24984	null
2026-03-26	CROSS: A Mixture-of-Experts Reinforcement Learning Framework for Generalizable Large-Scale Traffic Signal Control	Xibei Chen et.al.	2603.24930	null
2026-03-25	OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding	Xiaoyu Tang et.al.	2603.24876	null
2026-03-25	Enes Causal Discovery	Alexis Kafantaris et.al.	2603.24436	null
2026-03-25	Cross Section Measurements of $\bar{n}p \rightarrow K^{+}K^{-}π^{+}(π^{0})$ via Antineutrons Produced by $J/ψ\to p π^{-} \bar{n}$ Decays	BESIII Collaboration et.al.	2603.24272	null
2026-03-25	B-MoE: A Body-Part-Aware Mixture-of-Experts “All Parts Matter” Approach to Micro-Action Recognition	Nishit Poddar et.al.	2603.24245	null
2026-03-25	Sequence-aware Large Language Models for Explainable Recommendation	Gangyi Zhang et.al.	2603.24136	null
2026-03-25	PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning	Huanyu Li et.al.	2603.24047	null
2026-03-25	LGEST: Dynamic Spatial-Spectral Expert Routing for Hyperspectral Image Classification	Jiawen Wen et.al.	2603.24045	null
2026-03-25	MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning	Andrea Manzoni et.al.	2603.24044	null
2026-03-25	SiftMoE: Similarity-Aware Energy-Efficient Expert Selection for Wireless Distributed MoE Inference	Qian Chen et.al.	2603.23888	null
2026-03-24	Lightweight Fairness for LLM-Based Recommendations via Kernelized Projection and Gated Adapters	Nan Cui et.al.	2603.23780	null
2026-03-24	The Diminishing Returns of Early-Exit Decoding in Modern LLMs	Rui Wei et.al.	2603.23701	null
2026-03-24	VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs	Haoran Yuan et.al.	2603.23481	link
2026-03-24	Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning	Connor Mclaughlin et.al.	2603.23436	null
2026-03-24	Amplitude Analysis of the Isospin-Violating Decay $J/ψ\rightarrowγηπ^{0}$	BESIII Collaboration et.al.	2603.23081	null
2026-03-24	IntentWeave: A Progressive Entry Ladder for Multi-Surface Browser Agents in Cloud Portals	Wanying Mo et.al.	2603.22917	null
2026-03-24	Search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$	BESIII Collaboration et.al.	2603.22804	null
2026-03-24	KALAVAI: Predicting When Independent Specialist Fusion Works – A Quantitative Model for Post-Hoc Cooperative LLM Training	Ramchand Kumaresan et.al.	2603.22755	null
2026-03-24	Why Database Manuals Are Not Enough: Efficient and Reliable Configuration Tuning for DBMSs via Code-Driven LLM Agents	Xinyi Zhang et.al.	2603.22708	null
2026-03-23	Bridging the Know-Act Gap via Task-Level Autoregressive Reasoning	Jihyun Janice Ahn et.al.	2603.22619	null
2026-03-23	FullCircle: Effortless 3D Reconstruction from Casual 360 $^\circ$ Captures	Yalda Foroutan et.al.	2603.22572	null
2026-03-23	3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing	Haoyu Zhen et.al.	2603.22279	null
2026-03-23	A bending in the size-mass relation of star-forming galaxies across $0.5 < z < 6.0$ at a critical stellar mass of $10^{10}M_\odot$ revealed by JWST	Longyue Chen et.al.	2603.22239	null
2026-03-23	Mixture of Mini Experts: Overcoming the Linear Layer Bottleneck in Multiple Instance Learning	Daniel Shao et.al.	2603.22198	null
2026-03-23	ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval	Zhuocheng Zhang et.al.	2603.21886	null
2026-03-23	Holistic Scaling Laws for Optimal Mixture-of-Experts Architecture Optimization	Weilin Wan et.al.	2603.21862	null
2026-03-23	DiT-Flow: Speech Enhancement Robust to Multiple Distortions based on Flow Matching in Latent Space and Diffusion Transformers	Tianyu Cao et.al.	2603.21608	null
2026-03-22	Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity	Zihan Fang et.al.	2603.21276	null
2026-03-22	QMoP: Query Guided Mixture-of-Projector for Efficient Visual Token Compression	Zhongyang Li et.al.	2603.21232	null
2026-03-22	MI-DPG: Decomposable Parameter Generation Network Based on Mutual Information for Multi-Scenario Recommendation	Wenzhuo Cheng et.al.	2603.21209	null
2026-03-22	Diffusion-based Probabilistic Air Quality Forecasting with Mechanistic Insight	Ao Ding et.al.	2603.21131	null
2026-03-22	Mixture of Chapters: Scaling Learnt Memory in Transformers	Tasmay Pankaj Tibrewal et.al.	2603.21096	null
2026-03-22	CoVFT: Context-aware Visual Fine-tuning for Multimodal Large Language Models	Nan Zhou et.al.	2603.21077	null
2026-03-22	LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning	Jianing Wang et.al.	2603.21065	null
2026-03-21	Satellite-to-Street: Synthesizing Post-Disaster Views from Satellite Imagery via Generative Vision Models	Yifan Yang et.al.	2603.20697	null
2026-03-21	CFNN: Continued Fraction Neural Network	Chao Wang et.al.	2603.20634	null
2026-03-21	A 4R-supported circular product-service system for luxury branded events	Ke Ma et.al.	2603.20613	null
2026-03-20	AE-LLM: Adaptive Efficiency Optimization for Large Language Models	Kaito Tanaka et.al.	2603.20492	null
2026-03-20	Thinking in Different Spaces: Domain-Specific Latent Geometry Survives Cross-Architecture Translation	Marcus Armstrong et.al.	2603.20406	null
2026-03-20	Gesture2Speech: How Far Can Hand Movements Shape Expressive Speech?	Lokesh Kumar et.al.	2603.19831	null
2026-03-20	Making Video Models Adhere to User Intent with Minor Adjustments	Daniel Ajisafe et.al.	2603.19672	null
2026-03-20	Structured Prompting for Arabic Essay Proficiency: A Trait-Centric Evaluation Approach	Salim Al Mandhari et.al.	2603.19668	null
2026-03-20	CS-MUNet: A Channel-Spatial Dual-Stream Mamba Network for Multi-Organ Segmentation	Yuyang Zheng et.al.	2603.19659	null
2026-03-20	UniBioTransfer: A Unified Framework for Multiple Biometrics Transfer	Caiyi Sun et.al.	2603.19637	null
2026-03-19	Scalable Prompt Routing via Fine-Grained Latent Task Discovery	Yunyi Zhang et.al.	2603.19415	null
2026-03-22	Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation	Zhuolin Yang et.al.	2603.19220	null
2026-03-19	DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Efficient MoE Inference on Edge	Yuegui Huang et.al.	2603.19172	null
2026-03-19	ATG-MoE: Autoregressive trajectory generation with mixture-of-experts for assembly skill learning	Weihang Huang et.al.	2603.19029	null
2026-03-19	GWTC-4.0: Tests of General Relativity. III. Tests of the Remnants	The LIGO Scientific Collaboration et.al.	2603.19021	null
2026-03-19	GWTC-4.0: Tests of General Relativity. II. Parameterized Tests	The LIGO Scientific Collaboration et.al.	2603.19020	null
2026-03-19	GWTC-4.0: Tests of General Relativity. I. Overview and General Tests	The LIGO Scientific Collaboration et.al.	2603.19019	null
2026-03-19	DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning	Yizhou Han et.al.	2603.18872	null
2026-03-19	Empathetic Motion Generation for Humanoid Educational Robots via Reasoning-Guided Vision–Language–Motion Diffusion Architecture	Fuze Sun et.al.	2603.18771	null
2026-03-19	Observation of $D_s^+ \to a_0(980)^+f_0(500)$ in the Amplitude Analysis of $D_s^+ \to π^+ π^0 π^0 η$	BESIII Collaboration et.al.	2603.18521	null
2026-03-19	AIMER: Calibration-Free Task-Agnostic MoE Pruning	Zongfang Liu et.al.	2603.18492	null
2026-03-19	AlignMamba-2: Enhancing Multimodal Fusion and Sentiment Analysis with Modality-Aware Mamba	Yan Li et.al.	2603.18462	null
2026-03-19	Spatially Indirect Exciton Condensation in Two-Dimensional Strongly Correlated Semimetals	Yao Zeng et.al.	2603.18445	null
2026-03-18	Path-Constrained Mixture-of-Experts	Zijin Gu et.al.	2603.18297	null
2026-03-18	CORE: Robust Out-of-Distribution Detection via Confidence and Orthogonal Residual Scoring	Jin Mo Yang et.al.	2603.18290	null
2026-03-18	Resonance-enhanced integrated acousto-optic beam steering	Yue Yu et.al.	2603.18191	null
2026-03-18	Understanding Task Aggregation for Generalizable Ultrasound Foundation Models	Fangyijie Wang et.al.	2603.18123	null
2026-03-18	DebugLM: Learning Traceable Training Data Provenance for LLMs	Wenjie Jacky Mo et.al.	2603.17884	null
2026-03-18	The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency	Huamin Chen et.al.	2603.17280	null
2026-03-17	Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency	Lucas Bandarkar et.al.	2603.17102	null
2026-03-17	Edge-Efficient Two-Stream Multimodal Architecture for Non-Intrusive Bathroom Fall Detection	Haitian Wang et.al.	2603.17069	null
2026-03-17	SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding	D. Darankoum et.al.	2603.16739	null
2026-03-17	HMAR: Hierarchical Modality-Aware Expert and Dynamic Routing Medical Image Retrieval Architecture	Aojie Yuan et.al.	2603.16679	null
2026-03-19	Mixture of Style Experts for Diverse Image Stylization	Shihao Zhu et.al.	2603.16649	null
2026-03-17	Tarab: A Multi-Dialect Corpus of Arabic Lyrics and Poetry	Mo El-Haj et.al.	2603.16601	null
2026-03-17	Visual Distraction Undermines Moral Reasoning in Vision-Language Models	Xinyi Yang et.al.	2603.16445	null
2026-03-18	EngGPT2: Sovereign, Efficient and Open Intelligence	G. Ciarfaglia et.al.	2603.16430	null
2026-03-17	PlotTwist: A Creative Plot Generation Framework with Small Language Models	Abhinav Thorat et.al.	2603.16410	null
2026-03-17	DynamicGate MLP Conditional Computation via Learned Structural Dropout and Input Dependent Gating for Functional Plasticity	Yong Il Choi et.al.	2603.16367	null
2026-03-17	Behavioral Steering in a 35B MoE Language Model via SAE-Decoded Probe Vectors: One Agency Axis, Not Five Traits	Jia Qing Yap et.al.	2603.16335	null
2026-03-17	AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection	Hongwei Lin et.al.	2603.16261	null
2026-03-17	Accelerating Approximate Analytical Join Queries over Unstructured Data with Statistical Guarantees	Yuxuan Zhu et.al.	2603.16153	null
2026-03-16	Confidently Wrong: Why Ignoring Binaries Biases IMF Inference at Large Sample Sizes	Anna L. Rosen et.al.	2603.15779	null
2026-03-16	Mastering the Minority: An Uncertainty-guided Multi-Expert Framework for Challenging-tailed Sequence Learning	Ye Wang et.al.	2603.15708	null
2026-03-16	Bridging Local and Global Knowledge: Cascaded Mixture-of-Experts Learning for Near-Shortest Path Routing	Yung-Fu Chen et.al.	2603.15541	null
2026-03-16	Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis	Penny Chong et.al.	2603.15483	null
2026-03-16	A Closer Look into LLMs for Table Understanding	Jia Wang et.al.	2603.15402	null
2026-03-16	MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers	Kangjun Guo et.al.	2603.15265	null
2026-03-17	Tracking the Discriminative Axis: Dual Prototypes for Test-Time OOD Detection Under Covariate Shift	Wooseok Lee et.al.	2603.15213	null
2026-03-16	ForceVLA2: Unleashing Hybrid Force-Position Control with Force Awareness for Contact-Rich Manipulation	Yang Li et.al.	2603.15169	null
2026-03-16	M2IR: Proactive All-in-One Image Restoration via Mamba-style Modulation and Mixture-of-Experts	Shiwei Wang et.al.	2603.14816	null
2026-03-16	Genetic Algorithms in Regression	Mo Li et.al.	2603.14801	null
2026-03-16	Universe Routing: Why Self-Evolving Agents Need Epistemic Control	Zhaohui Geoffrey Wang et.al.	2603.14799	null
2026-03-15	TopoCL: Topological Contrastive Learning for Medical Imaging	Guangyu Meng et.al.	2603.14647	null
2026-03-15	A measurement of gas rotation in galaxy groups via the kinetic Sunyaev-Zeldovich effect	Tianyi Yang et.al.	2603.14494	null
2026-03-15	Towards One-for-All Anomaly Detection for Tabular Data	Shiyuan Li et.al.	2603.14407	null
2026-03-15	WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems	Yuchen Wang et.al.	2603.14392	null
2026-03-15	M $^2$ RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling	Mayank Mishra et.al.	2603.14360	null
2026-03-15	A Physically-Grounded Attack and Adaptive Defense Framework for Real-World Low-Light Image Enhancement	Tongshun Zhang et.al.	2603.14304	null
2026-03-15	All-sky Searches for Continuous Gravitational Waves from Isolated Neutron Stars in the Data from the First Part of the Fourth LIGO-Virgo-KAGRA Observing Run	The LIGO Scientific Collaboration et.al.	2603.14168	null
2026-03-14	PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting	Xinyu Xiao et.al.	2603.13818	null
2026-03-14	Implicit Maximum Likelihood Estimation for Real-time Generative Model Predictive Control	Grayson Lee et.al.	2603.13733	null
2026-03-14	Sparse-Dense Mixture of Experts Adapter for Multi-Modal Tracking	Yabin Zhu et.al.	2603.13719	null
2026-03-13	NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL	Amos Goldman et.al.	2603.13606	null
2026-03-13	MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models	Md. Abdul Awal et.al.	2603.13213	null
2026-03-13	Reference-Free Image Quality Assessment for Virtual Try-On via Human Feedback	Yuki Hirakawa et.al.	2603.13057	null
2026-03-13	Team RAS in 10th ABAW Competition: Multimodal Valence and Arousal Estimation Approach	Elena Ryumina et.al.	2603.13056	null
2026-03-13	Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation	Fei Wang et.al.	2603.12845	null
2026-03-13	Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking	Zizhao Mo et.al.	2603.12831	null
2026-03-13	LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing	Jiawei Hao et.al.	2603.12645	null
2026-03-13	CarPLAN: Context-Adaptive and Robust Planning with Dynamic Scene Awareness for Autonomous Driving	Junyong Yun et.al.	2603.12607	null
2026-03-13	Spectral Dataset of Stripped-Envelope Supernovae from the Tsinghua Supernova Group	Danfeng Xiang et.al.	2603.12604	null
2026-03-13	Expert Pyramid Tuning: Efficient Parameter Fine-Tuning for Expertise-Driven Task Allocation	Jia-Chen Zhang et.al.	2603.12577	null
2026-03-13	Spatio-Semantic Expert Routing Architecture with Mixture-of-Experts for Referring Image Segmentation	Alaa Dalaq et.al.	2603.12538	null
2026-03-12	TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition	Prabhu Vellaisamy et.al.	2603.12465	null
2026-03-12	NeuroLoRA: Context-Aware Neuromodulation for Parameter-Efficient Multi-Task Adaptation	Yuxin Yang et.al.	2603.12378	null
2026-03-12	A Two-Stage Dual-Modality Model for Facial Emotional Expression Recognition	Jiajun Sun et.al.	2603.12221	null
2026-03-12	CrossEarth-SAR: A SAR-Centric and Billion-Scale Geospatial Foundation Model for Domain Generalizable Semantic Segmentation	Ziqi Ye et.al.	2603.12008	null
2026-03-12	AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization	Qiyang Li et.al.	2603.11873	null
2026-03-12	Expert Threshold Routing for Autoregressive Language Modeling with Dynamic Computation Allocation and Load Balancing	Hanchi Sun et.al.	2603.11535	null
2026-03-11	Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers	Mynampati Sri Ranganadha Avinash et.al.	2603.11114	null
2026-03-11	Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions	Kangke Cheng et.al.	2603.10721	null
2026-03-11	UniStitch: Unifying Semantic and Geometric Features for Image Stitching	Yuan Mei et.al.	2603.10568	null
2026-03-11	Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design	Junzhuo Li et.al.	2603.10379	null
2026-03-12	The Orthogonal Vulnerabilities of Generative AI Watermarks: A Comparative Empirical Benchmark of Spatial and Latent Provenance	Jesse Yu et.al.	2603.10323	null
2026-03-10	Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions	Mingyang Song et.al.	2603.09938	null
2026-03-10	Quantifying the Necessity of Chain of Thought through Opaque Serial Depth	Jonah Brown-Cohen et.al.	2603.09786	null
2026-03-10	MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants	Zuhao Zhang et.al.	2603.09652	null
2026-03-10	MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning	Xiang Yuan et.al.	2603.09478	null
2026-03-12	Multi-tasking through quantum annealing	Jargalsaikhan Artag et.al.	2603.09468	null
2026-03-10	Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers	Albus Yizhuo Li et.al.	2603.09453	null
2026-03-10	Exploring Modality-Aware Fusion and Decoupled Temporal Propagation for Multi-Modal Object Tracking	Shilei Wang et.al.	2603.09287	null
2026-03-10	Acoustic and Semantic Modeling of Emotion in Spoken Language	Soumya Dutta et.al.	2603.09212	null
2026-03-10	GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models	Md Selim Sarowar et.al.	2603.09079	null
2026-03-09	The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference	Vignesh Adhinarayanan et.al.	2603.08960	null
2026-03-09	ConFu: Contemplate the Future for Better Speculative Sampling	Zongyue Qin et.al.	2603.08899	null
2026-03-09	Microwave response of electrically driven spins in a three-qubit quantum processor	Tanner M. Janda et.al.	2603.08577	null
2026-03-09	LAR-MoE: Latent-Aligned Routing for Mixture of Experts in Robotic Imitation Learning	Ariel Rodriguez et.al.	2603.08476	null
2026-03-09	Amplitude Analysis of Singly Cabibbo-Suppressed Decay $Λ^{+}_{c}\to p K^{+} K^{-}$	BESIII Collaboration et.al.	2603.08469	null
2026-03-09	IronEngine: Towards General AI Assistant	Xi Mo et.al.	2603.08425	null
2026-03-09	Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows	Shentong Mo et.al.	2603.08126	null
2026-03-09	An improved measurement of $η^\prime\rightarrow e^{+}e^{-}ω$	BESIII Collaboration et.al.	2603.08120	null
2026-03-09	SAMoE-VLA: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving	Zihan You et.al.	2603.08113	null
2026-03-09	Deterministic Differentiable Structured Pruning for Large Language Models	Weiyu Huang et.al.	2603.08065	null
2026-03-09	Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization	Jingwei Li et.al.	2603.08022	null
2026-03-09	Scaling Machine Learning Interatomic Potentials with Mixtures of Experts	Yuzhi Liu et.al.	2603.07977	null
2026-03-09	Structural Design and Performance Analysis of Laser Transmitting Telescope for Space Gravitational Wave Detection	Long Yongtao et.al.	2603.07967	null
2026-03-09	SGG-R $^{\rm 3}$ : From Next-Token Prediction to End-to-End Unbiased Scene Graph Generation	Jiaye Feng et.al.	2603.07961	null
2026-03-09	SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans	Hansi Zeng et.al.	2603.07853	null
2026-03-08	Scalable Training of Mixture-of-Experts Models with Megatron Core	Zijie Yan et.al.	2603.07685	null
2026-03-08	AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots	Likui Zhang et.al.	2603.07648	null
2026-03-08	Mixed Effects Mixture of Experts: Modeling Double Heterogeneous Trajectories	Xinkai Yue et.al.	2603.07479	null
2026-03-08	UnSCAR: Universal, Scalable, Controllable, and Adaptable Image Restoration	Debabrata Mandal et.al.	2603.07406	null
2026-03-07	Scheduling Parallel Optical Circuit Switches for AI Training	Kevin Liang et.al.	2603.07373	null
2026-03-07	Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures	Shuqing Luo et.al.	2603.07006	null
2026-03-06	Swimba: Switch Mamba Model Scales State Space Models	Zhixu Du et.al.	2603.06938	null
2026-03-06	PaQ-DETR: Learning Pattern and Quality-Aware Dynamic Queries for Object Detection	Zhengjian Kang et.al.	2603.06917	null
2026-03-06	RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering	Gaia A. Bertolino et.al.	2603.06542	null
2026-03-06	A Mixture-of-Experts Framework for Practical Hybrid-Quantum Models in Credit Card Fraud Detection	Rodrigo Chaves et.al.	2603.06473	null
2026-03-06	MoEMambaMIL: Structure-Aware Selective State Space Modeling for Whole-Slide Image Analysis	Dongqing Xie et.al.	2603.06378	null
2026-03-06	MoEless: Efficient MoE LLM Serving via Serverless Computing	Hanfei Yu et.al.	2603.06350	null
2026-03-06	WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection	Peng Chen et.al.	2603.06313	null
2026-03-06	GazeMoE: Perception of Gaze Target with Mixture-of-Experts	Zhuangzhuang Dai et.al.	2603.06256	null
2026-03-06	EvoESAP: Non-Uniform Expert Pruning for Sparse MoE	Zongfang Liu et.al.	2603.06003	null
2026-03-06	MoE Lens – An Expert Is All You Need	Marmik Chaudhari et.al.	2603.05806	null
2026-03-06	Sparse Crosscoders for diffing MoEs and Dense models	Marmik Chaudhari et.al.	2603.05805	null
2026-03-05	Change Point Detection for Cell Populations Measured via Flow Cytometry	Yik Lun Kei et.al.	2603.05700	null
2026-03-05	FreeTxt-Vi: A Benchmarked Vietnamese-English Toolkit for Segmentation, Sentiment, and Summarisation	Hung Nguyen Huy et.al.	2603.05690	null
2026-03-05	Multi-channel joint analysis of the exotic charmonium-like state $T_{c\bar{c}}(4020)$	BESIII Collaboration et.al.	2603.05564	null
2026-03-05	VietJobs: A Vietnamese Job Advertisement Dataset	Hieu Pham Dinh et.al.	2603.05262	null
2026-03-05	NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension	Rongzhi Li et.al.	2603.05046	null
2026-03-05	Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation	Yilong Chen et.al.	2603.04971	null
2026-03-05	Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling	Yong Liu et.al.	2603.04791	null
2026-03-05	TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings	Yebo Wu et.al.	2603.04772	null
2026-03-04	ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model	Yuhao Xu et.al.	2603.04589	null
2026-03-04	Augmenting representations with scientific papers	Nicolò Oreste Pinciroli Vago et.al.	2603.04516	null
2026-03-04	RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation	Yixin Chen et.al.	2603.04348	null
2026-03-04	CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation	Jinfeng Xu et.al.	2603.04320	null
2026-03-04	*Precise measurement of the form factors in $D^0\rightarrow K^(892)^-\ell^+ν_{\ell}$ and observation of $D^0\rightarrow K_2^(1430)^-\ell^+ν_{\ell}$*	BESIII Collaboration et.al.	2603.04136	null
2026-03-04	UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization	Qianfeng Yang et.al.	2603.03967	null
2026-03-04	Glass Segmentation with Fusion of Learned and General Visual Features	Risto Ojala et.al.	2603.03718	null
2026-03-04	Plasmonic polaron in self-intercalated 1T-TiS2	Byoung Ki Choi et.al.	2603.03663	null
2026-03-03	Modeling Cross-vision Synergy for Unified Large Vision Model	Shengqiong Wu et.al.	2603.03564	null
2026-03-03	Beyond Language Modeling: An Exploration of Multimodal Pretraining	Shengbang Tong et.al.	2603.03276	null
2026-03-03	Search for a massless particle beyond the Standard Model in the $Ξ^0\toΛ+ \text{invisible}$ decay	BESIII Collaboration et.al.	2603.03199	null
2026-03-04	MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection	Jun Yeong Park et.al.	2603.03101	null
2026-03-03	CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots	Shihao Ma et.al.	2603.03067	null
2026-03-03	EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education	Baoliang Chen et.al.	2603.03066	null
2026-03-03	Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs	Wuyue Zhang et.al.	2603.02731	null
2026-03-03	TenExp: Mixture-of-Experts-Based Tensor Decomposition Structure Search Framework	Ting-Wei Zhou et.al.	2603.02720	null
2026-03-03	MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration	Lingshun Kong et.al.	2603.02710	null
2026-03-03	Addressing Missing and Noisy Modalities in One Solution: Unified Modality-Quality Framework for Low-quality Multimodal Data	Sijie Mai et.al.	2603.02695	null
2026-03-03	Robust Heterogeneous Analog-Digital Computing for Mixture-of-Experts Models with Theoretical Generalization Guarantees	Mohammed Nowaz Rabbani Chowdhury et.al.	2603.02633	null
2026-03-02	Search for the charmonium weak decay $ψ(2S)\to D_s^-π^+ + c.c.$ and $ψ(2S)\to D_s^-ρ^+ + c.c.$	BESIII Collaboration et.al.	2603.01777	null
2026-03-02	DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks	Gökdeniz Gülmez et.al.	2603.01697	null
2026-03-02	PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification	Jian Yu et.al.	2603.01547	null
2026-03-02	Multimodal Mixture-of-Experts with Retrieval Augmentation for Protein Active Site Identification	Jiayang Wu et.al.	2603.01511	null
2026-03-02	DOCFORGE-BENCH: A Comprehensive Benchmark for Document Forgery Detection and Analysis	Zengqi Zhao et.al.	2603.01433	null
2026-03-03	UETrack: A Unified and Efficient Framework for Single Object Tracking	Ben Kang et.al.	2603.01412	null
2026-03-02	Fed-GAME: Personalized Federated Learning with Graph Attention Mixture-of-Experts For Time-Series Forecasting	Yi Li et.al.	2603.01363	null
2026-03-01	Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning	Hamed Damirchi et.al.	2603.01326	null
2026-03-01	Fast Confidence-Aware Human Prediction via Hardware-accelerated Bayesian Inference for Safe Robot Navigation	Michael Lu et.al.	2603.01122	null
2026-03-01	TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading	Yudong Pan et.al.	2603.01058	null
2026-03-01	Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving	Xubo Zhu et.al.	2603.01007	null
2026-02-28	MME: Mixture of Mesh Experts with Random Walk Transformer Gating	Amir Belder et.al.	2603.00828	null
2026-02-28	*First Amplitude Analysis of $D^0\rightarrow K^-π^0e^+ν_e$ and Observation of $D^0\rightarrow K^_2(1430)^-e^+ν_e$**	BESIII Collaboration et.al.	2603.00743	null
2026-02-28	K^2-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control	Zhe Wu et.al.	2603.00676	null
2026-02-28	Precise Measurement and Control of Radon Progeny on Detector Surfaces	C. B. Z. Luo et.al.	2603.00647	null
2026-02-28	CoMoL: Efficient Mixture of LoRA Experts via Dynamic Core Space Merging	Jie Cao et.al.	2603.00573	null
2026-02-27	CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning	Yuxuan Liu et.al.	2602.24142	null
2026-02-27	Precision Studies and Searches for CP Asymmetries in the Inclusive Decay $Λ_{c}^{+}\to ΛX$	BESIII Collaboration et.al.	2602.24089	null
2026-02-27	Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization	Chenwei Jia et.al.	2602.24059	null
2026-02-27	Measurement of Born Cross Sections for $e^+e^-\toΣ^-\barΣ^+$ at $\sqrt{s}=3.51-4.95$ GeV and Observation of $ψ(3770)\toΣ^-\barΣ^+$	BESIII Collaboration et.al.	2602.23835	null
2026-02-27	ProductResearch: Training E-Commerce Deep Research Agents via Multi-Agent Synthetic Trajectory Distillation	Jiangyuan Wang et.al.	2602.23716	null
2026-02-26	Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG	Hanning Guo et.al.	2602.23410	null
2026-02-26	A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations	Soumya Dutta et.al.	2602.23300	null
2026-02-26	Learning Physical Operators using Neural Operators	Vignesh Gopakumar et.al.	2602.23113	null
2026-02-26	Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability	Bum Jun Kim et.al.	2602.22988	null
2026-02-26	pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation	Shentong Mo et.al.	2602.22938	null
2026-02-26	MEDNA-DFM: A Dual-View FiLM-MoE Model for Explainable DNA Methylation Prediction	Yi He et.al.	2602.22850	null
2026-02-26	DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation	Hao Zheng et.al.	2602.22839	null
2026-02-26	Productivity and Collaboration in Hybrid Agile Teams: An Interview Study	Elisabeth Mo et.al.	2602.22835	null
2026-02-26	Measurements of branching fractions of $Λ_{c}^{+}\toΣ^{0}K_{S}^{0}π^{+}$ and $Λ_{c}^{+}\toΣ^{0}K_{S}^{0}K^{+}$	BESIII Collaboration et.al.	2602.22754	null
2026-02-26	IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation	Yanpei Guo et.al.	2602.22700	null
2026-02-26	Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting	Fabian Muşat et.al.	2602.22685	null
2026-02-26	Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement	Shuchen Zhu et.al.	2602.22681	null
2026-02-26	Predictive variational inference for flexible regression models	Lucas Kock et.al.	2602.22582	null
2026-02-26	Towards Dynamic Dense Retrieval with Routing Strategy	Zhan Su et.al.	2602.22547	null
2026-02-25	NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training	Dengdi Sun et.al.	2602.22059	null
2026-02-25	Excitation: Momentum For Experts	Sagi Shaier et.al.	2602.21798	null
2026-02-25	Learning from Yesterday’s Error: An Efficient Online Learning Method for Traffic Demand Prediction	Xiannan Huang et.al.	2602.21757	null
2026-02-25	TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts	Jiafeng Lin et.al.	2602.21693	null
2026-02-25	Multi-Layer Scheduling for MoE-Based LLM Reasoning	Yifan Sun et.al.	2602.21626	null
2026-02-24	A Path to an All-Sky Survey with Roman	Jiwon Jesse Han et.al.	2602.21280	null
2026-02-24	On infinite sets with no $3$ on a line	Moe Putterman et.al.	2602.21275	null
2026-02-24	ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments	Haley Li et.al.	2602.21140	null
2026-02-24	MUSE: Harnessing Precise and Diverse Semantics for Few-Shot Whole Slide Image Classification	Jiahao Xu et.al.	2602.20873	null
2026-02-25	GeCo-SRT: Geometry-aware Continual Adaptation for Robotic Cross-Task Sim-to-Real Transfer	Wenbo Yu et.al.	2602.20871	null
2026-02-24	Multi-time Loewner energy: rate function for large deviation	Mo Chen et.al.	2602.20642	null
2026-02-24	Precise Measurement of Matter-Antimatter Asymmetry with Entangled Hyperon Antihyperon Pairs	BESIII Collaboration et.al.	2602.20524	null
2026-02-24	Search for Light-Mass Fractionally Charged Particles in Space with DAMPE Experiment	F. Alemanno et.al.	2602.20519	null
2026-02-24	Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA	Nuocheng Yang et.al.	2602.20492	null
2026-02-23	Learning Discriminative and Generalizable Anomaly Detector for Dynamic Graph with Limited Supervision	Yuxing Tian et.al.	2602.20019	null
2026-02-23	Counterfactual Understanding via Retrieval-aware Multimodal Modeling for Time-to-Event Survival Prediction	Ha-Anh Hoang Nguyen et.al.	2602.19987	null
2026-02-23	ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting	Yuxing Tian et.al.	2602.19969	null
2026-02-23	A Replicate-and-Quantize Strategy for Plug-and-Play Load Balancing of Sparse Mixture-of-Experts LLMs	Zijie Liu et.al.	2602.19938	null
2026-02-23	Towards Dexterous Embodied Manipulation via Deep Multi-Sensory Fusion and Sparse Expert Scaling	Yirui Sun et.al.	2602.19764	null
2026-02-23	Multimodal Dataset Distillation Made Simple by Prototype-Guided Data Synthesis	Junhyeok Choi et.al.	2602.19756	null
2026-02-23	RAID: Retrieval-Augmented Anomaly Detection	Mingxiu Cai et.al.	2602.19611	null
2026-02-23	EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting	Angzi Xu et.al.	2602.19485	null
2026-02-22	RegionRoute: Regional Style Transfer with Diffusion Model	Bowen Chen et.al.	2602.19254	null
2026-02-22	Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts	Toshihide Ubukata et.al.	2602.19244	null
2026-02-22	SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation	Yujie Lu et.al.	2602.19213	null
2026-02-22	JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation	Kai Liu et.al.	2602.19163	null
2026-02-22	K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model	Shiyi Cao et.al.	2602.19128	null
2026-02-22	Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection	Hossein Shokouhinejad et.al.	2602.19025	null
2026-02-21	NeuroWise: A Multi-Agent LLM “Glass-Box” System for Practicing Double-Empathy Communication with Autistic Partners	Albert Tang et.al.	2602.18962	null
2026-02-21	Give Users the Wheel: Towards Promptable Recommendation Paradigm	Fuyuan Lyu et.al.	2602.18929	null
2026-02-21	Diverse properties of electron Forbush decreases revealed by the Dark Matter Particle Explorer	F. Alemanno et.al.	2602.18743	null
2026-02-21	Comprehensive measurement of $η^\prime$ photoproduction off the proton at $E_γ< 2.4$ $\mathrm{GeV}$	N. Muramatsu et.al.	2602.18675	null
2026-02-20	Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory	Vatsal Agarwal et.al.	2602.18434	null
2026-02-20	RamanSeg: Interpretability-driven Deep Learning on Raman Spectra for Cancer Diagnosis	Chris Tomy et.al.	2602.18119	null
2026-02-20	DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE	Yujie Jin et.al.	2602.18019	null
2026-02-19	Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds	Ibne Farabi Shihab et.al.	2602.17798	null
2026-02-19	Phase-Aware Mixture of Experts for Agentic Reinforcement Learning	Shengtian Yang et.al.	2602.17038	null
2026-02-19	Arcee Trinity Large Technical Report	Varun Singh et.al.	2602.17004	null
2026-02-19	Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation	Yan Wang et.al.	2602.16990	null
2026-02-18	Claim Automation using Large Language Model	Zhengda Mo et.al.	2602.16836	null
2026-02-18	Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning	Zifan Wang et.al.	2602.16796	null
2026-02-18	Geometric Neural Operators via Lie Group-Constrained Latent Dynamics	Jiaquan Zhang et.al.	2602.16209	null
2026-02-18	OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis	Tianwei Lin et.al.	2602.16110	null
2026-02-18	Federated Graph AGI for Cross-Border Insider Threat Intelligence in Government Financial Schemes	Srikumar Nayak et.al.	2602.16109	null
2026-02-17	MoE-Spec: Expert Budgeting for Efficient Speculative Decoding	Bradley McDanel et.al.	2602.16052	null
2026-02-17	ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns	Ziyu Zhao et.al.	2602.15521	null
2026-02-17	GMAIL: Generative Modality Alignment for generated Image Learning	Shentong Mo et.al.	2602.15368	null
2026-02-16	Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs	Ali Khalesi et.al.	2602.15091	null
2026-02-13	RynnBrain: Open Embodied Foundation Models	Ronghao Dang et.al.	2602.14979	null
2026-02-16	Topological and arithmetic characteristics about products of projective lines with complex tori	Jia-Li Mo et.al.	2602.14745	null
2026-02-16	DriveFine: Refining-Augmented Masked Diffusion VLA for Precise and Robust Driving	Chenxu Dang et.al.	2602.14577	null
2026-02-15	DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices	Songyuan Li et.al.	2602.14301	null
2026-02-15	MILD: Multi-Intent Learning and Disambiguation for Proactive Failure Prediction in Intent-based Networking	Md. Kamrul Hossain et.al.	2602.14283	null
2026-02-15	Multi-Agent Debate: A Unified Agentic Framework for Tabular Anomaly Detection	Pinqiao Wang et.al.	2602.14251	null
2026-02-15	Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling via Functional Scaling Laws	Jinbo Wang et.al.	2602.14208	null
2026-02-15	Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization	Rizhen Hu et.al.	2602.14159	null
2026-02-15	REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment	Kai Ye et.al.	2602.14065	null
2026-02-15	LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts	Yang Liu et.al.	2602.14060	null
2026-02-15	Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models	Sajjad Kachuee et.al.	2602.14039	null
2026-02-15	Eureka-Audio: Triggering Audio Intelligence in Compact Language Models	Dan Zhang et.al.	2602.13954	null
2026-02-14	Assessing Cybersecurity Risks and Traffic Impact in Connected Autonomous Vehicles	Saurav Silwal et.al.	2602.13898	null
2026-02-14	Mixture-of-experts Wishart model for covariance matrices with an application to Cancer drug screening	The Tien Mai et.al.	2602.13888	null
2026-02-13	Dyad: a binary-star dynamics and statistics library for Python	Amery Gration et.al.	2602.13388	null
2026-02-13	Improved measurements of the coherence factors and strong-phase differences in $D\to K^-π^+π^+π^-$ and $D\to K^-π^+π^0$ with quantum-correlated $D\bar{D}$ decays	BESIII Collaboration et.al.	2602.13002	null
2026-02-13	Aspect-Based Sentiment Analysis for Future Tourism Experiences: A BERT-MoE Framework for Persian User Reviews	Hamidreza Kazemi Taskooh et.al.	2602.12778	null
2026-02-13	Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning	Jon Irureta et.al.	2602.12708	null
2026-02-13	Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers	Anrui Chen et.al.	2602.12587	null
2026-02-13	SD-MoE: Spectral Decomposition for Effective Expert Specialization	Ruijun Huang et.al.	2602.12556	null
2026-02-13	Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR	Jaeyoung Lee et.al.	2602.12546	null
2026-02-12	Query-focused and Memory-aware Reranker for Long Context Processing	Yuqing Li et.al.	2602.12192	null
2026-02-12	Measurement of the singly Cabibbo-suppressed decay $Λ_c^+\to pη’$ with Deep Learning	BESIII Collaboration et.al.	2602.11974	null
2026-02-12	Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration	Akhiad Bercovich et.al.	2602.11937	null
2026-02-12	Deep Kernel Fusion for Transformers	Zixi Zhang et.al.	2602.11808	null
2026-02-12	LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training	Xinyi Liu et.al.	2602.11686	null
2026-02-12	Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts	Haiyang Jiang et.al.	2602.11622	null
2026-02-12	Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm	Jinrui Zhang et.al.	2602.11543	null
2026-02-12	Adaptive Milestone Reward for GUI Agents	Congmin Zheng et.al.	2602.11524	null
2026-02-12	Observation of a New Excited $Σ$ State in $ψ(3686)\to\bar{p}K^+Σ^0+c.c.$	BESIII Collaboration et.al.	2602.11501	null
2026-02-11	Charting Empirical Laws for LLM Fine-Tuning in Scientific Multi-Discipline Learning	Lintao Wang et.al.	2602.11215	null
2026-02-11	MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs	Yupu Gu et.al.	2602.10965	null
2026-02-11	CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control	Riccardo Barbano et.al.	2602.10933	null
2026-02-11	VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training	Guobin Shen et.al.	2602.10693	null
2026-02-11	Multimodal Priors-Augmented Text-Driven 3D Human-Object Interaction Generation	Yin Wang et.al.	2602.10659	null
2026-02-11	A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology	Siyuan Yan et.al.	2602.10624	null
2026-02-11	Supercharging Packet-level Network Simulation of Large Model Training via Memoization and Fast-Forwarding	Fei Long et.al.	2602.10615	null
2026-02-11	Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters	Ailin Huang et.al.	2602.10604	null
2026-02-11	Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity	Guangzhi Xiong et.al.	2602.10585	null
2026-02-12	3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars	Zhongju Wang et.al.	2602.10516	null
2026-02-10	Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and Caching	Hanyuan Gao et.al.	2602.10254	null
2026-02-10	TDE 2025abcr: A Tidal Disruption Event in the Outskirts of a Massive Galaxy	Robert Stein et.al.	2602.10180	null
2026-02-10	MalMoE: Mixture-of-Experts Enhanced Encrypted Malicious Traffic Detection Under Graph Drift	Yunpeng Tan et.al.	2602.10157	null
2026-02-10	Diverse Skill Discovery for Quadruped Robots via Unsupervised Learning	Ruopeng Cui et.al.	2602.09767	null
2026-02-10	Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware Systems	Guowei Liu et.al.	2602.09721	null
2026-02-10	First observation of the $η_{c}\toΞ^{0} \barΞ^{0}$ decay	BESIII Collaboration et.al.	2602.09652	null
2026-02-10	DR.Experts: Differential Refinement of Distortion-Aware Experts for Blind Image Quality Assessment	Bohan Fu et.al.	2602.09531	null
2026-02-10	SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity	Yukun Zhang et.al.	2602.09386	null
2026-02-10	Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density	Zhendong Mi et.al.	2602.09316	null
2026-02-09	Generalizing GNNs with Tokenized Mixture of Experts	Xiaoguang Guo et.al.	2602.09258	null
2026-02-09	UI-Venus-1.5 Technical Report	Veuns-Team et.al.	2602.09082	null
2026-02-09	DirMoE: Dirichlet-routed Mixture of Experts	Amirhossein Vahidi et.al.	2602.09001	null
2026-02-09	OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation	Yehua Huang et.al.	2602.08896	null
2026-02-09	FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models	Annemette Brok Pirchert et.al.	2602.08818	null
2026-02-10	MOVA: Towards Scalable and Synchronized Video-Audio Generation	SII-OpenMOSS Team et.al.	2602.08794	null
2026-02-10	Redundancy-Free View Alignment for Multimodal Human Activity Recognition with Arbitrarily Missing Views	Duc-Anh Nguyen et.al.	2602.08755	null
2026-02-09	Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing	Jona te Lintelo et.al.	2602.08741	null
2026-02-09	6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks	Mohamed Amine Ferrag et.al.	2602.08675	null
2026-02-10	Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models	Mingzi Cao et.al.	2602.08658	null
2026-02-09	Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs	Yukun Jiang et.al.	2602.08621	null
2026-02-09	Giant Magnetocaloric Effect in a High-Spin Shastry-Sutherland Dipolar Magnet	Jianjian Gong et.al.	2602.08497	null
2026-02-09	TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration	Linye Wei et.al.	2602.08404	null
2026-02-09	Tighnari v2: Mitigating Label Noise and Distribution Shift in Multimodal Plant Distribution Prediction via Mixture of Experts and Weakly Supervised Learning	Haixu Liu et.al.	2602.08282	null
2026-02-09	Large Language Models in Peer-Run Community Behavioral Health Services: Understanding Peer Specialists and Service Users’ Perspectives on Opportunities, Risks, and Mitigation Strategies	Cindy Peng et.al.	2602.08187	null
2026-02-08	Multimodal normative modeling in Alzheimers Disease with introspective variational autoencoders	Sayantan Kumar et.al.	2602.08077	null
2026-02-08	Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation	Shayan Ali Hassan et.al.	2602.08062	null
2026-02-08	Enhanced Mixture 3D CGAN for Completion and Generation of 3D Objects	Yahia Hamdi et.al.	2602.08046	null
2026-02-08	The Rise of Sparse Mixture-of-Experts: A Survey from Algorithmic Foundations to Decentralized Architectures and Vertical Domain Applications	Dong Pan et.al.	2602.08019	null
2026-02-08	Fast Model Selection and Stable Optimization for Softmax-Gated Multinomial-Logistic Mixture of Experts Models	TrungKhang Tran et.al.	2602.07997	null
2026-02-08	Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds	Chen Yang et.al.	2602.07864	null
2026-02-07	SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models	Juntong Wu et.al.	2602.07616	null
2026-02-06	DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos	Shenyuan Gao et.al.	2602.06949	null
2026-02-06	Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing	Meng Lou et.al.	2602.06862	null
2026-02-06	POP: Online Structural Pruning Enables Efficient Inference of Large Foundation Models	Yi Chen et.al.	2602.06822	null
2026-02-06	SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers	Shentong Mo et.al.	2602.06706	null
2026-02-06	Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making	Baichuan-M3 Team et.al.	2602.06570	null
2026-02-06	TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders	Yuchen Jiang et.al.	2602.06563	null
2026-02-06	HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction	Shengxuan Qiu et.al.	2602.06527	null
2026-02-05	GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt	Mark Russinovich et.al.	2602.06258	null
2026-02-05	To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training	Meghana Madhyastha et.al.	2602.06183	null
2026-02-05	MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models	Nurbek Tastan et.al.	2602.06154	null
2026-02-05	OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale	Jingze Shi et.al.	2602.05711	null
2026-02-05	Hidden simplicity in AdS spinning Mellin amplitudes via scaffolding	Song He et.al.	2602.05568	null
2026-02-05	M $^2$ -Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining	Rui Lv et.al.	2602.05429	null
2026-02-05	Mergers Drive Structural Complexity but Not Starbursts in Lyman- $α$ Emitters at $3 < z < 4$ : A JWST Spatially Resolved View	Qi Song et.al.	2602.05411	null
2026-02-05	Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach	Beichen Wan et.al.	2602.05340	null
2026-02-05	Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink	Guozhi Liu et.al.	2602.05228	null
2026-02-04	Rule-Based Spatial Mixture-of-Experts U-Net for Explainable Edge Detection	Bharadwaj Dogga et.al.	2602.05100	null
2026-02-04	Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism	Chenwei Cui et.al.	2602.04870	null
2026-02-04	PDF-HR: Pose Distance Fields for Humanoid Robots	Yi Gu et.al.	2602.04851	null
2026-02-04	ERNIE 5.0 Technical Report	Haifeng Wang et.al.	2602.04705	null
2026-02-04	Let Experts Feel Uncertainty: A Multi-Expert Label Distribution Approach to Probabilistic Time Series Forecasting	Zhen Zhou et.al.	2602.04678	null
2026-02-04	RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models	Jiacheng Liang et.al.	2602.04448	null
2026-02-04	Mixture of Masters: Sparse Chess Language Models with Player Routing	Giacomo Frisoni et.al.	2602.04447	null
2026-02-04	Study of $\barΛ$-$p$ Annihilation into Light Mesons	BESIII Collaboration et.al.	2602.04276	null
2026-02-04	Universal Quantized Berry-Dipole Flat Bands	Qingyang Mo et.al.	2602.04194	null
2026-02-04	OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows	Ruiting Dai et.al.	2602.04144	null
2026-02-04	Expert Selections In MoE Models Reveal (Almost) As Much As Text	Amir Nuriyev et.al.	2602.04105	null
2026-02-03	SpecMD: A Comprehensive Study On Speculative Expert Prefetching	Duc Hoang et.al.	2602.03921	null
2026-02-03	UniGeM: Unifying Data Mixing and Selection via Geometric Exploration and Mining	Changhao Wang et.al.	2602.03772	null
2026-02-03	HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing	Yizhao Gao et.al.	2602.03560	null
2026-02-03	DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs	Zeyu Zhu et.al.	2602.03495	null
2026-02-03	Scaling Continual Learning with Bi-Level Routing Mixture-of-Experts	Meng Lou et.al.	2602.03473	null
2026-02-03	VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers	Zhiwen Li et.al.	2602.03210	null
2026-02-03	Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry	Ye Su et.al.	2602.03204	null
2026-02-03	Aligning Forest and Trees in Images and Long Captions for Visually Grounded Understanding	Byeongju Woo et.al.	2602.02977	null
2026-02-02	Decision-Focused Optimal Transport	Suhan Liu et.al.	2602.02800	null
2026-02-02	Loss mechanisms of microwave frequency acoustic waves in thin film lithium niobate	Qixuan Lin et.al.	2602.02797	null
2026-02-02	SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning	Qifan Yu et.al.	2602.02472	null
2026-02-02	Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE	Yuanteng Chen et.al.	2602.02443	null
2026-02-02	DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild	Arnab Das et.al.	2602.02286	null
2026-02-02	MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology	Susu Hu et.al.	2602.02282	null
2026-02-02	Kimi K2.5: Visual Agentic Intelligence	Kimi Team et.al.	2602.02276	null
2026-02-02	vLLM-Omni: Fully Disaggregated Serving for Any-to-Any Multimodal Models	Peiqi Yin et.al.	2602.02204	null
2026-02-02	No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs	Liyan Xu et.al.	2602.02103	null
2026-02-02	Edge-Aligned Initialization of Kernels for Steered Mixture-of-Experts	Martin Determann et.al.	2602.02031	null
2026-02-02	SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning	Zhen-Hao Xie et.al.	2602.01990	null
2026-02-02	Mixture-of-Experts with Intermediate CTC Supervision for Accented Speech Recognition	Wonjun Lee et.al.	2602.01967	null
2026-02-02	SOPRAG: Multi-view Graph Experts Retrieval for Industrial Standard Operating Procedures	Liangtao Lin et.al.	2602.01858	null
2026-02-02	From Knowing to Doing Precisely: A General Self-Correction and Termination Framework for VLA models	Wentao Zhang et.al.	2602.01811	null
2026-02-02	Mutual-Guided Expert Collaboration for Cross-Subject EEG Classification	Zhi Zhang et.al.	2602.01728	null
2026-02-02	AdNanny: One Reasoning LLM for All Offline Ads Recommendation Tasks	Nan Hu et.al.	2602.01563	null
2026-02-01	A Statistical Theory of Gated Attention through the Lens of Hierarchical Mixture of Experts	Viet Nguyen et.al.	2602.01468	null
2026-02-01	Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function	Tuan Minh Pham et.al.	2602.01466	null
2026-02-01	Exposing and Defending the Achilles’ Heel of Video Mixture-of-Experts	Songping Wang et.al.	2602.01369	null
2026-02-01	Observation of $\barΛp\to K^{+}π^{+}π^{-}π^{0}$ and $\barΛp\to K^{+}π^{+}π^{-}2π^{0}$	BESIII Collaboration et.al.	2602.01282	null
2026-02-01	MiTA Attention: Efficient Fast-Weight Scaling via a Mixture of Top- $k$ Activations	Qishuai Wen et.al.	2602.01219	null
2026-02-01	Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse	Zizhuo Fu et.al.	2602.01203	null
2026-01-30	Omni-fMRI: A Universal Atlas-Free fMRI Foundation Model	Mo Wang et.al.	2601.23090	null
2026-01-30	UrbanMoE: A Sparse Multi-Modal Mixture-of-Experts Framework for Multi-Task Urban Region Profiling	Pingping Liu et.al.	2601.22746	null
2026-01-30	A Cross-Domain Graph Learning Protocol for Single-Step Molecular Geometry Refinement	Chengchun Liu et.al.	2601.22723	null
2026-01-30	A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization	Shiye Lei et.al.	2601.22718	null
2026-01-30	A Unified Study of LoRA Variants: Taxonomy, Review, Codebase, and Empirical Evaluation	Haonan He et.al.	2601.22708	null
2026-01-30	Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments	Jinwoo Jang et.al.	2601.22647	null
2026-01-30	SpanNorm: Reconciling Training Stability and Performance in Deep Transformers	Chao Wang et.al.	2601.22580	null
2026-01-30	SHED Light on Segmentation for Dense Prediction	Seung Hyun Lee et.al.	2601.22529	null
2026-01-30	Continual Policy Distillation from Distributed Reinforcement Learning Teachers	Yuxuan Li et.al.	2601.22475	null
2026-01-29	ECO: Quantized Training without Full-Precision Master Weights	Mahdi Nikdan et.al.	2601.22101	null
2026-01-29	Heterogeneous Computing: The Key to Powering the Future of AI Agent Inference	Yiren Zhao et.al.	2601.22001	null
2026-01-29	MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts	Lorenzo Mazza et.al.	2601.21971	null
2026-01-29	MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts	Evandro S. Ortigossa et.al.	2601.21866	null
2026-01-29	OneMall: One Model, More Scenarios – End-to-End Generative Recommender Family at Kuaishou E-Commerce	Kun Zhang et.al.	2601.21770	null
2026-01-29	Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers	Evandro S. Ortigossa et.al.	2601.21641	null
2026-01-29	Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves	Jonas Knupp et.al.	2601.21582	null
2026-01-29	Multi-Modal Time Series Prediction via Mixture of Modulated Experts	Lige Zhang et.al.	2601.21547	null
2026-01-29	ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory	Yang Zhao et.al.	2601.21545	null
2026-01-30	L $^3$ : Large Lookup Layers	Albert Tseng et.al.	2601.21461	null
2026-01-29	ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation	Zihao Huang et.al.	2601.21420	null
2026-01-29	L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts	Minghao Yang et.al.	2601.21349	null
2026-01-29	Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies	Ce Hao et.al.	2601.21251	null
2026-01-29	Scaling Embeddings Outperforms Scaling Experts in Language Models	Hong Liu et.al.	2601.21204	null
2026-01-29	ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling	Yuchen Yang et.al.	2601.21198	null
2026-01-29	Precise measurements of $D^0 \to K^-\ell^+ν_\ell$ and $D^+ \to \bar K^0\ell^+ν_\ell$ decays	BESIII Collaboration et.al.	2601.21196	null
2026-01-29	Search for $ψ_0(4360)\rightarrow ηψ(2S)$ through the process $e^+e^- \rightarrow ηηψ(2S)$	BESIII Collaboration et.al.	2601.21190	null
2026-01-29	First Experimental Constraint on the Scalar Current in the $D^{0(+)}\to \bar K\ell^+ν_{\ell}$ Transition	BESIII Collaboration et.al.	2601.21185	null
2026-01-29	BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding	Ziyi Zhao et.al.	2601.21148	null
2026-01-29	TRACE: Trajectory Recovery for Continuous Mechanism Evolution in Causal Representation Learning	Shicheng Fan et.al.	2601.21135	null
2026-01-28	ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler	Bohua Zou et.al.	2601.20755	null
2026-01-28	ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code	Mingqiao Mo et.al.	2601.20679	null
2026-01-28	Unsupervised Ensemble Learning Through Deep Energy-based Models	Ariel Maymon et.al.	2601.20556	null
2026-01-28	OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution	Le Zhang et.al.	2601.20380	null
2026-01-28	OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion	Shuoyan Wei et.al.	2601.20308	null
2026-01-28	MiLorE-SSL: Scaling Multilingual Capabilities in Self-Supervised Models without Forgetting	Jing Xu et.al.	2601.20300	null
2026-01-28	HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH	Yueyang Wang et.al.	2601.20255	null
2026-01-28	Hyperparameter Transfer with Mixture-of-Expert Layers	Tianze Jiang et.al.	2601.20205	null
2026-01-28	Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery	Zhipeng Zhang et.al.	2601.20193	null
2026-01-27	Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts	TrungKhang Tran et.al.	2601.19811	null
2026-01-27	Component-Level Lesioning of Language Models Reveals Clinically Aligned Aphasia Phenotypes	Yifan Wang et.al.	2601.19723	null
2026-01-27	LoPRo: Enhancing Low-Rank Quantization via Permuted Block-Wise Rotation	Hongyaoxing Gu et.al.	2601.19675	null
2026-01-27	GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining	Shentong Mo et.al.	2601.19606	null
2026-01-27	Search for the isospin-violating decays $\boldsymbol{χ_{cJ}\toΛ\barΣ^{0}+c.c.}$ and $\boldsymbol{η_{c}\toΛ\barΣ^{0}+c.c.}$	BESIII Collaboration et.al.	2601.19493	null
2026-01-27	Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition	Isha Pandey et.al.	2601.19451	null
2026-01-26	Superlinear Multi-Step Attention	Yufeng Huang et.al.	2601.18401	null
2026-01-26	FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning	Zhaopeng Qiu et.al.	2601.18150	null
2026-01-26	Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions	Pedram Agand et.al.	2601.18107	null
2026-01-26	OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion	Zhichao Wang et.al.	2601.18094	null
2026-01-26	LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts	Venmugil Elango et.al.	2601.18089	null
2026-01-25	Domain-Expert-Guided Hybrid Mixture-of-Experts for Medical AI: Integrating Data-Driven Learning with Clinical Priors	Jinchen Gu et.al.	2601.17977	null
2026-01-25	EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI Agents	Ying Mo et.al.	2601.17722	null
2026-01-25	$\infty$ -MoE: Generalizing Mixture of Experts to Infinite Experts	Shota Takashiro et.al.	2601.17680	null
2026-01-25	Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context	Zhihao Zhang et.al.	2601.17642	null
2026-01-24	PILOT: A Perceptive Integrated Low-level Controller for Loco-manipulation over Unstructured Scenes	Xinru Cui et.al.	2601.17440	null
2026-01-24	Topological Protection by Local Support Symmetry and Destructive Interference	Jun-Won Rhim et.al.	2601.17272	null
2026-01-23	Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts	Xuan-Phi Nguyen et.al.	2601.17111	null
2026-01-23	First evidence for $D_s^+ \to f_1(1420) e^+ν_e$ and search for $D_s^+ \to f_1(1285) e^+ν_e$	BESIII Collaboration et.al.	2601.16938	null
2026-01-23	Coarse-Grained Geometric Quantum Dynamics in the Tensor Network Representation	Mo Sha et.al.	2601.16913	null
2026-01-23	GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints	Andy Zhu et.al.	2601.16905	null
2026-01-23	Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation	Tims Pecerskis et.al.	2601.16863	null
2026-01-23	SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents	Yuhang Wang et.al.	2601.16746	null
2026-01-23	LongCat-Flash-Thinking-2601 Technical Report	Meituan LongCat Team et.al.	2601.16725	null
2026-01-23	*Search for the radiative decay $D^+_s \to γK^(892)^+$**	BESIII Collaboration et.al.	2601.16476	null
2026-01-22	proto-Lightspeed: a high-speed, ultra-low read noise imager on the Magellan Clay Telescope	Christopher Layden et.al.	2601.16268	null
2026-01-22	Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning	Moo Jin Kim et.al.	2601.16163	null
2026-01-22	Universal Refusal Circuits Across LLMs: Cross-Model Transfer via Trajectory Replay and Concept-Basis Reconstruction	Tony Cristofano et.al.	2601.16034	null
2026-01-22	Search for the reaction channel $e^+ e^- \to ηη\,J/ψ$ and the isospin partner of the $Z_c(3900)$ at center-of-mass energies $\sqrt{s} = 4.226-4.950$ GeV	BESIII Collaboration et.al.	2601.15882	null
2026-01-22	LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting	Yuhan Chen et.al.	2601.15772	null
2026-01-22	Redshift-Binned Constraints on the Hubble Constant under $Λ$ CDM, CPL, and Padé Cosmography	Zhi-Yuan Mo et.al.	2601.15765	null
2026-01-21	On the diagonal of low bidegree hypersurfaces	Morten Lüders et.al.	2601.15409	null
2026-01-21	Improving MoE Compute Efficiency by Composing Weight and Data Sparsity	Maciej Kilian et.al.	2601.15370	null
2026-01-21	Pb4U-GNet: Resolution-Adaptive Garment Simulation via Propagation-before-Update Graph Network	Aoran Liu et.al.	2601.15110	null
2026-01-21	Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization	Adam Rokah et.al.	2601.15021	null
2026-01-21	SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction	Kaixuan Zhang et.al.	2601.14910	null
2026-01-21	Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation	Rui Qi et.al.	2601.14896	null
2026-01-21	UBATrack: Spatio-Temporal State Space Model for General Multi-Modal Tracking	Qihua Liang et.al.	2601.14799	null
2026-01-21	UniRoute: Unified Routing Mixture-of-Experts for Modality-Adaptive Remote Sensing Change Detection	Qingling Shu et.al.	2601.14797	null
2026-01-21	Robustness of Mixtures of Experts to Feature Noise	Dong Sun et.al.	2601.14792	null
2026-01-21	Online Linear Programming with Replenishment	Yuze Chen et.al.	2601.14629	null
2026-01-20	$π$ MPC: A Parallel-in-horizon and Construction-free NMPC Solver	Liang Wu et.al.	2601.14414	null
2026-01-20	Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models	YuanLab. ai et.al.	2601.14327	null
2026-01-20	LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems	Badri N. Patro et.al.	2601.14053	null
2026-01-20	Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering	Yuxin Chen et.al.	2601.14050	null
2026-01-20	DExTeR: Weakly Semi-Supervised Object Detection with Class and Instance Experts for Medical Imaging	Adrien Meyer et.al.	2601.13954	null
2026-01-20	The R2Pub Telescopes for Surveying: An Overview and Performance Evaluation of the System	Xuan Song et.al.	2601.13587	null
2026-01-20	ButterflyMoE: Sub-Linear Ternary Experts via Structured Butterfly Orbits	Aryan Karmore et.al.	2601.13563	null
2026-01-20	MN-TSG:Continuous Time Series Generation with Irregular Observations	Xu Zhang et.al.	2601.13534	null
2026-01-19	CLIP-Guided Adaptable Self-Supervised Learning for Human-Centric Visual Tasks	Mingshuang Luo et.al.	2601.13133	null
2026-01-19	Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning	Fengran Mo et.al.	2601.13115	null
2026-01-19	Polychronous Wave Computing: Timing-Native Address Selection in Spiking Networks	Natalila G. Berloff et.al.	2601.13079	null
2026-01-19	Synthesizing Strong-Coupling Kohn-Luttinger Superconductivity in 2D Van der Waals materials	Shi-Cong Mo et.al.	2601.13074	null
2026-01-19	PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning	Zhiyan Hou et.al.	2601.13020	null
2026-01-19	HT-GNN: Hyper-Temporal Graph Neural Network for Customer Lifetime Value Prediction in Baidu Ads	Xiaohui Zhao et.al.	2601.13013	null
2026-01-19	OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models	Shiyuan Li et.al.	2601.12996	null
2026-01-19	PhyG-MoE: A Physics-Guided Mixture-of-Experts Framework for Energy-Efficient GNSS Interference Recognition	Zhihan Zeng et.al.	2601.12798	null
2026-01-19	Topology-Aware Multiscale Mixture of Experts for Efficient Molecular Property Prediction	Long D. Nguyen et.al.	2601.12637	null
2026-01-18	A Mixture of Experts Vision Transformer for High-Fidelity Surface Code Decoding	Hoang Viet Nguyen et.al.	2601.12483	null
2026-01-18	Learning Diverse Skills for Behavior Models with Mixture of Experts	Wangtian Shen et.al.	2601.12397	null
2026-01-18	NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages	Lakshya Tomar et.al.	2601.12389	null
2026-01-18	GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer	Xinyuan Zhao et.al.	2601.12316	null
2026-01-18	Facet-Aware Multi-Head Mixture-of-Experts Model with Text-Enhanced Pre-training for Sequential Recommendation	Mingrui Liu et.al.	2601.12301	null
2026-01-16	Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering	Yuling Shi et.al.	2601.11255	null
2026-01-16	First Measurement of the Absolute Branching Fraction of $η_c \to γγ$	BESIII Collaboration et.al.	2601.11236	null
2026-01-16	Self-Augmented Mixture-of-Experts for QoS Prediction	Kecheng Cai et.al.	2601.11036	null
2026-01-16	RobuMTL: Enhancing Multi-Task Learning Robustness Against Weather Conditions	Tasneem Shaffee et.al.	2601.10921	null
2026-01-15	Search for sub-GeV dark particles in $η\toπ^0+\rm{invisible}$ decay	BESIII Collaboration et.al.	2601.10597	null
2026-01-15	Deterministic and scalable generation of large Fock states	Mo Xiong et.al.	2601.10559	null
2026-01-15	Algebraic Farkas Lemma and Strong Duality for Perturbed Conic Linear Programming	P. D. Khanh et.al.	2601.10390	null
2026-01-15	MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts	Yuxuan Lou et.al.	2601.10272	null
2026-01-15	A Highly Magnetic Ultra Massive White Dwarf with a 23-minute Rotation Period	Jincheng Guo et.al.	2601.10188	null
2026-01-15	What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models	Guimin Hu et.al.	2601.10159	null
2026-01-15	MMPG: MoE-based Adaptive Multi-Perspective Graph Fusion for Protein Representation Learning	Yusong Wang et.al.	2601.10157	null
2026-01-15	Extremum Seeking Nonovershooting Control of Strict-Feedback Systems Under Unknown Control Direction	Kaixin Lu et.al.	2601.09998	null
2026-01-14	Progressive Mixture-of-Experts with autoencoder routing for continual RANS turbulence modelling	Haoyu Ji et.al.	2601.09305	null
2026-01-14	A Raman-Gas Spectral Compressor for High-Energy Femtosecond Laser Pulses	Zegui Wang et.al.	2601.09234	null
2026-01-15	A.X K1 Technical Report	Sung Jun Cheon et.al.	2601.09200	null
2026-01-14	WiFo-E: A Scalable Wireless Foundation Model for End-to-End FDD Precoding in Communication Networks	Weibo Wen et.al.	2601.09186	null
2026-01-14	Horseshoe Mixtures-of-Experts (HS-MoE)	Nick Polson et.al.	2601.09043	null
2026-01-13	OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG	Fengran Mo et.al.	2601.09028	null
2026-01-12	TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts	Yu Xu et.al.	2601.08881	null
2026-01-13	MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm	Bowen Zhou et.al.	2601.08800	null
2026-01-13	LWM-Spectro: A Foundation Model for Wireless Baseband Signal Spectrograms	Namhyun Kim et.al.	2601.08780	null
2026-01-13	M $^2$ FMoE: Multi-Resolution Multi-View Frequency Mixture-of-Experts for Extreme-Adaptive Time Series Forecasting	Yaohui Huang et.al.	2601.08631	null
2026-01-13	Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances	Ziqi Ding et.al.	2601.08516	null
2026-01-13	Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance	Jihang Li et.al.	2601.08418	null
2026-01-13	Controlled LLM Training on Spectral Sphere	Tian Xie et.al.	2601.08393	null
2026-01-13	Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models	Bo Wang et.al.	2601.08383	null
2026-01-13	Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints	Seng Pei Liew et.al.	2601.08215	null
2026-01-12	Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation	Yuxin Yang et.al.	2601.07935	null
2026-01-12	An eclipsing 8.56 minute orbital period mass-transferring binary	Emma T. Chickles et.al.	2601.07925	null
2026-01-12	Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator	Chaewon Heo et.al.	2601.07698	null
2026-01-12	Amplitude analysis and branching fraction measurement of $J/ψ\to Λ\barΣ^0η+\mathrm{c.c}$	BESIII Collaboration et.al.	2601.07617	null
2026-01-12	Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models	Xin Cheng et.al.	2601.07372	null
2026-01-11	PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation	Yuanzhe Liu et.al.	2601.07060	null
2026-01-11	Solar Open Technical Report	Sungrae Park et.al.	2601.07022	null
2026-01-11	Deep Learning Based Channel Extrapolation for Dual-Band Massive MIMO Systems	Qikai Xiao et.al.	2601.06858	null
2026-01-11	MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models	Xin Ye et.al.	2601.06857	null
2026-01-11	MoEScore: Mixture-of-Experts-Based Text-Audio Relevance Score Prediction for Text-to-Audio System Evaluation	Bochao Sun et.al.	2601.06829	null
2026-01-11	SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute	Bowen Shen et.al.	2601.06790	null
2026-01-11	AutoTour: Automatic Photo Tour Guide with Smartphones and LLMs	Huatao Xu et.al.	2601.06781	null
2026-01-11	MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues	Zheyuan Liu et.al.	2601.06757	null
2026-01-10	R-Estimation with Right-Censored Data	Glen A. Satten et.al.	2601.06685	null
2026-01-10	Efficient and Reliable Estimation of Named Entity Linking Quality: A Case Study on GutBrainIE	Marco Martinelli et.al.	2601.06624	null
2026-01-10	Hellinger Multimodal Variational Autoencoders	Huyen Khanh Vo et.al.	2601.06572	null
2026-01-10	Physics-guided foundation model for universal speckle removal in ultrathin multimode fiber imaging	Xianrui Zeng et.al.	2601.06448	null
2026-01-10	The Promise of Time-Series Foundation Models for Agricultural Forecasting: Evidence from Marketing Year Average Prices	Le Wang et.al.	2601.06371	null
2026-01-09	Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning	Nusrat Jahan Prottasha et.al.	2601.06356	null
2026-01-09	AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving	Tianhao Xu et.al.	2601.06288	null
2026-01-09	Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR	Zijun Min et.al.	2601.05607	null
2026-01-09	Buffered AUC maximization for scoring systems via mixed-integer optimization	Moe Shiina et.al.	2601.05544	null
2026-01-09	Scalable Heterogeneous Graph Learning via Heterogeneous-aware Orthogonal Prototype Experts	Wei Zhou et.al.	2601.05537	null
2026-01-08	MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs	Jiyuan Zhang et.al.	2601.05296	null
2026-01-08	MoE3D: A Mixture-of-Experts Module for 3D Reconstruction	Zichen Wang et.al.	2601.05208	null
2026-01-08	FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts	Yiji Zhao et.al.	2601.05174	link
2026-01-08	How to Set the Learning Rate for Large-Scale Pre-training?	Yunhua Zhou et.al.	2601.05049	null
2026-01-08	CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters	Ao Sun et.al.	2601.04885	null
2026-01-08	DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation	Guanzhi Deng et.al.	2601.04823	null
2026-01-08	Users Mispredict Their Own Preferences for AI Writing Assistance	Vivian Lai et.al.	2601.04461	null
2026-01-08	Re-Rankers as Relevance Judges	Chuan Meng et.al.	2601.04455	null
2026-01-07	Transitive Expert Error and Routing Problems in Complex AI Systems	Forest Mars et.al.	2601.04416	null
2026-01-06	Scaling Trends for Multi-Hop Contextual Reasoning in Mid-Scale Language Models	Brady Steele et.al.	2601.04254	null
2026-01-07	When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life	Xinyue Lou et.al.	2601.04043	null
2026-01-07	A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems	Qi Wu et.al.	2601.03992	null
2026-01-07	Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures	Ibrahim Delibasoglu et.al.	2601.03889	null
2026-01-07	PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation	Wenlong Huang et.al.	2601.03782	null
2026-01-07	Variational Inference, Entropy, and Orthogonality: A Unified Theory of Mixture-of-Experts	Ye Su et.al.	2601.03577	null
2026-01-07	CALM: Culturally Self-Aware Language Models	Lingzhi Shen et.al.	2601.03483	null
2026-01-06	The Illusion of Specialization: Unveiling the Domain-Invariant “Standing Committee” in Mixture-of-Experts Models	Yan Wang et.al.	2601.03425	null
2026-01-06	AT2024wpp: An Extremely Luminous Fast Ultraviolet Transient Powered by Accretion onto a Black Hole	Daniel A. Perley et.al.	2601.03337	null
2026-01-06	ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios	Yihan Wei et.al.	2601.03011	null
2026-01-08	MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free	Yishu Lei et.al.	2601.02967	null
2026-01-06	MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation	Wenzhao Jiang et.al.	2601.02943	null
2026-01-06	MiMo-V2-Flash Technical Report	Bangjun Xiao et.al.	2601.02780	null
2026-01-05	Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts	Boxuan Lyu et.al.	2601.02144	null
2026-01-05	Cross section measurement of $e^{+}e^{-}\rightarrow π^{0}π^{0}ψ(3686)$ from $\sqrt{s}=$ 4.008 GeV to 4.951 GeV	BESIII Collaboration et.al.	2601.02136	null
2026-01-07	FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations	Adeshola Okubena et.al.	2601.02071	null
2026-01-05	GCR: Geometry-Consistent Routing for Task-Agnostic Continual Anomaly Detection	Joongwon Chae et.al.	2601.01856	null
2026-01-05	First Observation of $D^{0(+)}\to \bar Kωe^+ν_e$ and Determination of the Branching Fraction of $\bar K_1(1270)\to \bar K ω$	BESIII Collaboration et.al.	2601.01817	null
2026-01-05	Causality-Aware Temporal Projection for Video Understanding in Video-LLMs	Zhengjian Kang et.al.	2601.01804	null
2026-01-05	Measurements of the branching fractions of $χ_{cJ}\to 2K^+ 2K^- ω$ and $φK^+ K^- ω$ decays	BESIII Collaboration et.al.	2601.01758	null
2026-01-05	K-EXAONE Technical Report	Eunbi Choi et.al.	2601.01739	null
2026-01-05	Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications	YuanLab. ai et.al.	2601.01718	null
2026-01-05	Varying-Coefficient Mixture of Experts Model	Qicheng Zhao et.al.	2601.01699	null
2026-01-06	Measurements of the absolute branching fractions of the $Λ_{c}^{+}$ hadronic decays	BESIII Collaboration et.al.	2601.01503	null
2026-01-04	Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts	Ruofeng Yang et.al.	2601.01475	null
2026-01-06	Making MoE-based LLM Inference Resilient with Tarragon	Songyu Zhang et.al.	2601.01310	null
2026-01-03	MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance	Hamad Khan et.al.	2601.01260	null
2026-01-02	Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures	Kabir Grover et.al.	2601.00942	null
2026-01-02	HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts	Zihan Fang et.al.	2601.00583	null
2026-01-02	A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR	Yuang Zheng et.al.	2601.00557	null
2026-01-01	Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations	Hyunjun Kim et.al.	2601.00457	null
2026-01-01	Traffic-MoE: A Sparse Foundation Model for Network Traffic Analysis	Jiajun Zhou et.al.	2601.00357	null
2026-01-01	Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach	Kohei Yoshikawa et.al.	2601.00287	null
2025-12-31	Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem	Weixun Wang et.al.	2512.24873	null
2025-12-31	Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models	Ákos Prucs et.al.	2512.24776	null
2025-12-30	Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning	Ziqing Fan et.al.	2512.24265	null
2025-12-30	Training Report of TeleChat3-MoE	Xinzhang Liu et.al.	2512.24157	null
2025-12-30	*Skyrmion and Meron Crystals in Intermetallic Gd $3$Ru$_4$Al${12}$ : Microscopic Model Insights into Chiral Phases*	Jiajun Mo et.al.	2512.24071	null
2025-12-30	RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress	Ruixuan Huang et.al.	2512.23995	null
2025-12-30	Towards a bottom-up formulation of spin kinetic theory	Zonglin Mo et.al.	2512.23960	null
2026-01-02	Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling	Chulun Zhou et.al.	2512.23959	null
2025-12-30	Learnable Query Aggregation with KV Routing for Cross-view Geo-localisation	Hualin Ye et.al.	2512.23938	null
2025-12-29	Observations of the Fermi bubbles and the Galactic center excess with the DArk Matter Particle Explorer	F. Alemanno et.al.	2512.23458	null
2025-12-29	Dynamic Subspace Composition: Efficient Adaptation via Contractive Basis Expansion	Vladimer Khasia et.al.	2512.23448	null
2025-12-29	Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss	Ang Lv et.al.	2512.23447	null
2025-12-29	Bitcoin-IPC: Scaling Bitcoin with a Network of Proof-of-Stake Subnets	Marko Vukolić et.al.	2512.23439	null
2025-12-29	*Study of $\bar{K}^(892)^0 η$ and $K_S^0 a_0(980)^0$ in the $D^{0} \to K_{S}^{0}π^0η$ decay**	BESIII Collaboration et.al.	2512.23389	null
2025-12-30	YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection	Xu Lin et.al.	2512.23273	null
2025-12-28	Trust Region Masking for Long-Horizon LLM Reinforcement Learning	Yingru Li et.al.	2512.23075	null
2025-12-28	FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment	Boyang Zhang et.al.	2512.23070	null
2025-12-28	Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware	Alex Khalil et.al.	2512.23029	null
2025-12-28	Reach-Avoid Differential game with Reachability Analysis for UAVs: A decomposition approach	Minh Bui et.al.	2512.22793	null
2025-12-28	Text-Routed Sparse Mixture-of-Experts Model with Explanation and Temporal Alignment for Multi-Modal Sentiment Analysis	Dongning Rao et.al.	2512.22741	null
2025-12-27	RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure	Wei Gao et.al.	2512.22560	null
2025-12-27	Scalpel-SAM: A Semi-Supervised Paradigm for Adapting SAM to Infrared Small Object Detection	Zihan Liu et.al.	2512.22483	null
2025-12-27	Bright 4B: Scaling Hyperspherical Learning for Segmentation in 3D Brightfield Microscopy	Amil Khan et.al.	2512.22423	null
2025-12-26	FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion	Zhuoran Zhu et.al.	2512.22036	null
2025-12-26	SWE-RM: Execution-free Feedback For Software Engineering Agents	KaShun Shum et.al.	2512.21919	null
2025-12-26	Accelerate Speculative Decoding with Sparse Computation in Verification	Jikai Wang et.al.	2512.21911	null
2025-12-26	MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction	Carolina Aparício et.al.	2512.21897	null
2025-12-26	CrownGen: Patient-customized Crown Generation via Point Diffusion Model	Juyoung Bae et.al.	2512.21890	null
2025-12-26	SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis	Mo Wang et.al.	2512.21881	null
2025-12-25	Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction	Zheng Yin et.al.	2512.21707	null
2025-12-25	Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism	Xinglin Pan et.al.	2512.21487	null
2025-12-24	DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction	Khondoker Mirazul Mumenin et.al.	2512.21433	null
2025-12-24	SparScene: Efficient Traffic Scene Representation via Sparse Graph Learning for Large-Scale Trajectory Generation	Xiaoyu Mo et.al.	2512.21133	null
2025-12-26	Identification with Orthogonal Basis Functions: Convergence Speed, Asymptotic Bias, and Rate-Optimal Pole Selection	Jiayun Li et.al.	2512.21096	null
2025-12-25	GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs	Lichao Wu et.al.	2512.21008	null
2025-12-24	SACodec: Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs	Zhongren Dong et.al.	2512.20944	null
2025-12-24	RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks	Ningyuan Liu et.al.	2512.20920	null
2025-12-24	NVIDIA Nemotron 3: Efficient and Open Intelligence	NVIDIA et.al.	2512.20856	null
2025-12-23	Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning	NVIDIA et.al.	2512.20848	null
2025-12-23	Defending against adversarial attacks using mixture of experts	Mohammad Meymani et.al.	2512.20821	null
2025-12-23	MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts	Alexandros Christoforos et.al.	2512.20604	null
2025-12-23	Branch Learning in MRI: More Data, More Models, More Training	Yuyang Li et.al.	2512.20330	null
2025-12-23	Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity	Yuxing Gan et.al.	2512.20291	null
2025-12-23	Degradation-Aware Metric Prompting for Hyperspectral Image Restoration	Binfeng Wang et.al.	2512.20251	null
2025-12-23	AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model	Sofian Chaybouti et.al.	2512.20157	null
2025-12-23	Fun-Audio-Chat Technical Report	Qian Chen et.al.	2512.20156	null
2025-12-23	Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting	Sangoh Lee et.al.	2512.20014	null
2025-12-23	Observation and branching fraction measurements of $χ_{cJ}\to p \bar p K^0_S K^0_S$	BESIII Collaboration et.al.	2512.19993	null
2025-12-22	UCCL-EP: Portable Expert-Parallel Communication	Ziming Mao et.al.	2512.19849	null
2025-12-21	How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts	Sumin Park et.al.	2512.19765	null
2025-12-22	Towards Closed-Loop Embodied Empathy Evolution: Probing LLM-Centric Lifelong Empathic Motion Generation in Unseen Scenarios	Jiawen Wang et.al.	2512.19551	null
2025-12-22	EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control	Chao Yang et.al.	2512.19043	null
2025-12-21	Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation	Guangtao Lyu et.al.	2512.18804	null
2025-12-21	Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts	Linwei Qiu et.al.	2512.18718	null
2025-12-21	Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless Computing	Wentao Liu et.al.	2512.18674	null
2025-12-21	Commercial Vehicle Braking Optimization: A Robust SIFT-Trajectory Approach	Zhe Li et.al.	2512.18597	null
2025-12-20	Secret mixtures of experts inside your LLM	Enric Boix-Adsera et.al.	2512.18452	null
2025-12-20	MoE Pathfinder: Trajectory-driven Expert Pruning	Xican Yang et.al.	2512.18425	null
2025-12-20	MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation	Kaixing Yang et.al.	2512.18181	null
2025-12-20	Cross section and parametrization of charmonium decay	Xiao-Hu Mo et.al.	2512.18154	null
2025-12-19	MoE-TransMov: A Transformer-based Model for Next POI Prediction in Familiar & Unfamiliar Movements	Ruichen Tan et.al.	2512.17985	null
2025-12-19	Interpreting the strong clustering of ultra-diffuse galaxies by halo spin bias	Qinglin Ma et.al.	2512.17742	null
2025-12-19	Cross sections measurement of $e^+e^-\to Ξ(1530)^0\barΞ^0 + c.c.$ and search for $ψ(3770)\toΞ(1530)^0\barΞ^0 + c.c.$	BESIII Colaboration et.al.	2512.17275	null
2025-12-19	Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding	Yuqing Li et.al.	2512.17220	null
2025-12-19	Capturing Arbitrary Waveform without Absorption with Synthesis of Complex Frequencies	Zhaohua Tian et.al.	2512.17156	null
2025-12-18	Bandwidth-Efficient Adaptive Mixture-of-Experts via Low-Rank Compensation	Zhenyu Liu et.al.	2512.17073	null
2025-12-18	Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models	Zhongpan Tang et.al.	2512.16963	null
2025-12-18	LinkedOut: Linking World Knowledge Representation Out of Video LLM for Next-Generation Video Recommendation	Haichao Zhang et.al.	2512.16891	null
2025-12-18	The WINTER Observatory: A One-Degree InGaAs Survey Camera to study the Transient Infrared Sky	Danielle Frostig et.al.	2512.16753	null
2025-12-18	PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation	Mengyuan Liu et.al.	2512.16494	null
2025-12-18	Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems	En-Ming Huang et.al.	2512.16473	null
2025-12-18	Pretrained Battery Transformer (PBT): A battery life prediction foundation model	Ruifeng Tan et.al.	2512.16334	null
2025-12-19	Sigma-MoE-Tiny Technical Report	Qingguo Hu et.al.	2512.16248	null
2025-12-18	Open Ad-hoc Categorization with Contextualized Feature Learning	Zilin Wang et.al.	2512.16202	null
2025-12-18	INTELLECT-3: Technical Report	Prime Intellect Team et.al.	2512.16144	null
2025-12-17	Wake instability past a sphere settling in a strongly stratified flow	Chang-Fan Mo et.al.	2512.15626	null
2025-12-17	Measurements of the Absolute Branching Fraction of the Semileptonic Decay $\mathbf{Ξ^{-}\rightarrow Λe^- \barν_{e}}$ and the Axial Charge of the $\mathbfΞ^{-}$	BESIII Collaboration et.al.	2512.15273	null
2025-12-19	VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments	Yuze Wu et.al.	2512.15258	null
2025-12-17	*Search for the decays $X(3872)\to K_{S}^{0}K^{\pm}π^{\mp}$ and $K^(892)\bar{K}$ at BESIII**	BESIII Collaboration et.al.	2512.15091	null
2025-12-19	Let the Barbarians In: How AI Can Accelerate Systems Performance Research	Audrey Cheng et.al.	2512.14806	null
2025-12-15	SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning	Tomohito Kawabata et.al.	2512.14757	null
2025-12-16	Measurements of the branching fractions of $χ_{cJ}\to φφη, φφη^{\prime}$ and $φK^+K^-η$	BESIII Collaboration et.al.	2512.14369	null
2025-12-16	SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing	Han Zou et.al.	2512.14140	null
2025-12-16	SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations	Wentao Guo et.al.	2512.14080	null
2025-12-16	Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training	Can Jin et.al.	2512.13996	null
2025-12-15	Connection between galaxy morphology and dark-matter halo structure II: predicting disk structure from dark-matter halo properties	Jinning Liang et.al.	2512.13822	null
2025-12-13	RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing	Yuhan Tang et.al.	2512.13727	null
2025-12-15	StutterFuse: Mitigating Modality Collapse in Stuttering Detection with Jaccard-Weighted Metric Learning and Gated Fusion	Guransh Singh et.al.	2512.13632	null
2025-12-16	Janus: Disaggregating Attention and Experts for Scalable MoE Inference	Zhexiang Zhang et.al.	2512.13525	null
2025-12-15	SIGMA: An AI-Empowered Training Stack on Early-Life Hardware	Lei Qu et.al.	2512.13488	null
2025-12-15	Automated Information Flow Selection for Multi-scenario Multi-task Recommendation	Chaohua Yang et.al.	2512.13396	null
2025-12-15	Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC	Qingyuan Liu et.al.	2512.13047	null
2025-12-15	Safe Control of Multi-Agent Systems with Minimal Communication	Mo Yang et.al.	2512.13021	null
2025-12-15	SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE Inference	Yuseon Choi et.al.	2512.12990	null
2025-12-14	Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution	Boyang Yan et.al.	2512.12806	null
2025-12-14	Bayesian Optimization Parameter Tuning Framework for a Lyapunov Based Path Following Controller	Zhewen Zheng et.al.	2512.12649	null
2025-12-13	Amplitude Analysis and Branching Fraction Measurement of $D^+ \to π^+π^0π^0$	BESIII Collaboration et.al.	2512.12397	null
2025-12-13	Fine-Grained Zero-Shot Learning with Attribute-Centric Representations	Zhi Chen et.al.	2512.12219	null
2025-12-13	ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB	Jeongjun Park et.al.	2512.12206	null
2025-12-13	MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models	Ahmad Chamma et.al.	2512.12121	null
2025-12-12	Measurement of the cosmic ray nickel energy spectrum from 10 GeV/n to 2 TeV/n with the DAMPE	F. Alemanno et.al.	2512.11425	null
2025-12-11	Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration	Sicheng Mo et.al.	2512.10954	null
2025-12-11	Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration	Wenlong Jiao et.al.	2512.10581	null
2025-12-11	Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment	Han Li et.al.	2512.10450	null
2025-12-12	Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge	Junjie Bai et.al.	2512.10071	null
2025-12-10	Efficient Continual Learning in Neural Machine Translation: A Low-Rank Adaptation Approach	Salvador Carrión et.al.	2512.09910	null
2025-12-10	DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation	Zhizhong Wang et.al.	2512.09814	null
2025-12-10	M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks	Blessed Guda et.al.	2512.09797	null
2025-12-10	First measurement of the absolute branching fractions of $Σ^+$ nonleptonic decays and test of the $ΔI = 1/2$ rule % $Σ^+ \to p π^0$ and $Σ^+ \to n π^+$	BESIII Collaboration et.al.	2512.09628	null
2025-12-10	FoundIR-v2: Optimizing Pre-Training Data Mixtures for Image Restoration Foundation Model	Xiang Chen et.al.	2512.09282	null
2025-12-10	Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens	Yanpeng Yu et.al.	2512.09277	null
2025-12-10	Bug Priority Change Prediction: An Exploratory Study on Apache Software	Guangzong Cai et.al.	2512.09216	null
2025-12-09	Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts	Yifan Lyu et.al.	2512.08814	null
2025-12-09	What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance	Athena Psalta et.al.	2512.08697	null
2025-12-09	Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems	Mingwei Li et.al.	2512.08411	null
2025-12-09	FastBEV++: Fast by Algorithm, Deployable by Design	Yuanpeng Chen et.al.	2512.08237	null
2025-12-08	Relational Visual Similarity	Thao Nguyen et.al.	2512.07833	null
2025-12-08	Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE	Anxiang Zeng et.al.	2512.07710	null
2025-12-08	LongCat-Image Technical Report	Meituan LongCat Team et.al.	2512.07584	null
2025-12-12	MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion Transformer	Penghui Liu et.al.	2512.07500	null
2025-12-08	Equivariant Diffusion for Crystal Structure Prediction	Peijia Lin et.al.	2512.07289	null
2025-12-08	Measurement of the branching fraction of $η\to μ^+ μ^-$ and search for $η\to e^+ e^-$	BESIII Collaboration et.al.	2512.07144	null
2025-12-09	TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning	Zebin Xing et.al.	2512.07135	null
2025-12-08	PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes	Kepeng Lin et.al.	2512.07113	null
2025-12-07	Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding	MinCheol Jeon et.al.	2512.06929	null
2025-12-07	Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks	Long Shi et.al.	2512.06784	null
2025-12-07	Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving	Wei-Bin Kou et.al.	2512.06664	null
2025-12-06	Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion	Jaewon Ahn et.al.	2512.06449	null
2025-12-04	The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation	Ranjan Sapkota et.al.	2512.06032	null
2025-12-05	HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies	Zhiying Du et.al.	2512.05693	null
2025-12-05	ProPhy: Progressive Physical Alignment for Dynamic World Simulation	Zijun Wang et.al.	2512.05564	null
2025-12-04	Evidence for the semileptonic decays $Λ_c^{+} \to Σ^{\pm} π^{\mp} e^+ ν_e$	BESIII Collaboration et.al.	2512.05178	null
2025-12-09	EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture	Xin He et.al.	2512.04810	null
2025-12-04	Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild	Yigui Feng et.al.	2512.04728	null
2025-12-04	Study of the reaction $Ξ^{0}n\rightarrowΛΛX$ using $Ξ^{0}$ -nucleus scattering	BESIII Collaboration et.al.	2512.04701	null
2025-12-04	Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space	Joey Hong et.al.	2512.04601	null
2025-12-04	The Binary Fraction of Stars in the Dwarf Galaxy Ursa Minor via Dark Energy Spectroscopic Instrument	Tian Qiu et.al.	2512.04477	null
2025-12-04	Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems	Zehao Fan et.al.	2512.04476	null
2025-12-03	Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research	Zia Qi et.al.	2512.04261	null
2025-12-03	Decoding Large Language Diffusion Models with Foreseeing Movement	Yichuan Mo et.al.	2512.04135	null
2025-12-03	Stable Signer: Hierarchical Sign Language Generative Model	Sen Fang et.al.	2512.04048	null
2025-12-03	OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference	Liujianfu Wang et.al.	2512.03927	null
2025-12-04	A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models	X. Y. Han et.al.	2512.03915	null
2025-12-03	Parsimonious Clustering of Covariance Matrices	Yixi Xu et.al.	2512.03912	null
2025-12-03	Measurement of the hyperon weak radiative decay $Ξ^0\toγΣ^0$ at BESIII	BESIII Collaboration et.al.	2512.03877	null
2025-12-03	Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation	Subin Kim et.al.	2512.03534	null
2025-12-03	CellScout: Visual Analytics for Mining Biomarkers in Cell State Discovery	Rui Sheng et.al.	2512.03485	null
2025-12-03	Unconventional Magneto-Optical Effects in Altermagnets	Yongpan Li et.al.	2512.03435	null
2025-12-03	SSLfmm: An R Package for Semi-Supervised Learning with a Mixed-Missingness Mechanism in Finite Mixture Models	Geoffrey J. McLachlan et.al.	2512.03322	null
2025-12-02	Intrinsic Second-Order Topological Superconductors with Tunable Majorana Zero Modes	Xiao-Jiao Wang et.al.	2512.02775	null
2025-12-02	Stepwise Schema-Guided Prompting Framework with Parameter Efficient Instruction Tuning for Multimedia Event Extraction	Xiang Yuan et.al.	2512.02584	null
2025-12-02	SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts	Jiaqi Liu et.al.	2512.02517	null
2025-12-02	A Fully First-Order Layer for Differentiable Optimization	Zihao Zhao et.al.	2512.02494	null
2025-12-02	Quasi-steady electron-excitonic complexes coupling in a two-dimensional semiconductor	Shangkun Mo et.al.	2512.02490	null
2025-12-02	Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention	Wenyi Xiong et.al.	2512.02368	null
2025-12-02	Understanding and Harnessing Sparsity in Unified Multimodal Models	Shwai He et.al.	2512.02351	null
2025-12-02	OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning	Boyu Zhu et.al.	2512.02306	null
2025-12-01	Towards Unified Video Quality Assessment	Chen Feng et.al.	2512.02224	null
2025-12-01	ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation	Chenyang Gu et.al.	2512.02013	null
2025-12-01	Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks	Kai Zhang et.al.	2512.01750	null
2025-12-01	GRASP: Guided Residual Adapters with Sample-wise Partitioning	Felix Nützel et.al.	2512.01675	null
2025-12-01	Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery	Zhicheng Zhao et.al.	2512.01665	null
2025-12-01	Cuffless Blood Pressure Estimation from Six Wearable Sensor Modalities in Multi-Motion-State Scenarios	Yiqiao Chen et.al.	2512.01653	null
2025-12-01	Integrated YOLOP Perception and Lyapunov-based Control for Autonomous Mobile Robot Navigation on Track	Mo Chen et.al.	2512.01608	null
2025-12-01	Personalized optimization of pediatric HD-tDCS for dose consistency and target engagement	Zeming Liu et.al.	2512.01406	null
2025-12-02	Stabilizing Reinforcement Learning with LLMs: Formulation and Practices	Chujie Zheng et.al.	2512.01374	null
2025-12-01	TagSplat: Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking	Hanzhi Guo et.al.	2512.01329	null
2025-12-01	Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe	Yahui Liu et.al.	2512.01252	null
2025-11-30	Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios	Jianxiang Zang et.al.	2512.00920	null
2025-11-30	Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning	Yebo Wu et.al.	2512.00902	null
2025-11-30	Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking	Lingling Fu et.al.	2512.00724	null
2025-11-29	GCMCG: A Clustering-Aware Graph Attention and Expert Fusion Network for Multi-Paradigm, Multi-task, and Cross-Subject EEG Decoding	Yiqiao Chen et.al.	2512.00574	null
2025-11-28	Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model	Junshu Tang et.al.	2511.23429	null
2025-11-28	LFM2 Technical Report	Alexander Amini et.al.	2511.23404	null
2025-11-28	Chart2Code-MoLA: Efficient Multi-Modal Code Generation via Adaptive Expert Routing	Yifei Wang et.al.	2511.23321	null
2025-11-28	Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models	Xiang Hu et.al.	2511.23319	null
2025-11-28	Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering	Zijian Fu et.al.	2511.23304	null
2025-11-28	Experts are all you need: A Composable Framework for Large Language Model Inference	Shrihari Sridharan et.al.	2511.22955	null
2025-11-28	EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model	Yuhao Xu et.al.	2511.22935	null
2025-11-27	Architecture Decoupling Is Not All You Need For Unified Multimodal Model	Dian Zheng et.al.	2511.22663	null
2025-11-27	OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency	Jun Wang et.al.	2511.22481	null
2025-11-27	Foundation Model for Intelligent Wireless Communications	Boxun Liu et.al.	2511.22222	null
2025-11-27	MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding	Yu Li et.al.	2511.22103	null
2025-11-27	Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian	Yiran Zhang et.al.	2511.22069	null
2025-11-26	Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models	Naifu Zhang et.al.	2511.21663	null
2025-11-26	Continual Error Correction on Low-Resource Devices	Kirill Paramonov et.al.	2511.21652	null
2025-11-27	Qwen3-VL Technical Report	Shuai Bai et.al.	2511.21631	null
2025-11-26	Enhanced Landmark Detection Model in Pelvic Fluoroscopy using 2D/3D Registration Loss	Chou Mo et.al.	2511.21575	null
2025-11-26	Scaling limits of critical FK-decorated random planar maps with $q=4$	William Da Silva et.al.	2511.21480	null
2025-11-26	Study of the reactions $\bar{n} p \to 2π^{+}π^{-}$, $2π^{+}π^{-}π^{0}$, and $2π^{+}π^{-}2π^{0}$ using $J/ψ\to p π^{-}\bar{n}$	BESIII Collaboration et.al.	2511.21462	null
2025-11-26	MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training	Lu Zhao et.al.	2511.21431	null
2025-11-26	Do Reasoning Vision-Language Models Inversely Scale in Test-Time Compute? A Distractor-centric Empirical Analysis	Jiyun Bae et.al.	2511.21397	null
2025-11-26	Conditional Generative Modeling of Stochastic LTI Systems: A Behavioral Approach	Jiayun Li et.al.	2511.21219	null
2025-11-26	MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts	Ivan Novikov et.al.	2511.21089	null
2025-11-25	HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation	Xiang Wang et.al.	2511.20520	null
2025-11-25	Soft Adaptive Policy Optimization	Chang Gao et.al.	2511.20347	null
2025-11-25	ADNet: A Large-Scale and Extensible Multi-Domain Benchmark for Anomaly Detection Across 380 Real-World Categories	Hai Ling et.al.	2511.20169	null
2025-11-25	Adaptive Knowledge Transfer for Cross-Disciplinary Cold-Start Knowledge Tracing	Yulong Deng et.al.	2511.20009	null
2025-11-25	SONIC: Spectral Optimization of Noise for Inpainting with Consistency	Seungyeon Baek et.al.	2511.19985	null
2025-11-25	Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models	Wentao Hu et.al.	2511.19822	null
2025-11-22	Exploiting the Experts: Unauthorized Compression in MoE-LLMs	Pinaki Prasad Guha Neogi et.al.	2511.19480	null
2025-11-22	Tracking and Segmenting Anything in Any Modality	Tianlu Zhang et.al.	2511.19475	null
2025-11-24	Life-IQA: Boosting Blind Image Quality Assessment through GCN-enhanced Layer Interaction and MoE-based Feature Decoupling	Long Tang et.al.	2511.19024	null
2025-11-24	OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs	Yuting Gao et.al.	2511.19023	null
2025-11-24	Dynamic Mixture of Experts Against Severe Distribution Shifts	Donghu Kim et.al.	2511.18987	null
2025-11-23	HiFi-MambaV2: Hierarchical Shared-Routed MoE for High-Fidelity MRI Reconstruction	Pengcheng Fang et.al.	2511.18534	null
2025-11-23	AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert	Yuting Gao et.al.	2511.18314	null
2025-11-22	PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures	Yuheng Shao et.al.	2511.18116	null
2025-11-22	CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking	Hao Li et.al.	2511.17967	null
2025-11-22	Measuring the Impact of Lexical Training Data Coverage on Hallucination Detection in Large Language Models	Shuo Zhang et.al.	2511.17946	null
2025-11-22	FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning	Guoyang Xia et.al.	2511.17885	null
2025-11-22	Equivalence of Context and Parameter Updates in Modern Transformer Blocks	Adrian Goldwaser et.al.	2511.17864	null
2025-11-21	Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization	Akhil Singampalli et.al.	2511.17829	null
2025-11-21	Boosting Brain-inspired Path Integration Efficiency via Learning-based Replication of Continuous Attractor Neurodynamics	Zhangyu Ge et.al.	2511.17687	null
2025-11-21	Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required?	Sukwon Yun et.al.	2511.17400	null
2025-11-21	MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment	Huangbiao Xu et.al.	2511.17397	link
2025-11-21	Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design	Quentin Anthony et.al.	2511.17127	null
2025-11-21	Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters	Zhan Su et.al.	2511.17044	null
2025-11-21	VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions	Qianyi Shao et.al.	2511.16998	null
2025-11-21	RadioKMoE: Knowledge-Guided Radiomap Estimation with Kolmogorov-Arnold Networks and Mixture-of-Experts	Fupei Guo et.al.	2511.16986	null
2025-11-21	MicroMoE: Fine-Grained Load Balancing for Mixture-of-Experts with Token Scheduling	Chenqi Zhao et.al.	2511.16947	null
2025-11-20	*Search for the charmonium weak decay $J/ψ\to\bar{D}^0\bar{K}^{0}+{\rm c.c.}$**	BESIII Collaboration et.al.	2511.16083	null
2025-11-20	Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution	Xiao He et.al.	2511.16024	null
2025-11-19	AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture	Qiming Guo et.al.	2511.15870	null
2025-11-19	MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping	Yushi Huang et.al.	2511.15690	null
2025-11-19	Search for the lepton number violating process $Ξ^- \rightarrow Σ^+ e^- e^- +c.c.$	BESIII Collaboration et.al.	2511.15394	null
2025-11-19	VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation	Tairan He et.al.	2511.15200	null
2025-11-19	GPU-Initiated Networking for NCCL	Khaled Hamidouche et.al.	2511.15076	null
2025-11-19	WiCo-PG: Wireless Channel Foundation Model for Pathloss Map Generation via Synesthesia of Machines	Mingran Sun et.al.	2511.15030	null
2025-11-19	WiCo-MG: Wireless Channel Foundation Model for Multipath Generation via Synesthesia of Machines	Zengrui Han et.al.	2511.15026	null
2025-11-19	Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference	Kexin Chu et.al.	2511.15015	null
2025-11-18	HMC: Learning Heterogeneous Meta-Control for Contact-Rich Loco-Manipulation	Lai Wei et.al.	2511.14756	null
2025-11-18	Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching	Jintao Zhang et.al.	2511.14488	null
2025-11-18	MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts	Wenfeng Wang et.al.	2511.14102	null
2025-11-18	FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration	Jingren Liu et.al.	2511.14099	null
2025-11-18	SMGeo: Cross-View Object Geo-Localization with Grid-Level Mixture-of-Experts	Fan Zhang et.al.	2511.14093	null
2025-11-17	MoMoE: A Mixture of Expert Agent Model for Financial Sentiment Analysis	Peng Shu et.al.	2511.13983	null
2025-11-17	InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE	Lipeng Wang et.al.	2511.13488	null
2025-11-18	YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection	Ori Meiraz et.al.	2511.13344	null
2025-11-17	Skeletons Speak Louder than Text: A Motion-Aware Pretraining Paradigm for Video-Based Person Re-Identification	Rifen Lin et.al.	2511.13150	null
2025-11-17	Self-Adaptive Graph Mixture of Models	Mohit Meena et.al.	2511.13062	null
2025-11-17	Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation	Yu Hou et.al.	2511.12922	null
2025-11-17	Simple Lines, Big Ideas: Towards Interpretable Assessment of Human Creativity from Drawings	Zihao Lin et.al.	2511.12880	null
2025-11-16	Connectivity-Guided Sparsification of 2-FWL GNNs: Preserving Full Expressivity with Improved Efficiency	Rongqin Chen et.al.	2511.12838	null
2025-11-16	Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data	Yunxin Li et.al.	2511.12609	null
2025-11-16	SEMC: Structure-Enhanced Mixture-of-Experts Contrastive Learning for Ultrasound Standard Plane Recognition	Qing Cai et.al.	2511.12559	null
2025-11-16	MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics	Jing Li et.al.	2511.12525	null
2025-11-16	MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding	Zhanheng Nie et.al.	2511.12449	null
2025-11-16	Self-Supervised Visual Prompting for Cross-Domain Road Damage Detection	Xi Xiao et.al.	2511.12410	null
2025-11-15	SAC-MoE: Reinforcement Learning with Mixture-of-Experts for Control of Hybrid Dynamical Systems with Uncertainty	Leroy D’Souza et.al.	2511.12361	null
2025-11-15	AMR-MoEGA: Antimicrobial Resistance Prediction using Mixture of Experts and Genetic Algorithms	Anshul Bagaria et.al.	2511.12223	null
2025-11-15	ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction	Ruochen Li et.al.	2511.12214	null
2025-11-14	FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models	Yonatan Dukler et.al.	2511.11505	null
2025-11-14	Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification	Qinghao Gao et.al.	2511.11460	null
2025-11-14	SPOT: Single-Shot Positioning via Trainable Near-Field Rainbow Beamforming	Yeyue Cai et.al.	2511.11391	null
2025-11-14	Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing	Cong Cao et.al.	2511.11236	null
2025-11-14	DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding	Mingwei Xing et.al.	2511.11232	null
2025-11-14	ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization	Anzhe Cheng et.al.	2511.10971	null
2025-11-14	Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go	Yashshi Pipalani et.al.	2511.10868	null
2025-11-13	Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts	Sumin Lee et.al.	2511.10300	null
2025-11-13	RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo	Jueun Ko et.al.	2511.10107	null
2025-11-13	BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference	Yun Wang et.al.	2511.10054	null
2025-11-14	HI-TransPA: Hearing Impairments Translation Personal Assistant	Zhiming Ma et.al.	2511.09915	null
2025-11-13	ConSurv: Multimodal Continual Learning for Survival Analysis	Dianzhi Yu et.al.	2511.09853	null
2025-11-11	Let the Experts Speak: Improving Survival Prediction & Calibration via Mixture-of-Experts Heads	Todd Morrill et.al.	2511.09567	null
2025-11-12	SMF-VO: Direct Ego-Motion Estimation via Sparse Motion Fields	Sangheon Yang et.al.	2511.09072	null
2025-11-12	UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving	Ziyi Song et.al.	2511.09013	null
2025-11-12	Selective Sinkhorn Routing for Improved Sparse Mixture of Experts	Duc Anh Nguyen et.al.	2511.08972	null
2025-11-12	Bayesian Mixture of Experts For Large Language Models	Maryam Dialameh et.al.	2511.08968	null
2025-11-12	An Improved Dual-Attention Transformer-LSTM for Small-Sample Prediction of Modal Frequency and Actual Anchor Radius in Micro Hemispherical Resonator Design	Yuyi Yao et.al.	2511.08900	null
2025-11-11	OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild	Yuncheng Guo et.al.	2511.08423	null
2025-11-11	Text-based Aerial-Ground Person Retrieval	Xinyu Zhou et.al.	2511.08369	null
2025-11-14	Towards Non-Stationary Time Series Forecasting with Temporal Stabilization and Frequency Differencing	Junkai Lu et.al.	2511.08229	null
2025-11-13	National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech – The SpeechCARE Solution	Maryam Zolnoori et.al.	2511.08132	null
2025-11-13	Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression	Cheng Yuan et.al.	2511.08066	null
2025-11-11	TouchWalker: Real-Time Avatar Locomotion from Touchscreen Finger Walking	Geuntae Park et.al.	2511.07860	null
2025-11-10	One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers	Georgiy Shakirov et.al.	2511.07603	null
2025-11-12	Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs	Zhongyang Li et.al.	2511.07419	null
2025-11-11	Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction	Hyeryun Park et.al.	2511.07392	null
2025-11-10	AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning	Qile Jiang et.al.	2511.07262	null
2025-11-10	Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture	Tianhao Fu et.al.	2511.07110	null
2025-11-10	CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition	Hung-Yang Sung et.al.	2511.06860	null
2025-11-10	S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning	Jiangwen Dong et.al.	2511.06727	null
2025-11-10	Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation	Evelyn Chee et.al.	2511.06723	null
2025-11-09	Route Experts by Sequence, not by Token	Tiansheng Wen et.al.	2511.06494	null
2025-11-09	HyMoERec: Hybrid Mixture-of-Experts for Sequential Recommendation	Kunrong Li et.al.	2511.06388	null
2025-11-09	DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation	Speed Zhu et.al.	2511.06307	null
2025-11-09	A Mixture-of-Experts Framework with Log-Logistic Components for Survival Analysis on Histopathology Images	Ardhendu Sekhar et.al.	2511.06266	null
2025-11-08	MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference	Myunghyun Rhee et.al.	2511.06010	null
2025-11-08	DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities	Nagur Shareef Shaik et.al.	2511.05968	null
2025-11-08	MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering	Jian Zhu et.al.	2511.05876	null
2025-11-08	In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading	Shuning Lin et.al.	2511.05814	null
2025-11-07	Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder	Zhen Xu et.al.	2511.05745	null
2025-11-07	BrainCSD: A Hierarchical Consistency-Driven MoE Foundation Model for Unified Connectome Synthesis and Multitask Brain Trait Prediction	Xiongri Shen et.al.	2511.05630	null
2025-11-07	Quantum-Uncertainty-Governed Spin Dynamics in s-d Coupled Systems	Jie Zheng et.al.	2511.05388	null
2025-11-07	OvA-LP: A Simple and Efficient Framework for Federated Learning on Non-IID Data	Dongjin Park et.al.	2511.05028	null
2025-11-07	MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery	Baiye Cheng et.al.	2511.05007	null
2025-11-06	PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference	Yushu Zhao et.al.	2511.04805	null
2025-11-06	GNN-MoE: Context-Aware Patch Routing using GNNs for Parameter-Efficient Domain Generalization	Mahmoud Soliman et.al.	2511.04008	null
2025-11-05	GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models	Zhibin Wang et.al.	2511.03251	null
2025-11-04	From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos	Xun Wang et.al.	2511.02762	null
2025-11-04	Verifying LLM Inference to Prevent Model Weight Exfiltration	Roy Rinberg et.al.	2511.02620	null
2025-11-04	RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains	Tianle Pu et.al.	2511.02331	null
2025-11-04	FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error	Fengjuan Wang et.al.	2511.02302	null
2025-11-04	Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining	Costin-Andrei Oncescu et.al.	2511.02237	null
2025-11-03	Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing	Song Gao et.al.	2511.01743	null
2025-11-03	HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA	Lei Hu et.al.	2511.01463	null
2025-11-04	CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing	Yifan Zhou et.al.	2511.01197	null
2025-11-03	DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection	Guoxin Ma et.al.	2511.01192	null
2025-11-01	OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback	Kai Luo et.al.	2511.00510	null
2025-10-31	LongCat-Flash-Omni Technical Report	Meituan LongCat Team et.al.	2511.00279	null
2025-10-31	Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals	Xiangyu Fan et.al.	2510.27684	null
2025-10-31	RDMA Point-to-Point Communication for LLM Systems	Nandor Licker et.al.	2510.27656	null
2025-10-31	MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts	Jingnan Gao et.al.	2510.27234	null
2025-10-31	AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification	Yuanhao Tang et.al.	2510.27155	null
2025-10-30	Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement	Aaditya Shukla et.al.	2510.27051	null
2025-10-30	Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems	Hongbo Li et.al.	2510.27004	null
2025-10-30	MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation	Arghavan Rezvani et.al.	2510.26996	null
2025-10-30	ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference	Zixu Shen et.al.	2510.26730	null
2025-10-30	Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications	Chuang Zhang et.al.	2510.26628	null
2025-10-30	Asymptotic meshes from $r$ -variational adaptation methods for static problems in one dimension	Darith Hun et.al.	2510.26375	null
2025-10-30	MossNet: Mixture of State-Space Experts is a Multi-Head Attention	Shikhar Tuli et.al.	2510.26182	null
2025-10-29	Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis	Hyeonjun Lee et.al.	2510.26014	null
2025-10-31	Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training	Hong Wang et.al.	2510.25803	null
2025-10-29	Revisiting scalable sequential recommendation with Multi-Embedding Approach and Mixture-of-Experts	Qiushi Pan et.al.	2510.25285	null
2025-10-29	MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference	Xinru Tang et.al.	2510.25258	null
2025-10-29	H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts	Peilin Tan et.al.	2510.25091	null
2025-10-28	Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation	Inclusion AI et.al.	2510.24821	null
2025-10-28	Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance	Yujie Wei et.al.	2510.24711	null
2025-10-28	Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation	Xiucheng Zhang et.al.	2510.24055	null
2025-10-26	Sparsity and Superposition in Mixture of Experts	Marmik Chaudhari et.al.	2510.23671	null
2025-10-27	EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting	Musleh Alharthi et.al.	2510.23396	null
2025-10-27	Rethinking GSPO: The Perplexity-Entropy Equivalence	Chi Liu et.al.	2510.23142	null
2025-10-27	Knocking-Heads Attention	Zhanchao Zhou et.al.	2510.23052	null
2025-10-27	Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts	Di Zhang et.al.	2510.23027	null
2025-10-27	MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning	Han Wu et.al.	2510.23013	null
2025-10-25	Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation	Ling-Team et.al.	2510.22115	null
2025-10-23	Addressing Corner Cases in Autonomous Driving: A World Model-based Approach with Mixture of Experts and LLMs	Haicheng Liao et.al.	2510.21867	null
2025-10-24	PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling	Andrea Bonfanti et.al.	2510.21262	null
2025-10-24	Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization	Yunlong Chu et.al.	2510.21207	null
2025-10-24	Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts	Yanguang Sun et.al.	2510.21114	null
2025-10-24	MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning	Siyong Chen et.al.	2510.21093	null
2025-10-23	Bayesian Jammer Localization with a Hybrid CNN and Path-Loss Mixture of Experts	Mariona Jaramillo-Civill et.al.	2510.20666	null
2025-10-23	xTime: Extreme Event Prediction with Hierarchical Knowledge Distillation and Expert Fusion	Quan Li et.al.	2510.20651	null
2025-10-23	Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning	Xiaohan Lan et.al.	2510.20519	null
2025-10-23	A Parameter-Efficient Mixture-of-Experts Framework for Cross-Modal Geo-Localization	LinFeng Li et.al.	2510.20291	null
2025-10-23	AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training	Huawei Bai et.al.	2510.20111	null
2025-10-22	HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission	Weihao Yang et.al.	2510.19470	null
2025-10-22	MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs	Xinfeng Xia et.al.	2510.19366	null
2025-10-22	Modeling Turn-Taking with Semantically Informed Gestures	Varsha Suresh et.al.	2510.19350	null
2025-10-23	RailS: Load Balancing for All-to-All Communication in Distributed Mixture-of-Experts Training	Heng Xu et.al.	2510.19262	null
2025-10-22	A Design Science Blueprint for an Orchestrated AI Assistant in Doctoral Supervision	Teo Susnjak et.al.	2510.19227	null
2025-10-23	MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting	In-Hwan Jin et.al.	2510.19210	null
2025-10-25	Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model	Ling Team et.al.	2510.18855	null
2025-10-21	Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework	Yujie Xing et.al.	2510.18825	null
2025-10-21	Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification	Bin Gu et.al.	2510.18533	null
2025-10-21	Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study	Gangda Deng et.al.	2510.18370	null
2025-10-21	DeepSeek-OCR: Contexts Optical Compression	Haoran Wei et.al.	2510.18234	null
2025-10-22	L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts	Shihao Ji et.al.	2510.17898	null
2025-10-20	Towards 3D Objectness Learning in an Open World	Taichi Liu et.al.	2510.17686	null
2025-10-20	Intelligent Communication Mixture-of-Experts Boosted-Medical Image Segmentation Foundation Model	Xinwei Zhang et.al.	2510.17684	null
2025-10-20	Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm	Hao Qiao et.al.	2510.17604	null
2025-10-23	Photon radiation induced by rescattering in strong-interacting medium with a magnetic field	Yue Zhang et.al.	2510.17597	null
2025-10-20	ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts	Zheyue Tan et.al.	2510.17483	null
2025-10-19	Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures	Pingzhi Li et.al.	2510.16968	null
2025-10-19	End-to-end Listen, Look, Speak and Act	Siyin Wang et.al.	2510.16756	null
2025-10-18	NeurIPT: Foundation Model for Neural Interfaces	Zitao Fang et.al.	2510.16548	link
2025-10-18	Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts	Yongxiang Hua et.al.	2510.16448	null
2025-10-18	Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures	Minh-Khoi Nguyen-Nhat et.al.	2510.16411	null
2025-10-17	Expert Merging in Sparse Mixture of Experts with Nash Bargaining	Dung V. Nguyen et.al.	2510.16138	null
2025-10-17	Human or AI? Comparing Design Thinking Assessments by Teaching Assistants and Bots	Sumbul Khan et.al.	2510.16069	null
2025-10-17	Mixture of Experts Approaches in Dense Retrieval Tasks	Effrosyni Sokli et.al.	2510.15683	null
2025-10-17	FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification	Zhen Sun et.al.	2510.15595	null
2025-10-17	Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks	Yuyuan Feng et.al.	2510.15333	null
2025-10-17	MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation	Xianyang Qi et.al.	2510.15286	null
2025-10-17	Adaptive Individual Uncertainty under Out-Of-Distribution Shift with Expert-Routed Conformal Prediction	Amitesh Badkul et.al.	2510.15233	null
2025-10-16	Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models	Guinan Su et.al.	2510.14853	null
2025-10-16	MergeMoE: Efficient Compression of MoE Models via Expert Output Merging	Ruijie Miao et.al.	2510.14436	null
2025-10-16	Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning	Weijie Shen et.al.	2510.14300	null
2025-10-16	MACE: Mixture-of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering	Mingkai Liu et.al.	2510.14251	null
2025-10-16	Demonstrating Exoplanet Transit Photometry from Space with a 15-mm Aperture Optical Navigation Camera on Hayabusa2	Koki Yumoto et.al.	2510.14229	null
2025-10-15	REAP the Experts: Why Pruning Prevails for One-Shot MoE compression	Mike Lasby et.al.	2510.13999	null
2025-10-15	Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module	Ruitao Feng et.al.	2510.13558	null
2025-10-15	ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition	Deeptimaan Banerjee et.al.	2510.13493	null
2025-10-15	Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers	Xin Zhao et.al.	2510.13462	null
2025-10-15	Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts	Li Bai et.al.	2510.13451	null
2025-10-15	UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE	Zhenyu Liu et.al.	2510.13344	null
2025-10-15	GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models	Chen Zheng et.al.	2510.13079	null
2025-10-17	Scope: Selective Cross-modal Orchestration of Visual Perception Experts	Tianyu Zhang et.al.	2510.12974	null
2025-10-14	Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps	Do Tien Hai et.al.	2510.12744	null
2025-10-14	Proof of Cloud: Data Center Execution Assurance for Confidential VMs	Filip Rezabek et.al.	2510.12469	null
2025-10-14	MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts	Yushu Zhao et.al.	2510.12357	null
2025-10-14	DE3S: Dual-Enhanced Soft-Sparse-Shape Learning for Medical Early Time-Series Classification	Tao Xie et.al.	2510.12214	null
2025-10-13	Enhancing the Quality of 3D Lunar Maps Using JAXA’s Kaguya Imagery	Yumi Iwashita et.al.	2510.11817	null
2025-10-13	Beyond ‘Templates’: Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View	Jinyu Zhang et.al.	2510.11687	null
2025-10-13	Robust Ego-Exo Correspondence with Long-Term Memory	Yijun Hu et.al.	2510.11417	null
2025-10-13	Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers	Wenhan Ma et.al.	2510.11370	null
2025-10-13	What to expect from microscopic nuclear modelling for k $_{\rm eff}$ calculations ?	D. Rochman et.al.	2510.11256	null
2025-10-13	DND: Boosting Large Language Models with Dynamic Nested Depth	Tieyuan Chen et.al.	2510.11001	null
2025-10-13	MC#: Mixture Compressor for Mixture-of-Experts Large Models	Wei Huang et.al.	2510.10962	null
2025-10-12	Crisis-Aware Regime-Conditioned Diffusion with CVaR Allocation	Ali Atiah Alzahrani et.al.	2510.10807	null
2025-10-12	Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection	Shizhen Zhao et.al.	2510.10584	null
2025-10-12	Hierarchical LoRA MoE for Efficient CTR Model Scaling	Zhichen Zeng et.al.	2510.10432	null
2025-10-11	SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference	Liangkun Chen et.al.	2510.10302	null
2025-10-10	MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest	Xiao Yang et.al.	2510.09857	null
2025-10-10	ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting	Jindong Tian et.al.	2510.09734	null
2025-10-10	Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation	Youwei Zheng et.al.	2510.09094	null
2025-10-09	LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution	Xiaohui Li et.al.	2510.08771	null
2025-10-13	dInfer: An Efficient Inference Framework for Diffusion Language Models	Yuxin Ma et.al.	2510.08666	null
2025-10-08	Dynamic Mixture-of-Experts for Visual Autoregressive Model	Jort Vincenti et.al.	2510.08629	null
2025-10-09	FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts	Heming Zou et.al.	2510.08396	null
2025-10-09	Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization	Jason Bohne et.al.	2510.08256	null
2025-10-09	From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill	Gunjun Lee et.al.	2510.08055	null
2025-10-09	Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training	Ruizhe Wang et.al.	2510.08008	null
2025-10-09	Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing	Cunli Mao et.al.	2510.07736	null
2025-10-09	Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision	Xiaoxu Ma et.al.	2510.07703	null
2025-10-09	LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning	Yuhan Sun et.al.	2510.07685	null
2025-10-08	MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting	Yoli Shavit et.al.	2510.07459	null
2025-10-08	Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting	Walid Guettala et.al.	2510.07426	null
2025-10-08	Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts	Fangshuo Liao et.al.	2510.07205	null
2025-10-08	A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages	Zibo Su et.al.	2510.06612	null
2025-10-09	SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation	Shuang Cheng et.al.	2510.06303	null
2025-10-06	Reproducibility Study of “XRec: Large Language Models for Explainable Recommendation”	Ranjan Mishra et.al.	2510.06275	null
2025-10-10	Barbarians at the Gate: How AI is Upending Systems Research	Audrey Cheng et.al.	2510.06189	null
2025-10-07	CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits	Kangyu Wang et.al.	2510.06133	null
2025-10-07	Rasterized Steered Mixture of Experts for Efficient 2D Image Regression	Yi-Hsin Li et.al.	2510.05814	null
2025-10-07	Mixture of Neuron Experts	Runxi Cheng et.al.	2510.05781	null
2025-10-07	MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition	Haoxun Li et.al.	2510.05749	null
2025-10-07	Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting	Zhongkai Yu et.al.	2510.05497	null
2025-10-06	Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving	Yue Pan et.al.	2510.05245	null
2025-10-06	REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis	Alec K. Peltekian et.al.	2510.04923	null
2025-10-06	LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0	Jinbo Wen et.al.	2510.04765	null
2025-10-06	Multilingual Routing in Mixture-of-Experts	Lucas Bandarkar et.al.	2510.04694	null
2025-10-06	Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing	Xuanhua Yin et.al.	2510.04670	null
2025-10-06	Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space	Tomas Figliolia et.al.	2510.04476	null
2025-10-05	HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks	Nghiem T. Diep et.al.	2510.04295	null
2025-10-05	SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling	Harshil Vejendla et.al.	2510.04286	null
2025-10-05	MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition	Umberto Cappellazzo et.al.	2510.04136	null
2025-10-03	Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective	Yehuda Dar et.al.	2510.03151	null
2025-10-02	ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models	Gursimran Singh et.al.	2510.02613	null
2025-10-02	UpSafe $^\circ$ C: Upcycling for Controllable Safety in Large Language Models	Yuhao Sun et.al.	2510.02194	null
2025-10-02	LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition	Rixin Zhou et.al.	2510.01651	null
2025-10-01	Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs	Leyla Mirvakhabova et.al.	2510.01185	null
2025-10-01	Learning Compact Representations of LLM Abilities via Item Response Theory	Jianhao Chen et.al.	2510.00844	null
2025-10-01	Graph Integrated Multimodal Concept Bottleneck Model	Jiakai Lin et.al.	2510.00701	null
2025-10-01	FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression	Yifei Gao et.al.	2510.00621	null
2025-10-01	Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning	Minghao Yang et.al.	2510.00570	null
2025-09-30	FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training	Yunqi Gao et.al.	2510.00207	null
2025-09-30	Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization	Yaoxiang Wang et.al.	2509.26520	null
2025-09-30	Nephrobase Cell+: Multimodal Single-Cell Foundation Model for Decoding Kidney Biology	Chenyu Li et.al.	2509.26223	null
2025-09-30	Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline	Haiyang Li et.al.	2509.25991	null
2025-09-30	UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression	Yuan Zhao et.al.	2509.25934	null
2025-09-30	Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel	Chuanyang Zheng et.al.	2509.25913	null
2025-10-01	A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI	Arvind Murari Vepa et.al.	2509.25889	null
2025-09-30	Collaborative Compression for Large-Scale MoE Deployment on Edge	Yixiao Chen et.al.	2509.25689	null
2025-09-30	LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts	Yuan Zhuang et.al.	2509.25684	null
2025-09-30	Guiding Mixture-of-Experts with Temporal Multimodal Interactions	Xing Han et.al.	2509.25678	null
2025-09-29	K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model	Bangwei Guo et.al.	2509.25594	null
2025-09-29	GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference	Yu Han et.al.	2509.25041	null
2025-09-29	LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection	Bao-Ngoc Dao et.al.	2509.24547	null
2025-11-03	Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding	Zhibin Wang et.al.	2508.21706	null
2025-07-22	Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data	Yunyi Shen et.al.	2507.16817	null
2025-07-22	Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training	Zixiao Huang et.al.	2507.16274	null
2025-07-21	Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure	Alexandra Junell et.al.	2507.16088	null
2025-07-21	Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation	Alessandro B. Melchiorre et.al.	2507.15826	null
2025-07-21	RankMixer: Scaling Up Ranking Models in Industrial Recommenders	Jie Zhu et.al.	2507.15551	null
2025-07-21	The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts	Sungmin Yun et.al.	2507.15465	null
2025-07-21	Universal crystal material property prediction via multi-view geometric fusion in graph transformers	Liang Zhang et.al.	2507.15303	null
2025-07-20	CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning	Pan Hu et.al.	2507.14903	null
2025-07-23	GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving	Chi Wan et.al.	2507.14456	null
2025-07-18	SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing	Yingying Zhang et.al.	2507.13812	null
2025-07-17	Apple Intelligence Foundation Language Models: Tech Report 2025	Hanzhi Zhou et.al.	2507.13575	null
2025-07-17	R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning	Xiaohan Guo et.al.	2507.13107	null
2025-07-16	Astro-MoE: Mixture of Experts for Multiband Astronomical Time Series	Martina Cádiz-Leyton et.al.	2507.12611	null
2025-07-16	Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models	Gen Luo et.al.	2507.12566	null
2025-07-16	Mixture of Raytraced Experts	Andrea Perin et.al.	2507.12419	null
2025-07-16	CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning	Peiwen Xia et.al.	2507.11834	null
2025-07-09	The AI Shadow War: SaaS vs. Edge Computing Architectures	Rhea Pritham Marpu et.al.	2507.11545	null
2025-07-15	Mixture of Experts in Large Language Models	Danyang Zhang et.al.	2507.11181	null
2025-07-15	Atmos-Bench: 3D Atmospheric Structures for Climate Insight	Tianchi Xu et.al.	2507.11085	null
2025-07-14	DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models	Luolin Xiong et.al.	2507.09955	null
2025-07-14	ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization	Huilai Li et.al.	2507.09945	null
2025-07-14	Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems	Vindula Jayawardana et.al.	2507.09836	null
2025-07-18	Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts	Aakash Tripathi et.al.	2507.09754	null
2025-07-13	Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive	You Huang et.al.	2507.09612	null
2025-07-12	PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process	Shiqi Jiang et.al.	2507.09242	null
2025-07-11	SSH-Passkeys: Leveraging Web Authentication for Passwordless SSH	Moe Kayali et.al.	2507.09022	null
2025-07-11	BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity	Chenyang Song et.al.	2507.08771	null
2025-07-11	CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes	Tianyou Jiang et.al.	2507.08542	null
2025-07-11	White-Basilisk: A Hybrid Model for Code Vulnerability Detection	Ioannis Lamprou et.al.	2507.08540	null
2025-07-21	KAT-V1: Kwai-AutoThink Technical Report	Zizheng Zhan et.al.	2507.08297	null
2025-07-11	Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization	Woon Ryong Kim et.al.	2507.08269	null
2025-07-10	MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving	Lu Xu et.al.	2507.07818	null
2025-07-10	When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance	Peizhang Shao et.al.	2507.07748	null
2025-07-09	Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning	Ankit Jyothish et.al.	2507.07335	null
2025-07-08	Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate	A. Bochkov et.al.	2507.07129	null
2025-07-07	Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding	Nidhi Bhatia et.al.	2507.07120	null
2025-06-03	Multi-level Mixture of Experts for Multimodal Entity Linking	Zhiwei Hu et.al.	2507.07108	null
2025-07-09	4KAgent: Agentic Any Image to 4K Super-Resolution	Yushen Zuo et.al.	2507.07105	null
2025-07-11	FlexOlmo: Open Language Models for Flexible Data Use	Weijia Shi et.al.	2507.07024	null
2025-07-09	Deep Disentangled Representation Network for Treatment Effect Estimation	Hui Meng et.al.	2507.06650	null
2025-07-09	SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference	Qian Chen et.al.	2507.06567	null
2025-07-09	MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models	Yiwen Liu et.al.	2507.06502	null
2025-07-08	Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation	Szymon Płotka et.al.	2507.06363	null
2025-07-08	Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis	Xintong Hu et.al.	2507.06116	null
2025-07-09	A Survey on Prompt Tuning	Zongqian Li et.al.	2507.06085	null
2025-07-08	Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors	Bing Wang et.al.	2507.05939	null
2025-07-08	What You Have is What You Track: Adaptive and Robust Multimodal Tracking	Yuedong Tan et.al.	2507.05899	null
2025-07-21	Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition	Zijin Gu et.al.	2507.05724	null
2025-07-08	Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach	Xiaobing Chen et.al.	2507.05685	null
2025-07-08	City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data	Tianxing Wu et.al.	2507.05651	null
2025-07-07	QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks	Hoang-Quan Nguyen et.al.	2507.05190	null
2025-07-07	NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification	Jun Hu et.al.	2507.04870	null
2025-07-07	UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization	Kai Yang et.al.	2507.04706	null
2025-07-07	DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics	Yayu Long et.al.	2507.04661	null
2025-07-08	UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification	Xixi Wan et.al.	2507.04638	null
2025-07-07	Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts	Yun Wang et.al.	2507.04631	null
2025-07-06	Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts	Guokan Shang et.al.	2507.04569	null
2025-07-22	Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge	Linshen Liu et.al.	2507.04123	null
2025-07-05	From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM	Xinyi Wu et.al.	2507.03868	null
2025-07-04	Decoupled Relative Learning Rate Schedules	Jan Ludziejewski et.al.	2507.03526	null
2025-07-03	Neural Inhibition Improves Dynamic Routing and Mixture of Experts	Will Y. Zou et.al.	2507.03221	null
2025-07-02	Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!	Do-hyeon Yoon et.al.	2507.03014	null
2025-07-03	System-performance and cost modeling of Large Language Model training and inference	Wenzhe Guo et.al.	2507.02456	null
2025-07-03	NLP4Neuro: Sequence-to-sequence learning for neural population decoding	Jacob J. Morra et.al.	2507.02264	null
2025-07-02	MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics	Dmytro Kuzmenko et.al.	2507.01843	null
2025-07-02	GradMetaNet: An Equivariant Architecture for Learning on Gradients	Yoav Gelberg et.al.	2507.01649	null
2025-07-02	Mixtures of Neural Network Experts with Application to Phytoplankton Flow Cytometry Data	Ethan Pawl et.al.	2507.01375	null
2025-07-02	Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model	Chaoxiang Cai et.al.	2507.01351	null
2025-07-02	Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations	Bohao Wang et.al.	2507.01337	null
2025-07-02	ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation	JianChao Zhao et.al.	2507.00502	null
2025-07-01	MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE	Geng Zhang et.al.	2507.00390	null
2025-06-30	Engineering NV Centers via Hydrogen-Driven Defect Chemistry in CVD Diamonds for Quantum Applications: NVHx Dissociations into NV, Origin of 468nm Center, and Cause of Brown Coloration	Mubashir Mansoor et.al.	2507.00300	null
2025-06-17	LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing	Wenbing Li et.al.	2507.00029	null
2025-06-30	MotionGPT3: Human Motion as a Second Modality	Bingfan Zhu et.al.	2506.24086	null
2025-06-30	MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis	Zhe Liu et.al.	2506.23648	null
2025-06-30	Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model	Mu-Chi Chen et.al.	2506.23635	null
2025-07-01	Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging	Lujun Li et.al.	2506.23266	null
2025-06-29	External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting	Haoran Li et.al.	2506.23201	null
2025-06-29	Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound	Zhiyuan Zhu et.al.	2506.23108	null
2025-07-01	Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning	Sanskar Pandey et.al.	2506.22919	null
2025-06-27	QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization	Danush Khanna et.al.	2506.22396	null
2025-06-27	Towards Distributed Neural Architectures	Aditya Cowsik et.al.	2506.22389	null
2025-06-27	MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism	Zheng Zhang et.al.	2506.22175	null
2025-07-09	DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE	Hang Shao et.al.	2506.21864	null
2025-06-21	AdaptGOT: A Pre-trained Model for Adaptive Contextual POI Representation Learning	Xiaobin Ren et.al.	2506.21612	null
2025-06-26	Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts	Jiajie Yang et.al.	2506.21328	null
2025-06-26	Learning to Skip the Middle Layers of Transformers	Tim Lawson et.al.	2506.21103	null
2025-06-26	Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning	Haodong Lu et.al.	2506.21035	null
2025-06-26	EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning	Xiao Zhang et.al.	2506.20986	null
2025-06-30	The Singapore Consensus on Global AI Safety Research Priorities	Yoshua Bengio et.al.	2506.20702	null
2025-06-17	Utility-Driven Speculative Decoding for Mixture-of-Experts	Anish Saxena et.al.	2506.20675	null
2025-06-25	Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration	Jiaxing Huang et.al.	2506.20282	null
2025-06-24	Integrating Pair Programming as a Work Practice	Nina Haugland Andersen et.al.	2506.19511	null
2025-07-05	The H $α$ line as a probe of chromospheric magnetic fields	Harsh Mathur et.al.	2506.19510	null
2025-06-23	Multimodal Anomaly Detection with a Mixture-of-Experts	Christoph Willibald et.al.	2506.19077	null
2025-06-23	Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models	Zihan Wang et.al.	2506.18945	null
2025-06-23	Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning	Rahul Atul Bhope et.al.	2506.18789	null
2025-06-23	An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify	Shivam Verma et.al.	2506.18735	null
2025-06-23	Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks	Xiaodong Wu et.al.	2506.18543	null
2025-06-23	SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation	Zichong Li et.al.	2506.18349	null
2025-06-23	Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies	Junchao Fan et.al.	2506.18304	null
2025-06-22	Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection	Zheng Zhan et.al.	2506.18145	null
2025-06-21	Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert	Gelei Xu et.al.	2506.17787	null
2025-06-21	Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities	Xinghao Huang et.al.	2506.17755	null
2025-06-21	PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation	Xinyu Xiong et.al.	2506.17712	null
2025-06-20	SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification	Zhenglin Lai et.al.	2506.17368	null
2025-07-14	FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE	Khiem Le et.al.	2506.16600	null
2025-06-19	Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models	Daniel Fidel Harvey et.al.	2506.16419	null
2025-06-19	DCFNet: Doppler Correction Filter Network for Integrated Sensing and Communication in Multi-User MIMO-OFDM Systems	Hyeonho Noh et.al.	2506.16191	null
2025-06-17	Scaling Intelligence: Designing Data Centers for Next-Gen Language Models	Jesmin Jahan Tithi et.al.	2506.15006	null
2025-06-17	NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification	Wajih Hassan Raza et.al.	2506.14970	null
2025-06-17	Narrowing the Gap between TEEs Threat Model and Deployment Strategies	Filip Rezabek et.al.	2506.14964	null
2025-05-31	Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors	Henrik Klagges et.al.	2506.14794	null
2025-06-19	Integrating Dynamical Systems Learning with Foundational Models: A Meta-Evolutionary AI Framework for Clinical Trials	Joseph Geraci et.al.	2506.14782	null
2025-06-17	GMT: General Motion Tracking for Humanoid Whole-Body Control	Zixuan Chen et.al.	2506.14770	null
2025-06-17	Exploring Speaker Diarization with Mixture of Experts	Gaobin Yang et.al.	2506.14750	null
2025-06-18	Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs	Ling Team et.al.	2506.14731	null
2025-09-23	GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors	Hengyuan Zhang et.al.	2506.14646	null
2025-06-17	Single-Example Learning in a Mixture of GPDMs with Latent Geometries	Jesse St. Amand et.al.	2506.14563	null
2025-06-30	MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation	Shen Yuan et.al.	2506.14436	link
2025-06-17	MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models	Hongyu Wang et.al.	2506.14435	null
2025-06-17	Less is More: Undertraining Experts Improves Model Upcycling	Stefan Horoi et.al.	2506.14126	null
2025-06-16	Load Balancing Mixture of Experts with Similarity Preserving Routers	Nabil Omi et.al.	2506.14038	null
2025-06-16	GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics	Qianzhong Chen et.al.	2506.14009	null
2025-06-16	MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention	MiniMax et.al.	2506.13585	link
2025-06-16	Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization	Guanghui Song et.al.	2506.13541	null
2025-07-04	EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization	Zhongqian Fu et.al.	2506.13329	link
2025-06-16	Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs	Xintong Tang et.al.	2506.13192	null
2025-06-19	Serving Large Language Models on Huawei CloudMatrix384	Pengfei Zuo et.al.	2506.12708	null
2025-06-14	Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts	Shengzhuang Chen et.al.	2506.12597	null
2025-06-14	Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control	Rongpeng Li et.al.	2506.12453	null
2025-06-17	HarMoEny: Efficient Multi-GPU Inference of MoE Models	Zachary Doucet et.al.	2506.12417	null
2025-06-14	Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model	Chong Li et.al.	2506.12388	null
2025-06-13	Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?	Houyi Li et.al.	2506.12119	null
2025-06-13	Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution	Zhangkai Ni et.al.	2506.11823	link
2025-05-21	MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning	Linjie Li et.al.	2506.11038	null
2025-04-23	Test code generation at Ericsson using Program Analysis Augmented Fine Tuned LLMs	Sai Krishna et.al.	2506.11006	null
2025-06-12	Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts	Zaijing Li et.al.	2506.10357	null
2025-06-12	Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments	Philipp Bär et.al.	2506.10345	null
2025-06-13	A Survey of Generative Categories and Techniques in Multimodal Large Language Models	Longzhen Han et.al.	2506.10016	null
2025-06-11	GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture	GigaChat team et.al.	2506.09440	null
2025-06-11	DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts	Yuchen Feng et.al.	2506.09351	null
2025-06-11	Ming-Omni: A Unified Multimodal Model for Perception and Generation	Inclusion AI et.al.	2506.09344	link
2025-06-10	CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks	Yixuan Li et.al.	2506.08931	null
2025-06-10	CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA	Jiale Dong et.al.	2506.08496	link
2025-06-11	MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding	Shivang Chopra et.al.	2506.08356	null
2025-06-09	Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting	Timothée Hornek Amir Sartipi et.al.	2506.08113	null
2025-06-11	STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation	Yiming Wang et.al.	2506.08054	link
2025-06-09	A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling	Jacob Helwig et.al.	2506.07969	link
2025-06-09	New Insights into the T Tauri Binary Separation Distribution	Caleb Eastlund et.al.	2506.07938	null
2025-06-09	M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration	Yongzhen Wang et.al.	2506.07814	null
2025-07-23	MIRA: Medical Time Series Foundation Model for Real-World Health Data	Hao Li et.al.	2506.07584	null
2025-06-11	MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization	Ken Yaggel et.al.	2506.07563	link
2025-06-09	MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts	Wei Tao et.al.	2506.07533	null
2025-06-09	Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning	Libo Wang et.al.	2506.07501	null
2025-06-09	MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing	Haiyue Ma et.al.	2506.07366	null
2025-06-08	UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment	Wentao Zhao et.al.	2506.07013	null
2025-06-07	High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations	Ziwei Li et.al.	2506.06858	null
2025-06-07	Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning	Yuan Yuan et.al.	2506.06694	null
2025-06-25	SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities	Guoyang Xia et.al.	2506.06406	null
2025-05-27	MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes	Feiyang Pan et.al.	2506.06318	null
2025-06-06	Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization	Jonathan Yang et.al.	2506.06196	null
2025-06-06	MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models	Jie Cao et.al.	2506.05928	null
2025-06-06	dots.llm1 Technical Report	Bi Huo et.al.	2506.05767	null
2025-06-05	Mixture-of-Experts Meets In-Context Reinforcement Learning	Wenhao Wu et.al.	2506.05426	null
2025-06-20	Kinetics: Rethinking Test-Time Scaling Laws	Ranajoy Sadhukhan et.al.	2506.05333	link
2025-06-05	Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection	Ziyi Zhou et.al.	2506.04739	null
2025-06-09	FlashDMoE: Fast Distributed MoE in a Single Kernel	Osayamen Jonathan Aimuyo et.al.	2506.04667	link
2025-06-04	Out-of-Distribution Graph Models Merging	Yidi Wang et.al.	2506.03674	null
2025-06-04	Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts	Jiaxing Zhang et.al.	2506.03591	null
2025-06-04	PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs	Ze Yu Zhang et.al.	2506.02965	null
2025-06-03	Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights	Jakub Krajewski et.al.	2506.02890	null
2025-06-03	Brain-Like Processing Pathways Form in Models With Heterogeneous Experts	Jack Cook et.al.	2506.02813	null
2025-06-04	MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection	Juntong Li et.al.	2506.02535	null
2025-06-03	MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework	Yupeng Qi et.al.	2506.02460	null
2025-05-31	Enhancing Multimodal Continual Instruction Tuning with BranchLoRA	Duzhen Zhang et.al.	2506.02041	null
2025-06-02	SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model	Zhao Yang et.al.	2506.01833	link
2025-06-02	Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning	Ryotaro Kawata et.al.	2506.01656	null
2025-06-02	DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models	Jiancheng Ye et.al.	2506.01257	null
2025-06-01	Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts	Fan Liu et.al.	2506.00965	null
2025-05-31	FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts	Xinyi Wang et.al.	2506.00495	null
2025-05-30	Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction	Shuai Liu et.al.	2505.24597	null
2025-06-11	Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis	Junzhuo Li et.al.	2505.24593	null
2025-05-30	Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer	Yilun Kong et.al.	2505.24378	link
2025-05-30	GradPower: Powering Gradients for Faster Language Model Pre-Training	Mingze Wang et.al.	2505.24275	null
2025-05-30	On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks	Mingze Wang et.al.	2505.24205	null
2025-06-02	Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts	Xuweiyi Chen et.al.	2505.23926	null
2025-06-09	Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert	Zhaokun Wang et.al.	2505.23868	null
2025-05-29	Revisiting Uncertainty Estimation and Calibration of Large Language Models	Linwei Tao et.al.	2505.23854	null
2025-05-28	EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models	Linglin Jing et.al.	2505.23830	null
2025-06-03	LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions	Hadi Askari et.al.	2505.23811	null
2025-05-29	From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents	Tobias Lindenbauer et.al.	2505.23422	link
2025-05-29	Context-Aware Semantic Communication for the Wireless Networks	Guangyuan Liu et.al.	2505.23249	null
2025-05-29	Two Is Better Than One: Rotations Scale LoRAs	Hongcan Guo et.al.	2505.23184	null
2025-05-28	HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer	Qi Cai et.al.	2505.22705	link
2025-05-28	Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts	Xue Zhang et.al.	2505.22582	null
2025-05-28	A Human-Centric Approach to Explainable AI for Personalized Education	Vinitra Swamy et.al.	2505.22541	link
2025-05-28	Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion	Kewen Chen et.al.	2505.22360	null
2025-05-28	Advancing Expert Specialization for Better MoE	Hongcan Guo et.al.	2505.22323	null
2025-05-28	ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation	Jiawen Yu et.al.	2505.22159	null
2025-05-28	On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition	Shujie HU et.al.	2505.22072	null
2025-05-28	AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation	Yan Rong et.al.	2505.22053	null
2025-05-29	ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge	Zhongyi Zhou et.al.	2505.21906	null
2025-05-27	MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis	Yitong Li et.al.	2505.21698	null
2025-05-23	EvidenceMoE: A Physics-Guided Mixture-of-Experts with Evidential Critics for Advancing Fluorescence Light Detection and Ranging in Scattering Media	Ismail Erbas et.al.	2505.21532	null
2025-05-29	Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity	Yehui Tang et.al.	2505.21411	null
2025-05-27	Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities	Junyan Zhang et.al.	2505.21191	null
2025-05-27	Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts	Yue Zhang et.al.	2505.21079	null
2025-05-27	Multi-objective Large Language Model Alignment with Hierarchical Experts	Zhuo Li et.al.	2505.20925	null
2025-05-27	FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models	Hao Kang et.al.	2505.20225	null
2025-06-01	NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID	Shihao Li et.al.	2505.20001	null
2025-05-26	Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments	Junming Liu et.al.	2505.19699	null
2025-06-13	MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE	Zongle Huang et.al.	2505.19645	null
2025-05-26	Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate	Liangwei Nathan Zheng et.al.	2505.19525	link
2025-05-26	WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference	Sihan Chen et.al.	2505.19427	link
2025-05-25	RankLLM: A Python Package for Reranking with LLMs	Sahel Sharifymoghaddam et.al.	2505.19284	null
2025-05-25	I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts	Jiayi Xin et.al.	2505.19190	link
2025-05-24	TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling	Chonghua Han et.al.	2505.18670	null
2025-05-24	ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation	Jian Liang et.al.	2505.18640	link
2025-07-02	Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter	Weizhi Zhong et.al.	2505.18612	null
2025-05-24	Guiding the Experts: Semantic Priors for Efficient and Focused MoE Routing	Chengxi Min et.al.	2505.18586	link
2025-05-24	Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning	Aofei Chang et.al.	2505.18503	null
2025-05-24	On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts	Fanqi Yan et.al.	2505.18455	null
2025-05-24	$μ$ -MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts	Toshiaki Koike-Akino et.al.	2505.18451	null
2025-05-23	Betelgeuse’s Buddy: X-Ray Constraints on the Nature of $α$ Ori B	Anna J. G. O’Grady et.al.	2505.18376	null
2025-05-23	Betelgeuse, Betelgeuse, Betelgeuse, Betel-buddy? Constraints on the dynamical companion to $α$ Orionis from HST	Jared A. Goldberg et.al.	2505.18375	null
2025-05-13	Constrained Edge AI Deployment: Fine-Tuning vs Distillation for LLM Compression	Jacob Sander et.al.	2505.18166	null
2025-05-23	Enhancing CTR Prediction with De-correlated Expert Networks	Jiancheng Wang et.al.	2505.17925	null
2025-05-23	PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval	Zehua Pei et.al.	2505.17639	null
2025-05-23	CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning	Jinyuan Feng et.al.	2505.17553	null
2025-05-31	MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation	Kaixing Yang et.al.	2505.17543	null
2025-07-04	JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model	Qihao Duan et.al.	2505.17257	null
2025-05-31	TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling	Weizhe Lin et.al.	2505.17155	null
2025-05-22	DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving	Zhenjie Yang et.al.	2505.16278	null
2025-05-22	DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor	Yan Zhao et.al.	2505.16256	null
2025-05-21	Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models	Jingcong Liang et.al.	2505.16056	link
2025-05-26	MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding	Yuxiang Wei et.al.	2505.15946	null
2025-05-21	Who “Controls” Where Work Shall be Done? State-of-Practice in Post-Pandemic Remote Work Regulation	Darja Smite et.al.	2505.15743	null
2025-05-21	CoLA: Collaborative Low-Rank Adaptation	Yiyun Zhou et.al.	2505.15471	link
2025-07-04	Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought	Tencent Hunyuan Team et.al.	2505.15431	null
2025-05-21	Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks	Uranik Berisha et.al.	2505.15414	null
2025-05-21	Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites	Xintong Wang et.al.	2505.15297	null
2025-05-21	Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines	Xiaohou Shi et.al.	2505.15151	null
2025-05-20	Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies	Haoyi Qiu et.al.	2505.14972	link
2025-05-30	TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis	Yu Zhang et.al.	2505.14910	link
2025-05-20	Balanced and Elastic End-to-end Training of Dynamic LLMs	Mohamed Wahib et.al.	2505.14864	null
2025-05-20	Solving MNIST with a globally trained Mixture of Quantum Experts	Paolo Alessandro Xavier Tognini et.al.	2505.14789	null
2025-05-27	Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training	Mengru Wang et.al.	2505.14681	null
2025-05-21	Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach	Umberto Cappellazzo et.al.	2505.14336	null
2025-05-20	FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation	Shaolin Zhu et.al.	2505.14256	null
2025-05-20	THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation	Yunlong Liang et.al.	2505.14173	null
2025-05-20	Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition	Shuo Zhang et.al.	2505.14143	null
2025-05-20	Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging	Ryo Bertolissi et.al.	2505.14136	null
2025-05-20	Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts	Xi Chen et.al.	2505.14088	null
2025-05-20	StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning	Huaijie Wang et.al.	2505.13997	null
2025-05-20	Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting	Bao-Ngoc Dao et.al.	2505.13944	link
2025-05-27	U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding	Ziqian Wang et.al.	2505.13880	link
2025-05-20	EfficientLLM: Efficiency in Large Language Models	Zhengqing Yuan et.al.	2505.13840	null
2025-05-19	CompeteSMoE – Statistically Guaranteed Mixture of Experts Training via Competition	Nam V. Nguyen et.al.	2505.13380	link
2025-05-19	Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference	Shuqing Luo et.al.	2505.13345	link
2025-05-19	Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models	Lucas Berry et.al.	2505.13273	null
2025-05-19	True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics	Christoph Jürgen Hemmer et.al.	2505.13192	null
2025-05-23	Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures	Tuan Thai et.al.	2505.13052	null
2025-05-19	TransferTraj: A Vehicle Trajectory Learning Model for Region and Task Transferability	Tonglong Wei et.al.	2505.12672	null
2025-05-30	Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization	Hongbiao Zhu et.al.	2505.12311	null
2025-05-22	Model Merging in Pre-training of Large Language Models	Yunshui Li et.al.	2505.12082	null
2025-05-22	Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition	Runduo Han et.al.	2505.12007	link
2025-05-17	MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging	Zihuan Qiu et.al.	2505.11883	null
2025-05-17	Improving Coverage in Combined Prediction Sets with Weighted p-values	Gina Wong et.al.	2505.11785	null
2025-05-16	HessFormer: Hessians at Foundation Scale	Diego Granziol et.al.	2505.11564	null
2025-05-10	PRIME: Physics-Related Intelligent Mixture of Experts for Transistor Characteristics Prediction	Zhenxing Dou et.al.	2505.11523	null
2025-05-19	MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production	Chao Jin et.al.	2505.11432	null
2025-05-21	MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	Yinsicheng Jiang et.al.	2505.11415	null
2025-05-16	A Fast Kernel-based Conditional Independence test with Application to Causal Discovery	Oliver Schacht et.al.	2505.11085	null
2025-05-16	On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating	Huy Nguyen et.al.	2505.10860	null
2025-05-14	PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning	Zongqian Li et.al.	2505.09519	link
2025-05-14	Qwen3 Technical Report	An Yang et.al.	2505.09388	link
2025-05-14	Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures	Chenggang Zhao et.al.	2505.09343	null
2025-05-29	Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony	Shaoyu Wang et.al.	2505.08944	null
2025-05-13	PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts	Yang Su et.al.	2505.08719	null
2025-05-25	AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale	Yunjie Ji et.al.	2505.08311	null
2025-05-12	UMoE: Unifying Attention and FFN with Shared Experts	Yuanhang Yang et.al.	2505.07260	null
2025-05-11	Seed1.5-VL Technical Report	Dong Guo et.al.	2505.07062	null
2025-05-21	FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers	Tianyu Chen et.al.	2505.06858	null
2025-05-11	The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts	Enric Boix-Adsera et.al.	2505.06839	null
2025-05-10	Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free	Zihan Qiu et.al.	2505.06708	link
2025-05-30	Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding	Dawei Huang et.al.	2505.06685	link
2025-05-10	QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration	HamidReza Imani et.al.	2505.06481	null
2025-05-06	A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning	Junzhou Xu et.al.	2505.06272	null
2025-05-12	FloE: On-the-Fly MoE Inference on Memory-constrained GPU	Yuxin Zhou et.al.	2505.05950	null
2025-05-09	MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design	Haojie Duanmu et.al.	2505.05799	link
2025-05-10	SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA Communication	Mikhail Khalilov et.al.	2505.05366	null
2025-05-08	Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts	Ming Li et.al.	2505.05035	null
2025-05-07	Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs	Yehui Tang et.al.	2505.04519	null
2025-05-07	SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios	Ning Cheng et.al.	2505.04201	null
2025-05-07	LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?	Teddy Foley et.al.	2505.04075	link
2025-05-07	Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications	Yuanai Xie et.al.	2505.04068	null
2025-05-24	Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks	Mehran Mazandarani et.al.	2505.03806	null
2025-05-02	MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance	Xing Hu et.al.	2505.03804	null
2025-05-06	Towards Smart Point-and-Shoot Photography	Jiawan Li et.al.	2505.03638	null
2025-05-06	Faster MoE LLM Inference for Extremely Large Models	Haoqi Yang et.al.	2505.03531	null
2025-05-06	STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation	Maolin Wang et.al.	2505.03484	null
2025-05-06	3D Gaussian Splatting Data Compression with Mixture of Priors	Lei Liu et.al.	2505.03310	null
2025-05-05	Finger Pose Estimation for Under-screen Fingerprint Sensor	Xiongjun Guan et.al.	2505.02481	link
2025-05-05	Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems	Kai Zhang et.al.	2505.02381	null
2025-05-08	Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques	Sanjay Surendranath Girija et.al.	2505.02309	null
2025-05-04	Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields	Zhenxing Mi et.al.	2505.02005	link
2025-05-03	Backdoor Attacks Against Patch-based Mixture of Experts	Cedric Chan et.al.	2505.01811	link
2025-05-01	MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling	Abdoul Majid O. Thiombiano et.al.	2505.01459	null
2025-05-02	Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders	Rogelio A Mancisidor et.al.	2505.01134	null
2025-05-02	CoCoAFusE: Beyond Mixtures of Experts via Model Fusion	Aurelio Raffa Ugolini et.al.	2505.01105	null
2025-05-01	Improving Routing in Sparse Mixture of Experts with Graph of Tokens	Tam Nguyen et.al.	2505.00792	null
2025-05-01	CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series	Tian Lan et.al.	2505.00415	null
2025-05-01	Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing	Piotr Piękos et.al.	2505.00315	link
2025-04-30	Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders	Xuwei Yang et.al.	2505.00216	null
2025-05-08	Identifying Critical Dependencies in Large-Scale Continuous Software Engineering	Anastasiia Tkalich et.al.	2504.21437	null
2025-04-29	TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts	Pradip Kunwar et.al.	2504.21190	null
2025-04-29	Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization	Shuai Gong et.al.	2504.21063	null
2025-04-26	PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight	Ben Goertzel et.al.	2504.21029	null
2025-04-29	In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer	Zechuan Zhang et.al.	2504.20690	null
2025-05-30	ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting	Yu Zhang et.al.	2504.20630	null
2025-04-29	MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification	Yichu Xu et.al.	2504.20509	null
2025-04-29	FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks	Wenjing Xiao et.al.	2504.20446	null
2025-04-29	MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation	Amaan Izhar et.al.	2504.20343	link
2025-04-28	Accelerating Mixture-of-Experts Training with Adaptive Expert Replication	Athinagoras Skiadopoulos et.al.	2504.19925	null
2025-04-28	DUETS: Setting expectations for asteroseismic binaries and binary products with synthetic populations	A. Mazzi et.al.	2504.19866	null
2025-04-28	Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey	Yunting Xu et.al.	2504.19660	null
2025-05-04	ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving	Renju Feng et.al.	2504.19580	link
2025-05-30	Versatile Framework for Song Generation with Prompt-based Control	Yu Zhang et.al.	2504.19062	null
2025-04-29	BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts	Qingyue Wang et.al.	2504.18598	null
2025-04-25	NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation	Rob Romijnders et.al.	2504.18147	null
2025-05-15	TGDT: A Temporal Graph-based Digital Twin for Urban Traffic Corridors	Nooshin Yousefzadeh et.al.	2504.18008	null
2025-06-11	Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection	Haokai Zhang et.al.	2504.17834	link
2025-04-22	Compass-V2 Technical Report	Sophia Maria et.al.	2504.15527	null
2025-04-21	Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images	Jonathan Brokman et.al.	2504.15470	link
2025-04-17	D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving	Haodong Wang et.al.	2504.15299	null
2025-04-23	MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core	Dennis Liu et.al.	2504.14960	null
2025-04-20	Evaluating Temporal Plasticity in Foundation Time Series Models for Incremental Fine-tuning	Jia Liu et.al.	2504.14677	null
2025-04-29	Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning	ByteDance Seed et.al.	2504.13914	null
2025-04-18	Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts	Jie Zou et.al.	2504.13655	null
2025-04-18	HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering	Alexander Rusnak et.al.	2504.13590	null
2025-04-18	Dense Backpropagation Improves Training for Sparse Mixture-of-Experts	Ashwinee Panda et.al.	2504.12463	link
2025-04-16	Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models	Yuanbo Tang et.al.	2504.12359	null
2025-04-16	Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data	Sangwon Hyun et.al.	2504.12287	null
2025-04-16	The Discovery of Two Quadruple Star Systems with the Second and Third Shortest Outer Periods	Brian P. Powell et.al.	2504.12239	null
2025-04-16	MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models	Hang Yuan et.al.	2504.12234	null
2025-04-13	Transmission of low energy electrons through a polyethylene terephthalate 800-nm diameter nanocapillary	Li Pengfei et.al.	2504.11479	null
2025-04-15	Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology	Henrik Häggström et.al.	2504.11279	link
2025-05-22	Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability	Jiani Liu et.al.	2504.10804	null
2025-04-14	Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning	LeiLei Ma et.al.	2504.09990	null
2025-04-14	DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training	Masahiro Tanaka et.al.	2504.09983	null
2025-04-14	Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications	Nathalie Bartoli et.al.	2504.09930	null
2025-04-14	Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming	Zhiqiang He et.al.	2504.09906	null
2025-04-13	Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation	Jia Wei et.al.	2504.09601	null
2025-04-12	MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints	Yichao Yuan et.al.	2504.09345	null
2025-04-12	Mixture of Group Experts for Learning Invariant Representations	Lei Kang et.al.	2504.09265	null
2025-04-12	Exploring Modality Disruption in Multimodal Fake News Detection	Moyang Liu et.al.	2504.09154	null
2025-05-08	RouterKT: Mixture-of-Experts for Knowledge Tracing	Han Liao et.al.	2504.08989	null
2025-03-23	ExpertRAG: Efficient RAG with Mixture of Experts – Optimizing Context Retrieval for Adaptive LLM Responses	Esmail Gumaan et.al.	2504.08744	null
2025-04-11	Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design	Robin Grapin et.al.	2504.08671	null
2025-04-11	Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner	Liu Xiao et.al.	2504.08247	null
2025-04-10	C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing	Zhongyang Li et.al.	2504.07964	link
2025-04-11	Scaling Laws for Native Multimodal Models	Mustafa Shukor et.al.	2504.07951	null
2025-04-10	Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models	Hongcheng Guo et.al.	2504.07807	link
2025-04-10	Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network	Peng Jia et.al.	2504.07777	null
2025-04-15	Kimi-VL Technical Report	Kimi Team et.al.	2504.07491	link
2025-04-09	MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution	Zhe Wang et.al.	2504.07308	link
2025-04-11	Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models	Ling Team et.al.	2504.07158	null
2025-05-28	Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations	Zican Dong et.al.	2504.06792	null
2025-04-24	FedMerge: Federated Personalization via Model Merging	Shutong Chen et.al.	2504.06768	null
2025-04-08	S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning	Hanqing Zeng et.al.	2504.06426	null
2025-04-08	HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference	Shuzhang Zhong et.al.	2504.05897	link
2025-04-08	Adaptive Substructure-Aware Expert Model for Molecular Property Prediction	Tianyi Jiang et.al.	2504.05844	null
2025-04-10	Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations	Ajay Jaiswal et.al.	2504.05586	null
2025-04-07	SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement	Zuying Xie et.al.	2504.04818	null
2025-04-06	On the Spatial Structure of Mixture-of-Experts in Transformers	Daniel Bershatsky et.al.	2504.04444	null
2025-04-05	Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator	Bing Wang et.al.	2504.04076	link
2025-04-04	HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs	Yongji Wu et.al.	2504.03871	null
2025-04-01	Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns	Diego Vallarino et.al.	2504.03750	null
2025-04-01	A Unified Virtual Mixture-of-Experts Framework:Enhanced Inference and Hallucination Mitigation in Single-Model System	Mingyan Liu et.al.	2504.03739	null
2025-03-26	A multi-scale lithium-ion battery capacity prediction using mixture of experts and patch-based MLP	Yuzhu Lei et.al.	2504.03706	link
2025-04-04	RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation	Hanbo Bi et.al.	2504.03166	null
2025-06-01	TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models	Xinquan Wang et.al.	2504.02712	null
2025-04-07	MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators	Beichen Huang et.al.	2504.02658	link
2025-04-24	Cognitive Memory in Large Language Models	Lianlei Shan et.al.	2504.02441	null
2025-04-23	MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism	Ruidong Zhu et.al.	2504.02263	null
2025-04-20	Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design	Mohan Zhang et.al.	2504.01337	null
2025-04-01	Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function	Qiuchen Song et.al.	2504.00819	null
2025-04-01	DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism	Dengchun Li et.al.	2504.00661	link
2025-04-01	CFP: Low-overhead Profiling-based Intra-operator Parallelism Generation by Preserving Communication-Free Structures	Weifang Hu et.al.	2504.00598	null
2025-04-01	Continual Cross-Modal Generalization	Yan Xia et.al.	2504.00561	null
2025-04-01	Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection	Shunxin Chen et.al.	2504.00458	null
2025-03-31	Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion	Jiagen Li et.al.	2503.23721	null
2025-05-16	Mixture of Routers	Jia-Chen Zhang et.al.	2503.23362	null
2025-05-25	MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models	Zehua Liu et.al.	2503.23100	null
2025-03-29	S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning	Giang Do et.al.	2503.23007	null
2025-03-29	Sparse Mixture of Experts as Unified Competitive Learning	Giang Do et.al.	2503.22996	null
2025-03-26	Reasoning Beyond Limits: Advances and Open Problems for LLMs	Mohamed Amine Ferrag et.al.	2503.22732	null
2025-04-01	Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities	Raman Dutt et.al.	2503.22517	null
2025-04-29	RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction	Armin Abdollahi et.al.	2503.21971	null
2025-05-08	Binarity at LOw Metallicity (BLOeM): Enhanced multiplicity of early B-type dwarfs and giants at $Z=0.2\,{\rm Z}_\odot$	J. I. Villaseñor et.al.	2503.21936	null
2025-03-27	iMedImage Technical Report	Ran Wei et.al.	2503.21836	null
2025-03-27	LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models	Hengyuan Zhao et.al.	2503.21227	null
2025-05-17	MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness	Zihao Zheng et.al.	2503.21135	null
2025-03-26	Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework	Soham Sane et.al.	2503.20750	null
2025-03-26	UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines	Chen Tang et.al.	2503.20748	null
2025-03-26	Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning	Sashuai Zhou et.al.	2503.20633	null
2025-04-14	MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation	Rongyu Zhang et.al.	2503.20384	null
2025-03-26	Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning	Yousef Sadegheih et.al.	2503.20326	link
2025-03-31	Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion	Konyul Park et.al.	2503.19776	null
2025-04-30	BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts	Suzhe Xu et.al.	2503.19769	null
2025-03-25	M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation	Ziyuan Liu et.al.	2503.19406	null
2025-04-21	Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design	Rui Xie et.al.	2503.18869	null
2025-04-30	Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding	Tianyu Chen et.al.	2503.18578	null
2025-03-24	SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking	Wenrui Cai et.al.	2503.18338	null
2025-04-01	Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding	Ze Zhang et.al.	2503.18104	link
2025-03-22	Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM	Codefuse et.al.	2503.17793	null
2025-03-25	Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts	Yike Yuan et.al.	2503.16057	null
2025-03-21	UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations	Debabrata Mandal et.al.	2503.15868	null
2025-03-20	Mixture of Lookup Experts	Shibo Jie et.al.	2503.15798	link
2025-03-21	Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication	Sin-Yu Huang et.al.	2503.15722	null
2025-04-29	SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation	Thomas Pickard et.al.	2503.15358	null
2025-03-21	Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition	Seungyeon Cho et.al.	2503.14960	null
2025-03-18	Core-Periphery Principle Guided State Space Model for Functional Connectome Classification	Minheng Chen et.al.	2503.14655	null
2025-03-18	DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers	Minglei Shi et.al.	2503.14487	null
2025-03-18	MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts	Runqi Meng et.al.	2503.14355	null
2025-03-18	Frac-Connections: Fractional Extension of Hyper-Connections	Defa Zhu et.al.	2503.14125	null
2025-03-18	SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture	Tian Qin et.al.	2503.13808	null
2025-03-13	Ensemble Learning for Large Language Models in Text and Code Generation: A Survey	Mari Ashiga et.al.	2503.13505	null
2025-03-17	Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge	Shengling Qin et.al.	2503.13421	null
2025-05-10	Channel Estimation for Pinching-Antenna Systems (PASS)	Jian Xiao et.al.	2503.13268	null
2025-03-17	Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation	Yu Liu et.al.	2503.13254	null
2025-05-21	Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps	Mohammad Al-Jarrah et.al.	2503.12633	link
2025-03-16	MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts	Harshit et.al.	2503.12592	null
2025-03-16	MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification	Jianwei Zhao et.al.	2503.12401	null
2025-05-10	Adaptive Mixture of Low-Rank Experts for Robust Audio Spoofing Detection	Qixian Chen et.al.	2503.12010	null
2025-03-14	FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA	Jieming Bian et.al.	2503.11880	null
2025-03-10	MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical Care	Jiaqing Zhang et.al.	2503.11695	null
2025-03-14	A Review of DeepSeek Models’ Key Innovative Techniques	Chengen Wang et.al.	2503.11486	null
2025-03-14	MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling	Rachel S. Y. Teo et.al.	2503.11144	link
2025-03-13	Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores	Chenpeng Wu et.al.	2503.10725	link
2025-05-19	dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis	Luyuan Xie et.al.	2503.10412	null
2025-04-10	Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing	Zecheng Zhao et.al.	2503.10111	link
2025-03-12	MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching	Tairan Xu et.al.	2503.09716	null
2025-03-12	Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework	Bakary Badjie et.al.	2503.09504	null
2025-03-12	Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment	Nazanin Moradinasab et.al.	2503.09498	link
2025-04-01	Astrea: A MOE-based Visual Understanding Model with Progressive Alignment	Xiaoda Yang et.al.	2503.09445	null
2025-03-12	Automatic Operator-level Parallelism Planning for Distributed Deep Learning – A Mixed-Integer Programming Approach	Ruifeng She et.al.	2503.09357	null
2025-03-12	Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference	Mohammad Siavashi et.al.	2503.09304	null
2025-03-13	FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models	Fufangchen Zhao et.al.	2503.09158	null
2025-05-22	MoE-Loco: Mixture of Experts for Multitask Locomotion	Runhan Huang et.al.	2503.08564	null
2025-03-11	BoundarEase: Fostering Constructive Community Engagement to Inform More Equitable Student Assignment Policies	Cassandra Overney et.al.	2503.08543	link
2025-03-11	Accelerating MoE Model Inference with Expert Sharding	Oana Balmau et.al.	2503.08467	null
2025-03-26	Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models	Junzhe Li et.al.	2503.08120	null
2025-03-11	MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models	Han Zhao et.al.	2503.08007	null
2025-03-10	Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM	Yongqiang Yao et.al.	2503.07680	null
2025-04-01	TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster	Kanghui Ning et.al.	2503.07649	null
2025-03-05	BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification	Jing Zhang et.al.	2503.07640	null
2025-03-05	Mixture of Experts Made Intrinsically Interpretable	Xingyi Yang et.al.	2503.07639	null
2025-03-26	GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts	Minwen Liao et.al.	2503.07417	null
2025-04-18	A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications	Siyuan Mu et.al.	2503.07137	link
2025-03-10	VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots	Fu Chen et.al.	2503.07049	link
2025-03-10	ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration	Mengting Ai et.al.	2503.06881	link
2025-03-10	eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference	Suraiya Tairin et.al.	2503.06823	null
2025-03-09	MoFE: Mixture of Frozen Experts Architecture	Jean Seo et.al.	2503.06491	null
2025-03-25	Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models	Nguyen Do et.al.	2503.06413	link
2025-03-08	MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering	Vinay Kumar Verma et.al.	2503.06296	null
2025-03-08	A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts	Wenzhuo Du et.al.	2503.06064	null
2025-03-08	MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model	Miguel Contreras et.al.	2503.06059	null
2025-03-08	GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices	Xudong Lu et.al.	2503.06019	null
2025-03-03	How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model	Diego Vallarino et.al.	2503.05800	null
2025-03-11	Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning	Justin Chih-Yao Chen et.al.	2503.05641	null
2025-03-07	FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework	Jingyu Xu et.al.	2503.05626	null
2025-04-15	Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts	Weigao Sun et.al.	2503.05447	link
2025-03-10	Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs	Ling Team et.al.	2503.05139	null
2025-03-07	Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts	Shwai He et.al.	2503.05066	null
2025-03-06	Continual Pre-training of MoEs: How robust is your router?	Benjamin Thérien et.al.	2503.05029	null
2025-02-25	Comparative Analysis Based on DeepSeek, ChatGPT, and Google Gemini: Features, Techniques, Performance, Future Prospects	Anichur Rahman et.al.	2503.04783	null
2025-03-19	Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining	Houyi Li et.al.	2503.04715	null
2025-03-07	Question-Aware Gaussian Experts for Audio-Visual Question Answering	Hongyeob Kim et.al.	2503.04459	link
2025-03-19	Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling	Yan Li et.al.	2503.04398	null
2025-03-06	A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery	Yiheng Zhu et.al.	2503.04362	null
2025-03-06	Quantum metric induced magneto-optical effects in $\mathcal{PT}$ -symmetric antiferromagnets	Yongpan Li et.al.	2503.04312	null
2025-03-06	DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval	Yating Liu et.al.	2503.04144	null
2025-03-05	VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection	Enkhtogtokh Togootogtokh et.al.	2503.03797	link
2025-03-09	Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs	Haoran Fan et.al.	2503.03594	link
2025-03-06	Convergence Rates for Softmax Gating Mixture of Experts	Huy Nguyen et.al.	2503.03213	null
2025-03-04	MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation	Weihang Wang et.al.	2503.02799	link
2025-03-04	FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting	Congluo Xu et.al.	2503.02692	null
2025-03-06	Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer	Yujiao Yang et.al.	2503.02495	link
2025-03-04	Tabby: Tabular Data Synthesis with Language Models	Sonia Cromp et.al.	2503.02152	null
2025-03-03	ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition	Nastaran Mansourian et.al.	2503.01750	null
2025-03-03	Effective High-order Graph Representation Learning for Credit Card Fraud Detection	Yao Zou et.al.	2503.01556	null
2025-03-03	DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models	Yongqi Huang et.al.	2503.01359	null
2025-03-03	PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation	Linhai Zhang et.al.	2503.01303	null
2025-03-03	Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting	Xiaobin Hong et.al.	2503.01157	null
2025-03-02	Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion	Daiki Nishiyama et.al.	2503.00925	null
2025-03-01	Efficiently Editing Mixture-of-Experts Models with Compressed Experts	Yifei He et.al.	2503.00634	null
2025-03-01	CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering	Tianyu Huai et.al.	2503.00413	null
2025-02-28	CoSMoEs: Compact Sparse Mixture of Experts	Patrick Huber et.al.	2503.00245	null
2025-02-26	Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos	Jiamin Luo et.al.	2503.00049	null
2025-03-01	R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts	Zhongyang Li et.al.	2502.20395	link
2025-02-27	Mixture of Experts for Recognizing Depression from Interview and Reading Tasks	Loukas Ilias et.al.	2502.20213	null
2025-02-27	Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems	Zeyi Ren et.al.	2502.20183	null
2025-02-27	UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook	Yidi Jiang et.al.	2502.20067	null
2025-02-27	AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs	Xuyang Wei et.al.	2502.20035	link
2025-03-04	Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Shulai Zhang et.al.	2502.19811	link
2025-02-27	Extension of SUSY SU(5) GUTs with Nelson-Barr models	Junji Hisano et.al.	2502.19686	null
2025-03-15	Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization	Taishi Nakamura et.al.	2502.19261	null
2025-02-26	OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment	Jiaxin Deng et.al.	2502.18965	null
2025-02-26	Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM	Junxiao Ma et.al.	2502.18863	null
2025-02-25	Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking	Changyuan Zhao et.al.	2502.18118	null
2025-02-09	MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition	Mehran Shabanpour et.al.	2502.17457	null
2025-03-17	The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE	Andrei Chernov et.al.	2502.17391	null
2025-02-24	Delta Decompression for MoE-based LLMs Compression	Hao Gu et.al.	2502.17298	link
2025-02-24	Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks	Andrei Chernov et.al.	2502.17187	null
2025-02-24	Muon is Scalable for LLM Training	Jingyuan Liu et.al.	2502.16982	link
2025-03-07	BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference	Zewen Jin et.al.	2502.16927	null
2025-02-24	ENACT-Heart – ENsemble-based Assessment Using CNN and Transformer on Heart Sounds	Jiho Han et.al.	2502.16914	null
2025-02-26	Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment	Chenghao Fan et.al.	2502.16894	null
2025-02-22	An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning	Masoud Shokrnezhad et.al.	2502.16198	null
2025-02-20	A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models	Mengyang Sun et.al.	2502.15828	link
2025-03-20	Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models	Yuan Sun et.al.	2502.15451	link
2025-03-02	Tight Clusters Make Specialized Experts	Stefan K. Nielsen et.al.	2502.15315	link
2025-02-21	Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction	Baohang Zhou et.al.	2502.15290	link
2025-02-20	Ray-Tracing for Conditionally Activated Neural Networks	Claudio Gallicchio et.al.	2502.14788	null
2025-02-21	ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model	Zhongyi Zhou et.al.	2502.14420	null
2025-02-19	MoM: Linear Sequence Modeling with Mixture-of-Memories	Jusen Du et.al.	2502.13685	link
2025-02-19	Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts	Xin Li et.al.	2502.13577	null
2025-02-18	MoBA: Mixture of Block Attention for Long-Context LLMs	Enzhe Lu et.al.	2502.13189	link
2025-02-18	Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models	Gyeongman Kim et.al.	2502.12947	null
2025-03-13	DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs	Minxuan Lv et.al.	2502.12455	null
2025-02-17	From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs	Kumari Nishu et.al.	2502.12325	null
2025-02-17	Binarity at LOw Metallicity (BLOeM): Multiplicity of early B-type supergiants in the Small Magellanic Cloud	N. Britavskiy et.al.	2502.12239	null
2025-02-17	Accurate Expert Predictions in MoE Inference via Cross-Layer Gate	Zhiyuan Fang et.al.	2502.12224	null
2025-02-17	How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines	Ayan Sengupta et.al.	2502.12051	null
2025-02-17	Connector-S: A Survey of Connectors in Multi-modal Large Language Models	Xun Zhu et.al.	2502.11453	null
2025-02-16	Mixture of Tunable Experts – Behavior Modification of DeepSeek-R1 at Inference Time	Robert Dahlke et.al.	2502.11096	null
2025-02-16	ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models	Shixuan Li et.al.	2502.11059	null
2025-02-15	Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization	Matthew Lyle Olson et.al.	2502.10928	null
2025-02-11	MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition	Sungnyun Kim et.al.	2502.10447	null
2025-04-03	Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution	Bowen Chen et.al.	2502.09654	null
2025-02-14	Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting	Nicholas Dronen et.al.	2502.09500	link
2025-02-12	The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities	Ning Li et.al.	2502.08381	null
2025-02-12	Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification	Xuanze Chen et.al.	2502.08083	null
2025-03-09	Training Sparse Mixture Of Experts Text Embedding Models	Zach Nussbaum et.al.	2502.07972	link
2025-02-11	Memory Analysis on the Training Course of DeepSeek Models	Ping Zhang et.al.	2502.07846	null
2025-02-11	LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid	Weigao Sun et.al.	2502.07563	link
2025-02-11	MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks	Lotfi Abdelkrim Mecharbat et.al.	2502.07422	null
2025-02-11	Online Aggregation of Trajectory Predictors	Alex Tong et.al.	2502.07178	null
2025-02-09	Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline	Zhiyuan Fang et.al.	2502.06888	null
2025-02-12	Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach	Xu Zhang et.al.	2502.06832	null
2025-02-10	MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing	Seokjin Go et.al.	2502.06643	null
2025-02-10	Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE	Haiduo Huang et.al.	2502.06282	link
2025-02-10	Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models	Peiran Wang et.al.	2502.06094	null
2025-02-08	Mol-MoE: Training Preference-Guided Routers for Molecule Generation	Diego Calanzone et.al.	2502.05633	null
2025-02-17	UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA	Jiale Dong et.al.	2502.05602	link
2025-02-07	fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving	Hanfei Yu et.al.	2502.05370	null
2025-02-07	Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts	Roussel Desmond Nzoyem et.al.	2502.05335	null
2025-02-19	Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient	Jan Ludziejewski et.al.	2502.05172	null
2025-02-06	Mixture of neural operator experts for learning boundary conditions and model selection	Dwyer Deighan et.al.	2502.04562	null
2025-02-06	CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Zehua Pei et.al.	2502.04416	link
2025-02-06	Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning	Peizhuang Cong et.al.	2502.03884	null
2025-03-20	A Retrospective Systematic Study on Hierarchical Sparse Query Transformer-assisted Ultrasound Screening for Early Hepatocellular Carcinoma	Chaoyin She et.al.	2502.03772	link
2025-02-05	(GG) MoE vs. MLP on Tabular Data	Andrei Chernov et.al.	2502.03608	null
2025-02-05	RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts	Tuan Truong et.al.	2502.03044	null
2025-03-22	On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation	Nghiem T. Diep et.al.	2502.03029	null
2025-02-05	Scaling Laws for Upcycling Mixture-of-Experts Language Models	Seng Pei Liew et.al.	2502.03009	null
2025-02-04	ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals	Jianan Nie et.al.	2502.02748	null
2025-02-04	Binarity at LOw Metallicity (BLOeM): The multiplicity properties and evolution of BAF-type supergiants	L. R. Patrick et.al.	2502.02644	null
2025-02-04	Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism	Yuhao Qing et.al.	2502.02581	null
2025-02-07	Brief analysis of DeepSeek R1 and its implications for Generative AI	Sarah Mercer et.al.	2502.02523	null
2025-02-04	M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference	Nikhil Bhendawade et.al.	2502.02040	null
2025-02-07	MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation	Haibo Tong et.al.	2502.01719	null
2025-02-27	Omni-Mol: Exploring Universal Convergent Space for Omni-Molecular Tasks	Chengxin Hu et.al.	2502.01074	null
2025-02-17	MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs	Yuhang Zhou et.al.	2502.00997	null
2025-02-03	CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling	Xinze Wang et.al.	2502.00965	null
2025-02-02	UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs	Yufei He et.al.	2502.00806	null
2025-02-02	Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective	Yujin Oh et.al.	2502.00619	null
2025-02-05	Weak-to-Strong Diffusion with Reflection	Lichen Bai et.al.	2502.00473	null
2025-02-01	PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning	Yu Feng et.al.	2502.00354	link
2025-02-01	Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective	Fanqi Yan et.al.	2502.00281	null
2025-01-31	Pheromone-based Learning of Optimal Reasoning Paths	Anirudh Chari et.al.	2501.19278	null
2025-03-03	Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning	Minh Le et.al.	2501.18936	null
2025-01-30	MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability	Yan Sun et.al.	2501.18439	null
2025-02-10	Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework	Jung-Hua Liu et.al.	2501.17903	null
2025-01-29	Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks	Lucio La Cava et.al.	2501.17557	null
2025-01-28	3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow	Yueen Ma et.al.	2501.16698	null
2025-01-27	Searching for GEMS: Discovery and Characterization of Two Brown Dwarfs Around M Dwarfs	Alexander Larsen et.al.	2501.16554	null
2025-02-12	One-for-All Does Not Work! Enhancing Vulnerability Detection by Mixture-of-Experts (MoE)	Xu Yang et.al.	2501.16454	null
2025-01-29	Mixture of Experts (MoE): A Big Data Perspective	Wensheng Gan et.al.	2501.16352	null
2025-01-27	Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference	Yinghan Li et.al.	2501.16103	null
2025-01-25	ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning	Shangqian Gao et.al.	2501.15316	null
2025-03-16	FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts	Ziqi Liu et.al.	2501.15125	link
2025-01-25	Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning	Ziyu Zhao et.al.	2501.15103	null
2025-01-24	Mean-field limit from general mixtures of experts to quantum neural networks	Anderson Melchor Hernandez et.al.	2501.14660	null
2025-01-30	Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation	Shengzhe Zhang et.al.	2501.14269	link
2025-03-12	Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images	Zeyun Deng et.al.	2501.14198	null
2025-01-23	CSAOT: Cooperative Multi-Agent System for Active Object Tracking	Hy Nguyen et.al.	2501.13994	null
2025-01-22	Autonomy-of-Experts Models	Ang Lv et.al.	2501.13074	null
2025-02-07	LLM4WM: Adapting LLM for Wireless Multi-Tasking	Xuanyu Liu et.al.	2501.12983	null
2025-01-22	UniUIR: Considering Underwater Image Restoration as An All-in-One Learner	Xu Zhang et.al.	2501.12981	null
2025-01-22	BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR	Guodong Ma et.al.	2501.12602	null
2025-02-26	Modality Interactive Mixture-of-Experts for Fake News Detection	Yifan Liu et.al.	2501.12431	link
2025-01-21	SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection	Xiaocheng Zhang et.al.	2501.12430	null
2025-01-25	Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models	Samira Abnar et.al.	2501.12370	null
2025-01-21	MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks	Qishen Zhou et.al.	2501.12281	link
2025-02-04	Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models	Zihan Qiu et.al.	2501.11873	null
2025-01-18	FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models	Xinglin Pan et.al.	2501.10714	null
2024-12-16	DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference	Yujie Zhang et.al.	2501.10375	null
2025-01-17	OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning	Jinyuan Feng et.al.	2501.10062	null
2025-01-17	LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading	Kuan-Ming Liu et.al.	2501.09636	null
2025-01-16	MoE $^2$ : Optimizing Collaborative Inference for Edge Large Language Models	Lyudong Jin et.al.	2501.09410	null
2025-01-14	MiniMax-01: Scaling Foundation Models with Lightning Attention	MiniMax et.al.	2501.08313	null
2025-01-14	Guiding polaritonic energy and momentum through two-dimensional Bravais lattices	Zhonglin Li et.al.	2501.08123	null
2025-02-11	GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism	Chen Tang et.al.	2501.07890	null
2025-01-18	PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration	Xiaoshui Huang et.al.	2501.07762	null
2025-01-13	A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis	Binyu Zhang et.al.	2501.07016	link
2025-01-12	Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning	Hanwen Zhong et.al.	2501.06884	link
2025-01-12	A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context	Noureldin Zahran et.al.	2501.06859	null
2025-03-18	TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning	Yinghao Zhu et.al.	2501.05661	link
2025-01-09	Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing	Mengfan Liu et.al.	2501.05313	null
2025-01-07	LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes	Xiang Xu et.al.	2501.04004	link
2025-01-07	mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training	Xudong Liao et.al.	2501.03905	null
2025-01-08	Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection	Donatella Genovese et.al.	2501.03432	null
2025-01-06	Solving the Porous Medium Equation with the eXtreme Mesh deformation approach (X-Mesh)	Alexandre Chemin et.al.	2501.03083	null
2025-01-05	Soft and Compliant Contact-Rich Hair Manipulation and Care	Uksang Yoo et.al.	2501.02630	null
2025-01-12	Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning	Zhongyi Zhou et.al.	2501.02198	null
2025-03-18	MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders	Jiajun Cao et.al.	2501.01709	null
2025-01-01	REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization	Huyen Nguyen et.al.	2501.00779	null
2025-01-06	Superposition in Transformers: A Novel Way of Building Mixture of Experts	Ayoub Ben Chaliah et.al.	2501.00530	link
2024-12-31	CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection	Xiaolei Wang et.al.	2501.00346	null
2024-12-30	SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection	Yuxuan Li et.al.	2412.20665	link
2024-12-29	Multimodal Variational Autoencoder: a Barycentric View	Peijie Qiu et.al.	2412.20487	null
2025-03-05	A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement	Sidra Nasir et.al.	2412.20468	null
2024-12-29	Mind the Data Gap: Bridging LLMs to Enterprise Data Integration	Moe Kayali et.al.	2412.20331	null
2025-03-09	UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity	Jingbo Lin et.al.	2412.20157	link
2024-12-28	Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection	Yaning Zhang et.al.	2412.20156	null
2025-02-18	DeepSeek-V3 Technical Report	DeepSeek-AI et.al.	2412.19437	link
2024-12-26	AskChart: Universal Chart Understanding through Textual Enhancement	Xudong Yang et.al.	2412.19146	link
2024-12-30	Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection	Xiaoyu Huang et.al.	2412.19108	null
2024-12-26	DAPoinTr: Domain Adaptive Point Transformer for Point Cloud Completion	Yinghui Li et.al.	2412.19062	link
2025-03-10	Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making	David Shoresh et.al.	2412.18593	link
2024-12-24	BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing	Yingjie Ma et.al.	2412.18065	link
2024-12-23	UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition	Li Fu et.al.	2412.17507	null
2025-02-01	BrainMAP: Learning Multiple Activation Pathways in Brain Networks	Song Wang et.al.	2412.17404	link
2024-12-23	Efficient fine-tuning methodology of text embedding models for information retrieval: contrastive learning penalty (clp)	Jeongsu Yu et.al.	2412.17364	link
2024-12-22	The Fermat curves and arrangements of lines and conics	Nils Peder Astrup Toft et.al.	2412.16993	null
2024-12-22	Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models	Elie Antoine et.al.	2412.16971	null
2024-12-18	GraphLoRA: Empowering LLMs Fine-Tuning via Graph Collaboration of MoE	Ting Bai et.al.	2412.16216	null
2024-12-20	Theory of Mixture-of-Experts for Mobile Edge Computing	Hongbo Li et.al.	2412.15690	null
2024-12-19	MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale	Swapnil Gandhi et.al.	2412.15411	null
2025-01-03	Qwen2.5 Technical Report	Qwen et.al.	2412.15115	link
2025-02-27	ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing	Ziteng Wang et.al.	2412.14711	link
2025-01-22	A Survey on Inference Optimization Techniques for Mixture of Experts Models	Jiacheng Liu et.al.	2412.14219	link
2024-12-18	SEKE: Specialised Experts for Keyword Extraction	Matej Martinc et.al.	2412.14087	link
2024-12-18	MedCoT: Medical Chain of Thought via Hierarchical Expert	Jiaxiang Liu et.al.	2412.13736	link
2024-12-17	SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks	Mátyás Vincze et.al.	2412.13053	null
2024-12-17	Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning	Moritz Reuss et.al.	2412.12953	null
2025-01-09	CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition	He Wang et.al.	2412.12760	null
2024-12-16	Investigating Mixture of Experts in Dense Retrieval	Effrosyni Sokli et.al.	2412.11864	null
2024-12-20	Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture	Jingze Shi et.al.	2412.11834	link
2024-12-16	Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation	Svetlana Pavlitska et.al.	2412.11608	link
2024-12-16	Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture	Jingyu Xu et.al.	2412.11557	null
2024-12-14	DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification	Yuhao Wang et.al.	2412.10650	link
2024-12-13	DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Zhiyu Wu et.al.	2412.10302	link
2024-12-13	Llama 3 Meets MoE: Efficient Upcycling	Aditya Vavre et.al.	2412.09952	link
2024-12-20	Memory Layers at Scale	Vincent-Pierre Berges et.al.	2412.09764	link
2025-01-10	Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Xiaoshuang Huang et.al.	2412.09278	link
2024-12-12	MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning	Lulu Zhao et.al.	2412.08946	null
2024-11-26	Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection	Tzu-Ting Yang et.al.	2412.08651	null
2025-01-18	Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective	Minh Le et.al.	2412.08285	null
2025-02-12	Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification	Xuanze Chen et.al.	2412.08193	link
2024-12-10	MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning	Yufei Ma et.al.	2412.07405	null
2024-12-10	Post-Training Statistical Calibration for Higher Activation Sparsity	Vui Seng Chua et.al.	2412.07174	link
2025-03-02	MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	Yao Fu et.al.	2412.07067	null
2024-12-07	Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts	Arturo Rodriguez et.al.	2412.06842	null
2024-12-09	Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset	Xiao Wang et.al.	2412.06647	link
2024-12-09	UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts	Zhen Wan et.al.	2412.06340	null
2024-12-08	Hallucination-aware Optimization for Large Language Model-empowered Communications	Yinqiu Liu et.al.	2412.06007	link
2024-12-10	An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism	Qing Zhang et.al.	2412.05821	null
2024-12-10	RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts	Xu Liu et.al.	2412.05679	link
2024-12-07	SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts	Gengze Zhou et.al.	2412.05552	link
2024-12-07	Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers	Boxun Xu et.al.	2412.05540	null
2024-12-23	Steps are all you need: Rethinking STEM Education with Prompt Engineering	Krishnasai Addala et.al.	2412.05023	null
2024-12-05	Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts	Chenyang Zhu et.al.	2412.04220	null
2025-03-02	Monet: Mixture of Monosemantic Experts for Transformers	Jungwoo Park et.al.	2412.04139	link
2024-12-05	Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks	Zhaoyang Liu et.al.	2412.03850	null
2024-12-04	Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond	Loukas Ilias et.al.	2412.03483	null
2024-12-03	CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting	Hao Chen et.al.	2412.02503	null
2025-02-14	MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption	Siddhant Dutta et.al.	2412.01858	null
2025-01-22	Yi-Lightning Technical Report	Alan Wake et.al.	2412.01253	null
2024-11-30	Mixture of Experts for Node Classification	Yu Shi et.al.	2412.00418	null
2025-01-22	HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting	Shaohan Yu et.al.	2412.00316	null
2024-11-27	Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference	Andrii Skliar et.al.	2412.00099	null
2025-02-16	Condense, Don’t Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning	Mingyu Cao et.al.	2412.00069	link
2024-11-29	LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References	Shuguo Jiang et.al.	2411.19758	null
2024-11-28	On the effectiveness of discrete representations in sparse mixture of experts	Giang Do et.al.	2411.19402	null
2024-11-28	Bayesian Cluster Weighted Gaussian Models	Panagiotis Papastamoulis et.al.	2411.18957	link
2024-11-27	UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS	Haomin Zhuang et.al.	2411.18797	null
2024-11-27	Complexity Experts are Task-Discriminative Learners for Any Image Restoration	Eduard Zamfir et.al.	2411.18466	null
2024-11-27	Mixture of Experts in Image Classification: What’s the Sweet Spot?	Mathurin Videau et.al.	2411.18322	null
2024-11-26	$H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs	Selim Furkan Tekin et.al.	2411.17792	link
2024-11-26	The Tempered Finite Element Method	Antoine Quiriny et.al.	2411.17564	null
2024-11-25	Staleness-Centric Optimizations for Efficient Diffusion MoE Inference	Jiajun Luo et.al.	2411.16786	null
2024-12-02	MH-MoE: Multi-Head Mixture-of-Experts	Shaohan Huang et.al.	2411.16205	null
2024-11-25	LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy	Peng Cui et.al.	2411.16095	null
2024-11-24	Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution	Haiquan Wang et.al.	2411.15871	null
2024-11-24	LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training	Xiaoye Qu et.al.	2411.15708	link
2024-11-23	Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts	Qizhou Chen et.al.	2411.15432	null
2024-11-23	Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation	Fahao Chen et.al.	2411.15419	null
2024-11-21	Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning	Jiange Yang et.al.	2411.14519	null
2024-11-20	MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification	Yuxuan Chen et.al.	2411.13004	null
2024-11-23	KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning	Ming Yin et.al.	2411.12950	null
2025-02-06	Ultra-Sparse Memory Network	Zihao Huang et.al.	2411.12364	null
2025-01-28	CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese Characters	Zishuo Feng et.al.	2411.11770	link
2024-11-18	MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao et.al.	2411.11217	null
2024-11-16	Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts	Jinqiang Long et.al.	2411.10669	link
2024-11-15	Weakly-Supervised Multimodal Learning on MIMIC-CXR	Andrea Agostini et.al.	2411.10356	link
2024-11-21	Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models	Wei Wang et.al.	2411.10003	null
2024-11-13	Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection	Vima Gupta et.al.	2411.08982	null
2024-11-13	Sparse Upcycling: Inference Inefficient Finetuning	Sasha Doubov et.al.	2411.08968	null
2024-11-13	LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing	Xiaonan Nie et.al.	2411.08446	null
2024-11-12	Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach	Renzi Wang et.al.	2411.08232	null
2024-11-12	PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model	Yilun Liu et.al.	2411.08212	null
2024-11-08	Biodynamic Analysis of Alpine Skiing with a Skier-Ski-Snow Interaction Model	Nan Gao et.al.	2411.08056	null
2024-11-12	Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge	Emmanuel Azuh Mensah et.al.	2411.07834	null
2024-11-11	Adaptive Conditional Expert Selection Network for Multi-domain Recommendation	Kuiyao Dong et.al.	2411.06826	null
2024-11-11	WDMoE: Wireless Distributed Mixture of Experts for Large Language Models	Nan Xue et.al.	2411.06681	null
2024-11-09	Learning Mixtures of Experts with EM	Quentin Fruytier et.al.	2411.06056	null
2024-11-08	NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts	Yen-Ting Lin et.al.	2411.05945	null
2024-11-05	DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts	Zelin Yao et.al.	2411.03025	link
2024-11-05	Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts	Yuan Xie et.al.	2411.02787	null
2024-11-27	SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models	Jianyi Zhang et.al.	2411.02433	link
2024-11-06	Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	Xingwu Sun et.al.	2411.02265	null
2024-12-27	FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation	Ziwei Zhan et.al.	2411.02115	null
2024-11-06	Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis	Mohammad Zbeeb et.al.	2411.01929	link
2025-02-10	RS-MoE: A Vision-Language Model with Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering	Hui Lin et.al.	2411.01595	null
2025-02-10	Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation	Mingrui Liu et.al.	2411.01457	null
2024-11-06	HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference	Peng Tang et.al.	2411.01433	null
2024-12-12	HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy	Shuqing Luo et.al.	2411.01288	link
2024-11-02	PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment	Dongxu Liu et.al.	2411.01245	null
2024-11-01	MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition	Cheng Yang et.al.	2411.01016	null
2024-11-01	LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models	Nam V. Nguyen et.al.	2411.00918	link
2024-10-16	TradExpert: Revolutionizing Trading with Mixture of Expert LLMs	Qianggang Ding et.al.	2411.00782	null
2024-11-01	MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization	Jingming Guo et.al.	2411.00662	link
2024-11-01	A Fast, Analytic Empirical Model of the Gaia Data Release 3 Astrometric Orbit Catalog Selection Function	Casey Y. Lam et.al.	2411.00654	link
2024-10-31	Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts	Xiang Deng et.al.	2410.23836	null
2024-10-30	Efficient and Interpretable Grammatical Error Correction with Mixture of Experts	Muhammad Reza Qorib et.al.	2410.23507	link
2024-10-30	Stealing User Prompts from Mixture of Experts	Itay Yona et.al.	2410.22884	null
2024-10-30	MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning	Xujia Wang et.al.	2410.22782	null
2025-02-08	ProMoE: Fast MoE-based LLM Serving using Proactive Caching	Xiaoniu Song et.al.	2410.22134	null
2024-10-29	Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging	Li Shen et.al.	2410.21804	null
2024-10-29	Neural Experts: Mixture of Experts for Implicit Neural Representations	Yizhak Ben-Shabat et.al.	2410.21643	null
2024-11-07	FinTeamExperts: Role Specialized MOEs For Financial Analysis	Yue Yu et.al.	2410.21338	null
2024-10-28	Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving	Jiyao Wang et.al.	2410.21086	null
2024-10-27	Towards a Blockchain and Opportunistic Edge Driven Metaverse of Everything	Paula Fraga-Lamas et.al.	2410.20594	null
2024-10-27	Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation	Maohao Shen et.al.	2410.20336	null
2024-10-27	GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields	Yusuke Sekikawa et.al.	2410.20306	null
2024-11-12	LLMs Can Evolve Continually on Modality for X-Modal Reasoning	Jiazuo Yu et.al.	2410.20178	link
2024-10-25	DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction	Zelin Zang et.al.	2410.19504	link
2025-01-27	Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis	Weikai Li et.al.	2410.19225	link
2024-10-24	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai et.al.	2410.19123	link
2024-10-24	Mixture of Parrots: Experts improve memorization more than reasoning	Samy Jelassi et.al.	2410.19034	null
2024-10-24	MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases	Zhisheng Lin et.al.	2410.18406	null
2024-10-23	Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches	Kexin Feng et.al.	2410.18298	null
2024-10-23	MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning	Jingfan Zhang et.al.	2410.18035	null
2024-10-24	ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference	Xin He et.al.	2410.17954	null
2024-10-23	Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition	Artem Basharin et.al.	2410.17765	null
2024-10-22	Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling	Jialong Li et.al.	2410.17043	null
2024-10-21	LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset	Ruikun Zhang et.al.	2410.16095	link
2024-10-22	CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts	Zhenpeng Su et.al.	2410.16077	link
2024-10-29	Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Qiao Sun et.al.	2410.15774	link
2024-11-23	ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts	Xumeng Han et.al.	2410.15732	null
2024-10-20	Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs	Xin Zhou et.al.	2410.15438	null
2024-11-16	LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration	Yuang Ai et.al.	2410.15385	link
2024-10-19	MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning	Suning Huang et.al.	2410.14972	null
2024-10-29	Collaboratively adding new knowledge to an LLM	Rhui Dih Lee et.al.	2410.14753	link
2024-10-18	MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts	Rachel S. Y. Teo et.al.	2410.14574	link
2024-10-18	Towards a Simple and Extensible Standard for Object-Centric Event Data (OCED) – Core Model, Design Space, and Lessons Learned	Dirk Fahland et.al.	2410.14495	link
2024-10-18	ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction	Haoyu He et.al.	2410.14099	link
2024-10-17	Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks	Jinze Zhao et.al.	2410.13964	null
2024-10-18	MoR: Mixture of Ranks for Low-Rank Adaptation Tuning	Chuanyu Tang et.al.	2410.13408	null
2024-10-16	Satellite-Terrestrial Quantum Networks and the Global Quantum Internet	Andrea Conti et.al.	2410.13096	null
2024-10-16	On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs	Herun Wan et.al.	2410.12600	null
2024-10-16	Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion	Minkyoung Cho et.al.	2410.12592	null
2024-10-16	Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts	Fanqi Yan et.al.	2410.12258	null
2025-01-03	EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference	Yulei Qian et.al.	2410.12247	null
2024-10-15	MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router	Yanyue Xie et.al.	2410.12013	null
2024-10-15	MoH: Multi-Head Attention as Mixture-of-Head Attention	Peng Jin et.al.	2410.11842	link
2024-10-15	GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation	Fei Tang et.al.	2410.11841	link
2024-10-15	Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models	James Vo et.al.	2410.11654	null
2024-10-16	Quadratic Gating Functions in Mixture of Experts: A Statistical Insight	Pedram Akbarian et.al.	2410.11222	null
2024-10-19	AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approach	Xurui Li et.al.	2410.10896	null
2024-10-01	Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models	Keivan Alizadeh et.al.	2410.10846	null
2024-10-16	Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free	Ziyue Li et.al.	2410.10814	link
2024-10-14	Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts	Guorui Zheng et.al.	2410.10626	link
2024-10-14	Learning to Ground VLMs without Forgetting	Aritra Bhowmik et.al.	2410.10491	null
2024-10-14	Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts	Xu Liu et.al.	2410.10469	null
2024-10-15	Ada-K Routing: Boosting the Efficiency of MoE-based LLMs	Tongtian Yue et.al.	2410.10456	null
2024-10-14	Tighter Risk Bounds for Mixtures of Experts	Wissam Akretche et.al.	2410.10397	null
2024-10-24	Scalable Multi-Domain Adaptation of Language Models using Modular Experts	Peter Schafhalter et.al.	2410.10181	null
2024-10-16	Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models	Jun Luo et.al.	2410.10114	null
2024-10-14	AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality	Peijun Qing et.al.	2410.10054	link
2024-10-13	ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL	Zhanqiu Guo et.al.	2410.09781	null
2024-10-13	MoIN: Mixture of Introvert Experts to Upcycle an LLM	Ajinkya Tejankar et.al.	2410.09687	null
2024-10-12	GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks	Dingyi Zhuang et.al.	2410.09570	null
2024-10-11	Semi-Supervised Learning of Noisy Mixture of Experts Models	Oh-Ran Kwon et.al.	2410.09039	null
2024-10-11	Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering	I-Chun Chen et.al.	2410.08589	null
2024-10-31	Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts	Sukwon Yun et.al.	2410.08245	link
2024-11-20	Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training	Gen Luo et.al.	2410.08202	null
2024-10-10	Efficient Dictionary Learning with Switch Sparse Autoencoders	Anish Mudide et.al.	2410.08201	link
2024-10-18	More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing	Sagi Shaier et.al.	2410.08003	null
2024-10-10	SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture	Jiayi Han et.al.	2410.07739	null
2024-10-10	Upcycling Large Language Models into Mixture of Experts	Ethan He et.al.	2410.07524	null
2024-10-09	User Feedback in Continuous Software Engineering: Revealing the State-of-Practice	Anastasiia Tkalich et.al.	2410.07459	null
2024-10-11	MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts	Peng Jin et.al.	2410.07348	null
2024-10-04	A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles	Diego Vallarino et.al.	2410.07234	null
2024-10-09	Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders	David Noever et.al.	2410.06462	null
2024-10-09	Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs	Ruijia Niu et.al.	2410.06431	null
2024-10-08	Probing the Robustness of Theory of Mind in Large Language Models	Christian Nickel et.al.	2410.06271	null
2024-10-08	MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More	Wei Huang et.al.	2410.06270	link
2024-12-17	Aria: An Open Multimodal Native Mixture-of-Experts Model	Dongxu Li et.al.	2410.05993	link
2024-10-08	Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models	Siqi Wang et.al.	2410.05661	null
2024-12-05	Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild	Xinyu Zhao et.al.	2410.05357	link
2024-10-07	Multimodal Fusion Strategies for Mapping Biophysical Landscape Features	Lucia Gordon et.al.	2410.04833	link
2024-10-06	Realizing Video Summarization from the Path of Language-based Semantic Understanding	Kuan-Chen Mu et.al.	2410.04511	null
2024-10-09	Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding	Wei Wu et.al.	2410.03553	null
2024-10-04	Exploring the Benefit of Activation Sparsity in Pre-training	Zhengyan Zhang et.al.	2410.03440	link
2024-10-03	MLP-KAN: Unifying Deep Representation and Function Learning	Yunhong He et.al.	2410.03027	link
2024-10-03	On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions	Huy Nguyen et.al.	2410.02935	null
2024-10-03	Neutral residues: revisiting adapters for model extension	Franck Signe Talla et.al.	2410.02744	null
2024-10-03	Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping	Ziye Huang et.al.	2410.02475	null
2024-10-03	MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction	Zhaojian Yu et.al.	2410.02241	null
2024-10-03	Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts	Minh Le et.al.	2410.02200	null
2024-10-04	Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices	Andres Potapczynski et.al.	2410.02117	link
2024-10-04	EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing	Haotian Sun et.al.	2410.02098	null
2024-10-02	Don’t flatten, tokenize! Unlocking the key to SoftMoE’s efficacy in deep RL	Ghada Sokar et.al.	2410.01930	null
2024-09-15	Integrating AI’s Carbon Footprint into Risk Management Frameworks: Strategies and Tools for Sustainable Compliance in Banking Sector	Nataliya Tkachenko et.al.	2410.01818	null
2024-10-02	Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models	Shayekh Bin Islam et.al.	2410.01782	link
2024-10-02	TIC 290061484: A Triply Eclipsing Triple System with the Shortest Known Outer Period of 24.5 Days	Veselin B. Kostov et.al.	2410.01711	null
2024-10-02	Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging	Tingfeng Hui et.al.	2410.01610	null
2024-10-02	The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Hong Li et.al.	2410.01417	null
2024-10-01	MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards	Sheng Wang et.al.	2410.00938	null
2024-10-01	UniAdapt: A Universal Adapter for Knowledge Calibration	Tai D. Nguyen et.al.	2410.00454	null
2024-10-01	Robust Traffic Forecasting against Spatial Shift over Years	Hongjun Wang et.al.	2410.00373	link
2024-09-29	IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method	Chaohui Xu et.al.	2410.00059	null
2024-09-30	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Haotian Zhang et.al.	2409.20566	null
2024-09-30	HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models	Bingshen Mu et.al.	2409.19878	null
2024-10-02	CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling	Jihai Zhang et.al.	2409.19291	link
2024-11-12	SciDFM: A Large Language Model with Mixture-of-Experts for Science	Liangtai Sun et.al.	2409.18412	null
2024-11-01	Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE	Xun Zhu et.al.	2409.17508	link
2024-09-26	A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction	Guangyu Wang et.al.	2409.17440	link
2024-09-24	Leveraging Mixture of Experts for Improved Speech Deepfake Detection	Viola Negroni et.al.	2409.16077	null
2024-10-02	Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts	Xiaoming Shi et.al.	2409.16040	link
2024-10-31	Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM	Fengrun Zhang et.al.	2409.15905	null
2024-09-24	Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks	Jiayi He et.al.	2409.15695	null
2024-12-13	A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts	Hugo Inzirillo et.al.	2409.15161	link
2024-09-23	Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond	Hong Chen et.al.	2409.14993	null
2024-09-21	Routing in Sparsely-gated Language Models responds to Context	Stefan Arnold et.al.	2409.14107	null
2024-10-01	On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists	Dongyang Fan et.al.	2409.13931	link
2024-09-20	Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning	Annette Spooner et.al.	2409.13791	null
2024-09-19	On the rationality problem for hypersurfaces	Jan Lange et.al.	2409.12834	null
2024-09-19	Retrieval-Augmented Test Generation: How Far Are We?	Jiho Shin et.al.	2409.12682	null
2024-09-19	Robust Audiovisual Speech Recognition Models with Mixture-of-Experts	Yihan Wu et.al.	2409.12370	null
2024-09-18	Mixture of Diverse Size Experts	Manxi Sun et.al.	2409.12210	null
2024-09-18	GRIN: GRadient-INformed MoE	Liyuan Liu et.al.	2409.12136	null
2024-09-18	Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0	Zhiyong Wang et.al.	2409.11909	null
2024-09-17	LPT++: Efficient Training on Mixture of Long-tailed Experts	Bowen Dong et.al.	2409.11323	null
2024-12-09	LOLA – An Open-Source Massively Multilingual Large Language Model	Nikit Srivastava et.al.	2409.11272	link
2024-09-16	Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression	Yi-Hsin Li et.al.	2409.10101	null
2024-11-20	MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving	Enming Zhang et.al.	2409.07267	link
2024-09-10	DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models	Maryam Akhavan Aghdam et.al.	2409.06669	null
2024-09-10	STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning	Jaeseong Lee et.al.	2409.06211	null
2024-10-31	VE: Modeling Multivariate Time Series Correlation with Variate Embedding	Shangjiong Wang et.al.	2409.06169	link
2024-09-09	Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models	Hongyang Lei et.al.	2409.05929	null
2024-09-09	Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks	Bo Xu et.al.	2409.05726	null
2024-09-09	Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection	Tianwu Lei et.al.	2409.05611	null
2024-09-06	Hot Stars in the GALEX Ultraviolet Sky Surveys (GUVcat_AISxSDSS_HS) and the Binary Fraction of Hot Evolved Stars	Luciana Bianchi et.al.	2409.04626	null
2024-09-05	Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions	Zemian Ke et.al.	2409.03282	null
2024-09-05	ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding	Zhengzhuo Xu et.al.	2409.03277	null
2024-09-05	xLAM: A Family of Large Action Models to Empower AI Agent Systems	Jianguo Zhang et.al.	2409.03215	link
2024-09-04	Configurable Foundation Models: Building LLMs from a Modular Perspective	Chaojun Xiao et.al.	2409.02877	null
2024-09-04	Pluralistic Salient Object Detection	Xuelu Feng et.al.	2409.02368	null
2024-09-03	OLMoE: Open Mixture-of-Experts Language Models	Niklas Muennighoff et.al.	2409.02060	link
2024-09-05	Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model	Hukai Huang et.al.	2409.02050	null
2024-09-03	BEAVER: An Enterprise Benchmark for Text-to-SQL	Peter Baile Chen et.al.	2409.02038	null
2024-09-03	Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information	Xinyu Zhang et.al.	2409.01605	null
2024-09-02	Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning	Soumajyoti Sarkar et.al.	2409.01483	null
2024-09-02	Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching	Sungmin Yun et.al.	2409.01141	null
2024-09-04	Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack	Guanzhong Chen et.al.	2409.00960	link
2024-09-02	Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts	Youngseog Chung et.al.	2409.00879	null
2024-09-11	Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts	Rhui Dih Lee et.al.	2408.17280	null
2024-08-29	Gradient-free variational learning with conditional mixture networks	Conor Heins et.al.	2408.16429	link
2024-09-07	Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models	Yuncheng Yang et.al.	2408.15915	link
2024-08-28	Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts	Nikolas Gritsch et.al.	2408.15901	null
2024-10-23	LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	Fangxun Shu et.al.	2408.15881	link
2024-08-28	Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts	Lean Wang et.al.	2408.15664	null
2024-08-27	Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis	Sakhinana Sagar Srinivas et.al.	2408.15305	null
2024-08-28	A Survey of Large Language Models for European Languages	Wazir Ali et.al.	2408.15040	null
2024-08-27	MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce	Hao Jiang et.al.	2408.14968	null
2024-08-24	Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings	Sagar Srinivas Sakhinana et.al.	2408.13622	null
2024-09-11	Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler	Yikang Shen et.al.	2408.13359	null
2024-10-30	The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities	Venkatesh Balavadhani Parthasarathy et.al.	2408.13296	null
2024-08-23	Guiding IoT-Based Healthcare Alert Systems with Large Language Models	Yulan Gao et.al.	2408.13071	null
2024-08-23	O-Mamba: O-shape State-Space Model for Underwater Image Enhancement	Chenyu Dong et.al.	2408.12816	link
2024-08-23	DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation	Xiaowei Mao et.al.	2408.12809	null
2024-08-23	Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth	Yuxiang Wei et.al.	2408.12803	null
2024-08-23	La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection	Hang Zou et.al.	2408.12793	null
2024-10-02	SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging	Mohammadreza Pourreza et.al.	2408.12733	null
2024-08-22	Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Jamba Team et.al.	2408.12570	null
2024-09-09	Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators	Dingkang Yang et.al.	2408.12325	null
2024-08-15	FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models	Zhongyu Zhao et.al.	2408.11855	link
2024-08-21	MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing	Hao Zhou et.al.	2408.11396	link
2024-08-21	KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting?	Xiao Han et.al.	2408.11306	link
2024-08-21	FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts	Hanzi Mei et.al.	2408.11304	null
2024-08-27	Unboxing Occupational Bias: Grounded Debiasing of LLMs with U.S. Labor Data	Atmika Gorti et.al.	2408.11247	null
2024-08-25	Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting	Jianxiang Zhou et.al.	2408.10822	link
2024-08-20	AnyGraph: Graph Foundation Model in the Wild	Lianghao Xia et.al.	2408.10700	link
2024-08-20	HMoE: Heterogeneous Mixture of Experts for Language Modeling	An Wang et.al.	2408.10681	null
2024-08-19	AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference	Shuzhang Zhong et.al.	2408.10284	link
2024-10-29	FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models	Xiaochen Wang et.al.	2408.10276	link
2024-08-26	SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models	Anke Tang et.al.	2408.10174	link
2024-11-01	Customizing Language Models with Instance-wise LoRA for Sequential Recommendation	Xiaoyu Kong et.al.	2408.10159	link
2024-08-19	A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method	Hang Zou et.al.	2408.09752	null
2024-08-16	Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection	Haohao Zhu et.al.	2408.08551	null
2024-08-17	BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts	Qizhen Zhang et.al.	2408.08274	null
2024-05-21	Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts	Yunxin Li et.al.	2405.11273	null
2024-05-31	Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models	Xudong Lu et.al.	2402.14800	null
2024-10-29	GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts	Shirley Wu et.al.	2312.04693	null
2023-09-12	Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning	Ted Zadouri et.al.	2309.05444	null
2023-04-25	Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism	Xin Chen et.al.	2304.11414	null
2018-06-22	Mixtures of Experts Models	Isobel Claire Gormley et.al.	1806.08200	null

Speculative Decoding

Publish Date	Title	Authors	PDF	Code
2026-05-22	Enhancing Energy Efficiency in Scientific Workflows through CFD based PIVAEs	Ali Zahir et.al.	2605.23850	null
2026-05-22	Misleading Microbenchmarks on the Java Virtual Machines	Filippo Schiavio et.al.	2605.23570	null
2026-05-22	Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving	Kewei Zhang et.al.	2605.23163	null
2026-05-22	SolarChain: Bridging Physical Law, Verifiable Trust, and Sustainable Markets for Urban Energy Resilience	Shilin Ou et.al.	2605.23162	null
2026-05-21	ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU	Aman Sunesh et.al.	2605.23057	null
2026-05-21	No Blue without Red: Evolutionary Properties of Super-Early Galaxies	A. Ferrara et.al.	2605.22914	null
2026-05-21	IdleSpec: Exploiting Idle Time via Speculative Planning for LLM Agents	Daewon Choi et.al.	2605.22154	null
2026-05-21	SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents	Mehrdad Saberi et.al.	2605.21965	null
2026-05-20	Hybrid Improper Ferroelectricity and Moiré Superlattices-induced Exciton Quantization in Layered 2D Halide Perovskite	Sanika S. Padelkar et.al.	2605.21449	null
2026-05-20	Frontier: Towards Comprehensive and Accurate LLM Inference Simulation	Yicheng Feng et.al.	2605.21312	null
2026-05-20	Multimodal Emotion Recognition with Large Language Models	Hongrui Zhang et.al.	2605.21239	null
2026-05-20	The benefit of a multi-band high resolution spectroscopic monitoring for studying stellar transients: the NGC 300 OT2008-1 UVES spectrum as a test case	Elena Mason et.al.	2605.21156	null
2026-05-19	Fifty Years of Transaction Processing Research (extended)	Philip A. Bernstein et.al.	2605.20466	null
2026-05-19	Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding	Yuhao Shen et.al.	2605.20104	null
2026-05-19	FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration	Yaojie Zhang et.al.	2605.20022	null
2026-05-20	SSV: Sparse Speculative Verification for Efficient LLM Inference	Zhibin Wang et.al.	2605.19893	null
2026-05-19	Exploring and Developing a Pre-Model Safeguard with Draft Models	Hongyu Cai et.al.	2605.19321	null
2026-05-18	Testing Black Holes with Interstellar Missions: I. Orbiting Probes	Leda Gao et.al.	2605.19176	null
2026-05-18	Designing On-Chain Options: Amortizing Perpetual Options	Maxim Bichuch et.al.	2605.19146	null
2026-05-18	KVBuffer: IO-aware Serving for Linear Attention	Longwei Zou et.al.	2605.19049	null
2026-05-18	Quantum Sidecar Architectures for Hybrid AI Training and Inference: Stateful Protected Registers, Stateless Reset-and-Reprepare Circuits and Quantum Weight-State Outlook	Y. Mo et.al.	2605.18031	null
2026-05-17	VeriCache: Turning Lossy KV Cache into Lossless LLM Inference	Jiayi Yao et.al.	2605.17613	null
2026-05-17	Federated Stream-Processing and Latency-Gated Response for Cross-Sector Threat Detection and Collaborative Containment	Namit Mohale et.al.	2605.17325	null
2026-05-17	TClone: Low-Latency Forking of Live GUI Environments for Computer-Use Agents	Yutong Huang et.al.	2605.17320	null
2026-05-16	Artificial Adaptive Intelligence: The Missing Stage Between Narrow and General Intelligence	Boris Kriuk et.al.	2605.16844	null
2026-05-16	Lever: Speculative LLM Inference on Smartphones	Tuowei Wang et.al.	2605.16786	null
2026-05-14	Polarization Signatures from GRMHD Simulations of Black Hole Accretion	P. Chris Fragile et.al.	2605.15166	null
2026-05-14	An Interpretable Latency Model for Speculative Decoding in LLM Serving	Linghao Kong et.al.	2605.15051	null
2026-05-14	Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing	Jie Jiang et.al.	2605.14978	null
2026-05-14	zSort: Stable Distribution Sort using Z-Score Partitioning	Hriday Jain et.al.	2605.14419	null
2026-05-14	Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding	Xun Fang et.al.	2605.14305	null
2026-05-13	Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding	Shuoyang Sun et.al.	2605.14005	null
2026-05-13	Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs	Jiahui Niu et.al.	2605.13778	null
2026-05-13	Europe and the Geopolitics of AGI: The Need for a Preparedness Plan	Maximilian Negele et.al.	2605.13634	null
2026-05-14	Speculative Interaction Agents: Building Real-Time Agents with Asynchronous I/O and Speculative Tool Calling	Coleman Hooper et.al.	2605.13360	null
2026-05-14	PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding	Yunhe Han et.al.	2605.13319	null
2026-05-12	Creating Group Rules with AI: Human-AI Collaboration in WhatsApp Moderation	Gauri Nayak et.al.	2605.12613	null
2026-05-12	TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection	Tom Sander et.al.	2605.12456	null
2026-05-12	COSMIC 1001: Engaging Future Speculation on Space Exploration with Generative AI	Lingyu Peng et.al.	2605.11827	null
2026-05-12	BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion	Shaobin Zhuang et.al.	2605.11577	null
2026-05-12	Dynamic Execution Commitment of Vision-Language-Action Models	Feng Chen et.al.	2605.11567	null
2026-05-11	Rethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stack	Prathamesh Vasudeo Naik et.al.	2605.11232	null
2026-05-11	Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing	Yunze Zhao et.al.	2605.11202	null
2026-05-11	CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration	Yuning Han et.al.	2605.11186	null
2026-05-11	Towards the Realization of the Dark Dimension Scenario in Hořava-Witten Theory	Ralph Blumenhagen et.al.	2605.11068	null
2026-05-11	SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding	Anton Plaksin et.al.	2605.10453	null
2026-05-11	Agent-X: Full Pipeline Acceleration of On-device AI Agents	Jinha Chung et.al.	2605.10380	null
2026-05-11	Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration	Shuzhang Zhong et.al.	2605.10195	null
2026-05-11	GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference	Zengzipeng Tang et.al.	2605.10124	null
2026-05-11	Janus: Compiler-Based Defense Against Transient Execution Attacks Using ARM Hardware Primitives	Ciyan Ouyang et.al.	2605.10049	null
2026-05-11	Attention Drift: What Autoregressive Speculative Decoding Models Learn	Doğaç Eldenk et.al.	2605.09992	null
2026-05-10	31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding	Pingcheng Dong et.al.	2605.09375	null
2026-05-10	Test-Time Speculation	Avinash Kumar et.al.	2605.09329	null
2026-05-09	BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning	Yuhang Xu et.al.	2605.08862	null
2026-05-09	PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding	Zihao An et.al.	2605.08632	null
2026-05-08	FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration	Zhengding Hu et.al.	2605.08520	null
2026-05-08	Fast Byte Latent Transformer	Julie Kallini et.al.	2605.08044	null
2026-05-08	Weak Order on the MacNeille Completion of Bruhat Order	Colin Defant et.al.	2605.08033	null
2026-05-08	Future Validity is the Missing Statistic: From Impossibility to $Φ$ -Estimation for Grammar-Faithful Speculative Decoding	Wenhua Nie et.al.	2605.07698	null
2026-05-08	There to care; not to kill: medical settings, statistics and wrongful convictions	Richard D. Gill et.al.	2605.07421	null
2026-05-08	SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting	Weijie Shi et.al.	2605.07243	null
2026-05-08	CASCADE: Context-Aware Relaxation for Speculative Image Decoding	Selin Yildirim et.al.	2605.07230	null
2026-05-07	Bounding Fixed Points of Non-Monotone Processes: Theory to Practice	Abdullah H. Rasheed et.al.	2605.06803	null
2026-05-06	Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning	Yuelin Hu et.al.	2605.05262	null
2026-05-06	UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding	Yepeng Weng et.al.	2605.04543	null
2026-05-05	Parallel Prefix Verification for Speculative Generation	Yuncheng Yao et.al.	2605.04263	null
2026-05-04	MARS-DA: A Hierarchical Reinforcement Learning Framework for Risk-Aware Multi-Agent Bidding in Power Grids	Jiayi Chen et.al.	2605.03142	null
2026-05-05	SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection	Shikhar Shukla et.al.	2605.02888	null
2026-05-04	de Sitter Vacua & pUniverses	Jeremias Aguilera-Damia et.al.	2605.02883	null
2026-05-04	CoVSpec: Efficient Device-Edge Co-Inference for Vision-Language Models via Speculative Decoding	Yuanyuan Jia et.al.	2605.02218	null
2026-05-07	PRCD-MAP: Learning How Much to Trust Imperfect Priors in Causal Discovery	Xihang Shan et.al.	2605.01669	null
2026-05-01	ADaPT: Adaptive-window Decoding for Practical fault-Tolerance	Tina Oberoi et.al.	2605.01149	null
2026-05-01	Component-Aware Self-Speculative Decoding in Hybrid Language Models	Hector Borobia et.al.	2605.01106	null
2026-05-01	Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding	Lehan Pan et.al.	2605.00342	null
2026-04-30	Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation	Jiaju Chen et.al.	2604.27747	null
2026-04-30	Back to the Future: Rethinking Endorsement in Order-Execute Blockchains	Rongji Huang et.al.	2604.27659	null
2026-04-29	Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding	Hayate Iso et.al.	2604.26779	null
2026-05-04	An Empirical Study of Speculative Decoding on Software Engineering Tasks	Yijia Li et.al.	2604.26469	null
2026-04-29	When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?	Tianyu Liu et.al.	2604.26412	null
2026-04-28	Bragg-Williams order competes with superconductivity	Xu Liu et.al.	2604.25843	null
2026-04-28	General-Purpose Technology and Speculative Bubble Detection	Haiqiang Chen et.al.	2604.25826	null
2026-04-28	SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission	Ce Zheng et.al.	2604.25777	null
2026-04-28	A Speculative Benchmark for the AMS-02 Electron and Positron Spectra from a Time-Symmetric Transport Hypothesis	Yi Yang et.al.	2604.25542	null
2026-04-30	AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices	Ma Zirui et.al.	2604.25326	null
2026-04-28	Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context	Soonho Kwon et.al.	2604.25230	null
2026-04-27	What If We Work Together? Fostering Reflections on Designer Inclusion in Open Source Software Through Speculative Design	Rozhan Hozhabri Nezhad et.al.	2604.24981	null
2026-04-26	ComplianceNLP: Knowledge-Graph-Augmented RAG for Multi-Framework Regulatory Gap Detection	Dongxin Guo et.al.	2604.23585	null
2026-04-25	Multiplicative Contractions, Additive Recoveries: Functional-Form Restrictions on Risk Exposure Dynamics	Liang Chen et.al.	2604.23315	null
2026-04-25	Motifs Enrichment as a Driver of an Emergent Preferential Attachment in rewired random regular graphs	Pawat Akara-pipattana et.al.	2604.23152	null
2026-04-24	Guess-Verify-Refine: Data-Aware Top-K for Sparse-Attention Decoding on Blackwell via Temporal Correlation	Long Cheng et.al.	2604.22312	null
2026-04-23	Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models	Muhammad Shafique et.al.	2604.21952	null
2026-04-23	Emergence of a non-bulk hexagonal Fe $_2$S$_2$ single layer via phase transformation	Affan Safeer et.al.	2604.21613	null
2026-04-22	The two-level systems in cryogenic solids, or how to avoid stressful memories	Vassiliy Lubchenko et.al.	2604.21109	null
2026-04-22	Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization	Jiu Chen et.al.	2604.21072	null
2026-04-22	HARBOR: Automated Harness Optimization	Biswa Sengupta et.al.	2604.20938	null
2026-04-22	DiP-SD: Distributed Pipelined Speculative Decoding for Efficient LLM Inference at the Edge	Yaodan Xu et.al.	2604.20919	null
2026-04-22	Decoupling Speculation from Merit: The Identity-Bound Asset Integrity Model (IBAIM) for Sustainable Web3 Gaming	Jinliang Xu et.al.	2604.20737	null
2026-04-22	FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving	Wenyan Chen et.al.	2604.20503	null
2026-04-22	Emergence biases in molecular evolution	Timothy Fuqua et.al.	2604.20477	null
2026-04-22	HaS: Accelerating RAG through Homology-Aware Speculative Retrieval	Peng Peng et.al.	2604.20452	null
2026-04-21	On-chain Peak Shaving	Irene Aldridge et.al.	2604.19956	null
2026-04-21	Super Apriel: One Checkpoint, Many Speeds	SLAM Labs et.al.	2604.19877	null
2026-04-21	A Possible Protocluster of Galaxies Serendipitously Discovered in the Field of an Intermediate-Redshift Post-starburst Galaxy	Mary C. Knowlton et.al.	2604.19651	null
2026-04-21	Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/(3p+1)})$ $p$ th-Order Oracle Complexity	Lesi Chen et.al.	2604.19462	null
2026-04-20	From Tokens to Ties: Network and Discourse Analysis of Web3 Ecosystems	Valentina Kuskova et.al.	2604.18761	null
2026-04-24	Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLM	Sravanth Kodavanti et.al.	2604.18655	null
2026-04-20	Copy-as-Decode: Grammar-Constrained Parallel Prefill for LLM Editing	Ziyang Liu et.al.	2604.18170	null
2026-04-20	WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference	Zixuan Liu et.al.	2604.17701	null
2026-04-19	Speculative Decoding for Autoregressive Video Generation	Yuezhou Hu et.al.	2604.17397	null
2026-04-19	BranchBench: Aligning Database Branching with Agentic Demands	Elaine Ang et.al.	2604.17180	null
2026-04-17	Path-Explosive Behaviour in Economic Time Series: A Realization-Centred Exploratory Framework	José Francisco Perles-Ribes et.al.	2604.16186	null
2026-04-17	Faster LLM Inference via Sequential Monte Carlo	Yahya Emara et.al.	2604.15672	null
2026-04-16	From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning	Kiran Purohit et.al.	2604.15244	null
2026-04-16	RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding	Zihong Zhang et.al.	2604.14885	link
2026-04-16	The Missing Knowledge Layer in AI: A Framework for Stable Human-AI Reasoning	Rikard Rosenbacke et.al.	2604.14881	null
2026-04-16	Acceptance Dynamics Across Cognitive Domains in Speculative Decoding	Saif Mahmoud et.al.	2604.14682	null
2026-04-16	ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving	Yuseon Choi et.al.	2604.14626	null
2026-04-16	ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding	Walaa Amer et.al.	2604.14612	null
2026-04-16	Prime–Zero Duality: Fractal Geometry, Renormalization-Group Flow, and an Information-Ontological Framework for Number Theory	Zhengqiang Li et.al.	2604.14596	null
2026-04-15	Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference	Xuwen Zhou et.al.	2604.13634	null
2026-04-15	ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding	Heming Xia et.al.	2604.13519	null
2026-04-14	Accelerating Speculative Decoding with Block Diffusion Draft Trees	Liran Ringel et.al.	2604.12989	null
2026-04-14	JWST observations of photodissociation regions. IV. Carbonaceous emission band sub-components in NGC 7023 have distinct spatial distributions	D. Van De Putte et.al.	2604.12860	null
2026-04-14	Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning	NVIDIA et.al.	2604.12374	null
2026-04-14	SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration	Zhuofan Wen et.al.	2604.12247	null
2026-04-14	Policy-Invisible Violations in LLM-Based Agents	Jie Wu et.al.	2604.12177	null
2026-04-13	SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling	Zikun Liu et.al.	2604.12110	null
2026-04-13	Is There an AI Bubble? Robust Date-Stamping for Periods of Exuberance	Abir Sarkar et.al.	2604.12062	null
2026-04-13	Putting the Brauer back in Brauer-Picard	Sean Sanford et.al.	2604.10869	null
2026-04-11	SpecMoE: A Fast and Efficient Mixture-of-Experts Inference via Self-Assisted Speculative Decoding	Jehyeon Bang et.al.	2604.10152	null
2026-04-15	A-IO: Adaptive Inference Orchestration for Memory-Bound NPUs	Chen Zhang et.al.	2604.09752	null
2026-04-09	SMART: When is it Actually Worth Expanding a Speculative Tree?	Lifu Wang et.al.	2604.09731	null
2026-04-08	ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge–Cloud Speculative LLM Serving	Xiangchen Li et.al.	2604.09722	null
2026-04-10	The Speculative Future of Conversational AI for Neurocognitive Disorder Screening: a Multi-Stakeholder Perspective	Jiaxiong Hu et.al.	2604.09070	null
2026-04-10	Noise-Aware In-Context Learning for Hallucination Mitigation in ALLMs	Qixuan Huang et.al.	2604.09021	null
2026-04-09	PG-MDP: Profile-Guided Memory Dependence Prediction for Area-Constrained Cores	Luke Panayi et.al.	2604.08445	null
2026-04-08	DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification	Ziyi Wang et.al.	2604.07622	null
2026-04-08	MARS: Enabling Autoregressive Models Multi-Token Generation	Ziqi Jin et.al.	2604.07023	null
2026-04-10	Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM	Chengyue Wu et.al.	2604.06832	null
2026-04-08	Neutron Star Merger Rates from Multi-messenger Observations: Clues to the Physical Origin of the Short and Long-short Gamma-ray Bursts	Zhi-Ping Jin et.al.	2604.06772	null
2026-04-07	AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent	Wenyue Hua et.al.	2604.06296	null
2026-04-07	QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization	Changxin Ke et.al.	2604.05963	null
2026-04-08	See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs	Yicheng Ji et.al.	2604.05650	null
2026-04-07	Multi-Drafter Speculative Decoding with Alignment Feedback	Taehyeon Kim et.al.	2604.05417	null
2026-04-06	DualDiffusion: A Speculative Decoding Strategy for Masked Diffusion Models	Satyam Goyal et.al.	2604.05250	null
2026-04-06	Understanding Clinician Experiences with Game-Based Interventions for Autistic Children to Inform a Future Game Platform Focused on Improving Motor Skills	Hunter M Beach et.al.	2604.05249	null
2026-04-06	Comparative Characterization of KV Cache Management Strategies for LLM Inference	Oteo Mamo et.al.	2604.05012	null
2026-04-05	Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling	Yongchang Hao et.al.	2604.04987	null
2026-04-06	Hardware-Level Governance of AI Compute: A Feasibility Taxonomy for Regulatory Compliance and Treaty Verification	Samar Ansari et.al.	2604.04712	null
2026-04-06	An algorithmic Polynomial Freiman-Ruzsa theorem	Davi Castro-Silva et.al.	2604.04547	null
2026-04-03	Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA	Zihua Wang et.al.	2604.02965	null
2026-04-03	MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference	Zheming Yang et.al.	2604.02945	null
2026-04-02	Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding	Tao Jin et.al.	2604.02047	null
2026-04-02	Reinforcement Learning for Speculative Trading under Exploratory Framework	Yun Zhao et.al.	2604.02035	null
2026-04-03	Phonon Thermal Hall Effect in quartz and its absence in silica	Yu Ling et.al.	2604.01908	null
2026-03-31	Frege in the Flesh: Biolinguistics and the Neural Enforcement of Syntactic Structures	Elliot Murphy et.al.	2604.00291	null
2026-03-31	Spatially modulated morphotropic phase boundaries in a compressively strained multiferroic thin film	Ting-Ran Liu et.al.	2604.00288	null
2026-03-31	Blockspace Under Pressure: An Analysis of Spam MEV on High-Throughput Blockchains	Wenhao Wang et.al.	2604.00234	null
2026-03-31	Cloudy With a Chance of Meatballs	Wolf Cukier et.al.	2603.29883	null
2026-03-31	Detecting speculative leaks with compositional semantics	Xaver Fabian et.al.	2603.29800	null
2026-03-31	Milky Way evolution on a human timescale	Eugene et.al.	2603.29503	null
2026-03-31	Mexican Burrowing Toads as gravitational wave detectors	Frederic V. Hessman et.al.	2603.29334	null
2026-03-30	The Binary-Binary Hierarchical System XY Leo: A Laboratory for Stellar Activity and Concealed Companions	D. Koçak et.al.	2603.28934	null
2026-04-02	A Black Hole Star at Cosmic Noon: Extreme Balmer break, photospheric continuum, and broad absorption by thick winds in a Little Red Dot at z=1.7	Alberto Torralba et.al.	2603.28335	null
2026-03-30	Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting	Zhen Zou et.al.	2603.28049	null
2026-03-28	SJD-VP: Speculative Jacobi Decoding with Verification Prediction for Autoregressive Image Generation	Bingqi Shan et.al.	2603.27115	null
2026-03-27	TAPS: Task Aware Proposal Distributions for Speculative Sampling	Mohamad Zbib et.al.	2603.27027	null
2026-03-26	S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation	Ligong Han et.al.	2603.25702	null
2026-03-26	Bulge Fossil Fragments as a new population of factories of gravitational wave sources in the Galaxy	F. R. Ferraro et.al.	2603.25127	null
2026-03-26	Decoding Market Emotions in Cryptocurrency Tweets via Predictive Statement Classification with Machine Learning and Transformers	Moein Shahiki Tash et.al.	2603.24933	null
2026-03-25	Quantum walk with a local spin interaction	Manami Yamagishi et.al.	2603.24444	null
2026-03-25	AI Fortune-Teller: Juxtaposing Shaman and AI to Reveal Human Agency in the Age of AI	Soonho Kwon et.al.	2603.23811	null
2026-03-24	Mars in the Australian Press, 1875-1899. 1. Interpretation, Authority and Planetary Science	Richard de Grijs et.al.	2603.23563	null
2026-03-24	SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning	Haoyu Huang et.al.	2603.23483	null
2026-03-24	RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue	Long Mai et.al.	2603.23346	null
2026-03-24	Mars excitement in Australian newspapers, 1877-1899: Humour and the public negotiation of astronomical knowledge	Richard de Grijs et.al.	2603.22906	null
2026-03-23	From Brittle to Robust: Improving LLM Annotations for SE Optimization	Lohith Senthilkumar et.al.	2603.22474	null
2026-03-24	Dynamic analysis enhances issue resolution	Mingwei Liu et.al.	2603.22048	null
2026-03-22	On the origin of the strong internal magnetic fields of central compact objects	Kazım Yavuz Ekşi et.al.	2603.21103	null
2026-03-21	SWE-Next: Scalable Real-World Software Engineering Tasks for Agents	Jiarong Liang et.al.	2603.20691	null
2026-03-21	AEGIS: From Clues to Verdicts – Graph-Guided Deep Vulnerability Reasoning via Dialectics and Meta-Auditing	Sen Fang et.al.	2603.20637	null
2026-03-20	Does This Gradient Spark Joy?	Ian Osband et.al.	2603.20526	null
2026-03-23	ParallelVLM: Lossless Video-LLM Acceleration with Visual Alignment Aware Parallel Speculative Decoding	Quan Kong et.al.	2603.19610	null
2026-03-19	Beyond the Desk: Barriers and Future Opportunities for AI to Assist Scientists in Embodied Physical Tasks	Irene Hou et.al.	2603.19504	null
2026-03-19	Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation	Chanh Nguyen et.al.	2603.19418	null
2026-03-19	The Uncertain Policy Price of Scaling Direct Air Capture	Leonardo Chiani et.al.	2603.19143	null
2026-03-19	A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference	Yida Zhang et.al.	2603.19133	null
2026-03-19	In the Margins: An Empirical Study of Ethereum Inscriptions	Xihan Xiong et.al.	2603.19086	null
2026-03-19	Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution	Yifan Sui et.al.	2603.18897	null
2026-03-19	SJD-PAC: Accelerating Speculative Jacobi Decoding via Proactive Drafting and Adaptive Continuation	Jialiang Kang et.al.	2603.18599	null
2026-03-19	Dream the Dream: Futuring Communication between LGBTQ+ and Cisgender Groups in Metaverse	Anqi Wang et.al.	2603.18578	null
2026-03-19	SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding	Shenggui Li et.al.	2603.18567	null
2026-03-18	Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing	Raghavv Goel et.al.	2603.17942	null
2026-03-18	HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness	Zihao Zheng et.al.	2603.17573	null
2026-03-18	“Not Just Me and My To-Do List”: Understanding Challenges of Task Management for Adults with ADHD and the Need for AI-Augmented Social Scaffolds	Jingruo Chen et.al.	2603.17258	null
2026-03-17	Search For a Counterpart to the Subsolar Mass Gravitational Wave Candidate S251112cm	Nicholas Vieira et.al.	2603.17009	null
2026-03-17	Characterizing Delusional Spirals through Human-LLM Chat Logs	Jared Moore et.al.	2603.16567	null
2026-03-17	SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation	Hang Lv et.al.	2603.16219	null
2026-03-17	Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective	Noppanat Wadlom et.al.	2603.16104	null
2026-03-16	Evasive Intelligence: Lessons from Malware Analysis for Evaluating AI Agents	Simone Aonzo et.al.	2603.15457	null
2026-03-16	The ALMA-QUARKS Survey: Evidence of an Explosive Molecular Outflow in IRAS 15520–5234	Ariful Hoque et.al.	2603.15040	null
2026-03-16	MMSpec: Benchmarking Speculative Decoding for Vision-Language Models	Hui Shen et.al.	2603.14989	null
2026-03-16	Hyper-learning and Unlearning: A Narrative Speculation on Urbanism in Media Ecologies	Anqi Wang et.al.	2603.14810	null
2026-03-14	Early Rug Pull Warning for BSC Meme Tokens via Multi-Granularity Wash-Trading Pattern Profiling	Dingding Cao et.al.	2603.13830	null
2026-03-14	Measuring Primitive Accumulation: An Information-Theoretic Approach to Capitalist Enclosure in PIK2, Indonesia	Sandy Hardian Susanto Herho et.al.	2603.13715	null
2026-03-13	Towards Fluent Interaction with Cyber-Physical Architecture	Jesse T. Gonzalez et.al.	2603.13633	null
2026-03-13	When Drafts Evolve: Speculative Decoding Meets Online Learning	Yu-Yang Qian et.al.	2603.12617	null
2026-03-12	Design Exploration of Lightweight Interactions for Awareness-Supporting Technologies in Hybrid Work	Lu Liu et.al.	2603.11977	null
2026-03-12	Edge-Cloud Collaborative Speech Emotion Captioning via Token-Level Speculative Decoding in Audio-Language Models	Xiangyuan Xue et.al.	2603.11397	null
2026-03-11	One-loop mass corrections and decay widths of Type II heavy string states	Massimo Bianchi et.al.	2603.11343	null
2026-03-11	Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts	George Saon et.al.	2603.11243	null
2026-03-11	Chasing RATs: Tracing Reading for and as Creative Activity	Sophia Liu et.al.	2603.11031	null
2026-03-11	XMM-Newton Observation and Optical Monitoring of the Candidate Redback Millisecond Pulsar 1FGL J0523.5 $-$ 2529	J. P. Halpern et.al.	2603.11028	null
2026-03-11	Kinematics of Wolf-Rayet Stars in the LMC: Clues to Subtype Origins	Caden Burkhardt et.al.	2603.10826	null
2026-03-11	Supersonic flow of a Chaplygin gas past a conical wing with $Λ$ -shaped cross sections	Minghong Han et.al.	2603.10401	null
2026-03-10	Intrinsic Numerical Robustness and Fault Tolerance in a Neuromorphic Algorithm for Scientific Computing	Bradley H. Theilman et.al.	2603.10246	null
2026-03-10	Phase diagram of 4D SU(3) Yang-Mills theory at $θ=π$ via imaginary theta simulations	Akira Matsumoto et.al.	2603.09604	null
2026-03-10	Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation	Luxi Lin et.al.	2603.09527	null
2026-03-09	ConFu: Contemplate the Future for Better Speculative Sampling	Zongyue Qin et.al.	2603.08899	null
2026-03-09	StreamReady: Learning What to Answer and When in Long Streaming Videos	Shehreen Azad et.al.	2603.08620	null
2026-03-09	Scalable On-the-fly Transcoding for Adaptive Streaming of Dynamic Point Clouds	Michael Rudolph et.al.	2603.08417	null
2026-03-09	Colloidal Probe Atomic Force Microscopy Reveals Anomalous Underscreening: A Matter of Experimental Conditions	Thomas Tilger et.al.	2603.08326	null
2026-03-09	EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs	Chang Han et.al.	2603.08088	null
2026-03-08	DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation	Shuzhang Zhong et.al.	2603.07416	null
2026-03-07	From debt crises to financial crashes (and back): a stock-flow consistent model for stock price bubbles	Matheus R. Grasselli et.al.	2603.07213	null
2026-03-02	SJD-PV: Speculative Jacobi Decoding with Phrase Verification for Autoregressive Image Generation	Zhehao Yu et.al.	2603.06666	null
2026-03-06	What are AI researchers worried about?	Cian O’Donovan et.al.	2603.06223	null
2026-03-06	EvoESAP: Non-Uniform Expert Pruning for Sparse MoE	Zongfang Liu et.al.	2603.06003	null
2026-03-09	Balancing Latency and Accuracy of Code Completion via Local-Cloud Model Cascading	Hanzhen Lu et.al.	2603.05974	null
2026-03-05	Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding	Ofir Ben Shoham et.al.	2603.05210	null
2026-03-04	Quantum foundations for quantum technologies in the International Year of Quantum (2025)	Angelo Bassi et.al.	2603.04630	null
2026-03-04	Raman scattering spectroscopic observation of a ferroelastic crossover in bond-frustrated PrCd $_3$P$_3$	Jackson Davis et.al.	2603.04539	null
2026-03-04	Weibel Instability-Driven Seed Magnetic Fields during Reionization	Jorie McDermott et.al.	2603.03608	null
2026-03-03	Accelerating OpenPangu Inference on NPU via Speculative Decoding	Yuntao Dai et.al.	2603.03383	null
2026-03-03	Speculative Speculative Decoding	Tanishq Kumar et.al.	2603.03251	null
2026-03-03	Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models	Shubhangi Upasani et.al.	2603.02631	null
2026-03-02	Latitude-Dependent Time Variations of the Solar Tachocline	Sarbani Basu et.al.	2603.02321	null
2026-03-02	Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning	Jiebin Zhang et.al.	2603.01639	null
2026-03-02	KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models	Zihao Zheng et.al.	2603.01581	null
2026-03-02	Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification	Guang Huang et.al.	2603.01399	null
2026-03-01	Proscenium: Exploring Design Spaces of Layered Information Experience on a Large Dual-Layer Transparent Display	Chen Chen et.al.	2603.01238	null
2026-02-27	Stellar engines and Dyson bubbles can be stable	Colin R McInnes et.al.	2603.00203	null
2026-02-27	Betting under Common Beliefs: The Effect of Probability Weighting	Patrick Beissner et.al.	2602.24194	null
2026-02-27	Task-Centric Acceleration of Small-Language Models	Dor Tsur et.al.	2602.24174	null
2026-02-27	LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding	Alexander Samarin et.al.	2602.23881	null
2026-02-27	The Auton Agentic AI Framework	Sheng Cao et.al.	2602.23720	null
2026-02-27	Active Learning for Planet Habitability Classification under Extreme Class Imbalance	R. I. El-Kholy et.al.	2602.23666	null
2026-02-25	The shape of transverse momentum spectra in hybrid hydrodynamic models	Thiago S. Domingues et.al.	2602.22490	null
2026-02-25	BMN-like Matrix Models	Eunwoo Lee et.al.	2602.22163	null
2026-02-25	Speculating for Epiplexity: How to Learn the Most from Speculative Design?	Botao Amber Hu et.al.	2602.22132	null
2026-02-25	Tidal disruptions of rubble piles: The case of Phobos	Harrison Agrusa et.al.	2602.21912	null
2026-02-24	Asymptotically (un)safe scattering amplitudes from scratch: a deep dive into the IR jungle	Benjamin Knorr et.al.	2602.21285	null
2026-02-23	KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem	Seongjin Cha et.al.	2602.20217	null
2026-02-23	SemanticNVS: Improving Semantic Scene Understanding in Generative Novel View Synthesis	Xinya Chen et.al.	2602.20079	null
2026-02-23	Anisotropic magnons in a layered honeycomb ferromagnet	Travis J. Williams et.al.	2602.19935	null
2026-02-23	Two-parameter families of MPO integrals of motion in Heisenberg spin chains	Vsevolod I. Yashin et.al.	2602.19741	null
2026-02-23	Leap+Verify: Regime-Adaptive Speculative Weight Prediction for Accelerating Neural Network Training	Jeremy McEntire et.al.	2602.19580	null
2026-02-21	WANSpec: Leveraging Global Compute Capacity for LLM Inference	Noah Martin et.al.	2602.18931	null
2026-02-19	Insidious Imaginaries: A Critical Overview of AI Speculations	Dejan Grba et.al.	2602.17383	null
2026-02-19	Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding	Rahul Thomas et.al.	2602.16994	null
2026-02-19	A testable framework for AI alignment: Simulation Theology as an engineered worldview for silicon-based agents	Josef A. Habdank et.al.	2602.16987	null
2026-02-18	Greedy Multi-Path Block Verification for Faster Decoding in Speculative Sampling	Rahul Thomas et.al.	2602.16961	null
2026-02-18	Privacy-Aware Split Inference with Speculative Decoding for Large Language Models over Wide-Area Networks	Michael Cunningham et.al.	2602.16760	null
2026-02-17	MoE-Spec: Expert Budgeting for Efficient Speculative Decoding	Bradley McDanel et.al.	2602.16052	null
2026-02-17	A Theoretical Approach to Stablecoin Design via Price Windows	Katherine Molinet et.al.	2602.15981	null
2026-02-17	Robot-Assisted Social Dining as a White Glove Service	Atharva S Kashyap et.al.	2602.15767	null
2026-02-17	Hot subdwarf stars from the Hamburg Quasar Survey	Ulrich Heber et.al.	2602.15692	null
2026-02-17	Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs	Libo Zhang et.al.	2602.15318	null
2026-02-16	Distributed Semi-Speculative Parallel Anisotropic Mesh Adaptation	Kevin Garner et.al.	2602.15204	null
2026-02-16	Kami of the Commons: Towards Designing Agentic AI to Steward the Commons	Botao Amber Hu et.al.	2602.14940	null
2026-02-16	Predicting the success of new crypto-tokens: the Pump.fun case	Giulio Marino et.al.	2602.14860	null
2026-02-16	Atomix: Timely, Transactional Tool Use for Reliable Agentic Workflows	Bardia Mohammadi et.al.	2602.14849	null
2026-02-14	Speculative Decoding with a Speculative Vocabulary	Miles Williams et.al.	2602.13836	null
2026-02-14	The Shadow Boss: Identifying Atomized Manipulations in Agentic Employment of XR Users using Scenario Constructions	Lik-Hang Lee et.al.	2602.13622	null
2026-02-13	ORAP: Optimized Row Access Prefetching for Rowhammer-mitigated Memory	Maccoy Merrell et.al.	2602.13434	null
2026-02-13	Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding	Wenhui Liao et.al.	2602.12957	null
2026-02-12	Holographic Equidistribution	Nico Cooper et.al.	2602.12265	null
2026-02-12	Embodied AI Agents for Team Collaboration in Co-located Blue-Collar Work	Kaisa Vaananen et.al.	2602.12136	null
2026-02-12	Wisdom of the LLM Crowd: A Large Scale Benchmark of Multi-Label U.S. Election-Related Harmful Social Media Content	Qile Wang et.al.	2602.11962	null
2026-02-11	What do people want to fact-check?	Bijean Ghafouri et.al.	2602.10935	null
2026-02-10	Simulation of the Space-Charge-Limited Current Density for Time-Variant Pulsed Injection	H. Huang et.al.	2602.09399	null
2026-02-10	Understanding Risk and Dependency in AI Chatbot Use from User Discourse	Jianfeng Zhu et.al.	2602.09339	null
2026-02-09	PICASSO: Scaling CHERI Use-After-Free Protection to Millions of Allocations using Colored Capabilities	Merve Gülmez et.al.	2602.09131	null
2026-02-09	Benchmarking the Energy Savings with Speculative Decoding Strategies	Rohit Dutta et.al.	2602.09113	null
2026-02-09	Symplectic excision and distance rigidity	Yoel Groman et.al.	2602.08969	null
2026-02-09	Three Lessons from Citizen-Centric Participatory AI Design	Eike Schneiders et.al.	2602.08554	null
2026-02-09	On- and off-chain demand and supply drivers of Bitcoin price	Pavel Ciaian et.al.	2602.08429	null
2026-02-09	TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration	Linye Wei et.al.	2602.08404	null
2026-02-10	Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices	Alejandro Ruiz y Mesa et.al.	2602.08060	null
2026-02-08	Dark Matter as Screened Ordinary Matter	Colin D. Froggatt et.al.	2602.07902	null
2026-02-07	Motivic invariants of moduli stacks of Higgs bundles and bundles with connections: results and speculations	Roman Fedorov et.al.	2602.07713	null
2026-02-07	Series-Parallel-Loop Decompositions of Control-flow Graphs	Xuran Cai et.al.	2602.07627	null
2026-02-07	Astrophysical positronium and Dicke superradiance	Abdaljalel E. Alizzi et.al.	2602.07489	null
2026-02-07	Imagining the Alien: Human Projections and Cognitive Limitations	S. G. Djorgovski et.al.	2602.07284	null
2026-02-06	XShare: Collaborative in-Batch Expert Sharing for Faster MoE Inference	Daniil Vankov et.al.	2602.07265	null
2026-02-06	SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding	Yikang Yue et.al.	2602.07223	null
2026-02-06	When RL Meets Adaptive Speculative Training: A Unified Training-Serving System	Junxiong Wang et.al.	2602.06932	null
2026-02-06	Continued fraction method for high overtone quasinormal modes in effective potentials with discontinuity	Guan-Ru Li et.al.	2602.06536	null
2026-02-06	RelayGen: Intra-Generation Model Switching for Efficient Reasoning	Jiwon Song et.al.	2602.06454	null
2026-02-06	Quenching Speculation in Quantum Markets via Entangled Neural Traders	Kieran Hymas et.al.	2602.06367	null
2026-02-05	DFlash: Block Diffusion for Flash Speculative Decoding	Jian Chen et.al.	2602.06036	null
2026-02-05	V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval	Dongyang Chen et.al.	2602.06034	null
2026-02-05	Multi-Token Prediction via Self-Distillation	John Kirchenbauer et.al.	2602.06019	null
2026-02-05	Measurement-Induced Dynamics of Particles and Quasiparticles in a Bose-Einstein-condensate array	Huy Nguyen et.al.	2602.05924	null
2026-02-05	Prompting Destiny: Negotiating Socialization and Growth in an LLM-Mediated Speculative Gameworld	Mandi Yang et.al.	2602.05864	null
2026-02-05	The near-continuum mechanism for extended Boltzmann theory: the non-equilibrium relaxation	Sha Liu et.al.	2602.05775	null
2026-02-05	Variational Speculative Decoding: Rethinking Draft Training from Token Likelihood to Sequence Acceptance	Xiandong Zou et.al.	2602.05774	null
2026-02-05	SDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM Acceleration	Hanyu Wei et.al.	2602.05499	null
2026-02-05	TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference	Jiyoung Park et.al.	2602.05145	null
2026-02-04	SPPAM: Signature Pattern Prediction and Access-Map Prefetcher	Maccoy Merrell et.al.	2602.04100	null
2026-02-03	pop-cosmos: Redshifts and physical properties of KiDS-1000 galaxies	Anik Halder et.al.	2602.03930	null
2026-02-03	SpecMD: A Comprehensive Study On Speculative Expert Prefetching	Duc Hoang et.al.	2602.03921	null
2026-02-04	Beyond Tokens: Semantic-Aware Speculative Decoding for Efficient Inference by Probing Internal States	Ximing Dong et.al.	2602.03708	null
2026-02-03	Efficient Algorithms for Partial Constraint Satisfaction Problems over Control-flow Graphs	Xuran Cai et.al.	2602.03588	null
2026-02-02	The emergent Big Bang scenario	Justin C. Feng et.al.	2602.02646	null
2026-02-02	An Empirical Study on Noisy Data and LLM Pretraining Loss Divergence	Qizhen Zhang et.al.	2602.02400	null
2026-02-02	PRISM: Parametrically Refactoring Inference for Speculative Sampling Draft Models	Xuliang Wang et.al.	2602.01762	null
2026-02-02	A Practical Tensor-Network Compression Pipeline for Production-Scale Large Language Models	Sergii Kozyrev et.al.	2602.01613	null
2026-02-02	Are Security Cues Static? Rethinking Warning and Trust Indicators for Life Transitions	Sarah Tabassum et.al.	2602.01544	null
2026-02-01	P-EAGLE: Parallel-Drafting EAGLE with Scalable Training	Mude Hui et.al.	2602.01469	null
2026-02-01	Improve the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models	Weiqing He et.al.	2602.01428	null
2026-02-01	FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching	Divya Jyoti Bajpai et.al.	2602.01329	null
2026-02-01	PACER: Blockwise Pre-verification for Speculative Decoding with Adaptive Length	Situo Zhang et.al.	2602.01274	null
2026-01-31	Eternagram: Inspiring Climate Action Through LLM-based Conversational Exploration of a Post-Devastation Climate Future	Suifang Zhou et.al.	2602.00571	null
2026-01-31	SAGE: Accelerating Vision-Language Models via Entropy-Guided Adaptive Speculative Decoding	Yujia Tong et.al.	2602.00523	null
2026-01-30	TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification	Haoyun Jiang et.al.	2601.23180	null
2026-01-30	SpecIBT: Formally Verified Protection Against Speculative Control-Flow Hijacking	Jonathan Baumann et.al.	2601.22978	null
2026-01-30	Beyond Medical Chatbots: Meddollina and the Rise of Continuous Clinical Intelligence	Vaibhav Ram S. V. N. S et.al.	2601.22645	null
2026-01-29	Plant-Inspired Robot Design Metaphors for Ambient HRI	Victor Nikhil Antony et.al.	2601.22387	null
2026-01-29	Subsolar mass black holes from stellar collapse induced by primordial black holes	Thomas W. Baumgarte et.al.	2601.22220	null
2026-01-29	StarSD: One-for-Many Speculative Decoding	Junhao He et.al.	2601.21622	null
2026-01-29	SPOILER-GUARD: Gating Latency Effects of Memory Accesses through Randomized Dependency Prediction	Gayathri Subramanian et.al.	2601.21211	null
2026-01-29	Scaling Embeddings Outperforms Scaling Experts in Language Models	Hong Liu et.al.	2601.21204	null
2026-01-28	Unplugging a Seemingly Sentient Machine Is the Rational Choice – A Metaphysical Perspective	Erik J Bekkers et.al.	2601.21016	null
2026-01-28	Manipulation in Prediction Markets: An Agent-based Modeling Experiment	Bridget Smart et.al.	2601.20452	null
2026-01-28	TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs	Minjae Lee et.al.	2601.20357	null
2026-01-26	LTS-VoiceAgent: A Listen-Think-Speak Framework for Efficient Streaming Voice Interaction via Semantic Triggering and Incremental Reasoning	Wenhao Zou et.al.	2601.19952	null
2026-01-27	The Competence Crisis: A Design Fiction on AI-Assisted Research in Software Engineering	Mairieli Wessel et.al.	2601.19628	null
2026-01-27	DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference	Fuliang Liu et.al.	2601.19278	link
2026-01-26	Flatter Tokens are More Valuable for Speculative Draft Model Training	Jiaming Fan et.al.	2601.18902	null
2026-01-26	Towards a Proof of the Improved Quantum Null Energy Condition	Ido Ben-Dayan et.al.	2601.18860	null
2026-01-26	Disk-jet-wind coupling from stellar mass to supermassive black holes	Chris Done et.al.	2601.18607	null
2026-01-30	LLM-42: Enabling Determinism in LLM Inference with Verified Speculation	Raja Gond et.al.	2601.17768	null
2026-01-24	Improving User Privacy in Personalized Generation: Client-Side Retrieval-Augmented Modification of Server-Side Generated Speculations	Alireza Salemi et.al.	2601.17569	null
2026-01-24	Towards a Declarative Agentic Layer for Intelligent Agents in MCP-Based Server Ecosystems	Maria Jesus Rodriguez-Sanchez et.al.	2601.17435	null
2026-01-24	Auditing Disability Representation in Vision-Language Models	Srikant Panda et.al.	2601.17348	null
2026-01-27	From Clicks to Consensus: Collective Consent Assemblies for Data Governance	Lin Kyi et.al.	2601.16752	null
2026-01-23	Integrated Photonic Quantum Computing: From Silicon to Lithium Niobate	Hui Zhang et.al.	2601.16484	null
2026-01-21	MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification	Jingwei Song et.al.	2601.15498	null
2026-01-23	Emergent, not Immanent: A Baradian Reading of Explainable AI	Fabio Morreale et.al.	2601.15029	null
2026-01-13	On the Limits of Learned Importance Scoring for KV Cache Compression	Brady Steele et.al.	2601.14279	null
2026-01-21	The Non-Predictability of Mispredicted Branches using Timing Information	Ioannis Constantinou et.al.	2601.13804	null
2026-01-19	Quasinormal modes and their excitation beyond general relativity. II: isospectrality loss in gravitational waveforms	Hector O. Silva et.al.	2601.13411	null
2026-01-19	The Words That Can’t Be Shared: Exploring the Design of Unsent Messages	Michael Yin et.al.	2601.13343	null
2026-01-19	Time variations of the mean magnetic flux in active regions of different magneto-morphological classes	Anastasiya Zhukova et.al.	2601.13168	null
2026-01-18	SplittingSecrets: A Compiler-Based Defense for Preventing Data Memory-Dependent Prefetcher Side-Channels	Reshabh K Sharma et.al.	2601.12270	null
2026-01-18	Speculative Sampling with Reinforcement Learning	Chenan Wang et.al.	2601.12212	null
2026-01-17	A Dynamo Confinement Scenario for the Solar Tachocline and its Implications for Spin-down in the Radiative Spreading Regime	Loren I. Matilsky et.al.	2601.11943	null
2026-01-16	On Abnormal Execution Timing of Conditional Jump Instructions	Annika Wilde et.al.	2601.11696	null
2026-01-15	WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching	Xiangchen Li et.al.	2601.11652	null
2026-01-16	Spectral evolution of hot hybrid white dwarfs: II. Photometry	Semih Filiz et.al.	2601.11191	null
2026-01-16	Coexisting electronic smectic liquid crystal and superconductivity in a Si square-net semimetal	Christopher J. Butler et.al.	2601.10939	null
2026-01-14	Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation	Xingyao Li et.al.	2601.09212	null
2026-01-14	SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache	Chi-Chih Chang et.al.	2601.09083	null
2026-01-13	HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding	Qitan Lv et.al.	2601.08273	null
2026-01-12	Spacetime Quasicrystals	Latham Boyle et.al.	2601.07769	null
2026-01-12	Crypto Pricing with Hidden Factors	Matthew Brigida et.al.	2601.07664	null
2026-01-12	TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees	Tianyu Liu et.al.	2601.07353	null
2026-01-11	The AI Cognitive Trojan Horse: How Large Language Models May Bypass Human Epistemic Vigilance	Andrew D. Maynard et.al.	2601.07085	null
2026-01-14	A binary merger product as the direct progenitor of a Type II-P supernova	Zexi Niu et.al.	2601.06577	null
2026-01-14	VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit	Junda Lin et.al.	2601.05755	null
2026-01-09	Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding	Yuxuan Zhou et.al.	2601.05724	null
2026-01-09	Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism	Yuhao Shen et.al.	2601.05524	null
2026-01-08	Multi-Scale Local Speculative Decoding for Image Generation	Elia Peruzzo et.al.	2601.05149	null
2026-01-08	Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence	Shengyin Sun et.al.	2601.04766	null
2026-01-08	The UnScripted Trip: Fostering Policy Discussion on Future Human-Vehicle Collaboration in Autonomous Driving Through Design-Oriented Methods	Xinyan Yu et.al.	2601.04601	null
2026-01-06	Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication	Daniel Qian et.al.	2601.03390	null
2026-01-06	On the Hilbert-Chow crepant resolution conjecture	Denis Nesterov et.al.	2601.03036	null
2026-01-08	MiMo-V2-Flash Technical Report	Xiaomi LLM-Core Team et.al.	2601.02780	null
2026-01-06	Experience and Adaptation in AI-mediated Hiring Systems: A Combined Analysis of Online Discourse and Interface Design	Md Nazmus Sakib et.al.	2601.02775	null
2026-01-06	From Slaves to Synths? Superintelligence and the Evolution of Legal Personality	Simon Chesterman et.al.	2601.02773	null
2026-01-06	Hypothesize-Then-Verify: Speculative Root Cause Analysis for Microservices with Pathwise Parallelism	Lingzhe Zhang et.al.	2601.02736	null
2026-01-05	A modern perspective on Tutte’s homotopy theorem	Matthew Baker et.al.	2601.02582	null
2026-01-06	The Betelgeuse Enigma: The Betelbuddy Hypothesis	Priya Hasan et.al.	2601.02012	null
2026-01-07	FastV-RAG: Towards Fast and Fine-Grained Video QA with Retrieval-Augmented Generation	Gen Li et.al.	2601.01513	null
2026-01-02	FlexSpec: Frozen Drafts Meet Evolving Targets in Edge-Cloud Collaborative LLM Speculative Decoding	Yuchen Li et.al.	2601.00644	null
2026-01-01	MR-DAW: Towards Collaborative Digital Audio Workstations in Mixed Reality	Torin Hopkins et.al.	2601.00326	null
2025-12-31	The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition	Xiaoze Liu et.al.	2601.00065	null
2025-12-29	From Clay to Code: Typological and Material Reasoning in AI Interpretations of Iranian Pigeon Towers	Abolhassan Pishahang et.al.	2601.00029	null
2025-12-31	Intriguing Magnetocaloric Effect in 6H-perovskite Ba3RRu2O9 (R=Ho, Gd, Tb, Nd) with Strong 4d-4f Correlations	Mohit Kumar et.al.	2512.24758	null
2025-12-29	Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding	Yue Guan et.al.	2512.23858	null
2025-12-29	Entropy-Aware Speculative Decoding Toward Improved LLM Reasoning	Tiancheng Su et.al.	2512.23765	null
2025-12-27	Landauer cost in a continuous vacuum/no-vacuum measurement	Lorenzo Pirovano et.al.	2512.23751	null
2025-12-29	Soft Robotic Technological Probe for Speculative Fashion Futures	Amy Ingold et.al.	2512.23570	null
2025-12-29	Fuzzilicon: A Post-Silicon Microcode-Guided x86 CPU Fuzzer	Johannes Lenzen et.al.	2512.23438	null
2025-12-28	An Architecture-Led Hybrid Report on Body Language Detection Project	Thomson Tong et.al.	2512.23028	null
2026-01-05	AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing	Jiacheng Li et.al.	2512.22455	null
2025-12-27	Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving	Rui Li et.al.	2512.22420	null
2025-12-26	Eliminate Branches by Melding IR Instructions	Yuze Li et.al.	2512.22390	null
2025-12-26	Accelerate Speculative Decoding with Sparse Computation in Verification	Jikai Wang et.al.	2512.21911	null
2025-12-26	Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees	Haodong Lei et.al.	2512.21857	null
2025-12-24	dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning	Shirui Chen et.al.	2512.21446	null
2025-12-24	Parallel Token Prediction for Language Models	Felix Draxler et.al.	2512.21323	null
2025-12-24	Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning	Shengguang Wu et.al.	2512.20934	null
2025-12-23	Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs	Rui Pan et.al.	2512.20573	null
2025-12-23	DecoKAN: Interpretable Decomposition for Forecasting Cryptocurrency Market Dynamics	Yuan Gao et.al.	2512.20028	null
2025-12-22	Multimodal LLMs for Historical Dataset Construction from Archival Image Scans: German Patents (1877-1918)	Niclas Griesshaber et.al.	2512.19675	null
2025-12-20	Towards Efficient Agents: A Co-Design of Inference Architecture and System	Weizhe Lin et.al.	2512.18337	null
2025-12-19	Digital Bricolage: Design Speculations for Embodied Approaches to Digitized Print-based Cultural Collections	Malak Sadek et.al.	2512.17590	null
2025-12-19	Accelerating Multi-modal LLM Gaming Performance via Input Prediction and Mishit Correction	Ziyang Lin et.al.	2512.17250	null
2025-12-18	Machines, AI and the past//future of things	Karola Köpferl et.al.	2512.16285	null
2025-12-18	Fast Collaborative Inference via Distributed Speculative Decoding	Ce Zheng et.al.	2512.16273	null
2025-12-17	Optimizing Agentic Language Model Inference via Speculative Tool Calls	Daniel Nichols et.al.	2512.15834	null
2025-12-14	Variable Record Table: A Unified Hardware-Assisted Framework for Runtime Security	Suraj Kumar Sah et.al.	2512.15777	null
2025-12-13	TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration	Ye Li et.al.	2512.15773	null
2025-12-17	Probing the dynamics of stringy flux tubes with large $R$ -charge	Davide Bonomi et.al.	2512.15698	null
2025-12-17	The longest known tails of ram-pressure stripped star-forming galaxies are caused by an ICM shock in Abell 1367	H. W. Edler et.al.	2512.15660	null
2025-12-17	DEER: Draft with Diffusion, Verify with Autoregressive Models	Zicong Cheng et.al.	2512.15176	null
2025-12-16	Steering Alternative Realities through Local Quantum Memory Operations	Xiongfeng Ma et.al.	2512.14377	null
2025-12-16	PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion	Huizheng Wang et.al.	2512.14322	null
2025-12-16	The Impact Market to Save Conference Peer Review: Decoupling Dissemination and Credentialing	Karthikeyan Sankaralingam et.al.	2512.14104	null
2025-12-16	RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees	Junjie Ma et.al.	2512.14069	null
2025-12-17	Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models	Chendong Sun et.al.	2512.13194	null
2025-12-14	Spectral Theory of Almost Periodic Banach–Malcev Algebras and Applications to Moufang Dynamics	Marwa Ennaceur et.al.	2512.12687	null
2025-12-16	Mage: Cracking Elliptic Curve Cryptography with Cross-Axis Transformers	Lily Erickson et.al.	2512.12483	null
2025-12-13	Moduli stacks of quiver connections and non-Abelian Hodge theory	Mahmud Azam et.al.	2512.12188	null
2025-12-13	Binarity at LOw Metallicity (BLOeM): Projected rotational velocities	D. J. Lennon et.al.	2512.12102	null
2025-12-12	Universal Dynamics of Financial Bubbles in Isolated Markets: Evidence from the Iranian Stock Market	Ali Hosseinzadeh et.al.	2512.12054	null
2025-12-11	CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving	Dong Liu et.al.	2512.11920	null
2025-12-12	Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks	Sergey Pankratov et.al.	2512.11718	null
2025-12-12	AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference	Kuan-Wei Lu et.al.	2512.11280	null
2025-12-12	FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration	Dongwon Jung et.al.	2512.11213	null
2025-12-11	Site Preference and Possible Coexistence of Antiferromagnetic Order and Magnetic Frustration in (Co1-xMgx)10Ge3O16 (0 <= x <= 30%)	Gina Angelo et.al.	2512.11132	null
2025-12-11	Mixing by offshore wind infrastructure: Resolving the density stratified wakes past vertical cylinders	Charlie J. Lloyd et.al.	2512.10751	null
2025-12-11	T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground	Dmitrii Stoianov et.al.	2512.10430	null
2025-12-11	Motifs in self-organising cells	Ying Chen Lim et.al.	2512.10307	null
2025-12-10	Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning	Logan Robbins et.al.	2512.10054	null
2025-12-14	GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference	Phuong Tran et.al.	2512.09963	null
2025-12-10	A Speculative GLRT-Backed Approach for Adversarial Resilience on Deep Learning-Based Array Processing	Nian-Cin Wang et.al.	2512.09893	null
2025-12-10	Baseline: Operation-Based Evolution and Versioning of Data	Jonathan Edwards et.al.	2512.09762	null
2025-12-10	WASP-12, shrouded in mystery or just cold gas?	Simon Daley-Yates et.al.	2512.09593	null
2025-12-09	Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation	Zhen Zou et.al.	2512.08537	null
2025-12-08	Fair Benchmarking of Optimisation Applications	Frank Phillipson et.al.	2512.07915	null
2025-11-30	The Endogenous Constraint: Hysteresis, Stagflation, and the Structural Inhibition of Monetary Velocity in the Bitcoin Network (2016-2025)	Hamoon Soleimani et.al.	2512.07886	null
2025-12-08	Chemical complexity in star formation induced by stellar feedback: cores shock-formed by the supernova remnant W44	G. Cosentino et.al.	2512.07562	null
2025-12-08	SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image Generation	Yao Teng et.al.	2512.07503	null
2025-12-06	BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination	Huizheng Wang et.al.	2512.06457	null
2025-12-05	Protocol Futuring: Speculating Second-Order Dynamics of Protocols in Sociotechnical Infrastructural Futures	Botao ‘Amber’ Hu et.al.	2512.06108	null
2025-12-05	Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction	Ruihong Yin et.al.	2512.05597	null
2025-12-09	Arbitrage: Efficient Reasoning via Advantage-Aware Speculation	Monishwaran Maheswaran et.al.	2512.05033	null
2025-12-04	Long-term X-ray variability of the multiple-planet host L 98-59: Hints of an activity cycle	I. Pillitteri et.al.	2512.04817	null
2025-12-04	RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting	Siqi Wang et.al.	2512.04752	null
2025-12-03	Counting AdS Vacua	Zihni Kaan Baykara et.al.	2512.04151	null
2025-12-01	Humanity in the Age of AI: Reassessing 2025’s Existential-Risk Narratives	Mohamed El Louadi et.al.	2512.04119	null
2025-12-02	From Administrative Chaos to Analytical Cohorts: A Three-Stage Normalisation Pipeline for Longitudinal University Administrative Records	H. R. Paz et.al.	2512.02936	null
2025-12-02	A Human-centric Framework for Debating the Ethics of AI Consciousness Under Uncertainty	Zhou Ziheng et.al.	2512.02544	null
2025-12-02	SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification	Zhendong Tan et.al.	2512.02337	null
2025-12-05	Much Ado About Noising: Dispelling the Myths of Generative Robotic Control	Chaoyi Pan et.al.	2512.01809	null
2025-12-01	Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding	Yilong Zhao et.al.	2512.01278	null
2025-11-30	Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding	Pengfei Hu et.al.	2512.00805	null
2025-11-30	SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs	Jiaming Xu et.al.	2512.00722	null
2025-11-30	SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving	Bohan Zhao et.al.	2512.00719	null
2025-11-29	Speculating on the Role of Media Architecture in Post-disaster Rebuilding and Recovery: Insights from Architects and Interaction Designers	Berk Goksenin Tan et.al.	2512.00537	null
2025-11-29	Measuring Memecoin Fragility	Yuexin Xiang et.al.	2512.00377	null
2025-12-04	Retail Investor Horizon and Earnings Announcements	Domonkos F. Vamossy et.al.	2512.00280	null
2025-12-05	Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match	Jinze Li et.al.	2511.22972	null
2025-12-03	AI Deception: Risks, Dynamics, and Controls	Boyuan Chen et.al.	2511.22619	null
2025-11-27	LLM-Cave: A benchmark and light environment for large language models reasoning and decision-making system	Huanyu Li et.al.	2511.22598	null
2025-11-26	Dark Speculation: Combining Qualitative and Quantitative Understanding in Frontier AI Risk Analysis	Daniel Carpenter et.al.	2511.21838	null
2025-11-26	Nuclear Detonations as Probes of Hidden Superluminal Sectors	Karl Svozil et.al.	2511.21793	null
2025-11-25	The dynamic of a tax on land value : concepts, models and impact scenario	Hugo Spring-Ragain et.al.	2511.21766	null
2025-11-24	Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models	Linye Wei et.al.	2511.21759	null
2025-12-01	DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving	Fengze Yu et.al.	2511.21669	null
2025-11-26	Weak gravity at micron scales from dark bubble cosmology and its cosmological consequences	Ulf Danielsson et.al.	2511.21362	null
2025-11-25	FREE: Uncertainty-Aware Autoregression for Parallel Diffusion Transformers	Xinwan Wen et.al.	2511.20390	null
2025-11-25	Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios	Luohe Shi et.al.	2511.20340	null
2025-11-25	Adaptive LLM Agents: Toward Personalized Empathetic Care	Priyanka Singh et.al.	2511.20080	null
2025-11-25	Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design	Zixiao Huang et.al.	2511.20048	null
2025-11-24	Agint: Agentic Graph Compilation for Software Engineering Agents	Abhi Chivukula et.al.	2511.19635	null
2025-11-24	AI Consciousness and Existential Risk	Rufin VanRullen et.al.	2511.19115	null
2025-11-24	NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations	Yejing Wang et.al.	2511.18793	null
2025-11-22	Accelerating Time Series Foundation Models with Speculative Decoding	Pranav Subbaraman et.al.	2511.18191	null
2025-11-22	Revisiting $γ$ -Ray Orbital Modulation in the Redback Millisecond Pulsar PSR J2039-5617	Mengqing Zhang et.al.	2511.17900	null
2025-11-21	Broadband X-ray observations of the periodic optical source ZTF J185139.81+171430.3 and its identification as a massive intermediate polar	Ren Deng et.al.	2511.17800	null
2025-11-21	Pre-cache: A Microarchitectural Solution to prevent Meltdown and Spectre	Subhash Sethumurugan et.al.	2511.17726	null
2025-11-21	Which active galaxies might be neutrino emitters?	Shuying Zhou et.al.	2511.16869	null
2025-11-20	Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter	Qinghao Hu et.al.	2511.16665	null
2025-11-20	An observationally based wind model contemporaneous with the radio detections in $τ$ Boötis	Dag Evensberget et.al.	2511.16370	null
2025-11-21	Fast LLM Post-training via Decoupled and Best-of-N Speculation	Rongxin Cheng et.al.	2511.16193	null
2025-11-20	Can Online GenAI Discussion Serve as Bellwether for Labor Market Shifts?	Shurui Cao et.al.	2511.16028	null
2025-11-19	Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization	Rahul Krishna Thomas et.al.	2511.15898	null
2025-11-19	Fossil group origins XIV: The radial orbits of A267	S. Zarattini et.al.	2511.15786	null
2025-11-19	FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation	Tingrui Shen et.al.	2511.15618	null
2025-11-24	*Structural phase transitions in the van der Waals ferromagnets Fe $x$Pd${y}$Te$_2$*	Rafaela F. S. Penacchio et.al.	2511.15584	null
2025-11-19	Cost-Aware Prediction (CAP): An LLM-Enhanced Machine Learning Pipeline and Decision Support System for Heart Failure Mortality Prediction	Yinan Yu et.al.	2511.15357	null
2025-11-19	Gaussian Blending: Rethinking Alpha Blending in 3D Gaussian Splatting	Junseo Koo et.al.	2511.15102	null
2025-11-18	Harmful Traits of AI Companions	W. Bradley Knox et.al.	2511.14972	null
2025-11-18	Photometric Constraints on Intermediate-mass Black Holes in the Galactic Centre	Tamojeet Roychowdhury et.al.	2511.14856	null
2025-11-23	Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning	Ruoyu Qin et.al.	2511.14617	null
2025-11-18	Positive AGN feedback in the outskirts of nearby barred spiral galaxies?	Bannanje Ananthamoorthy et.al.	2511.14257	null
2025-11-18	Enhanced UV emission knot in the giant radio galaxy NGC 315: Hint of patchy star formation?	Bannanje Ananthamoorthy et.al.	2511.14252	null
2025-11-18	MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts	Wenfeng Wang et.al.	2511.14102	null
2025-11-17	Beat the long tail: Distribution-Aware Speculative Decoding for RL Training	Zelei Shao et.al.	2511.13841	null
2025-11-17	VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping	Haotian Dong et.al.	2511.13587	null
2025-11-17	Tfin Crypto: From Speculation to Optimization in Risk Managed Crypto Portfolio Allocation	Thanh Nguyen et.al.	2511.13239	null
2025-11-15	Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding	Arun Ramachandran et.al.	2511.12031	null
2025-11-15	Educators on the Frontline: Philosophical and Realistic Perspectives on Integrating ChatGPT into the Learning Space	Surajit Das et.al.	2511.11960	null
2025-11-13	Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput	Jingwei Song et.al.	2511.11733	null
2025-11-09	Exploring Parallelism in FPGA-Based Accelerators for Machine Learning Applications	Sed Centeno et.al.	2511.11640	null
2025-11-14	Fast and Expressive Multi-Token Prediction with Probabilistic Circuits	Andreas Grivas et.al.	2511.11346	null
2025-11-14	Optimising Density Computations in Probabilistic Programs via Automatic Loop Vectorisation	Sangho Lim et.al.	2511.11070	null
2025-11-13	Widening of Binaries via Non-conservative Mass Transfer as a Formation Channel for Gaia Black Hole System	Aleksandra Olejak et.al.	2511.10728	null
2025-11-12	Evaluating from Benign to Dynamic Adversarial: A Squid Game for Large Language Models	Zijian Chen et.al.	2511.10691	null
2025-11-08	A Mathematical Framework for AI Singularity: Conditions, Bounds, and Control of Recursive Improvement	Akbar Anbar Jafari et.al.	2511.10668	null
2025-11-13	Steering Pretrained Drafters during Speculative Decoding	Frédéric Berdoz et.al.	2511.09844	null
2025-11-12	Emergent Dark Matter	Christian Canete et.al.	2511.09034	null
2025-11-12	TiDAR: Think in Diffusion, Talk in Autoregression	Jingyu Liu et.al.	2511.08923	null
2025-11-14	Kinematic scaling relations of disc galaxies from ionised gas at $z\sim~1$ and their connection with dark matter halos	Pavel E. Mancera Piña et.al.	2511.08685	null
2025-11-11	Parallel Sampling via Autospeculation	Nima Anari et.al.	2511.07869	null
2025-11-11	Critical Confabulation: Can LLMs Hallucinate for Social Good?	Peiqi Sui et.al.	2511.07722	null
2025-11-10	Look into your Heart – Prototypes for a Speculative Design Exploration of Personal Heart Rate Visualization	Swaroop Panda et.al.	2511.07600	null
2025-11-08	In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading	Shuning Lin et.al.	2511.05814	null
2025-11-06	The TeV emission of 3C273: inverse Compton radiation from shear-accelerated high-energy electrons in the large-scale jet?	F. Tavecchio et.al.	2511.04433	null
2025-11-03	TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding	Aditya Sridhar et.al.	2511.02017	null
2025-11-04	Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding	Jungyeon Koh et.al.	2511.01695	null
2025-11-03	When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding	Min Fang et.al.	2511.01282	null
2025-11-04	SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding	Jameson Sandler et.al.	2511.00606	null
2025-11-01	Reject Only Critical Tokens: Pivot-Aware Speculative Decoding	Amir Ziashahabi et.al.	2511.00351	null
2025-11-01	Sherlock: Reliable and Efficient Agentic Workflow Execution	Yeonju Ro et.al.	2511.00330	null
2025-10-31	SpecAttn: Speculating Sparse Attention	Harsh Shah et.al.	2510.27641	null
2025-10-30	Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral	Ayoub Hammal et.al.	2510.27017	null
2025-10-30	CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs	Zhiyuan Ning et.al.	2510.26843	null
2025-10-30	Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models	Yinrong Hong et.al.	2510.26577	null
2025-10-30	Polybasic Speculative Decoding Through a Theoretical Perspective	Ruilin Wang et.al.	2510.26527	null
2025-10-30	In space there will be no need to scream – Limits to the presence of giant planets in the $ζ^2$ Ret system	A. Suárez Mascareño et.al.	2510.26483	null
2025-10-30	ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems	Qiaoling Chen et.al.	2510.26475	null
2025-10-29	Foundations of Fiat-Denominated Loans Collateralized by Cryptocurrencies	Pavel Hubáček et.al.	2510.25878	null
2025-10-29	Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation	Zhi-Kai Chen et.al.	2510.25739	null
2025-10-29	Accurate Leakage Speculation for Quantum Error Correction	Chaithanya Naik Mude et.al.	2510.25661	null
2025-10-29	Detuning Choice for solving MIS and MWIS	Sem Saada Khelkhal et.al.	2510.25473	null
2025-10-31	MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding	Runxi Huang et.al.	2510.25327	null
2025-10-31	‘Studies for’: A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model	Chihiro Nagashima et.al.	2510.25228	null
2025-10-29	Prospects for a fourth generation of leptons in a 13 TeV p-p collider	Ramkrishna Joshi et.al.	2510.25190	null
2025-10-28	On the Field Excursion Bound	Tom Rudelius et.al.	2510.24715	null
2025-10-28	MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration	Junhyuk So et.al.	2510.24211	null
2025-10-28	SpecKD: Speculative Decoding for Effective Knowledge Distillation of LLMs	Haiduo Huang et.al.	2510.24021	null
2025-10-27	Financial markets as a Le Bonian crowd during boom-and-bust episodes: A complementary theoretical framework in behavioural finance	Claire Barraud et.al.	2510.23175	null
2025-10-27	Understanding In-Context Learning Beyond Transformers: An Investigation of State Space and Hybrid Architectures	Shenran Wang et.al.	2510.23006	null
2025-10-27	Exploring Structures of Inferential Mechanisms through Simplistic Digital Circuits	Giovanni Sileno et.al.	2510.22883	null
2025-10-26	Batch Speculative Decoding Done Right	Ranran Haoran Zhang et.al.	2510.22876	null
2025-10-26	FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference	Divya Jyoti Bajpai et.al.	2510.22641	null
2025-10-24	Unravelling the oxygen influence in cubic bixbyite In $_2$O$_3$ on Raman active phonon modes by isotope studies	Johannes Feldl et.al.	2510.22018	null
2025-10-24	Butterfly: glo-cal effects of data, energy and industry, New Media and Performance Exhibition Catalogue	Rebekah Rousi et.al.	2510.21893	null
2025-10-23	Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation	Yuhan Liu et.al.	2510.20812	null
2025-10-22	Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs	Hongyi Liu et.al.	2510.20064	null
2025-10-22	Speculative Sampling for Parametric Temporal Point Processes	Marin Biloš et.al.	2510.20031	null
2025-10-22	New Recursions for the Canonical Scalar-Scaffolded Yang-Mills Amplitude	Jeffrey V. Backus et.al.	2510.19901	null
2025-10-22	AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders	Yuezhou Hu et.al.	2510.19779	null
2025-10-23	Fast Inference via Hierarchical Speculative Decoding	Clara Mohri et.al.	2510.19705	null
2025-10-22	CoSense-LLM: Semantics at the Edge with Cost- and Uncertainty-Aware Cloud-Edge Cooperation	Hasan Akgul et.al.	2510.19670	null
2025-10-22	Fermionic fields of higher spin in de Sitter space	Dionysios Anninos et.al.	2510.19652	null
2025-10-21	Reasoning Language Model Inference Serving Unveiled: An Empirical Study	Qi Li et.al.	2510.18672	null
2025-10-21	From Quarter to All: Accelerating Speculative LLM Decoding via Floating-Point Exponent Remapping and Parameter Sharing	Yushu Zhao et.al.	2510.18525	null
2025-10-20	Annotating the Chain-of-Thought: A Behavior-Labeled Dataset for AI Safety	Antonio-Gabriel Chacón Menke et.al.	2510.18154	null
2025-10-20	A Hall viscosity for skyrmion via magnon interaction	Bom Soo Kim et.al.	2510.18092	null
2025-10-20	SpecAgent: A Speculative Retrieval and Forecasting Agent for Code Completion	George Ma et.al.	2510.17925	null
2025-10-18	Does GenAI Rewrite How We Write? An Empirical Study on Two-Million Preprints	Minfeng Qi et.al.	2510.17882	null
2025-10-18	$ρ$ Hammer: Reviving RowHammer Attacks on New Architectures via Prefetching	Weijie Chen et.al.	2510.16544	null
2025-10-18	What Limits Agentic Systems Efficiency?	Song Bian et.al.	2510.16276	null
2025-10-17	Interpretable RNA-Seq Clustering with an LLM-Based Agentic Evidence-Grounded Framework	Elias Hossain et.al.	2510.16082	null
2025-10-29	TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs	Sibo Xiao et.al.	2510.15545	null
2025-10-23	Accelerating Mobile Language Model via Speculative Decoding and NPU-Coordinated Execution	Zhiyang Chen et.al.	2510.15312	null
2025-10-16	Speculative Model Risk in Healthcare AI: Using Storytelling to Surface Unintended Harms	Xingmeng Zhao et.al.	2510.14718	null
2025-10-16	xLLM Technical Report	Tongxuan Liu et.al.	2510.14686	null
2025-10-15	Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving	Nikos Pagonas et.al.	2510.14126	null
2025-10-15	Tests of restricted Quantum Focusing and a universal CFT bound	Victor Franken et.al.	2510.13961	null
2025-10-17	What Layers When: Learning to Skip Compute in LLMs with Residual Gates	Filipe Laitenberger et.al.	2510.13876	null
2025-10-15	Are Randomized Quantum Linear Systems Solvers Practical?	Siddharth Hariprakash et.al.	2510.13766	null
2025-10-15	Speculating a Tactile Grammar: Toward Task-Aligned Chart Design for Non-Visual Perception	Areen Khalaila et.al.	2510.13731	null
2025-10-15	Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference	Nikhil Bhendawade et.al.	2510.13161	null
2025-10-14	3-Model Speculative Decoding	Sanghyun Byun et.al.	2510.12966	null
2025-10-14	Language Models Model Language	Łukasz Borchmann et.al.	2510.12766	null
2025-10-14	Notes on false vacuum decay in quantum Ising models	Ian G. Moss et.al.	2510.12592	null
2025-10-14	A Direct Memory Access Controller (DMAC) for Irregular Data Transfers on RISC-V Linux Systems	Thomas Benz et.al.	2510.12277	null
2025-10-14	How Far I’ll Go: Imagining Futures of Conversational AI with People with Visual Impairments Through Design Fiction	Jeanne Choi et.al.	2510.12268	null
2025-10-13	Direct Multi-Token Decoding	Xuan Luo et.al.	2510.11958	null
2025-10-13	New Tests of Low-Scale Quantum Gravity with Cosmic-Ray Collisions	Manuel Ettengruber et.al.	2510.11879	null
2025-10-13	General real-valued theories with the Schröder-Bernstein property are stable	Alexander Berenstein et.al.	2510.11858	null
2025-10-13	The Magic Barrier before Thermalization	Lukas Ebner et.al.	2510.11681	null
2025-10-13	(Dis)Proving Spectre Security with Speculation-Passing Style	Santiago Arranz-Olmos et.al.	2510.11573	null
2025-10-14	AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model	Zhiwei Jin et.al.	2510.11496	null
2025-10-13	Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding	Bingjie Zhu et.al.	2510.11331	null
2025-10-11	SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference	Liangkun Chen et.al.	2510.10302	null
2025-10-11	Exploration of Embodied Space Experience through Umbilical Interaction: A Grounded Theory Approach	Shuai Guo et.al.	2510.10258	null
2025-10-11	LAMOST J064137.77+045743.8: A New Binary of an A7-type Pulsating Subgiant and an M-type Red Dwarf	Yanhui Chen et.al.	2510.10164	null
2025-10-11	Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding	Payel Bhattacharjee et.al.	2510.09942	null
2025-10-10	Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy	Xiaoxiao Ma et.al.	2510.09012	null
2025-10-10	Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation	Yao Teng et.al.	2510.08994	null
2025-10-10	Mozart: A Chiplet Ecosystem-Accelerator Codesign Framework for Composable Bespoke Application Specific Integrated Circuits	Haoran Jin et.al.	2510.08873	null
2025-10-09	Atomically resolved electron reflectivity at a metal/semiconductor interface	Ding-Ming Huang et.al.	2510.07970	null
2025-10-08	OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs	Jaeseong Lee et.al.	2510.07535	null
2025-10-08	Lectures on entanglement, von Neumann algebras, and emergence of spacetime	Hong Liu et.al.	2510.07017	null
2025-10-08	Simulations of Globular Cluster Evolution with Multiple Stellar Populations	Mirek Giersz et.al.	2510.06942	null
2025-10-07	A Meat-Summer Night’s Dream: A Tangible Design Fiction Exploration of Eating Biohybrid Flying Robots	Ziming Wang et.al.	2510.06507	null
2025-10-07	Back to the Future Museum – Speculative Design for Virtual Citizen-Curated Museums	Richard Rhodes et.al.	2510.06472	null
2025-10-06	Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding	Shrenik Bhansali et.al.	2510.05421	null
2025-10-06	Zigzags and free adjunctions	Lorenzo Riva et.al.	2510.05371	null
2025-10-06	Gromov-Witten theory, degenerations, and the tautological ring	Davesh Maulik et.al.	2510.04779	null
2025-10-05	Speculative Actions: A Lossless Framework for Faster Agentic Systems	Naimeng Ye et.al.	2510.04371	null
2025-10-05	Self Speculative Decoding for Diffusion Large Language Models	Yifeng Gao et.al.	2510.04147	null
2025-10-04	Self-Speculative Masked Diffusions	Andrew Campbell et.al.	2510.03929	null
2025-10-04	Security Analysis of Ponzi Schemes in Ethereum Smart Contracts	Chunyi Zhang et.al.	2510.03819	null
2025-10-03	PrivacyMotiv: Speculative Persona Journeys for Empathic and Motivating Privacy Reviews in UX Design	Zeya Chen et.al.	2510.03559	null
2025-10-03	Action Deviation-Aware Inference for Low-Latency Wireless Robots	Jeyoung Park et.al.	2510.02851	null
2025-10-03	A Concept of Possibility for Real-World Events	Daniel G. Schwartz et.al.	2510.02655	null
2025-10-02	Dispersion in Analogue Gravity	Eren Erberk Erkul et.al.	2510.02542	null
2025-10-02	Impact of AGN and nuclear star formation on the ISM turbulence of galaxies: Insights from JWST/MIRI spectroscopy	Rogemar A. Riffel et.al.	2510.02517	null
2025-09-28	DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding	Guanghao Li et.al.	2510.02358	null
2025-10-02	The Disparate Impacts of Speculative Decoding	Jameson Sandler et.al.	2510.02128	null
2025-10-03	Virtual fibring of manifolds and groups	Dawid Kielak et.al.	2510.01805	null
2025-10-01	Theory is Shapes	Matthew Varona et.al.	2510.01382	null
2025-10-01	HiSpec: Hierarchical Speculative Decoding for LLMs	Avinash Kumar et.al.	2510.01336	null
2025-10-01	Combining complex Langevin dynamics with score-based and energy-based diffusion models	Gert Aarts et.al.	2510.01328	null
2025-09-30	Chiral effects and Joule heating in hot and dense matter	Srimoyee Sen et.al.	2510.00114	null
2025-09-29	A(I)nimism: Re-enchanting the World Through AI-Mediated Object Interaction	Diana Mykhaylychenko et.al.	2509.25558	null
2025-09-29	The Stellar Content of NGC~3603 Revisited: Is the IMF Top Heavy?	Philip Massey et.al.	2509.25099	null
2025-09-29	Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding	Sungkyun Kim et.al.	2509.24328	null
2025-09-29	SpecExit: Accelerating Large Reasoning Model via Speculative Exit	Rubing Yang et.al.	2509.24248	null
2025-09-28	HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models	Zhinan Xie et.al.	2509.23928	null
2025-09-27	SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts	Bingshuai Liu et.al.	2509.23232	null
2025-09-29	SAHM: State-Aware Heterogeneous Multicore for Single-Thread Performance	Shayne Wadle et.al.	2509.22405	null
2025-09-26	In Their Own Words: Reasoning Traces Tailored for Small Models Make Them Better Reasoners	Jaehoon Kim et.al.	2509.22230	null
2025-09-26	Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding	Shijing Hu et.al.	2509.22134	null
2025-09-26	FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning	Yizhou Zhang et.al.	2509.21792	null
2025-09-26	Self-Speculative Biased Decoding for Faster Live Translation	Linxiao Zeng et.al.	2509.21740	null
2025-09-25	SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding	Thomas Walton et.al.	2509.21689	null
2025-09-25	SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips	Xinyu Lian et.al.	2509.21271	null
2025-09-24	The interstellar heritage of comets	Karen Willacy et.al.	2509.20530	null
2025-09-30	Speculative Safety-Aware Decoding	Xuekang Wang et.al.	2508.17739	null
2025-08-07	Hierarchical Verification of Speculative Beams for Accelerating LLM Inference	Jaydip Sen et.al.	2508.03726	null
2025-07-22	Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges	Senyao Li et.al.	2507.16731	null
2025-07-22	Enhancing Compiler Optimization Efficiency through Grammatical Decompositions of Control-Flow Graphs	Xuran Cai et.al.	2507.16660	null
2025-07-22	Ly $α$ Emission from [OIII] Emitters Near Reionization: The role of environment in galaxy Ly$α$ detection	Seyedazim Hashemi et.al.	2507.16231	null
2025-07-20	Designing Robots with, not for: A Co-Design Framework for Empowering Interactions in Forensic Psychiatry	Qiaoqiao Ren et.al.	2507.14931	null
2025-07-18	On the asymptotic equidistribution of word values in symmetric groups	Vadim Alekseev et.al.	2507.13928	null
2025-07-22	Gravity and the Higgs boson mass	Carlo Branchina et.al.	2507.13832	null
2025-07-16	Modeling Feasible Locomotion of Nanobots for Cancer Detection and Treatment	Noble Harasha et.al.	2507.12400	null
2025-07-16	Efficient Control Flow Attestation by Speculating on Control Flow Path Representations	Liam Tyler et.al.	2507.12345	null
2025-07-17	DSSD: Efficient Edge-Device LLM Deployment and Collaborative Inference via Distributed Split Speculative Decoding	Jiahong Ning et.al.	2507.12000	null
2025-07-16	Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential	Mohammad Samragh et.al.	2507.11851	null
2025-07-16	Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI	Samyam Rajbhandari et.al.	2507.11830	null
2025-07-14	Exploring ultra-high energy neutrino experiments through the lens of the transport equation	Stefano Palmisano et.al.	2507.10665	null
2025-07-14	Large Interconnected Thermodynamic Systems Nearly Minimize Entropy Production	Kyle J. Ray et.al.	2507.10476	null
2025-07-14	Supernova-induced binary-interaction-powered supernovae: a model for SN2022jli	Ryosuke Hirai et.al.	2507.09974	null
2025-07-12	TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding	Shukai Gong et.al.	2507.09252	null
2025-07-21	Bringing the Norma Dark Cloud to Light in X-rays	Stephen L. Skinner et.al.	2507.09047	null
2025-07-11	On Evaluating Performance of LLM Inference Serving Systems	Amey Agrawal et.al.	2507.09019	null
2025-07-10	Greening Schoolyards and the Spatial Distribution of Property Values in Denver, Colorado	Mahshid Gorjian et.al.	2507.08894	null
2025-07-11	BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity	Chenyang Song et.al.	2507.08771	null
2025-07-11	Time Variation in the TeV Cosmic Ray Anisotropy with IceCube and Energy Dependence of the Solar Dipole	Perri Zilberman et.al.	2507.08242	null
2025-07-08	Optically Overluminous Tidal Disruption Events: Outflow Properties and Implications for Extremely Relativistic Disruptions	Yuhan Yao et.al.	2507.06453	null
2025-07-08	Experiments to test the hypothesis for solar and dark matter axions	Babette Döbrich et.al.	2507.06414	null
2025-07-08	Supernovae from stellar mergers and accretors of binary mass transfer: Implications for Type IIP, 1987A-like and interacting supernovae	F. R. N. Schneider et.al.	2507.06391	null
2025-07-08	Bouncing Grains Keep Protoplanetary Disks Bright	Yansong Qian et.al.	2507.06298	null
2025-07-08	Tropical Donagi theorem	Felix Röhrle et.al.	2507.05987	null
2025-07-04	Impact of flavor condensate dark matter on accretion disk luminosity in spherical spacetimes	Antonio Capolupo et.al.	2507.03758	null
2025-06-18	Evolution, Future of AI, and Singularity	Zeki Doruk Erden et.al.	2507.02876	null
2025-07-03	NVIDIA GPU Confidential Computing Demystified	Zhongshu Gu et.al.	2507.02770	null
2025-07-03	OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding	Ramchalam Kinattinkara Ramakrishnan et.al.	2507.02659	null
2025-07-03	High-Order Deep Meta-Learning with Category-Theoretic Interpretation	David H. Mguni et.al.	2507.02634	null
2025-07-14	FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference	Xing Liu et.al.	2507.02620	null
2025-07-02	H.E.S.S. programme searching for VHE gamma rays associated with FRBs	F. Aharonian et.al.	2507.02143	null
2025-07-07	Handling out-of-order input arrival in CEP engines on the edge combining optimistic, pessimistic and lazy evaluation	Styliani Kyrama et.al.	2507.01461	null
2025-07-02	LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation	Tianyu Liu et.al.	2507.01449	null
2025-07-01	Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding	Guangyi Zhang et.al.	2507.00605	null
2025-06-30	User Concerns Regarding Social Robots for Mood Regulation: A Case Study on the “Sunday Blues”	Zhuochao Peng et.al.	2507.00271	null
2025-07-08	Fully Parallelized BP Decoding for Quantum LDPC Codes Can Outperform BP-OSD	Ming Wang et.al.	2507.00254	null
2025-06-30	Metal-poor single Wolf-Rayet stars: the interplay of optically thick winds and rotation	Lumen Boco et.al.	2507.00137	null
2025-06-30	Segmented Operations using Matrix Multiplications	Aleksandros Sobczyk et.al.	2506.23906	null
2025-06-29	From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows	Mohamed Amine Ferrag et.al.	2506.23260	null
2025-06-28	Polar alignment of a circumbinary disc around a brown dwarf binary	Jeremy L. Smallwood et.al.	2506.22747	null
2025-07-03	VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs	Raghavv Goel et.al.	2506.22694	null
2025-06-27	QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization	Danush Khanna et.al.	2506.22396	null
2025-07-10	Cool Gas in the Circumgalactic Medium of Massive Post Starburst Galaxies	Zoe Harvey et.al.	2506.22287	null
2025-06-26	Small Encoders Can Rival Large Decoders in Detecting Groundedness	Istabrak Abbes et.al.	2506.21288	null
2025-06-26	You never have enough J/ $ψ$ events: the case for a J/$ψ$ factory	Stephen Lars Olsen et.al.	2506.20975	null
2025-06-17	Utility-Driven Speculative Decoding for Mixture-of-Experts	Anish Saxena et.al.	2506.20675	null
2025-07-09	Charged rotating quantum black holes	Dyuman Bhattacharya et.al.	2506.19941	null
2025-06-23	Entangled Quantum Negative Energy Teleportation as a Probe of Semiclassical Gravity	Daniel S. Zachary et.al.	2506.19878	null
2025-06-24	Scaling Speculative Decoding with Lookahead Reasoning	Yichao Fu et.al.	2506.19830	null
2025-06-23	LLMs on a Budget? Say HOLA	Zohaib Hasan Siddiqui et.al.	2506.18952	null
2025-07-10	The Full Nonlinear Vortex Tube-Vorton Method: the post-stall condition	Jesus Carlos Pimentel-Garcia et.al.	2506.18719	null
2025-06-17	Semantic uncertainty in advanced decoding methods for LLM generation	Darius Foodeei et.al.	2506.17296	null
2025-07-08	Capturing Misalignment	Pierfrancesco Guarino et.al.	2506.17176	null
2025-06-20	ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models	Bin Chen et.al.	2506.16712	null
2025-07-02	Rethinking LLM Training through Information Geometry and Quantum Metrics	Riccardo Di Sipio et.al.	2506.15830	null
2025-06-15	$\texttt{SPECS}$ : Faster Test-Time Scaling through Speculative Drafts	Mert Cemri et.al.	2506.15733	null
2025-06-18	CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies	Donghyun Gouk et.al.	2506.15601	null
2025-06-18	PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction	Shufan Li et.al.	2506.15556	null
2025-06-17	Optimistic MEV in Ethereum Layer 2s: Why Blockspace Is Always in Demand	Ozan Solmaz et.al.	2506.14768	null
2025-06-17	S $^4$ C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models	Tao He et.al.	2506.14158	null
2025-06-16	Stimulus Motion Perception Studies Imply Specific Neural Computations in Human Visual Stabilization	David W Arathorn et.al.	2506.13506	null
2025-06-21	Exploring the Secondary Risks of Large Language Models	Jiawei Chen et.al.	2506.12382	null
2025-06-14	Quantum Machine Learning	Muhammad Usman et.al.	2506.12292	null
2025-06-13	Fluid-induced snap-through instability of spherical shells	Pier Giuseppe Ledda et.al.	2506.12247	null
2025-06-13	Eliciting Reasoning in Language Models with Cognitive Tools	Brown Ebouky et.al.	2506.12115	null
2025-06-12	SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding	Ziyi Zhang et.al.	2506.11309	null
2025-06-11	Speculative Design in Spiraling Time: Methods and Indigenous HCI	James Eschrich et.al.	2506.10229	null
2025-06-11	V455 Car: an oscillating eclipsing Algol-type binary in triple star system	Zhao-Long Deng et.al.	2506.10124	null
2025-06-11	Patterns of Patterns III	Joseph Corneli et.al.	2506.09696	null
2025-07-13	SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving	Xiangchen Li et.al.	2506.09397	null
2025-06-11	A collection of results relating the geometry of plane domains and the exit time of planar Brownian motion, II	Greg Markowsky et.al.	2506.09364	null
2025-07-19	Draft-based Approximate Inference for LLMs	Kevin Galim et.al.	2506.08373	link
2025-06-10	Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity	Lesi Chen et.al.	2506.08362	null
2025-06-09	MiniCPM4: Ultra-Efficient LLMs on End Devices	MiniCPM Team et.al.	2506.07900	link
2025-06-09	FREESS: An Educational Simulator of a RISC-V-Inspired Superscalar Processor Based on Tomasulo’s Algorithm	Roberto Giorgi et.al.	2506.07665	link
2025-06-09	LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments	Jin Huang et.al.	2506.07416	null
2025-06-08	Exploiting Inaccurate Branch History in Side-Channel Attacks	Yuhui Zhu et.al.	2506.07263	null
2025-06-07	Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit	Charles Goddard et.al.	2506.06607	null
2025-06-06	Fake Friends and Sponsored Ads: The Risks of Advertising in Conversational Search	Jacob Erickson et.al.	2506.06447	null
2025-07-08	On the Fundamental Impossibility of Hallucination Control in Large Language Models	Michał P. Karpowicz et.al.	2506.06382	null
2025-06-06	Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): Evidence of planet-disk interaction in the 2MASSJ16120668-3010270 system	C. Ginski et.al.	2506.05892	null
2025-06-10	Gumbel-max List Sampling for Distribution Coupling with Multiple Samples	Joseph Rowan et.al.	2506.05632	null
2025-06-05	Accelerated Test-Time Scaling with Model-Free Speculative Sampling	Woomin Song et.al.	2506.04708	null
2025-06-04	Guided Speculative Inference for Efficient Test-Time Alignment of LLMs	Jonathan Geuter et.al.	2506.04118	link
2025-06-04	The Causal-Noncausal Tail Processes: An Introduction	Christian Gouriéroux et.al.	2506.04046	null
2025-06-04	AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism	Zhepei Wei et.al.	2506.03700	link
2025-06-04	POSS: Position Specialist Generates Better Draft for Speculative Decoding	Langlin Huang et.al.	2506.03566	link
2025-06-02	Out-of-Vocabulary Sampling Boosts Speculative Decoding	Nadav Timor et.al.	2506.03206	null
2025-06-03	Feedstack: Layering Structured Representations over Unstructured Feedback to Scaffold Human AI Conversation	Hannah Vy Nguyen et.al.	2506.03052	null
2025-06-03	Reuse or Generate? Accelerating Code Editing via Edit-Oriented Speculative Decoding	Peiding Wang et.al.	2506.02780	null
2025-06-28	Multi Layered Autonomy and AI Ecologies in Robotic Art Installations	Baoyang Chen et.al.	2506.02606	null
2025-06-03	Consultant Decoding: Yet Another Synergistic Mechanism	Chuanghao Ding et.al.	2506.02391	null
2025-06-02	Radiation GRMHD Models of Accretion onto Stellar-Mass Black Holes: I. Survey of Eddington Ratios	Lizhong Zhang et.al.	2506.02289	null
2025-05-16	SpecMemo: Speculative Decoding is in Your Pocket	Selin Yildirim et.al.	2506.01986	null
2025-05-16	Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism	Yuhao Shen et.al.	2506.01979	null
2025-06-02	Synchronic Web Digital Identity: Speculations on the Art of the Possible	Thien-Nam Dinh et.al.	2506.01856	null
2025-07-04	Playing with Transformer at 30+ FPS via Next-Frame Diffusion	Xinle Cheng et.al.	2506.01380	null
2025-06-02	Shape Shifting Light Dark Matter Solitons	Dor Ben-Amotz et.al.	2506.01282	null
2025-06-01	The $M_{\rm BH}-M_\star$ Relation of the hyperluminous Dust-obscured Quasars up to $z \sim 4$	Yibin Luo et.al.	2506.01218	null
2025-06-01	Mamba Drafters for Speculative Decoding	Daewon Choi et.al.	2506.01206	null
2025-06-01	The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage	Byung-Doh Oh et.al.	2506.01172	null
2025-05-31	Accelerating Diffusion LLMs via Adaptive Parallel Decoding	Daniel Israel et.al.	2506.00413	null
2025-05-31	Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively	Jiawei Gu et.al.	2506.00396	link
2025-05-30	Cross-Attention Speculative Decoding	Wei Zhong et.al.	2505.24544	null
2025-05-30	CLaSp: In-Context Layer Skip for Self-Speculative Decoding	Longze Chen et.al.	2505.24196	null
2025-06-10	Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism	Jinhui Wei et.al.	2505.23219	null
2025-05-28	Pre-Training Curriculum for Multi-Token Prediction in Language Models	Ansar Aynetdinov et.al.	2505.22757	link
2025-05-28	Mass-feeding of jet-launching white dwarfs in grazing and common envelope evolution	Noam Soker et.al.	2505.22621	null
2025-05-29	Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design	Yudi Zhang et.al.	2505.22179	link
2025-05-28	RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding	Yuichiro Hoshino et.al.	2505.22135	null
2025-05-28	Robust and Symmetric Magnetic Field Dependency of Superconducting Diode Effect in Asymmetric Dirac Semimetal SQUIDs	H. C. Travaglini et.al.	2505.21861	null
2025-05-27	Computocene: Notes from an Age of Observation	Simone Severini et.al.	2505.21744	null
2025-05-27	Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits	Yeshwanth Venkatesha et.al.	2505.21594	null
2025-05-27	Hardware-Efficient Attention for Fast Decoding	Ted Zadouri et.al.	2505.21487	null
2025-05-27	Pair binding and Hund’s rule breaking in high-symmetry fullerenes	R. Rausch et.al.	2505.21455	null
2025-05-28	Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity	Yehui Tang et.al.	2505.21411	null
2025-05-27	Repeated Auctions with Speculators: Arbitrage Incentives and Forks in DAOs	Nicolas Eschenbaum et.al.	2505.21296	null
2025-05-27	SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences	Jungyoub Cha et.al.	2505.20776	link
2025-05-27	Replication of Reference-Dependent Preferences and the Risk-Return Trade-Off in the Chinese Market	Penggan Xu et.al.	2505.20608	null
2025-05-26	Academic Research Output Derivatives: Structuring Futures and Options on Research Output Index	Amarendra Sharma et.al.	2505.20492	null
2025-05-26	Bounded cohomology, quotient extensions, and hierarchical hyperbolicity	Francesco Fournier-Facio et.al.	2505.20462	null
2025-05-26	HAMburger: Accelerating LLM Inference via Token Smashing	Jingyu Liu et.al.	2505.20438	null
2025-05-23	Reinforcement Speculative Decoding for Fast Ranking	Yingpeng Du et.al.	2505.20316	null
2025-06-13	MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE	Zongle Huang et.al.	2505.19645	null
2025-05-28	Faster and Better LLMs via Latency-Aware Test-Time Scaling	Zili Wang et.al.	2505.19634	null
2025-07-23	Turing Test 2.0: The General Intelligence Threshold	Georgios Mappouras et.al.	2505.19550	null
2025-05-29	DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding	Yunhai Hu et.al.	2505.19201	link
2025-05-25	Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs	Xuan Zhang et.al.	2505.19155	null
2025-05-24	Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding	Yixuan Wang et.al.	2505.18629	null
2025-05-23	VeriThinker: Learning to Verify Makes Reasoning Model Efficient	Zigeng Chen et.al.	2505.17941	link
2025-05-20	Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency	Ruixiao Li et.al.	2505.17074	null
2025-05-16	SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs	Jinwoo Park et.al.	2505.17052	null
2025-05-22	KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization	Mingbo Song et.al.	2505.16162	null
2025-05-21	Strong Hilbert space fragmentation and fractons from subsystem and higher-form symmetries	Charles Stahl et.al.	2505.15889	null
2025-05-21	Quasinormal Modes of Schwarzschild Black Holes in the Dehnen-(1, 4, 5/2) Type Dark Matter Halos	Qi-Qi Liang et.al.	2505.15540	null
2025-06-03	Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding	Zijian Lin et.al.	2505.15380	null
2025-05-21	SSR: Speculative Parallel Scaling Reasoning in Test-time	Yuanlin Chu et.al.	2505.15340	null
2025-05-21	BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms	Yunlong Hou et.al.	2505.15141	null
2025-05-20	STree: Speculative Tree Decoding for Hybrid State-Space Models	Yangchao Wu et.al.	2505.14969	null
2025-05-20	On the Day They Experience: Awakening Self-Sovereign Experiential AI Agents	Botao Amber Hu et.al.	2505.14893	null
2025-05-20	Unremarkable to Remarkable AI Agent: Exploring Boundaries of Agent Intervention for Adults With and Without Cognitive Impairment	Mai Lee Chang et.al.	2505.14872	null
2025-05-20	X-ray properties of compact elliptical galaxies	Orsolya E. Kovacs et.al.	2505.14768	null
2025-05-20	Speculative Decoding Reimagined for Multimodal Large Language Models	Luxi Lin et.al.	2505.14260	link
2025-05-19	Language and Thought: The View from LLMs	Daniel Rothschild et.al.	2505.13561	null
2025-05-19	HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding	Siran Liu et.al.	2505.13254	null
2025-09-15	Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification	Jikai Wang et.al.	2505.13204	null
2025-05-19	FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference	Guangda Liu et.al.	2505.13109	null
2025-05-25	FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks	Zihua Wang et.al.	2505.12728	link
2025-05-18	Traversal Verification for Speculative Tree Decoding	Yepeng Weng et.al.	2505.12398	null
2025-05-16	FAIR Ecosystems for Science at Scale	Sean R. Wilkinson et.al.	2505.11742	null
2025-05-16	Prime Number Error Terms	Nathan Ng et.al.	2505.11295	null
2025-05-16	Beyond surfaces: quantifying internal radiative heat transport in dense materials	Janak Tiwari et.al.	2505.10853	null
2025-05-16	Qualia Optimization	Philip S. Thomas et.al.	2505.10779	null
2025-07-10	Anchoring AI Capabilities in Market Valuations: The Capability Realization Rate Model and Valuation Misalignment Risk	Xinmin Fang et.al.	2505.10590	null
2025-05-18	MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models	Mugilan Ganesan et.al.	2505.10526	null
2025-05-21	SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices	Xiangwen Zhuge et.al.	2505.10259	link
2025-05-14	Chandra Rules Out Super-Eddington Accretion For Little Red Dots	Andrea Sacchi et.al.	2505.09669	null
2025-06-28	Extended Structural Dynamics – Emergent Irreversibility from Reversible Dynamics	Patrick BarAvi et.al.	2505.09650	null
2025-05-14	Observational study of the formation of homologous confined circular-ribbon flares	Shuhong Yang et.al.	2505.09093	null
2025-05-13	Long timescale numerical simulations of large, super-critical accretion discs	P. Chris Fragile et.al.	2505.08859	null
2025-05-13	Kudzu: Fast and Simple High-Throughput BFT	Victor Shoup et.al.	2505.08771	null
2025-05-13	Automatic Task Detection and Heterogeneous LLM Speculative Decoding	Danying Ge et.al.	2505.08600	null
2025-05-12	GUP Effective Metric Without GUP: Implications for the Sign of GUP Parameter and Quantum Bounce	Yen Chin Ong et.al.	2505.07972	null
2025-05-12	Localized Gravity, de Sitter, and the Horizon Criterion	Bjoern Friedrich et.al.	2505.07934	null
2025-06-22	TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking	Ching Nam Hang et.al.	2505.07891	null
2025-05-08	Scaling Laws for Speculative Decoding	Siyuan Yan et.al.	2505.07858	null
2025-05-12	SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models	Hang Wu et.al.	2505.07680	null
2025-05-10	N-body simulations of the Self-Confinement of Viscous Self-Gravitating Narrow Eccentric Planetary Ringlets	Joseph M. Hahn et.al.	2505.06639	null
2025-05-09	FastDup: a scalable duplicate marking tool using speculation-and-test mechanism	Zhonghai Zhang et.al.	2505.06127	link
2025-05-08	A Physics Model for Origin of Life	Paul Howard Frampton et.al.	2505.05634	null
2025-05-08	Memory Under Siege: A Comprehensive Survey of Side-Channel Attacks on Memory	MD Mahady Hassan et.al.	2505.04896	null
2025-05-08	Topological phase transition to a hidden charge density wave liquid	Joshua S. H. Lee et.al.	2505.04867	null
2025-05-07	SOAEsV2-7B/72B: Full-Pipeline Optimization for State-Owned Enterprise LLMs via Continual Pre-Training, Domain-Progressive SFT and Distillation-Enhanced Speculative Decoding	Jingyang Deng et.al.	2505.04723	null
2025-05-06	Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation	Hengyuan Hu et.al.	2505.03983	null
2025-05-06	QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies	Shuyao Cheng et.al.	2505.03195	null
2025-05-04	The quest for explosive bubbles in the Indonesian Rupiah/US exchange rate: Does the uncertainty trinity matter?	Abdul Khaliq et.al.	2505.02869	null
2025-05-24	Accelerating Large Language Model Reasoning via Speculative Search	Zhihai Wang et.al.	2505.02865	null
2025-05-21	Dirac Singleton as a Relativistic Field Beyond Standard Model	M. A. Vasiliev et.al.	2505.01915	null
2025-05-03	Speculative Evolution Through 3D Cellular Automata	Amir Hossein Khazaei et.al.	2505.01692	null
2025-05-02	PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding	Bradley McDanel et.al.	2505.01572	null
2025-05-12	Emotions in Artificial Intelligence	Hermann Borotschnig et.al.	2505.01462	null
2025-04-29	X-ray Spectroscopy via Temporal Decomposition	William Setterberg et.al.	2504.21169	null
2025-07-02	Ground to Dust: Collisional Cascades and the Fate of Kardashev II Megaswarms	Brian C. Lacki et.al.	2504.21151	null
2025-06-10	EvoPort: An Evolutionary Framework for Portfolio Optimization via Randomized Alpha Discovery and Ensemble-Based Allocation	Nguyen Van Thanh et.al.	2504.21095	null
2025-04-29	Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding	Gabe Guo et.al.	2504.20456	link
2025-04-28	AutoJudge: Judge Decoding Without Manual Annotation	Roman Garipov et.al.	2504.20039	null
2025-04-27	Detecting speculative data flow vulnerabilities using weakest precondition reasoning	Graeme Smith et.al.	2504.19128	null
2025-05-25	Efficient Reasoning for LLMs through Speculative Chain-of-Thought	Jikai Wang et.al.	2504.19095	link
2025-04-26	Global Simulations of Gravitational Instability in Protostellar Disks with Full Radiation Transport II. Locality of Gravitoturbulence, Clumpy Spirals, and Implications for Observable Substructure	Wenrui Xu et.al.	2504.18751	null
2025-06-15	PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation	Zihao An et.al.	2504.18583	null
2025-04-25	Generalizing the relativistic precession model of quasi-periodic oscillations through anharmonic corrections	Roberto Giambò et.al.	2504.18403	null
2025-04-23	A Vision for AI-Driven Adaptation of Dynamic AR Content to Users and Environments	Julian Rasch et.al.	2504.16562	null
2025-04-23	Hardness of Median and Center in the Ulam Metric	Nick Fischer et.al.	2504.16437	null
2025-04-22	On commuting integer matrices	Jonathan Chapman et.al.	2504.15839	null
2025-04-22	Delayed Keen Model with Inflation	Ali Tolga Dincer et.al.	2504.15819	null
2025-04-23	Speculative Sampling via Exponential Races	Szymon Kobus et.al.	2504.15475	null
2025-05-16	Rendezvous in CAVITY: Kinematics and gas properties of an isolated dwarf-dwarf merging pair in a cosmic void region	Bahar Bidaran et.al.	2504.15359	null
2025-04-21	*The phase diagram of CeRh ${2}$As${2}$ for out-of-plane magnetic field*	P. Khanenko et.al.	2504.15112	null
2025-04-21	Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds	Heidy Khlaaf et.al.	2504.15088	null
2025-04-21	Note on Type $III_1$ Algebras in $ c= 1$ String Theory and Bulk Causal Diamonds	T. Banks et.al.	2504.15076	null
2025-04-21	Exploring Collaborative GenAI Agents in Synchronous Group Settings: Eliciting Team Perceptions and Design Considerations for the Future of Work	Janet G. Johnson et.al.	2504.14779	null
2025-05-27	BLACKOUT: Data-Oblivious Computation with Blinded Capabilities	Hossam ElAtali et.al.	2504.14654	null
2025-04-25	UFO2: The Desktop AgentOS	Chaoyun Zhang et.al.	2504.14603	link
2025-04-20	An interstellar mission to test astrophysical black holes	Cosimo Bambi et.al.	2504.14576	null
2025-04-19	Charge Densities in Crystals and Triply-Periodic Minimal Surfaces	Mengdi Yin et.al.	2504.14148	null
2025-04-18	Going Whole Hog: A Philosophical Defense of AI Cognition	Herman Cappelen et.al.	2504.13988	null
2025-04-16	From job titles to jawlines: Using context voids to study generative AI systems	Shahan Ali Memon et.al.	2504.13947	null
2025-03-21	Bio-crafting Architecture: Experiences of growing mycelium in minimal surface molds	Anca-Simona Horvath et.al.	2504.13855	null
2025-05-28	The Sky as a Killing Horizon	Níckolas de Aguiar Alves et.al.	2504.12514	null
2025-04-12	Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time	Wang Yang et.al.	2504.12329	link
2025-04-18	Large Language Model-Based Knowledge Graph System Construction for Sustainable Development Goals: An AI-Based Speculative Design Perspective	Yi-De Lin et.al.	2504.12309	null
2025-04-16	Purposefully Induced Psychosis (PIP): Embracing Hallucination as Imagination in Large Language Models	Kris Pilcher et.al.	2504.12012	null
2025-04-16	Who Said Only Military Officers Can Deal with Uncertainty? On the Importance of Uncertainty in EdTech Data Visualisations	Felicitas Macgilchrist et.al.	2504.11974	null
2025-04-15	Five dimensional rotating and Quintessence black hole and their shadows	Milko Estrada et.al.	2504.11408	null
2025-04-16	Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance	Shangyu Liu et.al.	2504.11197	null
2025-04-14	Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing Obfuscation	Kartik Ramkrishnan et.al.	2504.10318	null
2025-04-14	Gravitational metamaterials from optical properties of spacetime media	Orlando Luongo et.al.	2504.09987	null
2025-04-12	Authoritarian Recursions: How Fiction, History, and AI Reinforce Control in Education, Warfare, and Discourse	Hasan Oguz et.al.	2504.09030	null
2025-04-11	SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting	Jiaming Xu et.al.	2504.08850	null
2025-05-31	SD $^2$ : Self-Distilled Sparse Drafters	Mike Lasby et.al.	2504.08838	null
2025-04-05	SLOs-Serve: Optimized Serving of Multi-SLO LLMs	Siyuan Chen et.al.	2504.08784	null
2025-04-11	Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices	Shengyuan Ye et.al.	2504.08242	null
2025-05-16	SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning	Rui Pan et.al.	2504.07891	link
2025-04-10	Synthetic Fluency: Hallucinations, Confabulations, and the Creation of Irish Words in LLM-Generated Translations	Sheila Castilho et.al.	2504.07680	null
2025-04-10	Proceedings of the Purposeful XR Workshop for CHI 2025	Elizabeth Childs et.al.	2504.07475	null
2025-04-09	Joint Survey Processing. III. Compact Oddballs in the COSMOS Field – Little Red Dots and Transients	Yu-Heng Lin et.al.	2504.07196	null
2025-04-09	ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation Schemes	Amund Bergland Kvalsvik et.al.	2504.07018	null
2025-04-08	SPIRe: Boosting LLM Inference Throughput with Speculative Decoding	Sanjit Neelam et.al.	2504.06419	null
2025-04-08	Decoding the Ishango Bone: Unveiling Prehistoric Mathematical Art	Jenny Baur et.al.	2504.06412	null
2025-04-08	Interplay between trimer structure and magnetic ground state in Ba5Ru3O12 probed by Neutron and muSR techniques	E. Kushwaha et.al.	2504.06113	null
2025-04-08	Strong Evidence That Abiogenesis Is a Rapid Process on Earth Analogs	David Kipping et.al.	2504.05993	null
2025-04-08	DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding	Hossein Entezari Zarch et.al.	2504.05598	null
2025-06-03	Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution	Raffi Khatchadourian et.al.	2504.05424	null
2025-04-06	pc-COP: An Efficient and Configurable 2048-p-Bit Fully-Connected Probabilistic Computing Accelerator for Combinatorial Optimization	Kiran Magar et.al.	2504.04543	null
2025-06-02	Representations of $p$ -adic groups and orbits with smooth closure in a variety of Langlands parameters	Kristaps Balodis et.al.	2504.04163	null
2025-04-05	PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models	Haofei Yin et.al.	2504.04104	null
2025-03-23	Agentic Business Process Management: The Past 30 Years And Practitioners’ Future Perspectives	Hoang Vu et.al.	2504.03693	null
2025-04-04	Ethics Readiness of Technology: The case for aligning ethical approaches with technological maturity	Eline de Jong et.al.	2504.03336	null
2025-04-03	A Review of Prototyping in XR: Linking Extended Reality to Digital Fabrication	Bixun Chen et.al.	2504.02998	null
2025-05-02	GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation	Zhiyuan Yan et.al.	2504.02782	link
2025-04-03	Black Holes, Moduli Stabilisation and the Swampland	Matilda Delgado et.al.	2504.02645	null
2025-04-08	Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge	Dong-Sig Han et.al.	2504.02618	null
2025-06-16	Graviton Scattering on Gravitational Atoms: Relic Graviton Shot Noise	Benjamin Avila-Lopez et.al.	2504.01286	null
2025-04-01	Reminiscences about Steven Weinberg (This Time it’s Personal)	C. P. Burgess et.al.	2504.01118	null
2025-04-01	Mesoscale Eddy – Internal Wave Coupling. III. The End of the Enstrophy Cascade and Maintenance of Gyre Scale Potential Vorticity Gradients	Kurt L. Polzin et.al.	2504.00486	null
2025-04-01	The Impact of Triangular-Toothed Gears on the Functionality of the Antikythera Mechanism	Esteban Guillermo Szigety y Gustavo Francisco Arenas et.al.	2504.00327	null
2025-06-04	Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding	Aayush Gautam et.al.	2504.00030	null
2025-03-31	*What the Fck Is Artificial General Intelligence?**	Michael Timothy Bennett et.al.	2503.23923	null
2025-03-31	A search for the three isomers of cyano-1,3-butadiene in TMC-1: Implications for bottom-up routes involving 1,3-butadiene	M. Agundez et.al.	2503.23841	null
2025-03-30	Credit, Land Speculation, and Low-Interest-Rate Policy	Tomohiro Hirano et.al.	2503.23552	null
2025-03-30	The Longest Duration SGRE Event in Solar Cycle 25	Nat Gopalswamy et.al.	2503.23544	null
2025-03-30	Speculative End-Turn Detector for Efficient Speech Chatbot Assistant	Hyunjong Ok et.al.	2503.23439	null
2025-03-29	Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation	Dominik Macko et.al.	2503.23242	null
2025-03-28	Formation and Evolution of Compact Binaries Containing Intermediate Mass Black Holes in Dense Star Clusters`	Seungjae Lee et.al.	2503.22109	null
2025-03-27	How to Constrain the Stochastic Gravitational Wave Background with Multi-Frequency Detections	Eleanor Gleave et.al.	2503.21508	null
2025-03-26	Speculations on higher Fukaya categories	James Pascaleff et.al.	2503.20906	null
2025-03-24	The Centers and Margins of Modeling Humans in Well-being Technologies: A Decentering Approach	Jichen Zhu et.al.	2503.19132	null
2025-05-14	Spectropolarimetry of A Nuclear Transient AT2023clx: Revealing The Geometrical Alignment between The Transient Outflow and The Nuclear Dusty Region	Kohki Uno et.al.	2503.19024	null
2025-03-23	A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models	Zuan Xie et.al.	2503.18989	null
2025-03-23	A Multi-Model Adaptation of Speculative Decoding for Classification	Somnath Roy et.al.	2503.18076	null
2025-03-20	SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs	Shibo Jie et.al.	2503.16163	null
2025-03-20	“This could save us months of work” – Use Cases of AI and Automation Support in Investigative Journalism	Besjon Cifliku et.al.	2503.16011	null
2025-03-20	SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models	Fahao Chen et.al.	2503.15921	null
2025-03-19	Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices	Ziyao Wang et.al.	2503.14932	null
2025-06-12	The Origin of the Very-High-Energy Diffuse $γ$ -Ray Emission: The Case for Galactic Source Cocoons	Antonio Ambrosone et.al.	2503.14651	null
2025-05-04	Superconductivity in magnetars: Exploring type-I and type-II states in toroidal magnetic fields	Mayusree Das et.al.	2503.14594	null
2025-03-26	Association of 220 PeV Neutrino KM3-230213A with Gamma-Ray Bursts	Ruiqi Wang et.al.	2503.14471	null
2025-03-18	Neutron portal to ultra-high-energy neutrinos	Gustavo F. S. Alves et.al.	2503.14419	null
2025-03-18	Speculative Decoding for Verilog: Speed and Quality, All in One	Changran Xu et.al.	2503.14153	null
2025-03-18	Growing a Twig to Accelerate Large Vision-Language Models	Zhenwei Shao et.al.	2503.14075	null
2025-03-17	ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts	Evangelos Georganas et.al.	2503.13565	null
2025-03-17	Enhanced anomalous Hall effect in the topological Kagome metal Cs(V $_{1-x}$Mn$_x$)$_3$Sb$_5$	Xinmin Wang et.al.	2503.13351	null
2025-03-28	WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows	Fabian Lehmann et.al.	2503.13072	link
2025-05-15	Collaborative Speculative Inference for Efficient LLM Inference Serving	Luyao Gao et.al.	2503.10325	null
2025-03-13	Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding	Jinze Li et.al.	2503.10135	null
2025-03-12	A practical guide to machine learning interatomic potentials – Status and future	Ryan Jacobs et.al.	2503.09814	null
2025-03-11	In Search of the Potentially Hazardous Asteroids in the Taurid Resonant Swarm	Jasmine Li et.al.	2503.08670	null
2025-03-11	Liquidity Competition Between Brokers and an Informed Trader	Ryan Donnelly et.al.	2503.08287	null
2025-03-25	Training Domain Draft Models for Speculative Decoding: Best Practices and Insights	Fenglu Hong et.al.	2503.07807	null
2025-03-10	Did smartphones break the world as we knew it?	Mikhail V. Tamm et.al.	2503.07773	null
2025-03-13	Design as Hope: Reimagining Futures for Seemingly Doomed Problems	JaeWon Kim et.al.	2503.07586	null
2025-03-09	A parallel parser for regular expressions	Angelo Borsotti et.al.	2503.06763	null
2025-03-07	Quantum-like cognition and decision making in the light of quantum measurement theory	Miho Fuyama et.al.	2503.05859	null
2025-02-25	Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research	Veda C. Storey et.al.	2503.05770	null
2025-03-10	Speculative Decoding for Multi-Sample Inference	Yiwei Li et.al.	2503.05330	null
2025-03-07	SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding	Kaiyu Huang et.al.	2503.05096	null
2025-02-11	Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations	Kunal Handa et.al.	2503.04761	null
2025-03-19	Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling	Yan Li et.al.	2503.04398	null
2025-03-06	A possible jet and corona configuration for Swift J1727.8–1613 during the hard state	Jing-Qiang Peng et.al.	2503.04044	null
2025-03-05	RASD: Retrieval-Augmented Speculative Decoding	Guofeng Quan et.al.	2503.03434	null
2025-03-26	SpecInF: Exploiting Idle GPU Resources in Distributed DL Training via Speculative Inference Filling	Cunchi Lv et.al.	2503.02550	null
2025-04-02	Linear Representations of Political Perspective Emerge in Large Language Models	Junsol Kim et.al.	2503.02080	link
2025-04-23	EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test	Yuhui Li et.al.	2503.01840	link
2025-03-03	Efficient Long-Term Structural Reliability Estimation with Non-Gaussian Stochastic Models: A Design of Experiments Approach	Sebastian Winter et.al.	2503.01566	null
2025-03-17	MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing	Haoxuan Li et.al.	2503.01425	null
2025-03-24	Turbulence in virtual: II. Origin of skewness and dual fraction processes	Xunchuan Liu et.al.	2503.01160	null
2025-03-02	DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting	Kai Lv et.al.	2503.00784	link
2025-03-02	Speculative Ad-hoc Querying	Haoyu Li et.al.	2503.00714	link
2025-03-04	Tutorial Proposal: Speculative Decoding for Efficient LLM Inference	Heming Xia et.al.	2503.00491	null
2025-03-01	Peek into the `White-Box’: A Field Study on Bystander Engagement with Urban Robot Uncertainty	Xinyan Yu et.al.	2503.00337	null
2025-03-01	Doraemon’s Gadget Lab: Unpacking Human Needs and Interaction Design in Speculative Technology	Tram Thi Minh Tran et.al.	2503.00257	null
2025-02-28	Broadband pulsed quadrature measurements with calorimeters	Ezad Shojaee et.al.	2503.00188	null
2025-02-28	AMuLeT: Automated Design-Time Testing of Secure Speculation Countermeasures	Bo Fu et.al.	2503.00145	link
2025-02-28	Assessment of universal relations among second-order moments of relativistic stars via reformulated perturbation equations	Koutarou Kyutoku et.al.	2503.00098	null
2025-02-14	A Short History of Rocks: or, How to Invent Quantum Computing	David Wakeham et.al.	2503.00005	null
2025-05-13	Nano Drone-based Indoor Crime Scene Analysis	Martin Cooney et.al.	2502.21019	null
2025-03-04	Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff	Maximilian Holsman et.al.	2502.20704	link
2025-02-28	MonadBFT: Fast, Responsive, Fork-Resistant Streamlined Consensus	Mohammad Mussadiq Jalalzai et.al.	2502.20692	null
2025-03-24	Turbulence in virtual: Origin of the variance and skewness of density function	Xunchuan Liu et.al.	2502.20458	null
2025-02-27	Long-Context Inference with Retrieval-Augmented Speculative Decoding	Guanzheng Chen et.al.	2502.20330	link
2025-04-28	Frobenius subalgebra lattices in tensor categories	Mainak Ghosh et.al.	2502.19876	null
2025-03-04	Speculative Decoding and Beyond: An In-Depth Survey of Techniques	Yunhai Hu et.al.	2502.19732	null
2025-02-26	From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens	Tong Wu et.al.	2502.18890	link
2025-02-26	Reimagining Personal Data: Unlocking the Potential of AI-Generated Images in Personal Data Meaning-Making	Soobin Park et.al.	2502.18853	null
2025-02-26	Towards Optimal Multi-draft Speculative Decoding	Zhengmian Hu et.al.	2502.18779	null
2025-03-02	Variability of Central Stars of Planetary Nebulae with the Zwicky Transient Facility. II. Long-Timescale Variables including Wide Binary and Late Thermal Pulse Candidates	Soumyadeep Bhattacharjee et.al.	2502.18651	null
2025-02-27	Kinematics of metallicity populations in Omega Centauri using Gaia Focused Product Release and Hubble Space Telescope	Nagaraj Vernekar et.al.	2502.17755	null
2025-02-24	Knowledge Distillation with Training Wheels	Guanlin Liu et.al.	2502.17717	null
2025-02-24	THOR: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX	Farshad Dizani et.al.	2502.17658	null
2025-02-24	LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification	Penghui Yang et.al.	2502.17421	link
2025-02-24	Defects in the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$ MOS system and the effect of thermal treatments	Khushabu. S. Agrawal et.al.	2502.17112	null
2025-05-25	CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter	Yepeng Weng et.al.	2502.16880	null
2025-02-24	APINT: A Full-Stack Framework for Acceleration of Privacy-Preserving Inference of Transformers based on Garbled Circuits	Hyunjun Cho et.al.	2502.16877	null
2025-04-03	Towards Reinforcement Learning for Exploration of Speculative Execution Vulnerabilities	Evan Lai et.al.	2502.16756	null
2025-02-22	Fluctuating Lattice, Several Energy Scales	Holger Bech Nielsen et.al.	2502.16369	null
2025-02-21	DReSD: Dense Retrieval for Speculative Decoding	Milan Gritta et.al.	2502.15572	link
2025-02-27	PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System	Yintao He et.al.	2502.15470	null
2025-02-24	Ultra-high-energy $γ$ -ray emission associated with the tail of a bow-shock pulsar wind nebula	Zhen Cao et.al.	2502.15447	null
2025-02-21	TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding	Zhaoxuan Wu et.al.	2502.15197	null
2025-02-21	A Critical Examination of the Nested Leaky Box Model for Galactic Cosmic Ray Transport	Benedikt Schroer et.al.	2502.15115	null
2025-03-11	FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling	Weilin Zhao et.al.	2502.14856	null
2025-05-07	Fusion rules and structure constants of E-series minimal models	Rongvoram Nivesvivat et.al.	2502.14295	null
2025-02-19	Which Attention Heads Matter for In-Context Learning?	Kayo Yin et.al.	2502.14010	link
2025-03-17	NVR: Vector Runahead on NPUs for Sparse Memory Access	Hui Wang et.al.	2502.13873	null
2025-02-19	Hierarchical accretion flow from the G351 infrared dark filament to its central cores	H. Beuther et.al.	2502.13866	null
2025-02-19	C2T: A Classifier-Based Tree Construction Method in Speculative Decoding	Feiye Huo et.al.	2502.13652	null
2025-02-19	Near-extremal dumb holes and some aspects of the Hawking effect	Akshat Pandey et.al.	2502.13557	null
2025-02-19	Radio observations of the ultra-long GRB 220627A reveal a hot cocoon supporting the blue supergiant progenitor scenario	James K. Leung et.al.	2502.13435	null
2025-02-18	Inconsistent metallicity spreads in first generation stars of globular clusters from high resolution spectroscopy and HST photometry	Eugenio Carretta et.al.	2502.13206	null
2025-02-17	SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs	Yige Xu et.al.	2502.12134	null
2025-02-16	AI Generations: From AI 1.0 to AI 4.0	Jiahao Wu et.al.	2502.11312	null
2025-02-16	Coherent Spin Pumping Originated from Sub-Terahertz Néel Vector Dynamics in Easy Plane α-Fe2O3/Pt	Gregory Fritjofson et.al.	2502.11281	null
2025-02-16	GRIFFIN: Effective Token Alignment for Faster Speculative Decoding	Shijing Hu et.al.	2502.11018	link
2025-02-05	QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache	Rishabh Tiwari et.al.	2502.10424	null
2025-02-13	Rosette Nebula Outburst Gaia 24djk from the Young Stellar Object V557 Mon	Adolfo S. Carvalho et.al.	2502.09523	null
2025-02-13	$^{18}$ F-FDG brain PET hypometabolism in post-SARS-CoV-2 infection: substrate for persistent/delayed disorders?	Eric Guedj et.al.	2502.09077	null
2025-02-13	CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality	Razvan-Gabriel Dumitru et.al.	2502.08923	link
2025-03-19	Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding	Ziyao Wang et.al.	2502.08020	null
2025-04-13	Regular Black Holes in Lovelock gravity with a Degenerate AdS Ground State and their shadows	Milko Estrada et.al.	2502.07992	null
2025-03-06	Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs	Ruichen Zhang et.al.	2502.07942	null
2025-02-05	Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference	Toby Simonds et.al.	2502.06833	null
2025-02-10	Persistent spin grids with spin-orbit coupled 2D electron gas	A. V. Poshakinskiy et.al.	2502.06745	null
2025-03-27	LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models	Sihwan Park et.al.	2502.06352	link
2025-02-10	Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE	Haiduo Huang et.al.	2502.06282	link
2025-02-08	Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding	Sukmin Cho et.al.	2502.05609	link
2025-01-31	Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies	Nadav Timor et.al.	2502.05202	null
2025-02-07	Learning Universal Multi-level Market Irrationality Factors to Improve Stock Return Forecasting	Chen Yang et.al.	2502.04737	null
2025-02-06	Speeding up Speculative Decoding via Approximate Verification	Meiyu Zhong et.al.	2502.04557	null
2025-02-06	Gig2Gether: Data-sharing to Empower, Unify and Demystify Gig Work	Jane Hsieh et.al.	2502.04482	null
2025-02-06	The Evolution of Hypervelocity Supernova Survivors and the Outcomes of Interacting Double White Dwarf Binaries	Ken J. Shen et.al.	2502.04451	null
2025-02-06	Properties of the emission region in pulsars with opposite subpulse drift directions in different profile components	H. M. Tedila et.al.	2502.03833	null
2025-02-05	COSMOS-Web: The emergence of the Hubble Sequence	M. Huertas-Company et.al.	2502.03532	null
2025-02-13	FSLH: Flexible Mechanized Speculative Load Hardening	Roberto Blanco et.al.	2502.03203	null
2025-02-05	How probable is the Lyman- $α$ damping wing in the spectrum of the redshift z = 5.9896 quasar ULAS J0148+0600?	Fiona Sawyer et.al.	2502.03085	null
2025-02-05	A comprehensive study of the gas-phase formation network of HC $_5$ N: theory, experiments, observations and models	Lisa Giani et.al.	2502.03046	null
2025-04-17	The connection between high-redshift galaxies and Lyman $α$ transmission in the Sherwood-Relics simulations of patchy reionisation	Luke Conaboy et.al.	2502.02983	null
2025-02-05	Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation	Jingyu Liu et.al.	2502.02789	link
2025-02-04	EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization	Yize Wu et.al.	2502.02493	null
2025-02-04	M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference	Nikhil Bhendawade et.al.	2502.02040	null
2025-02-03	Cosmic Ray Feedback in Massive Halos: Implications for the Distribution of Baryons	Eliot Quataert et.al.	2502.01753	null
2025-02-01	Speculative Ensemble: Fast Large Language Model Ensemble via Speculation	Jiale Fu et.al.	2502.01662	link
2025-02-03	Time-dependent solutions of biadjoint scalar field theories	Kymani Armstrong-Williams et.al.	2502.01294	null
2025-02-02	Constructing AI ethics narratives based on real-world data: Human-AI collaboration in data-driven visual storytelling	Mengyi Wei et.al.	2502.00637	null
2025-02-01	Predicting the number density of heavy seed massive black holes due to an intense Lyman-Werner field	Hannah O’Brennan et.al.	2502.00574	null
2025-02-04	Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation	Yang Cao et.al.	2502.00500	null
2025-02-14	Reward-Guided Speculative Decoding for Efficient LLM Reasoning	Baohao Liao et.al.	2501.19324	null
2025-01-31	Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment	Gregor Bachmann et.al.	2501.19309	null
2025-02-19	Emancipatory Information Retrieval	Bhaskar Mitra et.al.	2501.19241	null
2025-01-31	Trading Inference-Time Compute for Adversarial Robustness	Wojciech Zaremba et.al.	2501.18841	null
2025-01-30	Human Re-ID Meets LVLMs: What can we expect?	Kailash Hambarde et.al.	2501.18698	null
2025-01-28	How Hamilton-Jacobi formalism helps to address the physical meaning of the wave function in Bohmian mechanics	Arnaud Amblard et.al.	2501.16989	null
2025-03-04	Distilling Large Language Models for Network Active Queue Management	Deol Satish et.al.	2501.16734	null
2025-01-24	The disrupting and growing open cluster spiral arm patterns of the Milky Way	Xiaochen Liu et.al.	2501.14215	null
2025-01-19	Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks	Diego Gosmar et.al.	2501.13946	link
2025-01-23	Inflaton Self Resonance, Oscillons, and Gravitational Waves in Small Field Polynomial Inflation	Manuel Drees et.al.	2501.13811	null
2025-01-23	Considerations on the Origin of IRAS 19312+1950 Based on Long-Term Maser Observations	Huan-Xue Feng et.al.	2501.13769	null
2025-01-23	Compiler Support for Speculation in Decoupled Access/Execute Architectures	Robert Szafarczyk et.al.	2501.13553	null
2025-02-01	Concentration in Governance Control Across Decentralised Finance Protocols	Thomas Eisermann et.al.	2501.13377	link
2025-01-22	The outer structure of old star clusters in the Small Magellanic Cloud	Andrés E. Piatti et.al.	2501.13062	null
2025-01-22	Entanglement dynamics in collision models and entanglement quilts	Le Hu et.al.	2501.12629	null
2025-01-22	Link in $\mathbb{R}\mathbb{P}^3$ and the Topological Vertex	John Chae et.al.	2501.12566	null
2025-01-21	AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding	Zikun Li et.al.	2501.12162	null
2025-01-20	MIDIS: Quantifying the AGN component of X-ray-detected galaxies	Steven Gillman et.al.	2501.11491	null
2025-01-23	The JWST EXCELS survey: an extremely metal-poor galaxy at $z=8.271$ hosting an unusual population of massive stars	F. Cullen et.al.	2501.11099	null
2025-01-30	Vortices for lake equations (review with questions and speculations)	Jair Koiller et.al.	2501.10433	null
2025-01-17	From strong to weak correlations in breathing-mode kagome van der Waals materials: Nb $_3$(F,Cl,Br,I)$_8$ as a robust and versatile platform for many-body engineering	Joost Aretz et.al.	2501.10320	null
2025-01-16	25 years of XMM-Newton observations of the Sgr A complex: 3D distribution and internal structure of the clouds	G. Stel et.al.	2501.09737	null
2025-01-16	Weak electronic correlations in the cobalt oxychalcogenide superconductor Na2CoSe2O	Zhenchao Wu et.al.	2501.09675	null
2025-02-11	Anatomy of a Digital Bubble: Lessons Learned from the NFT and Metaverse Frenzy	Daisuke Kawai et.al.	2501.09601	null
2025-01-16	A universal break in energy functions of three hyperactive repeating fast radio bursts	Q. Wu et.al.	2501.09248	null
2025-01-15	The emission of interpulses by a 6.45-hour period coherent radio transient	Y. W. J. Lee et.al.	2501.09133	null
2025-01-13	Cassiopeia A’s Reverse Shock and its Effects on the Expanding SN Ejecta	Robert A. Fesen et.al.	2501.07708	null
2025-01-11	Is the Monetary Transmission Mechanism Broken? Time for People’s Quantitative Easing	Sebastian Dragoe et.al.	2501.06575	null
2025-01-27	QPEs as Lense-Thirring precession of super-Eddington flows	M. Middleton et.al.	2501.06185	link
2025-01-10	Analysing the coverage of the University of Bologna’s publication metadata in an existing source of open research information	Erica Andreose et.al.	2501.05821	null
2025-01-09	Accelerated Diffusion Models via Speculative Sampling	Valentin De Bortoli et.al.	2501.05370	null
2025-01-09	The CO-Fuelled Time Machine: Tracing Birth Conditions and Terrestrial Planet Formation Outcomes in HD 163296 through Pebble Drift-induced CO Enhancements	Joe Williams et.al.	2501.05316	null
2025-01-09	Observational Study of the Atmospheric Gravity Waves in the lower Solar Atmosphere	Ravi Chaurasiya et.al.	2501.05042	null
2025-01-07	Transparent Decompilation for Timing Side-Channel Analyses	Santiago Arranz Olmos et.al.	2501.04183	null
2025-01-07	Spin Environment of a Superconducting Qubit in High Magnetic Fields	S. Günzler et.al.	2501.03661	null
2025-01-07	Neural Cellular Automata and Deep Equilibrium Models	Zhibai Jia et.al.	2501.03573	null
2025-01-07	CI at Scale: Lean, Green, and Fast	Dhruva Juloori et.al.	2501.03440	null
2025-01-02	Vertex algebras, topological defects, and Moonshine	Roberto Volpato et.al.	2412.21141	null
2024-12-30	Strategic Learning and Trading in Broker-Mediated Markets	Alif Aqsha et.al.	2412.20847	null
2024-12-28	From Worms to Mice: Homeostasis Maybe All You Need	Jesus Marco de Lucas et.al.	2412.20090	null
2025-01-13	HADES: Hardware Accelerated Decoding for Efficient Speculation in Large Language Models	Ze Yang et.al.	2412.19925	null
2024-12-27	Cosmohedra	Nima Arkani-Hamed et.al.	2412.19881	null
2024-12-27	Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design	Junjie Zhang et.al.	2412.19439	null
2024-12-25	Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference	Libo Zhang et.al.	2412.18934	null
2024-12-25	AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures	Situo Zhang et.al.	2412.18910	null
2024-12-23	The Unique Helium Nova V445 Puppis Ejected $\gg$0.001 M$_{\odot}$ in the Year 2000 and Will Not Become a Type Ia Supernova	Bradley E. Schaefer et.al.	2412.17286	null
2024-12-20	Gravitational Observatories in AdS $_4$	Dionysios Anninos et.al.	2412.16305	null
2024-12-20	Two-Part Interplanetary Type II Solar Radio Bursts	Silja Pohjolainen et.al.	2412.15961	null
2025-01-10	Minimizing speculation overhead in a parallel recognizer for regular texts	Angelo Borsotti et.al.	2412.14975	null
2025-01-13	$\mathcal{N}=2$ superconformal gravitino in harmonic superspace	Evgeny Ivanov et.al.	2412.14822	null
2025-02-07	The JWST/NIRSpec view of the nuclear region in the prototypical merging galaxy NGC 6240	Matteo Ceci et.al.	2412.14685	null
2024-12-18	Fermion-Portal Dark Matter at a High-Energy Muon Collider	Pouya Asadi et.al.	2412.14235	null
2024-12-18	Current and secular accretion rates of EX Hydrae	K. Beuermann et.al.	2412.13850	null
2024-12-18	Fool’s gold: ligand-receptor interactions and the origins of life	Betony Adams et.al.	2412.13836	null
2024-12-18	Diffusion models and stochastic quantisation in lattice field theory	Gert Aarts et.al.	2412.13704	null
2024-12-17	Distributed Speculative Execution for Resilient Cloud Applications	Tianyu Li et.al.	2412.13314	null
2024-12-17	Where do X-ray low surface brightness clusters sit with respect to filaments?	S. Zarattini et.al.	2412.13258	null
2024-12-17	Agnosticism About Artificial Consciousness	Tom McClelland et.al.	2412.13145	null
2024-12-17	Insight into the Starburst Nature of Galaxy GN-z11 with JWST MIRI Spectroscopy	J. Álvarez-Márquez et.al.	2412.12826	null
2025-03-18	Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models	Seungeun Oh et.al.	2412.12687	null
2024-12-26	Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree	Xiangxiang Gao et.al.	2412.12639	null
2024-12-15	Heat kernel and local index theorem for open complex manifolds with $\mathbb{C}^{\ast }$ -action	Jih-Hsin Cheng et.al.	2412.11037	null
2024-12-14	The JWST-NIRCam View of Sagittarius C. II. Evidence for Magnetically Dominated HII Regions in the CMZ	John Bally et.al.	2412.10983	null
2025-02-23	Interference in Fuzzy Dark Matter Filaments: Idealised Models and Statistics	Tim Zimmermann et.al.	2412.10829	null
2025-02-10	Constrained Decoding with Speculative Lookaheads	Nishanth Nakshatri et.al.	2412.10418	null
2025-01-15	Asymmetric Temperature Variations In Protoplanetary disks: I. Linear Theory, Corotating Spirals, and Ring Formation	Zhaohuan Zhu et.al.	2412.09571	null
2024-12-12	AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs’ Complex Reasoning Capabilities	Fabrizio Davide et.al.	2412.09385	null
2024-12-11	Can transformative AI shape a new age for our civilization?: Navigating between speculation and reality	Jesus L. Lobo et.al.	2412.08273	null
2024-12-10	Mapping the spatial extent of HI-rich absorbers using MgII absorption along gravitational arcs	Trystyn A. M. Berg et.al.	2412.07652	null
2024-12-26	CoinCLIP: A Multimodal Framework for Assessing Viability in Web3 Memecoins	Hou-Wan Long et.al.	2412.07591	null
2024-12-10	Modeling Speculative Trading Patterns in Token Markets: An Agent-Based Analysis with TokenLab	Mengjue Wang et.al.	2412.07512	null
2024-12-10	KPZ-like scaling on a high-dimensional hypersphere	Daniil Fedotov et.al.	2412.07432	null
2024-12-10	Exploring types I and IIA effective actions through T-duality	Mohammad R. Garousi et.al.	2412.07234	null
2024-12-10	Relativistic Mott transition in strongly correlated artificial graphene	Liguo Ma et.al.	2412.07150	null
2024-12-10	Gravitational focusing and horizon entropy for higher-spin fields	Zihan Yan et.al.	2412.07107	null
2024-12-09	Inelastic H + H $^+_3$ Collision rates and their impact in the determination of the excitation temperature of H$^+_3$	Daniel Felix-Gonzalez et.al.	2412.06697	null
2024-12-09	Systematic comparison of deep generative models applied to multivariate financial time series	Howard Caulfield et.al.	2412.06417	null
2024-12-09	Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects	Louis Milliken et.al.	2412.06294	link
2024-12-06	Revisiting the hallmark freezing and melting points in colloidal dispersions and the search for the elusive coexistence region	J. Galen Wang et.al.	2412.05422	null
2024-12-06	Penetrative rotating magnetoconvection subject to lateral variations in temperature gradients	Tirtharaj Barman et.al.	2412.05235	null
2024-12-06	Predictive Window Decoding for Fault-Tolerant Quantum Programs	Joshua Viszlai et.al.	2412.05115	null
2024-12-04	Successive magnetic transitions in the spin-5/2 easy-axis triangular-lattice antiferromagnet Na $_2$BaMn(PO$_4$)$_2$ : A neutron diffraction study	Chuandi Zhang et.al.	2412.03149	null
2025-01-02	The Reality of AI and Biorisk	Aidan Peppin et.al.	2412.01946	null
2024-12-02	PLD+: Accelerating LLM inference by leveraging Language Model Artifacts	Shwetha Somasundaram et.al.	2412.01447	null
2024-12-02	Enhanced solid solution hardening by off-center substitutional solute atoms in α-Ti	Zi-Han Yu et.al.	2412.01298	null
2024-11-25	Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration	Zhuofan Wen et.al.	2412.00061	null
2024-11-12	The Copernican Argument for Alien Consciousness; The Mimicry Argument Against Robot Consciousness	Eric Schwitzgebel et.al.	2412.00008	null
2024-11-28	Night-Side Relativistic Electron Precipitation Bursts in the Outer Radiation Belt: Insights from ELFIN and THEMIS	Xi Lu et.al.	2411.19232	null
2024-11-27	*Magnetic field tuned superconducting and normal phase magnetism in CeCo ${0.5}$Rh${0.5}$In$_{5}$*	A. Howell et.al.	2411.18540	null
2024-11-27	Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding	Ziyin Zhang et.al.	2411.18462	link
2024-11-27	6G Takes Shape	Jeffrey G. Andrews et.al.	2411.18435	null
2024-11-27	An evolution of matrix-valued orthogonal polynomials	Erik Koelink et.al.	2411.18362	null
2024-11-27	Comprehensive Kernel Safety in the Spectre Era: Mitigations and Performance Evaluation (Extended Version)	Davide Davoli et.al.	2411.18094	null
2024-12-25	Stellar evolution along the AGB as revealed by the shape of Miras’ visual light curves	D. T. Hoai et.al.	2411.18044	null
2024-11-26	Stable curves and chromatic polynomials	Bernhard Reinke et.al.	2411.17551	null
2024-12-08	A revamped understanding of Cosmic Rays and Gamma-Ray Bursts	A. De Rújula et.al.	2411.15850	null
2024-11-20	The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz	David Noever et.al.	2411.14486	null
2024-12-03	Mediating Modes of Thought: LLM’s for design scripting	Moritz Rietschel et.al.	2411.14485	null
2024-11-21	*THz optical response of Ba(Fe ${1-x}$Ni$_x$)$_2$As$_2$ films analyzed within the three-band Eliashberg s$\pm$ -wave model*	Yurii A. Aleshchenko et.al.	2411.14011	null
2024-11-27	Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding	Hyun Ryu et.al.	2411.13157	null
2024-11-20	Far-field Boundary Conditions for Airfoil Simulation at High Incidence in Steady, Incompressible, Two-dimensional Flow	Narges Golmirzaee et.al.	2411.13077	null
2024-11-19	Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing	Ruyi Ding et.al.	2411.12508	null
2025-09-30	Continuous Speculative Decoding for Autoregressive Image Generation	Zili Wang et.al.	2411.11925	null
2024-12-26	Teapot: Efficiently Uncovering Spectre Gadgets in COTS Binaries	Fangzheng Lin et.al.	2411.11624	null
2024-11-30	Diversity of disc viscosities can explain the period ratios of resonant and non-resonant systems of hot super-Earths and mini-Neptunes	Bertram Bitsch et.al.	2411.11452	null
2024-11-25	First memoir on the asymptotics of certain infinite products	Wadim Zudilin et.al.	2411.11100	null
2024-11-17	FastDraft: How to Train Your Draft	Ofir Zafrir et.al.	2411.11055	null
2024-12-16	SAM Decoding: Speculative Decoding via Suffix Automaton	Yuxuan Hu et.al.	2411.10666	link
2024-11-15	Moving Forward: A Review of Autonomous Driving Software and Hardware Systems	Xu Wang et.al.	2411.10291	null
2024-11-14	Cosmic inflation in an extended non-commutative foliated quantum gravity: the wave function of the universe	César A. Zen Vasconcellos et.al.	2411.09756	null
2024-11-15	Provocation: Who benefits from “inclusion” in Generative AI?	Samantha Dalal et.al.	2411.09102	null
2024-11-13	Thought Experiments in Design Fiction for Visualization	Swaroop Panda et.al.	2411.08621	null
2025-01-01	A Geometric Substructure for Quantum Dynamics	Anthony John Bracken et.al.	2411.08230	null
2025-01-11	The Grass of the Universe: Rethinking Technosphere, Planetary History, and Sustainability with Fermi Paradox	Lukáš Likavčan et.al.	2411.08057	null
2024-11-12	A rich structure of renormalization group flows for Higgs-like models in 4 dimensions	André LeClair et.al.	2411.07476	null
2024-11-12	Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions	Siddharth Agarwal et.al.	2411.07444	null
2024-11-11	The Inherent Adversarial Robustness of Analog In-Memory Computing	Corey Lammie et.al.	2411.07023	null
2024-11-10	Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents	Yu Gu et.al.	2411.06559	link
2024-11-10	MOCCA-III: Effects of pristine gas accretion and cluster migration on globular cluster evolution, global parameters and multiple stellar populations	Mirek Giersz et.al.	2411.06421	null
2024-11-10	Generating Mixcode Popular Songs with Artificial Intelligence: Concepts, Plans, and Speculations	Abhishek Kaushik et.al.	2411.06420	null
2024-11-08	SSSD: Simply-Scalable Speculative Decoding	Michele Marzollo et.al.	2411.05894	null
2024-11-08	SpecHub: Provable Acceleration to Multi-Draft Speculative Decoding	Ryan Sun et.al.	2411.05289	link
2024-11-07	SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference	Gabriele Oliaro et.al.	2411.04975	null
2024-11-06	The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation	Lawrence Stewart et.al.	2411.03786	null
2024-11-05	Remarkable Scale Relation, Approximate SU(5), Fluctuating Lattice	Holger Bech Nielsen et.al.	2411.03552	null
2024-11-05	Shared Memory-Aware Latency-Sensitive Message Aggregation for Fine-Grained Communication	Kavitha Chandrasekar et.al.	2411.03533	null
2024-11-07	A high resolution simulation of protoplanetary disk turbulence driven by the vertical shear instability	Karim Shariff et.al.	2411.03467	null
2024-11-04	PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption	Yifan Tan et.al.	2411.03357	null
2024-11-05	On the possible core shift break in relativistic jets	E. E. Nokhrina et.al.	2411.02925	null
2024-11-04	A proof of self-organized criticality in a sandpile	Christopher Hoffman et.al.	2411.02541	null
2025-02-07	Pseudo Transitions in the Finite-Size Blume-Capel Model	Lei Shi et.al.	2411.01743	null
2024-11-05	Privacy Risks of Speculative Decoding in Large Language Models	Jiankun Wei et.al.	2411.01076	null
2024-10-30	Accelerated AI Inference via Dynamic Execution Methods	Haim Barad et.al.	2411.00853	null
2024-11-05	A Theoretical Perspective for Speculative Decoding Algorithm	Ming Yin et.al.	2411.00841	null
2024-10-31	Interpretable Language Modeling via Induction-head Ngram Models	Eunji Kim et.al.	2411.00066	link
2024-10-31	ALISE: Accelerating Large Language Model Serving with Speculative Scheduling	Youpeng Zhao et.al.	2410.23537	null
2024-10-30	Flavor Patterns of Fundamental Particles from Quantum Entanglement?	Jesse Thaler et.al.	2410.23343	null
2024-10-29	Lost and Found in Speculation: Hybrid Speculative Vulnerability Detection	Mohamadreza Rostami et.al.	2410.22555	null
2025-02-10	Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding	Bohan Li et.al.	2410.21951	null
2024-10-29	Rapid cooling of the Cassiopeia A neutron star due to superfluid quantum criticality	Hao-Fu Zhu et.al.	2410.21945	null
2024-10-28	Model-agnostic basis functions for the 2-point correlation function of dark matter in linear theory	Aseem Paranjape et.al.	2410.21374	link
2024-10-11	The Social Impact of Generative LLM-Based AI	Yu Xie et.al.	2410.21281	null
2024-10-28	On the limits of informationally efficient stock markets: New insights from a chartist-fundamentalist model	Laura Gardini et.al.	2410.21198	null
2024-10-27	A Jet-Induced Shock in a Young, Powerful Radio Galaxy at z=3.00	Nick Seymour et.al.	2410.20609	null
2024-10-27	FIRP: Faster LLM inference via future intermediate representation prediction	Pengfei Wu et.al.	2410.20488	null
2024-10-27	Inevitable Trade-off between Watermark Strength and Speculative Sampling Efficiency for Language Models	Zhengmian Hu et.al.	2410.20418	null
2024-10-31	Fast Best-of-N Decoding via Speculative Rejection	Hanshi Sun et.al.	2410.20290	link
2024-10-24	Intention Is All You Need	Advait Sarkar et.al.	2410.18851	null
2024-10-24	AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability	Sudhanshu Agrawal et.al.	2410.18351	null
2024-10-23	Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits	Ashish Khisti et.al.	2410.18234	null
2025-02-10	Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition	Artem Basharin et.al.	2410.17765	null
2024-10-22	AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration	Bradley McDanel et.al.	2410.17375	link
2024-10-22	Remote Timing Attacks on Efficient Language Model Inference	Nicholas Carlini et.al.	2410.17175	null
2024-10-23	Quantum many-body scars as remnants of stable many-body periodic orbits	Keita Omiya et.al.	2410.16916	null
2024-10-22	Chiral polaritonics: cavity-mediated enantioselective excitation condensation	Rosario R. Riso et.al.	2410.16861	null
2024-10-22	An Extreme Radio Fluctuation of Pulsar B1929 $+$ 10	Zhengli Wang et.al.	2410.16816	null
2024-10-21	Galaxy Size and Mass Build-up in the First 2 Gyrs of Cosmic History from Multi-Wavelength JWST NIRCam Imaging	Natalie Allen et.al.	2410.16354	null
2024-10-30	TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling	Jiahao Qiu et.al.	2410.16033	null
2024-10-21	Efficient and Universally Accessible Cross-Chain Options without Upfront Holder Collateral	Zifan Peng et.al.	2410.15724	null
2024-10-21	Investigating Unusual H $α$ Features towards the Scutum Supershell	R. Alsulami et.al.	2410.15712	null
2024-10-17	Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding	Tan Dat Nguyen et.al.	2410.13839	null
2024-10-17	Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions	Michael J. Q. Zhang et.al.	2410.13788	null
2024-10-17	Looking Inward: Language Models Can Learn About Themselves by Introspection	Felix J Binder et.al.	2410.13787	link
2024-10-17	PGC 44685: A Dwarf Star-forming Lenticular Galaxy with Wolf-Rayet Population	Shiying Lu et.al.	2410.13119	null
2024-10-16	Gravitational instantons and the quality problem of the QCD axion: Facts, speculations, and statements in between	Pier Giuseppe Catinari et.al.	2410.12741	null
2024-10-15	Evolution of Ferromagnetism and Electrical Resistivity in Sb-Doped Cr4PtGa17	Chaoguo Wang et.al.	2410.12078	null
2024-10-15	MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation	Chenxi Wang et.al.	2410.11779	link
2024-10-15	DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure	Yunfan Xiong et.al.	2410.11744	null
2024-10-15	Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling	Wenda Xu et.al.	2410.11325	null
2025-02-01	QSpec: Speculative Decoding with Complementary Quantization Schemes	Juntao Zhao et.al.	2410.11305	null
2024-11-20	Unveiling dust, molecular gas, and high star formation efficiency in extremely UV bright star-forming galaxies at $z\sim 2.1-3.6$	M. Dessauges-Zavadsky et.al.	2410.11121	null
2024-10-01	Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models	Keivan Alizadeh et.al.	2410.10846	null
2024-10-15	The Discovery of Polarized Water Vapor Megamaser Emission in a Molecular Accretion Disk	Jack F. Gallimore et.al.	2410.10569	null
2024-10-14	Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation	Siru Ouyang et.al.	2410.10141	null
2024-11-12	Self-Data Distillation for Recovering Quality in Pruned Large Language Models	Vithursan Thangarasa et.al.	2410.09982	null
2024-10-13	Super-Bandgap Electroluminescence from Cesium Lead Bromide	Justin Sculley et.al.	2410.09702	null
2024-10-21	On Two Nucleons Near Unitarity with Perturbative Pions	Yu Ping Teng et.al.	2410.09653	null
2024-10-11	Compact [OIII] emission-line regions (“Green Seeds”) in $\mathrm{Hα}$ emitters at Cosmic Noon from JWST Observations	Nuo Chen et.al.	2410.08520	null
2024-10-09	SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration	Heming Xia et.al.	2410.06916	link
2025-02-06	Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level	Xinyi Zeng et.al.	2410.06809	null
2024-10-08	ParallelSpec: Parallel Drafter for Efficient Speculative Decoding	Zilin Xiao et.al.	2410.05589	null
2024-10-09	Density estimation with LLMs: a geometric investigation of in-context learning trajectories	Toni J. B. Liu et.al.	2410.05218	null
2024-10-08	Efficient Inference for Large Language Model-based Generative Recommendation	Xinyu Lin et.al.	2410.05165	null
2024-10-04	Density functional theory based investigation of heavy fermion band candidates in triplet superconductor UTe2	Shouzheng Liu et.al.	2410.03840	null
2024-10-04	Mixture of Attentions For Speculative Decoding	Matthieu Zimmer et.al.	2410.03804	null
2024-10-03	AI-rays: Exploring Bias in the Gaze of AI Through a Multimodal Interactive Installation	Ziyao Gao et.al.	2410.03786	null
2024-09-24	Nonmetric geometric flows and quasicrystalline topological phases for dark energy and dark matter in $f(Q)$ cosmology	L. Bubuianu et.al.	2410.03700	null
2025-01-31	LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding	Doohyuk Jang et.al.	2410.03355	null
2024-10-04	Generative Edge Detection with Stable Diffusion	Caixia Zhou et.al.	2410.03080	null
2024-10-03	Inductive Generative Recommendation via Retrieval-based Speculation	Yijie Ding et.al.	2410.02939	link
2024-10-03	The Stellar Initial Mass Function of Early Dark Matter-free Gas Objects	William Lake et.al.	2410.02868	null
2024-10-03	Atoms near a conducting wedge: decay rates and entanglement around a corner	Romuald Kilianski et.al.	2410.02349	null
2024-10-02	Time Variation of the Solar Tachocline	Sarbani Basu et.al.	2410.01895	null
2024-12-25	Interpretable Contrastive Monte Carlo Tree Search Reasoning	Zitian Gao et.al.	2410.01707	link
2024-10-02	Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding	Yao Teng et.al.	2410.01699	link
2024-12-09	Forte : Finding Outliers with Representation Typicality Estimation	Debargha Ganguly et.al.	2410.01322	link
2024-10-02	Speculative Coreset Selection for Task-Specific Fine-tuning	Xiaoyu Zhang et.al.	2410.01296	null
2024-10-01	Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity	Michael R. Metel et.al.	2410.01028	null
2024-10-01	A Scheduling-Aware Defense Against Prefetching-Based Side-Channel Attacks	Till Schlüter et.al.	2410.00452	null
2024-11-12	Galactic center G objects as dust-enshrouded stars near the supermassive black hole	Michal Zajaček et.al.	2410.00304	null
2024-09-30	Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface	Wenyue Hua et.al.	2410.00079	null
2024-09-30	Statistical view of orbital circularisation with 14 000 characterised TESS eclipsing binaries	L. W. IJspeert et.al.	2409.20540	null
2024-09-30	New HI observations Toward the NGC 5055 Galaxy Group with FAST	Xiao-Lan Liu et.al.	2409.20109	null
2024-09-27	Thermal Conductivity of Cubic Silicon Carbide Single Crystals Heavily Doped by Nitrogen	Zifeng Huang et.al.	2409.18843	null
2024-09-27	SpecCFA: Enhancing Control Flow Attestation/Auditing via Application-Aware Sub-Path Speculation	Adam Caulfield et.al.	2409.18403	null
2025-03-17	Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference	Zongyue Qin et.al.	2409.16560	null
2024-09-22	ALMASOP. The Localized and Chemically rich Features near the Bases of the Protostellar Jet in HOPS 87	Shih-Ying Hsu et.al.	2409.14445	null
2024-09-21	Triangulating on Possible Futures: Conducting User Studies on Several Futures Instead of Only One	Antti Salovaara et.al.	2409.14137	null
2024-09-29	String Invention, Viable 3-3-1 Model, Dark Matter Black Holes	Holger B. Nielsen et.al.	2409.13776	null
2024-09-20	Interstellar Glycolaldehyde, Methyl Formate, and Acetic Acid. II. Chemical Modeling of the Bimodal Abundance Pattern in NGC 6334I	Brielle M. Shope et.al.	2409.13673	null
2024-09-20	A Comparison between Financial and Gambling Markets	Haoyu Liu et.al.	2409.13528	null
2024-12-12	Consequences of Minimal Entanglement in Bosonic Field Theories	Spencer Chang et.al.	2409.13030	null
2024-09-17	UNCOVER: Significant Reddening in Cosmic Noon Quiescent Galaxies	Jared Siegel et.al.	2409.11457	null
2024-09-17	The ALMA-CRISTAL Survey: Spatially-resolved Star Formation Activity and Dust Content in 4 < z < 6 Star-forming Galaxies	Juno Li et.al.	2409.10961	null
2024-12-14	Improving Multi-candidate Speculative Decoding	Xiaofan Lu et.al.	2409.10644	link
2024-09-16	Aggregation-diffusion in heterogeneous environments	Jonathan R. Potts et.al.	2409.10147	link
2024-12-12	Pure Lovelock Gravity regular black holes	Milko Estrada et.al.	2409.09559	null
2024-09-14	Ground State Phase Diagram of $\text{SU}(3)$ $t$-$J$ Chain	Junhao Zhang et.al.	2409.09344	null
2024-12-02	Two-Time Relativistic Bohmian Model of Quantum Mechanics	Giuseppe Raguní et.al.	2409.09049	null
2024-09-13	Dynamic Simultaneous Multithreaded Architecture	Daniel Ortiz-Arroyo et.al.	2409.07903	null
2024-09-09	DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL	Arturo Gonzalez-Escribano et.al.	2409.06075	null
2024-10-05	Predicting Foreign Exchange EUR/USD direction using machine learning	Kevin Cedric Guyard et.al.	2409.04471	null
2024-09-05	Evidence for Dust Depletion in a Misaligned Protoplanetary Disk with JWST	C. C. Espaillat et.al.	2409.03702	null
2024-09-04	Cavitating bubbles in condensing gas as a means of forming clumps, chondrites, and planetesimals	Eugene Chiang et.al.	2409.02978	null
2024-09-03	Light-Ray Wave Functions and Integrability	Alexandre Homrich et.al.	2409.02160	null
2024-09-03	Foreactor: Exploiting Storage I/O Parallelism with Explicit Speculation	Guanzhou Hu et.al.	2409.01580	null
2024-09-02	A Comprehensive Analysis of the Future of Atomically Precise Manufacturing	Vadym Shvydun et.al.	2409.00955	null
2024-08-30	Dynamic Depth Decoding: Faster Speculative Decoding for LLMs	Oscar Brown et.al.	2409.00142	null
2024-08-29	LightSLH: Provable and Low-Overhead Spectre v1 Mitigation through Targeted Instruction Hardening	Yiming Zhu et.al.	2408.16220	null
2024-08-28	An Empirical Study of API Misuses of Data-Centric Libraries	Akalanka Galappaththi et.al.	2408.15853	null
2024-08-28	Indirect nonlinear interaction between toroidal Alfvén eigenmode and ion temperature gradient mode mediated by zonal structures	Qian Fang et.al.	2408.15782	null
2025-02-27	Learning Harmonized Representations for Speculative Sampling	Lefan Zhang et.al.	2408.15766	null
2024-08-29	Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation	Lujun Gui et.al.	2408.15562	null
2024-11-18	The companion mass distribution of post common envelope hot subdwarf binaries: evidence for boosted and disrupted magnetic braking?	Lisa Blomberg et.al.	2408.15334	null
2024-08-27	The Way To Circumbinary Planets	Hans J Deeg et.al.	2408.15307	null
2024-12-26	The Mamba in the Llama: Distilling and Accelerating Hybrid Models	Junxiong Wang et.al.	2408.15237	link
2024-08-26	SO as shock tracer in protoplanetary disks: the AB Aurigae case	A. Dutrey et.al.	2408.14276	null
2024-08-25	The origins of noise in the Zeeman splitting of spin qubits in natural-silicon devices	Juan S. Rojas-Arias et.al.	2408.13707	null
2024-07-22	Simopt – Simulation pass for Speculative Optimisation of FPGA-CAD flow	Eashan Wadhwa et.al.	2408.12676	null
2024-12-19	Exposing Shadow Branches	Chrysanthos Pepi et.al.	2408.12592	null
2024-08-22	Enhancing Causal Discovery in Financial Networks with Piecewise Quantile Regression	Cameron Cornell et.al.	2408.12210	null
2024-08-21	Electrostatic Origins of the Dirichlet Principle	Steven Deckelman et.al.	2408.12002	null
2024-09-04	Parallel Speculative Decoding with Adaptive Draft Length	Tianyu Liu et.al.	2408.11850	link
2024-08-21	Chemical models of interstellar glycine and adenine precursor aminoacetonitrile (NH2CH2CN)	Xia Zhang et.al.	2408.11776	null
2024-08-20	High detection significance of the dark substructure in gravitational lens SDSSJ0946+1006 is revealed by image pixel supersampling	Quinn E. Minor et.al.	2408.11090	null
2024-08-23	MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding	Jian Chen et.al.	2408.11049	link
2024-08-20	Revisiting the measurements and interpretations of DLVO forces	Bo Feng et.al.	2408.10870	null
2024-08-19	Constraining the Generalized Tolman-Oppenheimer-Volkoff (GTOV) equation with Bayesian analysis	Franciele M. da Silva et.al.	2408.10425	null
2024-08-18	A new measure of risk using Fourier analysis	Michael Grabinski et.al.	2408.10279	null
2024-08-19	Excitonic-trion population in two-dimensional halide perovskites	Efstratios Manousakis et.al.	2408.10097	null
2024-08-16	Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling	Xianzhen Luo et.al.	2408.08696	null
2024-08-15	KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning	Kaiqi Zhang et.al.	2408.08146	null
2024-08-19	Coupling without Communication and Drafter-Invariant Speculative Decoding	Majid Daliri et.al.	2408.07978	link
2024-12-06	The Small Sizes and High Implied Densities of `Little Red Dots’ with Balmer Breaks Could Explain Their Broad Emission Lines Without an AGN	Josephine F. W. Baggen et.al.	2408.07745	null
2024-08-14	Only One Relation Possible? Modeling the Ambiguity in Event Temporal Relation Extraction	Yutong Hu et.al.	2408.07353	null
2024-07-23	Stablecoin Runs and Disclosure Policy in the Presence of Large Sales	Brian Zhu et.al.	2408.07227	null
2024-08-13	Speculations on Uncertainty and Humane Algorithms	Nicholas Gray et.al.	2408.06736	null
2024-08-15	Inefficiencies of Carbon Trading Markets	Nicola Borri et.al.	2408.06497	null
2024-08-12	Correct Wrong Path	Bhargav Reddy Godala et.al.	2408.05912	null
2024-08-11	A Decoding Acceleration Framework for Industrial Deployable LLM-based Recommender Systems	Yunjia Xi et.al.	2408.05676	link
2024-08-16	Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion	Jacob K Christopher et.al.	2408.05636	null
2024-08-09	Recurrent Stochastic Fluctuations with Financial Speculation	Tomohiro Hirano et.al.	2408.05047	null
2024-08-08	HotStuff-1: Linear Consensus with One-Phase Speculation	Dakai Kang et.al.	2408.04728	null
2024-08-08	CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding	Sophia Ho et.al.	2408.04678	null
2024-08-08	Black hole mass and optical radiation mechanism of the tidal disruption event AT 2023clx	Shiyan Zhong et.al.	2408.04448	null
2024-08-05	Rich dynamical behaviors from a digital reversal operation	Yannis Almirantis et.al.	2408.02527	null
2024-08-08	A speculative model for cyclic information preservation in Kerr-Newman spacetime using closed timelike curves	Aviral Damle et.al.	2408.02116	null
2024-08-06	Selection bias obfuscates the discovery of fast radio burst sources	Mohit Bhardwaj et.al.	2408.01876	null
2024-08-03	Dissolution zone model of the oxide structure in additively manufactured dispersion-strengthened alloys	Wenyuan Hou et.al.	2408.01845	null
2024-08-02	AT2023vto: An Exceptionally Luminous Helium Tidal Disruption Event from a Massive Star	Harsh Kumar et.al.	2408.01482	null
2024-08-01	Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection	Steven Fincke et.al.	2408.00914	null
2024-08-01	Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding	Bin Xiao et.al.	2408.00264	null
2024-07-31	Designing Beyond Current Conceptualizations of Spaceflight Experiences	James Cole et.al.	2408.00085	null
2024-07-31	Revisiting the fundamental metallicity relation with observation and simulation	Chengyu Ma et.al.	2407.21716	null
2024-07-31	The Bulk Densities of Small Solar System Bodies as a Probe of Planetesimal Formation	Misako Tatsuuma et.al.	2407.21386	null
2024-08-19	Instantons and the Large N=4 Algebra	Edward Witten et.al.	2407.20964	null
2024-07-17	Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies	Lachlan McGinness et.al.	2407.20244	null
2024-08-19	Reduced decay in Josephson coupling across ferromagnetic junctions with spin-orbit coupling layers	Ivan Kindiak et.al.	2407.19799	null
2024-07-26	Ionized and cold gas components in low surface brightness galaxy AGC 102004	Tian-Wen Cao et.al.	2407.18530	null
2024-07-25	Phase transitions in (2 + 1)D subsystem-symmetric monitored quantum circuits	Cole Kelson-Packer et.al.	2407.18340	null
2024-08-31	Uniqueness of an $E_8$ model of elementary particles	Robert A. Wilson et.al.	2407.18279	null
2024-07-24	Automorphisms of Calabi-Yau threefolds from algebraic dynamics and the second Chern class	Keiji Oguiso et.al.	2407.17297	null
2024-07-24	Mapping the individual, social, and biospheric impacts of Foundation Models	Andrés Domínguez Hernández et.al.	2407.17129	null
2024-07-04	Integrated Deflector Shield Technology for Spacecraft	Florian Neukart et.al.	2407.16701	null
2024-07-23	Graph-Structured Speculative Decoding	Zhuocheng Gong et.al.	2407.16207	null
2024-07-22	AI for Handball: predicting and explaining the 2024 Olympic Games tournament with Deep Learning and Large Language Models	Florian Felice et.al.	2407.15987	null
2024-07-22	An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph	B. Kaan Karamete et.al.	2407.15906	null
2024-07-23	Unveiling the Multifaceted GRB 200613A: Prompt Emission Dynamics, Afterglow Evolution, and the Host Galaxy’s Properties	Shao-Yu Fu et.al.	2407.15824	null
2024-11-21	SNIP: Speculative Execution and Non-Interference Preservation for Compiler Transformations	Sören van der Wall et.al.	2407.15080	null
2024-10-21	Is the difference between deep hedging and delta hedging a statistical arbitrage?	Pascal François et.al.	2407.14736	link
2024-07-19	Rational Bubbles: A Clarification	Tomohiro Hirano et.al.	2407.14017	null
2024-07-18	Surface roughening in nanoparticle catalysts	Cameron J. Owen et.al.	2407.13643	null
2024-07-18	SecScale: A Scalable and Secure Trusted Execution Environment for Servers	Ani Sunny et.al.	2407.13572	null
2024-07-17	RTL Verification for Secure Speculation Using Contract Shadow Logic	Qinhan Tan et.al.	2407.12232	null
2024-07-16	Breakup dynamics of a neutron-halo projectile on heavy target at deep sub-barrier energies	B. Mukeru et.al.	2407.12129	null
2024-11-16	PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	Branden Butler et.al.	2407.11798	null
2024-10-02	Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference	Zongyue Qin et.al.	2407.09722	null
2024-07-17	Accelerating the inference of string generation-based chemical reaction models for industrial applications	Mikhail Andronov et.al.	2407.09685	null
2024-09-12	Krylov complexity and chaos in deformed SYK models	Shira Chapman et.al.	2407.09604	null
2024-07-21	6G: The Intelligent Network of Everything – A Comprehensive Vision, Survey, and Tutorial	Harri Pennanen et.al.	2407.09398	null
2024-07-11	Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting	Zilong Wang et.al.	2407.08223	null
2024-07-10	Purity benchmarking study of error coherence in a single Xmon qubit	Auda Zhu et.al.	2407.07960	null
2024-07-10	Carbon Pricing and Resale in Emission Trading Systems	Peyman Khezr et.al.	2407.07386	null
2024-08-21	Fuzzy Spheres in Stringy Matrix Models: Quantifying Chaos in a Mixed Phase Space	Paolo Amore et.al.	2407.07259	null
2024-07-09	Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1)	Yanlong Peng et.al.	2407.06590	null
2024-07-05	Statistical investigations into the geometry and homology of random programs	Jon Sporring et.al.	2407.04854	null
2024-07-05	Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models	Bolaji Yusuf et.al.	2407.04641	null
2024-11-13	Black Holes with a charged quantum dust core	R. Casadio et.al.	2407.04146	null
2024-08-23	A distance conjecture beyond moduli?	Cédric Debusschere et.al.	2407.03715	null
2024-07-03	Braneworld Black Bounce to Transversable Wormhole Analytically Connected to an asymptotically $AdS_5$ Boundary	T. M. Crispim et.al.	2407.03528	null
2024-07-03	*Origin of anomalous magnetotransport in kagome superconductors AV ${3}$Sb${5}$ (A=K,Rb,Cs)*	A. E. Koshelev et.al.	2407.03189	null
2024-09-24	Large-scale ordered magnetic fields generated in mergers of helium white dwarfs	Rüdiger Pakmor et.al.	2407.02566	null
2024-07-02	A thermodynamic model of inflation without inflaton field	Jesus Anaya-Galeana et.al.	2407.02429	null
2024-07-02	MICONIC: JWST/MIRI MRS observations of the nuclear and circumnuclear regions of Mrk231	A. Alonso-Herrero et.al.	2407.02180	null
2024-07-02	S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models	Parsa Kavehzadeh et.al.	2407.01955	null
2024-08-31	Description of molecular chirality and its analysis with high harmonic generation	Akihito Kato et.al.	2407.01947	null
2024-07-01	Universal properties of residual moments in heavy-fermion metals	Ewan Scott et.al.	2407.01218	null
2024-07-01	Staying vigilant in the Age of AI: From content generation to content authentication	Yufan Li et.al.	2407.00922	null
2025-04-14	Block Verification Accelerates Speculative Decoding	Ziteng Sun et.al.	2403.10444	null
2024-03-06	Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement	Wonseok Jeon et.al.	2402.14160	null
2025-07-08	Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding	Zhuoming Chen et.al.	2402.12374	null
2025-02-06	Decoding Speculative Decoding	Minghao Yan et.al.	2402.01528	null
2024-04-10	Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO	Haim Barad et.al.	2311.04951	null
2023-08-10	Accelerating LLM Inference with Staged Speculative Decoding	Benjamin Spector et.al.	2308.04623	null
2023-05-22	Fast Inference from Transformers via Speculative Decoding	Yaniv Leviathan et.al.	2211.17192	null
2023-10-31	Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation	Heming Xia et.al.	2203.16487	null

Multimodal System

Publish Date	Title	Authors	PDF	Code
2026-05-21	Action with Visual Primitives	Weilong Guo et.al.	2605.22183	null
2026-05-19	Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference	Beomseok Kang et.al.	2605.19218	null
2026-05-14	KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy	Yingbing Huang et.al.	2605.16439	null
2026-04-03	Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models	Mingyeong Kim et.al.	2605.12517	null
2026-05-12	RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation	Qi Zhao et.al.	2605.11927	null
2026-05-09	Geometry Guided Self-Consistency for Physical AI	Yinwei Dai et.al.	2605.08638	null
2026-05-04	AlbumFill: Album-Guided Reasoning and Retrieval for Personalized Image Completion	Yu-Ju Tsai et.al.	2605.02892	null
2026-05-04	WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization	Wei Tao et.al.	2605.02262	null
2026-05-04	CoVSpec: Efficient Device-Edge Co-Inference for Vision-Language Models via Speculative Decoding	Yuanyuan Jia et.al.	2605.02218	null
2026-04-29	Progressive Semantic Communication for Efficient Edge-Cloud Vision-Language Models	Cyril Shih-Huan Hsu et.al.	2604.26508	null
2026-04-29	Efficient, VRAM-Constrained xLM Inference on Clients	Aditya Ukarande et.al.	2604.26334	null
2026-04-29	MetaSR: Content-Adaptive Metadata Orchestration for Generative Super-Resolution	Jiaqi Guo et.al.	2604.26244	null
2026-04-14	See No Evil: Semantic Context-Aware Privacy Risk Detection for AR	Jialu Liu et.al.	2604.22805	null
2026-04-25	Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction	Liyin Chen et.al.	2604.16955	null
2026-04-16	A Semantic Geometry for Uncovering Paradigm Dynamics via Scientific Publications	Jinchang Liu et.al.	2604.15150	null
2026-04-15	Thermodynamic Diffusion Inference with Minimal Digital Conditioning	Aditi De et.al.	2604.14332	null
2026-04-14	DreamStereo: Towards Real-Time Stereo Inpainting for HD Videos	Yuan Huang et.al.	2604.12270	null
2026-04-11	Mosaic: Cross-Modal Clustering for Efficient Video Understanding	Tuowei Wang et.al.	2604.10060	null
2026-04-09	CodecSight: Leveraging Video Codec Signals for Efficient Streaming VLM Inference	Yulin Zou et.al.	2604.06036	null
2026-04-06	ClickAIXR: On-Device Multimodal Vision-Language Interaction with Real-World Objects in Extended Reality	Dawar Khan et.al.	2604.04905	null
2026-04-08	GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads	Fanjiang Ye et.al.	2604.04335	null
2026-03-31	GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation	Rui Xie et.al.	2603.26266	null
2026-03-26	DRiffusion: Draft-and-Refine Process Parallelizes Diffusion Models with Ease	Runsheng Bai et.al.	2603.25872	null
2026-04-01	DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving	Pengxuan Yang et.al.	2603.24587	null
2026-04-01	SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems	Chung-En Johnny Yu et.al.	2603.23853	null
2026-03-19	6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models	Rundong Su et.al.	2603.18742	null
2026-03-18	DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving	Zilin Huang et.al.	2603.18315	null
2026-03-13	Draft-and-Target Sampling for Video Generation Policy	Qikang Zhang et.al.	2603.13438	null
2026-02-20	Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning	Earl J St Sauver et.al.	2603.13243	null
2026-03-11	COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints	Mohammad Saeid Anwar et.al.	2603.10436	null
2026-03-09	SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving	Ayush Barik et.al.	2603.07865	null
2026-03-08	MWM: Mobile World Models for Action-Conditioned Consistent Prediction	Han Yan et.al.	2603.07799	null
2026-02-27	SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching	Yasaman Haghighi et.al.	2602.24208	null
2026-02-26	LE-NeuS: Latency-Efficient Neuro-Symbolic Video Understanding via Adaptive Temporal Verification	Shawn Liang et.al.	2602.23553	null
2026-02-17	Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs	Libo Zhang et.al.	2602.15318	null
2026-02-13	AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers	Dong Liu et.al.	2602.13357	null
2026-02-11	FastUSP: A Multi-Level Collaborative Acceleration Framework for Distributed Diffusion Model Inference	Guandong Li et.al.	2602.10940	null
2026-02-24	Mapping Gemma3 onto an Edge Dataflow Architecture	Shouyu Du et.al.	2602.06063	null
2026-02-04	Annotation Free Spacecraft Detection and Segmentation using Vision Language Models	Samet Hicsonmez et.al.	2602.04699	null
2026-02-05	PIO-FVLM: Rethinking Training-Free Visual Token Reduction for VLM Acceleration from an Inference-Objective Perspective	Haokui Zhang et.al.	2602.04657	null
2026-02-03	ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and Expression	Mingxuan Wang et.al.	2602.03477	null
2026-02-03	SwiftVLM: Efficient Vision-Language Model Inference via Cross-Layer Token Bypass	Chen Qian et.al.	2602.03134	null
2026-01-31	APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation	Daoxuan Zhang et.al.	2602.00551	null
2026-01-20	Likelihood-Separable Diffusion Inference for Multi-Image MRI Super-Resolution	Samuel W. Remedios et.al.	2601.14030	null
2026-01-19	AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation	Xuecheng Chen et.al.	2601.12742	null
2026-01-26	ViSIL: Unified Evaluation of Information Loss in Multimodal Video Captioning	Po-han Li et.al.	2601.09851	null
2025-12-30	Bridging the Perception-Cognition Gap:Re-engineering SAM2 with Hilbert-Mamba for Robust VLM-based Medical Diagnosis	Hao Wu et.al.	2512.24013	null
2025-12-29	Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution	Hexin Zhang et.al.	2512.23532	null
2025-12-23	Input-Adaptive Visual Preprocessing for Efficient Fast Vision-Language Model Inference	Putu Indah Githa Cahyani et.al.	2512.20839	null
2025-12-21	AsyncDiff: Asynchronous Timestep Conditioning for Enhanced Text-to-Image Diffusion Inference	Longhuan Xu et.al.	2512.18675	null
2025-12-18	Collaborative Edge-to-Server Inference for Vision-Language Models	Soochang Song et.al.	2512.16349	null
2025-12-16	Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models	Chiyue Wei et.al.	2512.14661	null
2025-12-10	LISN: Language-Instructed Social Navigation with VLM-based Controller Modulating	Junting Chen et.al.	2512.09920	null
2025-12-05	Training-Time Action Conditioning for Efficient Real-Time Chunking	Kevin Black et.al.	2512.05964	null
2025-12-05	Quantitatively mapping the Eady model onto a two-layer quasi-geostrophic model	Julie Meunier et.al.	2512.05902	null
2025-12-05	Non-equilibrium formulation for inertial particles in turbulent swirling flows	Bernardo L. Español et.al.	2512.05855	null
2025-12-05	HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models	Shizhuo Mao et.al.	2512.05746	null
2025-12-05	ProPhy: Progressive Physical Alignment for Dynamic World Simulation	Zijun Wang et.al.	2512.05564	null
2025-12-05	Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models	Weijue Bu et.al.	2512.05546	null
2025-12-04	Uncertainty Quantification for Scientific Machine Learning using Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN)	Y. Sungtaek Ju et.al.	2512.05306	null
2025-12-04	CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators	Xianglong Hou et.al.	2512.05297	null
2025-12-04	XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots	Tianyi Wang et.al.	2512.05270	null
2025-12-04	NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation	Yu Zeng et.al.	2512.05106	null
2025-12-04	TV2TV: A Unified Framework for Interleaved Language and Video Generation	Xiaochuang Han et.al.	2512.05103	null
2025-12-04	Hybrid-Diffusion Models: Combining Open-loop Routines with Visuomotor Diffusion Policies	Jonne Van Haastregt et.al.	2512.04960	null
2025-12-04	FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action Tokenization	Yicheng Liu et.al.	2512.04952	null
2025-12-04	YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance	Junjie Zheng et.al.	2512.04779	null
2025-12-04	MemLoRA: Distilling Expert Adapters for On-Device Memory Systems	Massimo Bini et.al.	2512.04763	null
2025-12-04	E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving	Yihong Tang et.al.	2512.04733	null
2025-12-04	Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild	Yigui Feng et.al.	2512.04728	null
2025-12-05	Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length	Yubo Huang et.al.	2512.04677	null
2025-12-04	Persson’s Theory of Purely Normal Elastic Rough Surface Contact: A Tutorial Based on Stochastic Process Theory	Yang Xu et.al.	2512.04648	null
2025-12-04	VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory	Yifei Yu et.al.	2512.04519	null
2025-12-04	GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis	Changjin Kim et.al.	2512.04456	null
2025-12-04	NORi: An ML-Augmented Ocean Boundary Layer Parameterization	Xin Kai Lee et.al.	2512.04452	null
2025-12-04	FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination	Chengyang He et.al.	2512.04381	null
2025-12-03	Decoding Large Language Diffusion Models with Foreseeing Movement	Yichuan Mo et.al.	2512.04135	null
2025-12-03	DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment	Sheng-Hao Liao et.al.	2512.03981	null
2025-12-03	Refining Machine Learning Potentials through Thermodynamic Theory of Phase Transitions	Paul Fuchs et.al.	2512.03974	null
2025-12-03	Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization	Lianyu Pang et.al.	2512.03964	null
2025-12-03	OmniDexVLG: Learning Dexterous Grasp Generation from Vision Language Model-Guided Grasp Semantics, Taxonomy and Functional Affordance	Lei Zhang et.al.	2512.03874	null
2025-12-03	Fully Unsupervised Self-debiasing of Text-to-Image Diffusion Models	Korada Sri Vardhana et.al.	2512.03749	null
2025-12-03	PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention	Ziwen Li et.al.	2512.03724	null
2025-12-03	GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces	Melis Ocal et.al.	2512.03683	null
2025-12-03	ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers	Feice Huang et.al.	2512.03673	null
2025-12-03	V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention	Nan Sun et.al.	2512.03542	null
2025-12-03	CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving	Zhijian Qiao et.al.	2512.03510	null
2025-12-03	KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models	Rhys Newbury et.al.	2512.03450	null
2025-12-03	MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification	Yujian Zhao et.al.	2512.03404	null
2025-12-03	Push-broom Mapping of Galaxies and Supernova Remnants with the SPRITE CubeSat	Elena Carlson et.al.	2512.03329	null
2025-12-02	Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time	Daniel D. Richman et.al.	2512.03312	null
2025-12-02	Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling	Yueru Jia et.al.	2512.03044	null
2025-12-03	LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization	Zhihan Xiao et.al.	2512.02933	null
2025-12-02	AutoNeural: Co-Designing Vision-Language Models for NPU Inference	Wei Chen et.al.	2512.02924	null
2025-12-02	Glance: Accelerating Diffusion Models with 1 Sample	Zhuobai Dong et.al.	2512.02899	null
2025-12-03	SwarmDiffusion: End-To-End Traversability-Guided Diffusion for Embodiment-Agnostic Navigation of Heterogeneous Robots	Iana Zhura et.al.	2512.02851	null
2025-12-02	Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach	Siyuan Yang et.al.	2512.02834	null
2025-12-02	Reasoning-Aware Multimodal Fusion for Hateful Video Detection	Shuonan Yang et.al.	2512.02743	null
2025-12-02	VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm	Zhenkai Wu et.al.	2512.02700	null
2025-12-02	PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution	Zhongbao Yang et.al.	2512.02681	null
2025-12-02	Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation	Agathoklis Georgiou et.al.	2512.02660	null
2025-12-02	Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training	Hong-Jie You et.al.	2512.02652	null
2025-12-02	YingVideo-MV: Music-Driven Multi-Stage Video Generation	Jiahui Chen et.al.	2512.02492	null
2025-12-02	Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources	Phuc Pham et.al.	2512.02438	null
2025-12-02	VACoT: Rethinking Visual Data Augmentation with VLMs	Zhengzhuo Xu et.al.	2512.02361	null
2025-12-02	Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective	Qiyao Xue et.al.	2512.02340	null
2025-12-01	ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation	Chenyang Gu et.al.	2512.02013	null
2025-12-01	Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding	Zahra Mahdavi et.al.	2512.01922	null
2025-12-01	Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models	Yudi Wu et.al.	2512.01831	null
2025-12-01	CauSight: Learning to Supersense for Visual Causal Discovery	Yize Zhang et.al.	2512.01827	null
2025-12-01	Weight Space Representation Learning with Neural Fields	Zhuoqian Yang et.al.	2512.01759	null
2025-12-01	DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models	Wanpeng Zhang et.al.	2512.01715	null
2025-12-01	DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models	Patrick Kwon et.al.	2512.01686	null
2025-12-01	GRASP: Guided Residual Adapters with Sample-wise Partitioning	Felix Nützel et.al.	2512.01675	null
2025-12-01	SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge	Yumeng He et.al.	2512.01629	null
2025-12-01	Reconstructing Multi-Scale Physical Fields from Extremely Sparse Measurements with an Autoencoder-Diffusion Cascade	Letian Yi et.al.	2512.01572	null
2025-12-01	Hawkes process with a diffusion-driven baseline: long-run behavior, inference, statistical tests	Maya Sadeler Perrin et.al.	2512.01447	null
2025-12-01	Existence of two thresholds in a bistable equation with nonlocal competition	Matthieu Alfaro et.al.	2512.01435	null
2025-12-01	MDiff4STR: Mask Diffusion Model for Scene Text Recognition	Yongkun Du et.al.	2512.01422	null
2025-12-01	FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution	Seungho Choi et.al.	2512.01390	null
2025-12-01	Consistency Flow Model Achieves One-step Denoising Error Correction Codes	Haoyu Lei et.al.	2512.01389	null
2025-12-01	Qualitatively distinct mechanisms of noise-induced escape in diffusively coupled bistable elements	Hidemasa Ishii et.al.	2512.01388	null
2025-12-01	Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators	Medha Sawhney et.al.	2512.01370	null
2025-12-01	TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance	Pei Yang et.al.	2512.01314	null
2025-12-01	Inversions of stochastic processes from ergodic measures of Nonlinear SDEs	Hongyu Liu et.al.	2512.01307	null
2025-11-30	PIANO: Physics-informed Dual Neural Operator for Precipitation Nowcasting	Seokhyun Chin et.al.	2512.01062	null
2025-11-29	EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients	He-Yen Hsieh et.al.	2512.00670	null
2025-11-28	Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent	Jianzhe Lin et.al.	2511.23436	null
2025-11-28	LFM2 Technical Report	Alexander Amini et.al.	2511.23404	null
2025-11-28	SafeHumanoid: VLM-RAG-driven Control of Upper Body Impedance for Humanoid Robot	Yara Mahmoud et.al.	2511.23300	null
2025-11-28	Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering	Qiming Li et.al.	2511.23231	null
2025-11-28	Obstruction reasoning for robotic grasping	Runyu Jiao et.al.	2511.23186	null
2025-11-28	InstanceV: Instance-Level Video Generation	Yuheng Chen et.al.	2511.23146	null
2025-11-28	db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism	Siqi Chen et.al.	2511.23113	null
2025-11-28	MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents	Ruoxuan Zhang et.al.	2511.23055	null
2025-11-28	Time Extrapolation with Graph Convolutional Autoencoder and Tensor Train Decomposition	Yuanhong Chen et.al.	2511.23037	null
2025-11-28	Masked Diffusion for Generative Recommendation	Kulin Shah et.al.	2511.23021	null
2025-11-28	BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation	Zeyu Zhang et.al.	2511.22973	null
2025-11-28	Seeing before Observable: Potential Risk Reasoning in Autonomous Driving via Vision Language Models	Jiaxin Liu et.al.	2511.22928	null
2025-11-27	CAPE: Context-Aware Diffusion Policy Via Proximal Mode Expansion for Collision Avoidance	Rui Heng Yang et.al.	2511.22773	null
2025-11-27	Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer	Z-Image Team et.al.	2511.22699	null
2025-11-27	Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield	Dongyang Liu et.al.	2511.22677	null
2025-11-27	VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models	Silin Cheng et.al.	2511.22664	null
2025-11-27	Geometrically-Constrained Agent for Spatial Reasoning	Zeren Chen et.al.	2511.22659	null
2025-11-27	Beyond Success: Refining Elegant Robot Manipulation from Mixed-Quality Data via Just-in-Time Intervention	Yanbo Mao et.al.	2511.22555	null
2025-11-27	Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration	Mengyu Yang et.al.	2511.22533	null
2025-11-27	CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving	Zhaohui Wang et.al.	2511.22532	null
2025-11-26	Canvas-to-Image: Compositional Image Generation with Multimodal Controls	Yusuf Dalva et.al.	2511.21691	null
2025-11-26	Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving	Haohong Lin et.al.	2511.21584	null
2025-11-26	Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy	Teng Hu et.al.	2511.21579	null
2025-11-26	IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference	Wanli Zhong et.al.	2511.21513	null
2025-11-26	MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices	Shuai Zhang et.al.	2511.21475	null
2025-11-26	Odin: Oriented Dual-module Integration for Text-rich Network Representation Learning	Kaifeng Hong et.al.	2511.21416	null
2025-11-26	From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting	Umang Agarwal et.al.	2511.21215	null
2025-11-26	Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models	Changlin Li et.al.	2511.21122	null
2025-11-26	From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models	Hengyu Fu et.al.	2511.21103	null
2025-11-26	OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection	Chujie Wang et.al.	2511.21064	null
2025-11-26	GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision	Yuxiao Xiang et.al.	2511.20994	null
2025-11-25	Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy	Inkook Chun et.al.	2511.20906	null
2025-11-25	Test-Time Alignment of Text-to-Image Diffusion Models via Null-Text Embedding Optimisation	Taehoon Kim et.al.	2511.20889	null
2025-11-25	Symbiotic Brain-Machine Drawing via Visual Brain-Computer Interfaces	Gao Wang et.al.	2511.20835	null
2025-11-25	Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion	Samuele Dell’Erba et.al.	2511.20821	null
2025-11-25	Text-Guided Semantic Image Encoder	Raghuveer Thirukovalluru et.al.	2511.20770	null
2025-11-25	Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout	Hidir Yesiltepe et.al.	2511.20649	null
2025-11-25	LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight	Yunze Man et.al.	2511.20648	null
2025-11-25	Image2Gcode: Image-to-G-code Generation for Additive Manufacturing Using Diffusion-Transformer Model	Ziyue Wang et.al.	2511.20636	null
2025-11-25	MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models	Chieh-Yun Chen et.al.	2511.20629	null
2025-11-25	Latent Diffusion Inversion Requires Understanding the Latent Space	Mingxing Rao et.al.	2511.20592	null
2025-11-25	Anatomica: Localized Control over Geometric and Topological Properties for Anatomical Diffusion Models	Karim Kadry et.al.	2511.20587	null
2025-11-25	Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models	Shamima Hossain et.al.	2511.20531	null
2025-11-25	Efficient and Fast Generative-Based Singing Voice Separation using a Latent Diffusion Model	Genís Plaja-Roglans et.al.	2511.20470	null
2025-11-25	Object-Centric Vision Token Pruning for Vision Language Models	Guangyuan Li et.al.	2511.20439	null
2025-11-25	Image-Free Timestep Distillation via Continuous-Time Consistency with Trajectory-Sampled Pairs	Bao Tang et.al.	2511.20410	null
2025-11-25	FREE: Uncertainty-Aware Autoregression for Parallel Diffusion Transformers	Xinwan Wen et.al.	2511.20390	null
2025-11-25	Modified Equations for Stochastic Optimization	Stefan Perko et.al.	2511.20322	null
2025-11-25	TReFT: Taming Rectified Flow Models For One-Step Image Translation	Shengqian Li et.al.	2511.20307	null
2025-11-25	HVAdam: A Full-Dimension Adaptive Optimizer	Yiheng Zhang et.al.	2511.20277	null
2025-11-25	Rectified Flow for Vision-Aided mmWave V2I Beam Prediction	Can Zheng et.al.	2511.20265	null
2025-11-25	In-Context Compositional Learning via Sparse Coding Transformer	Wei Chen et.al.	2511.20194	null
2025-11-25	Spatially Resolved Plasma Diagnostics of the Supernova Remnant DEM L71 using the Reflection Grating Spectrometer	Yuki Amano et.al.	2511.20112	null
2025-11-25	iRadioDiff: Physics-Informed Diffusion Model for Indoor Radio Map Construction and Localization	Xiucheng Wang et.al.	2511.20015	null
2025-11-25	CounterVQA: Evaluating and Improving Counterfactual Reasoning in Vision-Language Models for Video Understanding	Yuefei Chen et.al.	2511.19923	null
2025-11-25	Scale Where It Matters: Training-Free Localized Scaling for Diffusion Models	Qin Ren et.al.	2511.19917	null
2025-11-24	Mixture of Horizons in Action Chunking	Dong Jing et.al.	2511.19433	null
2025-11-24	Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens	Yiming Qin et.al.	2511.19418	null
2025-11-24	Predicting partially observable dynamical systems via diffusion models with a multiscale inference scheme	Rudy Morel et.al.	2511.19390	null
2025-11-24	Efficiency vs. Fidelity: A Comparative Analysis of Diffusion Probabilistic Models and Flow Matching on Low-Resource Hardware	Srishti Gupta et.al.	2511.19379	null
2025-11-24	DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation	Zehong Ma et.al.	2511.19365	null
2025-11-24	Rethinking Intermediate Representation for VLM-based Robot Manipulation	Weiliang Tang et.al.	2511.19315	null
2025-11-24	CDLM: Consistency Diffusion Language Models For Faster Sampling	Minseo Kim et.al.	2511.19269	null
2025-11-24	SimDiff: Simpler Yet Better Diffusion Model for Time Series Point Forecasting	Hang Ding et.al.	2511.19256	null
2025-11-24	Learning Plug-and-play Memory for Guiding Video Diffusion Models	Selena Song et.al.	2511.19229	null
2025-11-24	EEG-VLM: A Hierarchical Vision-Language Model with Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction	Xihe Qiu et.al.	2511.19155	null
2025-11-24	MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images	Qirui Wang et.al.	2511.19119	null
2025-11-24	A Self-Conditioned Representation Guided Diffusion Model for Realistic Text-to-LiDAR Scene Generation	Wentao Qu et.al.	2511.19004	null
2025-11-24	BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models	Juncheng Li et.al.	2511.18921	null
2025-11-24	EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models	Wenhao Xu et.al.	2511.18920	null
2025-11-24	MatMart: Material Reconstruction of 3D Objects via Diffusion	Xiuchao Wu et.al.	2511.18900	null
2025-11-24	Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference	Wengyi Zhan et.al.	2511.18875	null
2025-11-24	UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model	Changxin Huang et.al.	2511.18845	null
2025-11-24	DiP: Taming Diffusion Models in Pixel Space	Zhennan Chen et.al.	2511.18822	null
2025-11-24	Mitigating Long-Tail Bias in HOI Detection via Adaptive Diversity Cache	Yuqiu Jiang et.al.	2511.18811	null
2025-11-24	MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent	Yuxia Fu et.al.	2511.18810	null
2025-11-21	SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding	Nikolay Nikolov et.al.	2511.17411	null
2025-11-21	SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion	Jiajie Guo et.al.	2511.17308	null
2025-11-21	A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback	Bulat Khaertdinov et.al.	2511.17255	null
2025-11-21	FlexiFlow: decomposable flow matching for generation of flexible molecular ensemble	Riccardo Tedoldi et.al.	2511.17249	null
2025-11-21	FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle	Mario Markov et.al.	2511.17171	null
2025-11-21	One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution	Yushun Fang et.al.	2511.17138	null
2025-11-21	Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models	He Huang et.al.	2511.17094	null
2025-11-21	Diversity Has Always Been There in Your Visual Autoregressive Models	Tong Wang et.al.	2511.17074	null
2025-11-21	DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing	Hao Chen et.al.	2511.17038	null
2025-11-21	Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation	Aniketh Iyengar et.al.	2511.17031	null
2025-11-21	VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions	Qianyi Shao et.al.	2511.16998	null
2025-11-21	MultiPriv: Benchmarking Individual-Level Privacy Reasoning in Vision-Language Models	Xiongtao Sun et.al.	2511.16940	null
2025-11-21	UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation	Chi Zhang et.al.	2511.16917	null
2025-11-21	Align & Invert: Solving Inverse Problems with Diffusion and Flow-based Models via Representational Alignment	Loukas Sfountouris et.al.	2511.16870	null
2025-11-20	Towards Unified Vision Language Models for Forest Ecological Analysis in Earth Observation	Xizhe Xue et.al.	2511.16853	null
2025-11-20	TRIM: Scalable 3D Gaussian Diffusion Inference with Temporal and Spatial Trimming	Zeyuan Yin et.al.	2511.16642	null
2025-11-21	VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference	Ziyan Liu et.al.	2511.16449	null
2025-11-20	Decoupling Complexity from Scale in Latent Diffusion Model	Tianxiong Zhong et.al.	2511.16117	null
2025-11-20	T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs	Shao-Jun Xia et.al.	2511.16107	null
2025-11-20	Learning Tractable Distributions Of Language Model Continuations	Gwen Yidou-Weng et.al.	2511.16054	null
2025-11-20	Understanding and improving axial detection in optical tweezers based on the interference of forward- and backward- scattered light	Isaac Pérez Castillo et.al.	2511.16036	null
2025-11-20	Physics-Guided Inductive Spatiotemporal Kriging for PM2.5 with Satellite Gradient Constraints	Shuo Wang et.al.	2511.16013	null
2025-11-19	Breaking the Bottleneck with DiffuApriel: High-Throughput Diffusion LMs with Mamba Backbone	Vaibhav Singh et.al.	2511.15927	null
2025-11-19	Think Visually, Reason Textually: Vision-Language Synergy in ARC	Beichen Zhang et.al.	2511.15703	null
2025-11-19	MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping	Yushi Huang et.al.	2511.15690	null
2025-11-19	Theoretical Closed-loop Stability Bounds for Dynamical System Coupled with Diffusion Policies	Gabriel Lauzier et.al.	2511.15520	null
2025-11-19	What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs	Zhihan Ren et.al.	2511.15316	null
2025-11-19	Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning	Yuxuan Gu et.al.	2511.15190	null
2025-11-19	A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models	Duo Li et.al.	2511.15098	null
2025-11-19	Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis	Chengyu Xie et.al.	2511.15092	null
2025-11-19	Reasoning via Video: The First Evaluation of Video Models’ Reasoning Abilities through Maze-Solving Tasks	Cheng Yang et.al.	2511.15065	null
2025-11-19	Aligning Generative Music AI with Human Preferences: Methods and Challenges	Dorien Herremans et.al.	2511.15038	null
2025-11-18	Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge	Antonia Ebner et.al.	2511.14744	null
2025-11-18	Oscillation Quenching Induced By Time-Varying Coupling Functions	Dushko Stavrov et.al.	2511.14370	null
2025-11-18	Bridging the Gap Between Bayesian Deep Learning and Ensemble Weather Forecasts	Xinlei Xiong et.al.	2511.14218	null
2025-11-18	InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior	Weimin Bai et.al.	2511.14208	null
2025-11-18	Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion	Zhuo Li et.al.	2511.14178	null
2025-11-18	Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation	Yu Zhong et.al.	2511.14131	null
2025-11-18	Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations	Yiqing Shen et.al.	2511.14100	null
2025-11-18	GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards	Yule Liu et.al.	2511.14045	null
2025-11-18	Flood-LDM: Generalizable Latent Diffusion Models for rapid and accurate zero-shot High-Resolution Flood Mapping	Sun Han Neo et.al.	2511.14033	null
2025-11-17	Single Tensor Cell Segmentation using Scalar Field Representations	Kevin I. Ruiz Vargas et.al.	2511.13947	null
2025-11-17	Mapping the Cosmic-Ray Ionization Rate in the Local Galaxy with H $_3^+$	Nick Indriolo et.al.	2511.13915	null
2025-11-17	Distribution Matching Distillation Meets Reinforcement Learning	Dengyang Jiang et.al.	2511.13649	null
2025-11-17	CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding	Shrenik Patel et.al.	2511.13644	null
2025-11-17	Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling	Adam Hazimeh et.al.	2511.13478	null
2025-11-18	Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline	Rui Zuo et.al.	2511.13442	null
2025-11-17	Local asymptotic normality for discretely observed McKean-Vlasov diffusions	Akram Heidari et.al.	2511.13366	null
2025-11-17	TransFit-CSM: A Fast, Physically Consistent Framework for Interaction-Powered Transients	Yu-Hao Zhang et.al.	2511.13265	null
2025-11-17	GenTract: Generative Global Tractography	Alec Sargood et.al.	2511.13183	null
2025-11-17	Conditional Diffusion Model for Multi-Agent Dynamic Task Decomposition	Yanda Zhu et.al.	2511.13137	null
2025-11-17	MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images	Doanh C. Bui et.al.	2511.13099	null
2025-11-17	MeanFlow Transformers with Representation Autoencoders	Zheyuan Hu et.al.	2511.13019	null
2025-11-17	SAGE: Spuriousness-Aware Guided Prompt Exploration for Mitigating Multimodal Bias	Wenqian Ye et.al.	2511.13005	null
2025-11-17	Infinite-Story: A Training-Free Consistent Text-to-Image Generation	Jihun Park et.al.	2511.13002	null
2025-11-17	Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention	Taiye Chen et.al.	2511.12940	null
2025-11-17	Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models	Guoyan Wang et.al.	2511.12937	null
2025-11-17	Method of Manufactured Learning for Solver-free Training of Neural Operators	Arth Sojitra et.al.	2511.12890	null
2025-11-17	BrainNormalizer: Anatomy-Informed Pseudo-Healthy Brain Reconstruction from Tumor MRI via Edge-Guided ControlNet	Min Gu Kwak et.al.	2511.12853	null
2025-11-16	Prompt-Driven Domain Adaptation for End-to-End Autonomous Driving via In-Context RL	Aleesha Khurram et.al.	2511.12755	null
2025-11-16	Backdoor Attacks on Open Vocabulary Object Detectors via Multi-Modal Prompt Tuning	Ankita Raj et.al.	2511.12735	null
2025-11-16	QPU Micro-Kernels for Stencil Computation	Stefano Markidis et.al.	2511.12617	null
2025-11-16	CoTBox-TTT: Grounding Medical VQA with Visual Chain-of-Thought Boxes During Test-time Training	Jiahe Qian et.al.	2511.12446	null
2025-11-16	RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning	Jingqi Xu et.al.	2511.12428	null
2025-11-14	PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision–Language Models	Nhat Hoang-Xuan et.al.	2511.11502	null
2025-11-14	Planetary nebulae as tracers of stellar population properties: a pilot study with MUSE	Ana Inés Ennis et.al.	2511.11479	null
2025-11-14	DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference	Farhana Amin et.al.	2511.11446	null
2025-11-14	BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning	Lan Li et.al.	2511.11421	null
2025-11-14	EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment	Ruoxi Cheng et.al.	2511.11301	null
2025-11-14	GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving	Fabian Schmidt et.al.	2511.11266	null
2025-11-14	CountSteer: Steering Attention for Object Counting in Diffusion Models	Hyemin Boo et.al.	2511.11253	null
2025-11-14	Viper-F1: Fast and Fine-Grained Multimodal Understanding with Cross-Modal State-Space Modulation	Quoc-Huy Trinh et.al.	2511.11177	null
2025-11-14	Explainable Deep Convolutional Multi-Type Anomaly Detection	Alex George et.al.	2511.11165	null
2025-11-14	Non-Gaussianity-induced enhanced target-finding dynamics of confined colloids	Guirec de Tournemire et.al.	2511.11117	null
2025-11-14	Sheaf Cohomology of Linear Predictive Coding Networks	Jeffrey Seely et.al.	2511.11092	null
2025-11-14	SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation	Sumin Yu et.al.	2511.11014	null
2025-11-14	VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models	Xinlei Yu et.al.	2511.11007	null
2025-11-14	CLUE: Controllable Latent space of Unprompted Embeddings for Diversity Management in Text-to-Image Synthesis	Keunwoo Park et.al.	2511.10993	null
2025-11-14	Binary Verification for Zero-Shot Vision	Jeffrey Liu et.al.	2511.10983	null
2025-11-13	FengHuang: Next-Generation Memory Orchestration for AI Inferencing	Jiamin Li et.al.	2511.10753	null
2025-11-13	Diffusion in the stochastic Klein-Gordon equation	Jonathan Oppenheim et.al.	2511.10738	null
2025-11-13	Reaching for the Edge II: Stellar Halos out to Large Radii as a Tracer of Dark Matter Halo Mass	Katya Leidig et.al.	2511.10723	null
2025-11-14	OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer	Haosong Peng et.al.	2511.10560	null
2025-11-13	A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space	Huijie Liu et.al.	2511.10555	null
2025-11-13	SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation	Wei Li et.al.	2511.10518	null
2025-11-13	Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models	Zhengtao Zou et.al.	2511.10292	null
2025-11-13	PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning	Yanbei Jiang et.al.	2511.10279	null
2025-11-13	LiNeXt: Revisiting LiDAR Completion with Efficient Non-Diffusion Architectures	Wenzhe He et.al.	2511.10209	null
2025-11-13	AI-Integrated Decision Support System for Real-Time Market Growth Forecasting and Multi-Source Content Diffusion Analytics	Ziqing Yin et.al.	2511.09962	null
2025-11-13	Remember Me: Bridging the Long-Range Gap in LVLMs with Three-Step Inference-Only Decay Resilience Strategies	Peng Gao et.al.	2511.09868	null
2025-11-12	From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance	Jeongho Min et.al.	2511.09820	null
2025-11-12	Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models	Konstantinos M. Dafnis et.al.	2511.09809	null
2025-11-12	HeatGen: A Guided Diffusion Framework for Multiphysics Heat Sink Design Optimization	Hadi Keramati et.al.	2511.09578	null
2025-11-12	Controllable protein design through Feynman-Kac steering	Erik Hartman et.al.	2511.09216	null
2025-11-12	FSampler: Training Free Acceleration of Diffusion Sampling via Epsilon Extrapolation	Michael A. Vladimir et.al.	2511.09180	null
2025-11-12	Emission-Line and Continuum Reverberation Mapping of the NLS1 Galaxy WPVS 48	M. A. Probst et.al.	2511.09153	null
2025-11-12	Diff-V2M: A Hierarchical Conditional Diffusion Model with Explicit Rhythmic Modeling for Video-to-Music Generation	Shulei Ji et.al.	2511.09090	null
2025-11-12	Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference	Chengze Jiang et.al.	2511.09064	null
2025-11-12	Expand Your SCOPE: Semantic Cognition over Potential-Based Exploration for Embodied Visual Navigation	Ningnan Wang et.al.	2511.08935	null
2025-11-12	From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model	Hanbo Cheng et.al.	2511.08930	null
2025-11-12	TiDAR: Think in Diffusion, Talk in Autoregression	Jingyu Liu et.al.	2511.08923	null
2025-11-12	Machines Serve Human: A Novel Variable Human-machine Collaborative Compression Framework	Zifu Zhang et.al.	2511.08915	null
2025-11-04	The Pervasive Blind Spot: Benchmarking VLM Inference Risks on Everyday Personal Videos	Shuning Zhang et.al.	2511.02367	null
2025-10-26	Encoder-Decoder Diffusion Language Models for Efficient Training and Inference	Marianne Arriola et.al.	2510.22852	null
2025-10-26	FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference	Divya Jyoti Bajpai et.al.	2510.22641	null
2025-10-28	Token-Level Inference-Time Alignment for Vision-Language Models	Kejia Chen et.al.	2510.21794	null
2025-10-20	SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference	Samir Khaki et.al.	2510.17777	null
2025-10-22	VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models	Qilin Liao et.al.	2510.17759	null
2025-10-16	Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference	Natan Bagrov et.al.	2510.14624	null
2025-10-13	Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation	Maggie Wang et.al.	2510.11689	null
2025-10-13	When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models	Samer Al-Hamadani et.al.	2510.11302	null
2025-10-11	Efficient Navigation in Unknown Indoor Environments with Vision-Language Models	D. Schwartz et.al.	2510.04991	null
2025-10-03	TridentServe: A Stage-level Serving System for Diffusion Pipelines	Yifei Xia et.al.	2510.02838	null
2025-10-26	EVODiff: Entropy-aware Variance Optimized Diffusion Inference	Shigui Li et.al.	2509.26096	null
2025-09-28	Sequential Diffusion Language Models	Yangzhou Liu et.al.	2509.24007	null
2025-09-28	HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models	Zhinan Xie et.al.	2509.23928	null
2025-11-27	Manifold-Aware Diffusion-Augmented Contrastive Learning for Noise-Robust Biosignal Representation	Rami Zewail et.al.	2509.20048	null
2025-09-20	Eye Gaze Tells You Where to Compute: Gaze-Driven Efficient VLMs	Qinyu Chen et.al.	2509.16476	null
2025-09-21	SpecVLM: Fast Speculative Decoding in Vision-Language Models	Haiduo Huang et.al.	2509.11815	null
2025-09-15	STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs	Han Liang et.al.	2509.04719	null
2025-08-26	MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs	Sixun Dong et.al.	2508.18264	null
2025-08-20	GM-Skip: Metric-Guided Transformer Block Skipping for Efficient Vision-Language Models	Lianming Huang et.al.	2508.18227	null
2025-08-21	Pretrained Diffusion Models Are Inherently Skipped-Step Samplers	Wenju Xu et.al.	2508.15233	null
2025-08-11	AdaptInfer: Adaptive Token Pruning for Vision-Language Model Inference with Dynamical Text Guidance	Weichen Zhang et.al.	2508.06084	null
2025-08-07	Real-Time Iteration Scheme for Diffusion Policy	Yufei Duan et.al.	2508.05396	null
2025-07-23	Accelerating Parallel Diffusion Model Serving with Residual Compression	Jiajun Luo et.al.	2507.17511	null
2025-07-11	BlindSight: Harnessing Sparsity for Efficient VLMs	Tharun Adithya Srikrishnan et.al.	2507.09071	null
2025-09-30	Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling?	Mingyuan Wu et.al.	2506.17417	null
2025-06-20	Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models	Michael Plainer et.al.	2506.17139	null
2025-06-18	VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service	Xiasi Wang et.al.	2506.15755	null
2025-07-01	Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model	Anirud Aggarwal et.al.	2506.15682	null
2025-06-12	Adding simple structure at inference improves Vision-Language Compositionality	Imanol Miranda et.al.	2506.09691	null
2025-06-09	Event-Priori-Based Vision-Language Model for Efficient Visual Understanding	Haotong Qin et.al.	2506.07627	null
2025-09-03	RNE: plug-and-play diffusion inference-time control and energy-based training	Jiajun He et.al.	2506.05668	null
2025-10-10	Can Vision Language Models Infer Human Gaze Direction? A Controlled Study	Zory Zhang et.al.	2506.05412	null
2025-10-05	Inference-time Scaling of Diffusion Models through Classical Search	Xiangcheng Zhang et.al.	2505.23614	null
2025-05-27	InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling	Xiaoxiao Jiang et.al.	2505.20600	null
2025-05-25	SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation	Shenggan Cheng et.al.	2505.19151	null
2025-06-13	VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis	Tina Khezresmaeilzadeh et.al.	2505.18570	null
2025-05-23	VERDI: VLM-Embedded Reasoning for Autonomous Driving	Bowen Feng et.al.	2505.15925	null
2025-05-20	Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism	Kunyun Wang et.al.	2505.14741	null
2025-04-14	Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization	Haiyong Yu et.al.	2504.09927	null
2025-04-15	Metropolis-Hastings Captioning Game: Knowledge Fusion of Vision Language Models via Decentralized Bayesian Inference	Yuta Matsui et.al.	2504.09620	null
2025-03-17	VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers	Ruanjun Li et.al.	2503.09387	null
2025-02-20	Light communicative materials	Hongshuang Guo et.al.	2503.05744	null
2025-02-21	Evaluating Precise Geolocation Inference Capabilities of Vision Language Models	Neel Jay et.al.	2502.14412	null
2025-10-08	Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search	Yuta Oshima et.al.	2501.19252	null
2025-02-10	Membership Inference Attacks Against Vision-Language Models	Yuke Hu et.al.	2501.18624	null
2025-03-10	Probing the Quantum Nature of Gravity through Classical Diffusion	Oliviero Angeli et.al.	2501.13030	null
2025-01-16	PATCHEDSERVE: A Patch Management Framework for SLO-Optimized Hybrid Resolution Diffusion Serving	Desen Sun et.al.	2501.09253	null
2025-01-16	StructSR: Refuse Spurious Details in Real-World Image Super-Resolution	Yachao Li et.al.	2501.05777	link
2024-12-19	Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model	Minglong Xue et.al.	2412.14630	link
2025-06-30	Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension	Xiyao Wang et.al.	2412.03704	link
2024-12-05	A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs	Wangbo Zhao et.al.	2412.03324	link
2024-12-02	[CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster	Qizhe Zhang et.al.	2412.01818	link
2025-03-30	Staleness-Centric Optimizations for Parallel Diffusion MoE Inference	Jiajun Luo et.al.	2411.16786	null
2024-11-01	VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration	Dezhan Tu et.al.	2410.23317	null
2025-01-07	Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance	Dongmin Park et.al.	2410.22376	link
2024-10-30	Natural Language Inference Improves Compositionality in Vision-Language Models	Paola Cascante-Bonilla et.al.	2410.22315	null
2024-10-18	Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models	Jie Ren et.al.	2410.13088	null
2025-02-11	ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time	Yi Ding et.al.	2410.06625	null
2024-10-08	A scaling limit for additive functionals	Thibaud Taillefumier et.al.	2410.06383	null
2024-09-03	CT-SDM: A Sampling Diffusion Model for Sparse-View CT Reconstruction across All Sampling Rates	Liutao Yang et.al.	2409.01571	null
2024-07-27	Faster Image2Video Generation: A Closer Look at CLIP Image Embedding’s Impact on Spatio-Temporal Cross-Attentions	Ashkan Taghipour et.al.	2407.19205	null
2024-07-15	LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis	Zhenxiong Tan et.al.	2407.10468	link
2024-06-13	DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning	Xuemin Hu et.al.	2406.09089	null
2024-10-03	I4VGen: Image as Free Stepping Stone for Text-to-Video Generation	Xiefan Guo et.al.	2406.02230	null
2025-01-14	Amortizing intractable inference in diffusion models for vision, language, and control	Siddarth Venkatraman et.al.	2405.20971	null
2024-05-30	DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation	Zachary Novack et.al.	2405.20289	null
2024-05-26	Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference	Xunpeng Huang et.al.	2405.16387	null
2025-04-16	Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models	Katherine Xu et.al.	2405.14828	null
2024-04-25	Inferring solid-state diffusivity in lithium-ion battery active materials: improving upon the classical GITT method	A. Emir Gumrukcuoglu et.al.	2404.16658	null
2024-11-05	Private Attribute Inference from Images with Vision-Language Models	Batuhan Tömekçe et.al.	2404.10618	null
2024-05-02	Privacy-Preserving Diffusion Model Using Homomorphic Encryption	Yaojian Chen et.al.	2403.05794	link
2024-05-08	ToDo: Token Downsampling for Efficient Generation of High-Resolution Images	Ethan Smith et.al.	2402.13573	null
2024-06-03	DITTO: Diffusion Inference-Time T-Optimization for Music Generation	Zachary Novack et.al.	2401.12179	null
2023-12-10	Statistical Spatially Inhomogeneous Diffusion Inference	Yinuo Ren et.al.	2312.05793	null
2023-07-31	Cross-Modal Concept Learning and Inference for Vision-Language Models	Yi Zhang et.al.	2307.15460	null
2024-01-04	Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference	Zihao Yu et.al.	2305.17423	link
2023-10-25	ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval	Kexun Zhang et.al.	2302.02285	link
2021-08-11	Manifold-aware Synthesis of High-resolution Diffusion from Structural Imaging	Benoit Anctil-Robitaille et.al.	2108.04135	null
2021-12-22	Functional Data Analysis with Rough Sample Paths?	Neda Mohammadi et.al.	2105.12035	null
2014-06-03	$C^0$ -estimates and smoothness of solutions to the parabolic equation defined by Kimura operators	Camelia A. Pop et.al.	1406.0742	null
2015-04-01	On nonnegative unbiased estimators	Pierre E. Jacob et.al.	1309.6473	null