Contributors Forks Stargazers Issues

Updated on 2026.04.03

Usage instructions: here

LLM inference

Publish Date Title Authors PDF Code
2026-04-02 Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding Tao Jin et.al. 2604.02047 null
2026-04-02 DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72 Wanqian Li et.al. 2604.01621 null
2026-04-01 Fast and Accurate Probing of In-Training LLMs’ Downstream Performances Zhichen Liu et.al. 2604.01025 null
2026-04-01 Learning from Many and Adapting to the Unknown in Open-set Test Streams Xiao Zhang et.al. 2604.00533 null
2026-04-01 Scheduling LLM Inference with Uncertainty-Aware Output Length Predictions Haoyu Zheng et.al. 2604.00499 null
2026-04-01 TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving Feng Ren et.al. 2604.00368 null
2026-03-31 ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving Annette Taberner-Miller et.al. 2604.00136 null
2026-03-30 Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference Zifan He et.al. 2603.29002 null
2026-03-24 StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving Azam Nouri et.al. 2603.28795 null
2026-03-30 A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN Gabriele Gemmi et.al. 2603.28680 null
2026-03-30 Tiered Super-Moore’s Law: Price Evolution, Production Frontiers, and Market Competition in Large Language Model Inference Services Mingdeng Du et.al. 2603.28576 null
2026-03-31 A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network Aojie Jiang et.al. 2603.28239 null
2026-03-31 ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing Edward J. Yoon et.al. 2603.27914 null
2026-03-29 KVSculpt: KV Cache Compression as Distillation Bo Jiang et.al. 2603.27819 null
2026-03-28 From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification Huamin Chen et.al. 2603.27299 null
2026-03-28 ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference Qiuyang Zhang et.al. 2603.27138 null
2026-03-27 MemBoost: A Memory-Boosted Framework for Cost-Aware LLM Inference Joris Köster et.al. 2603.26557 null
2026-03-27 Rocks, Pebbles and Sand: Modality-aware Scheduling for Multimodal Large Language Model Inference Konstantinos Papaioannou et.al. 2603.26498 null
2026-03-27 AgentCollab: A Self-Evaluation-Driven Collaboration Paradigm for Efficient LLM Agents Wenbo Gao et.al. 2603.26034 null
2026-03-26 Supercharging Federated Intelligence Retrieval Dimitris Stripelis et.al. 2603.25374 null
2026-03-26 Interpretable Zero-shot Referring Expression Comprehension with Query-driven Scene Graphs Yike Wu et.al. 2603.25004 null
2026-03-25 LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control Yifeng Zhang et.al. 2603.24361 null
2026-03-25 Self-Distillation for Multi-Token Prediction Guoliang Zhao et.al. 2603.23911 null
2026-03-24 The Diminishing Returns of Early-Exit Decoding in Modern LLMs Rui Wei et.al. 2603.23701 null
2026-03-24 Energy Efficient Software Hardware CoDesign for Machine Learning: From TinyML to Large Language Models Mohammad Saleh Vahdatpour et.al. 2603.23668 null
2026-03-24 LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load Pranay Tummalapalli et.al. 2603.23640 null
2026-03-24 Sparser, Faster, Lighter Transformer Language Models Edoardo Cetin et.al. 2603.23198 null
2026-03-24 Characterizing CPU-Induced Slowdowns in Multi-GPU LLM Inference Euijun Chung et.al. 2603.22774 null
2026-03-23 Chimera: Latency- and Performance-Aware Multi-agent Serving for Heterogeneous LLMs Kangqi Ni et.al. 2603.22206 null
2026-03-23 GSEM: Graph-based Self-Evolving Memory for Experience Augmented Clinical Reasoning Xiao Han et.al. 2603.22096 null
2026-03-23 CurvZO: Adaptive Curvature-Guided Sparse Zeroth-Order Optimization for Efficient LLM Fine-Tuning Shuo Wang et.al. 2603.21725 null
2026-03-25 PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection Hyoseok Park et.al. 2603.21576 null
2026-03-22 TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference Jaber Jaber et.al. 2603.21365 null
2026-03-22 The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project Huamin Chen et.al. 2603.21354 null
2026-03-22 Improving Coherence and Persistence in Agentic AI for System Optimization Pantea Karimi et.al. 2603.21321 null
2026-03-22 CALVO: Improve Serving Efficiency for LLM Inferences with Intense Network Demands Weiye Wang et.al. 2603.21257 null
2026-03-22 Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs Zihui Chen et.al. 2603.21155 null
2026-03-24 WWW.Serve: Interconnecting Global LLM Services through Decentralization Huanyu Wang et.al. 2603.20661 null
2026-03-20 KV Cache Optimization Strategies for Scalable and Efficient LLM Inference Yichun Xu et.al. 2603.20397 null
2026-03-20 Utility-Guided Agent Orchestration for Efficient LLM Tool Use Boyan Liu et.al. 2603.19896 null
2026-03-20 Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification Baoding He et.al. 2603.19715 null
2026-03-20 HyEvo: Self-Evolving Hybrid Agentic Workflows for Efficient Reasoning Beibei Xu et.al. 2603.19639 null
2026-03-19 A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference Yida Zhang et.al. 2603.19133 null
2026-03-19 BeamAgent: LLM-Aided MIMO Beamforming with Decoupled Intent Parsing and Alternating Optimization for Joint Site Selection and Precoding Xiucheng Wang et.al. 2603.18855 null
2026-03-19 From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning Grant Wilkins et.al. 2603.18383 null
2026-03-18 Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL Xunzhuo Liu et.al. 2603.18174 null
2026-03-18 Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction Xin Wei Chia et.al. 2603.18085 null
2026-03-17 NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference Zhaohui Geoffrey Wang et.al. 2603.18046 null
2026-03-18 RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference Arpit Singh Gautam et.al. 2603.17891 null
2026-03-18 Swarm: Co-Activation Aware KVCache Offloading Across Multiple SSDs Tuowei Wang et.al. 2603.17803 null
2026-03-18 Multi-stage Flow Scheduling for LLM Serving Yijun Sun et.al. 2603.17456 null
2026-03-18 ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression Ruibo Fan et.al. 2603.17435 null
2026-03-18 OmniVLN: Omnidirectional 3D Perception and Token-Efficient LLM Reasoning for Visual-Language Navigation across Air and Ground Platforms Zhongyuang Liu et.al. 2603.17351 null
2026-03-18 IEMAS: An Incentive-Efficiency Routing Framework for Open Agentic Web Ecosystems Hongze Liu et.al. 2603.17302 null
2026-03-18 The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency Huamin Chen et.al. 2603.17280 null
2026-03-17 An End-to-End Framework for Functionality-Embedded Provenance Graph Construction and Threat Interpretation Kushankur Ghosh et.al. 2603.17100 null
2026-03-17 FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism Huamin Chen et.al. 2603.16514 null
2026-03-17 Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective Noppanat Wadlom et.al. 2603.16104 null
2026-03-18 Resource Consumption Threats in Large Language Models Yuanhe Zhang et.al. 2603.16068 null
2026-03-17 inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference Huamin Chen et.al. 2603.16054 null
2026-03-16 BANGLASOCIALBENCH: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction Tanvir Ahmed Sijan et.al. 2603.15949 null
2026-03-16 SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration Yu Pan et.al. 2603.15397 null
2026-03-16 SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation Zicheng He et.al. 2603.14785 null
2026-03-16 AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems Zhaohui Geoffrey Wang et.al. 2603.14688 null
2026-03-15 Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use Ziling Zhou et.al. 2603.14332 null
2026-03-14 SVD Contextual Sparsity Predictors for Fast LLM Inference Georgii Serbin et.al. 2603.14110 null
2026-03-17 APEX-Searcher: Augmenting LLMs’ Search Capabilities through Agentic Planning and Execution Kun Chen et.al. 2603.13853 null
2026-03-14 Retrieval-Feedback-Driven Distillation and Preference Alignment for Efficient LLM-based Query Expansion Minghan Li et.al. 2603.13776 null
2026-03-13 Orla: A Library for Serving LLM-Based Multi-Agent Systems Rana Shahout et.al. 2603.13605 null
2026-03-13 Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference Huamin Chen et.al. 2603.13426 null
2026-03-17 Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking Zizhao Mo et.al. 2603.12831 null
2026-03-13 Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation Yichen Zhang et.al. 2603.12793 null
2026-03-13 ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning Shuo Yang et.al. 2603.12740 null
2026-03-13 Cost-Efficient Multimodal LLM Inference via Cross-Tier GPU Heterogeneity Donglin Yu et.al. 2603.12707 null
2026-03-13 98 $\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router Xunzhuo Liu et.al. 2603.12646 null
2026-03-13 When Drafts Evolve: Speculative Decoding Meets Online Learning Yu-Yang Qian et.al. 2603.12617 null
2026-03-12 TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition Prabhu Vellaisamy et.al. 2603.12465 null
2026-03-10 Detecting Miscitation on the Scholarly Web through LLM-Augmented Text-Rich Graph Learning Huidong Wu et.al. 2603.12290 null
2026-03-12 IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL Zhoujun Cheng et.al. 2603.12151 null
2026-03-12 Where Matters More Than What: Decoding-aligned KV Cache Compression via Position-aware Pseudo Queries Zhenxu Tian et.al. 2603.11564 null
2026-03-11 Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI Yonas Atinafu et.al. 2603.11340 null
2026-03-11 Markovian Generation Chains in Large Language Models Mingmeng Geng et.al. 2603.11228 null
2026-03-11 Leech Lattice Vector Quantization for Efficient LLM Compression Tycho F. A. van der Ouderaa et.al. 2603.11021 null
2026-03-11 CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems Panagiotis Georgios Pennas et.al. 2603.10726 null
2026-03-11 S-HPLB: Efficient LLM Attention Serving via Sparsity-Aware Head Parallelism Load Balance Di Liu et.al. 2603.10353 null
2026-03-11 MultiwayPAM: Multiway Partitioning Around Medoids for LLM-as-a-Judge Score Analysis Chihiro Watanabe et.al. 2603.10287 null
2026-03-10 ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling Dechuan Teng et.al. 2603.09691 null
2026-03-10 Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation Luxi Lin et.al. 2603.09527 null
2026-03-10 PIM-SHERPA: Software Method for On-device LLM Inference by Resolving PIM Memory Attribute and Layout Inconsistencies Sunjung Lee et.al. 2603.09216 null
2026-03-10 FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation Yinpeng Wu et.al. 2603.09046 null
2026-03-09 Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning Juming Xiong et.al. 2603.08999 null
2026-03-09 ConFu: Contemplate the Future for Better Speculative Sampling Zongyue Qin et.al. 2603.08899 null
2026-03-07 Turn: A Language for Agentic Computation Muyukani Kizito et.al. 2603.08755 null
2026-03-09 SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization Yeonsik Park et.al. 2603.08185 null
2026-03-09 EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs Chang Han et.al. 2603.08088 null
2026-03-09 Deterministic Differentiable Structured Pruning for Large Language Models Weiyu Huang et.al. 2603.08065 null
2026-03-09 DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention Younjoo Lee et.al. 2603.08026 null
2026-03-09 Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization Jingwei Li et.al. 2603.08022 null
2026-03-09 SageSched: Efficient LLM Scheduling Confronting Demand Uncertainty and Hybridity Zhenghao Gan et.al. 2603.07917 null
2026-03-09 Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents Jingbo Yang et.al. 2603.07915 null
2026-03-08 Temperature-Aware Scheduling of LLM Inference in Large-Scale Geo-Distributed Edge Data Centers with Distributed Optimization Arash Khalatbarisoltani et.al. 2603.07810 null
2026-03-08 ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs Yuzhuang Xu et.al. 2603.07770 null
2026-03-06 MoEless: Efficient MoE LLM Serving via Serverless Computing Hanfei Yu et.al. 2603.06350 null
2026-03-06 LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis Tao Zhang et.al. 2603.05904 null
2026-03-06 Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation Changcheng Li et.al. 2603.05881 null
2026-03-05 Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks Burak Topcu et.al. 2603.05692 null
2026-03-05 POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation Zeju Qiu et.al. 2603.05500 null
2026-03-05 Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity Di Zhang et.al. 2603.05168 null
2026-03-05 Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents Natchanon Pollertlam et.al. 2603.04814 null
2026-03-05 Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator Cong Li et.al. 2603.04797 null
2026-03-05 SLO-Aware Compute Resource Allocation for Prefill-Decode Disaggregated LLM Inference Luchang Li et.al. 2603.04716 null
2026-03-04 Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows Alfio Massimiliano Gliozzo et.al. 2603.04241 null
2026-03-04 A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality Arther Tian et.al. 2603.04028 null
2026-03-03 From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition? Shinas Shaji et.al. 2603.03148 null
2026-03-03 SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark Driven Embodiment Priyavanshi Pathania et.al. 2603.02949 null
2026-03-03 Agentic Self-Evolutionary Replanning for Embodied Navigation Guoliang Li et.al. 2603.02772 null
2026-03-03 Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference Yiqi Liu et.al. 2603.02737 null
2026-03-03 SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving Sunghyeon Woo et.al. 2603.02599 null
2026-03-02 Beyond Microservices: Testing Web-Scale RCA Methods on GPU-Driven LLM Workloads Dominik Scheinert et.al. 2603.02057 null
2026-03-02 Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning Jiebin Zhang et.al. 2603.01639 null
2026-03-02 Graph-Based Self-Healing Tool Routing for Cost-Efficient LLM Agents Neeraj Bholani et.al. 2603.01548 null
2026-03-02 Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report) Yu Lin et.al. 2603.01499 null
2026-03-02 Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study Emmanuel Aboah Boateng et.al. 2603.01486 null
2026-03-02 SFCo-Nav: Efficient Zero-Shot Visual Language Navigation via Collaboration of Slow LLM and Fast Attributed Graph Alignment Chaoran Xiong et.al. 2603.01477 null
2026-03-02 Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification Guang Huang et.al. 2603.01399 null
2026-02-27 Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving Ferran Agullo et.al. 2602.24044 null
2026-02-27 LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding Alexander Samarin et.al. 2602.23881 null
2026-02-27 SLA-Aware Distributed LLM Inference Across Device-RAN-Cloud Hariz Yet et.al. 2602.23722 null
2026-02-26 Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems Siyuan Liu et.al. 2602.23266 null
2026-02-26 LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure Jaehong Cho et.al. 2602.23036 null
2026-02-26 Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching Hiroki Matsutani et.al. 2602.22812 null
2026-02-26 Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement Shuchen Zhu et.al. 2602.22681 null
2026-02-26 Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning Qin-Wen Luo et.al. 2602.22642 null
2026-03-02 FLYING SERVING: On-the-Fly Parallelism Switching for Large Language Model Serving Shouwei Gao et.al. 2602.22593 null
2026-02-25 AutoQRA: Joint Optimization of Mixed-Precision Quantization and Low-rank Adapters for Efficient LLM Fine-Tuning Changhai Zhou et.al. 2602.22268 null
2026-02-25 Sustainable LLM Inference using Context-Aware Model Switching Yuvarani et.al. 2602.22261 null
2026-02-25 Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text Bitan Majumder et.al. 2602.21933 null
2026-02-25 Multi-Layer Scheduling for MoE-Based LLM Reasoning Yifan Sun et.al. 2602.21626 null
2026-02-26 DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference Yongtong Wu et.al. 2602.21548 null
2026-02-25 Pancake: Hierarchical Memory System for Multi-Agent LLM Serving Zhengding Hu et.al. 2602.21477 null
2026-02-24 SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks Elizabeth S. Z. Tan et.al. 2602.21307 null
2026-02-24 ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments Haley Li et.al. 2602.21140 null
2026-02-24 CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference Chao Fei et.al. 2602.20732 null
2026-02-24 OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services Longxiang Wang et.al. 2602.20595 null
2026-02-24 FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill Rakshith Jayanth et.al. 2602.20515 null
2026-02-23 KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem Seongjin Cha et.al. 2602.20217 null
2026-02-21 MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs Dongwei Wang et.al. 2602.20191 null
2026-02-23 ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads? Ayush Nangia et.al. 2602.19594 null
2026-02-22 A Power Market Model with Hypersaclers and Modular Datacenters Yihsu Chen et.al. 2602.19310 null
2026-02-22 Scaling Inference-Time Computation via Opponent Simulation: Enabling Online Strategic Adaptation in Repeated Negotiation Xiangyu Liu et.al. 2602.19309 null
2026-02-21 WANSpec: Leveraging Global Compute Capacity for LLM Inference Noah Martin et.al. 2602.18931 null
2026-02-25 BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS Omar Basit et.al. 2602.18755 null
2026-02-21 HillInfer: Efficient Long-Context LLM Inference on the Edge with Hierarchical KV Eviction using SmartSSD He Sun et.al. 2602.18750 null
2026-02-24 RPU – A Reasoning Processing Unit Matthew Adiletta et.al. 2602.18568 null
2026-02-20 Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative Filtering Jiayi Wu et.al. 2602.18249 null
2026-02-24 MASPO: Unifying Gradient Utilization, Probability Mass, and Signal Reliability for Robust and Sample-Efficient LLM Reasoning Xiaoliang Fu et.al. 2602.17550 null
2026-02-19 Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs Arka Pal et.al. 2602.17223 null
2026-02-18 Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark Charalampos Mastrokostas et.al. 2602.16811 null
2026-02-18 Privacy-Aware Split Inference with Speculative Decoding for Large Language Models over Wide-Area Networks Michael Cunningham et.al. 2602.16760 null
2026-02-18 FlowPrefill: Decoupling Preemption from Prefill Scheduling Granularity to Mitigate Head-of-Line Blocking in LLM Serving Chia-chi Hsieh et.al. 2602.16603 null
2026-02-18 LLM-Driven Intent-Based Privacy-Aware Orchestration Across the Cloud-Edge Continuum Zijie Su et.al. 2602.16100 null
2026-02-17 CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill Bradley McDanel et.al. 2602.16054 null
2026-02-17 MoE-Spec: Expert Budgeting for Efficient Speculative Decoding Bradley McDanel et.al. 2602.16052 null
2026-02-17 Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation Shutian Gu et.al. 2602.15724 null
2026-02-17 LLM-as-Judge on a Budget Aadirupa Saha et.al. 2602.15481 null
2026-02-16 Text Style Transfer with Parameter-efficient LLM Finetuning and Round-trip Translation Ruoxi Liu et.al. 2602.15013 null
2026-02-16 Efficient Multi-round LLM Inference over Disaggregated Serving Wenhao He et.al. 2602.14516 null
2026-02-16 WiSparse: Boosting LLM Inference Efficiency with Weight-Aware Mixed Activation Sparsity Lei Chen et.al. 2602.14452 null
2026-02-15 HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming Jiahui Chen et.al. 2602.14214 null
2026-02-14 ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System Hao Kang et.al. 2602.13692 null
2026-02-13 Characterize LSM-tree Compaction Performance via On-Device LLM Inference Jiabiao Ding et.al. 2602.12669 null
2026-02-13 Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats Pengxiang Zhao et.al. 2602.12635 null
2026-02-13 TensorCommitments: A Lightweight Verifiable Inference for Language Models Oguzhan Baser et.al. 2602.12630 null
2026-02-12 OServe: Accelerating LLM Serving via Spatial-Temporal Workload Orchestration Youhe Jiang et.al. 2602.12151 null
2026-02-12 PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving Sunghyeon Woo et.al. 2602.12029 null
2026-02-12 Predicting LLM Output Length via Entropy-Guided Representations Huanyi Xie et.al. 2602.11812 null
2026-02-12 Deep Kernel Fusion for Transformers Zixi Zhang et.al. 2602.11808 null
2026-02-12 GORGO: Maximizing KV-Cache Reuse While Minimizing Network Latency in Cross-Region LLM Load Balancing Alessio Ricci Toniolo et.al. 2602.11688 null
2026-02-12 LoRA-based Parameter-Efficient LLMs for Continuous Learning in Edge-based Malware Detection Christian Rondanini et.al. 2602.11655 null
2026-02-12 PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models Eunyeong Cho et.al. 2602.11530 null
2026-02-12 PAM: Processing Across Memory Hierarchy for Efficient KV-centric LLM Serving System Lian Liu et.al. 2602.11521 null
2026-02-12 Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt Yujie Gu et.al. 2602.11513 null
2026-02-12 Cachemir: Fully Homomorphic Encrypted Inference of Generative Large Language Model with KV Cache Ye Yu et.al. 2602.11470 null
2026-02-12 FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight Jiayi Zhou et.al. 2602.11136 null
2026-02-11 Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise Abhishek Saini et.al. 2602.11088 null
2026-02-11 BOute: Cost-Efficient LLM Serving with Heterogeneous LLMs and GPUs via Multi-Objective Bayesian Optimization Youhe Jiang et.al. 2602.10729 null
2026-02-12 S-GRec: Personalized Semantic-Aware Generative Recommendation with Asymmetric Advantage Jie Jiang et.al. 2602.10606 null
2026-02-12 QTALE: Quantization-Robust Token-Adaptive Layer Execution for LLMs Kanghyun Noh et.al. 2602.10431 null
2026-02-10 Beyond SMILES: Evaluating Agentic Systems for Drug Discovery Edward Wijaya et.al. 2602.10163 null
2026-02-12 Internalizing Multi-Agent Reasoning for Accurate and Efficient LLM-based Recommendation Yang Wu et.al. 2602.09829 null
2026-02-12 Efficient Remote Prefix Fetching with GPU-native Media ASICs Liang Mi et.al. 2602.09725 null
2026-02-10 MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering Sieun Hyeon et.al. 2602.09642 null
2026-02-10 Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning Zhida Jiang et.al. 2602.09578 null
2026-02-10 LLM-CoOpt: A Co-Design and Optimization Framework for Efficient LLM Inference on Heterogeneous Platforms Jie Kong et.al. 2602.09323 null
2026-02-09 PABU: Progress-Aware Belief Update for Efficient LLM Agents Haitao Jiang et.al. 2602.09138 null
2026-02-09 Benchmarking the Energy Savings with Speculative Decoding Strategies Rohit Dutta et.al. 2602.09113 null
2026-02-09 FlattenGPT: Depth Compression for Transformer with Layer Flattening Ruihan Xu et.al. 2602.08858 null
2026-02-09 Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems Lang Feng et.al. 2602.08847 null
2026-02-09 QUOKA: Query-Oriented KV Selection For Efficient LLM Prefill Dalton Jones et.al. 2602.08722 null
2026-02-09 Near-Oracle KV Selection via Pre-hoc Sparsity for Long-Context Inference Yifei Gao et.al. 2602.08329 null
2026-02-10 Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices Alejandro Ruiz y Mesa et.al. 2602.08060 null
2026-02-08 Accuracy-Delay Trade-Off in LLM Offloading via Token-Level Uncertainty Yumin Kim et.al. 2602.07958 null
2026-02-08 MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation Yu Zhao et.al. 2602.07905 null
2026-02-08 Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model Tianyi Wang et.al. 2602.07878 null
2026-02-10 ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs Yanlin Qi et.al. 2602.07721 null
2026-02-07 A Two-Layer Framework for Joint Online Configuration Selection and Admission Control Owen Shen et.al. 2602.07663 null
2026-02-07 Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference Hoang Anh Duy Le et.al. 2602.07397 null
2026-02-07 Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization Chong Wang et.al. 2602.07306 null
2026-02-06 SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding Yikang Yue et.al. 2602.07223 null
2026-02-06 When RL Meets Adaptive Speculative Training: A Unified Training-Serving System Junxiong Wang et.al. 2602.06932 null
2026-02-06 DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving Ying Yuan et.al. 2602.06502 null
2026-02-06 Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making Khurram Yamin et.al. 2602.06286 null
2026-02-06 RoPE-LIME: RoPE-Space Locality + Sparse-K Sampling for Efficient LLM Attribution Isaac Picov et.al. 2602.06275 null
2026-02-03 PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference Rui Ning et.al. 2602.06072 null
2026-02-05 Towards Green AI: Decoding the Energy of LLM Inference in Software Development Lola Solovyeva et.al. 2602.05712 null
2026-02-05 Determining Energy Efficiency Sweet Spots in Production LLM Inference Hiari Pizzini Cavagna et.al. 2602.05695 null
2026-02-05 Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers Jingkai Huang et.al. 2602.05395 null
2026-02-05 RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs Youngcheon You et.al. 2602.05367 null
2026-02-05 TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference Jiyoung Park et.al. 2602.05145 null
2026-02-04 GPU-to-Grid: Voltage Regulation via GPU Utilization Control Zhirui Liang et.al. 2602.05116 null
2026-02-04 LinGO: A Linguistic Graph Optimization Framework with LLMs for Interpreting Intents of Online Uncivil Discourse Yuan Zhang et.al. 2602.04693 null
2026-02-04 Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference Xinyu Wang et.al. 2602.04595 null
2026-02-04 LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding Gang Lin et.al. 2602.04541 null
2026-02-04 Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning Yansong Ning et.al. 2602.04284 null
2026-02-04 BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models Junyu Chen et.al. 2602.04163 null
2026-02-03 MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling Ning Ding et.al. 2602.03359 null
2026-02-03 DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference Jiancai Ye et.al. 2602.03184 null
2026-02-03 NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference Jiangyong Yu et.al. 2602.02988 null
2026-02-03 Large-Scale LLM Inference with Heterogeneous Workloads: Prefill-Decode Contention and Asymptotically Optimal Control Ruihan Lin et.al. 2602.02987 null
2026-02-03 3D-Learning: Diffusion-Augmented Distributionally Robust Decision-Focused Learning Jiaqi Wen et.al. 2602.02943 null
2026-02-02 A Single Revision Step Improves Token-Efficient LLM Reasoning Yingchuan Zhang et.al. 2602.02828 null
2026-02-02 Trust by Design: Skill Profiles for Transparent, Cost-Aware LLM Routing Mika Okamoto et.al. 2602.02386 null
2026-02-02 Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing Lingkun Long et.al. 2602.02159 null
2026-02-02 Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation? Susan Liang et.al. 2602.01623 null
2026-02-01 Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models Katrina Brown et.al. 2602.01237 null
2026-02-01 Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching Tianhao Miao et.al. 2602.01233 null
2026-02-01 A State-Transition Framework for Efficient LLM Reasoning Liang Zhang et.al. 2602.01198 null
2026-02-01 ReLayout: Versatile and Structure-Preserving Design Layout Editing via Relation-Aware Design Reconstruction Jiawei Lin et.al. 2602.01046 null
2026-02-01 ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning Zhishen Sun et.al. 2602.01003 null
2026-01-31 Sparsity-Aware Unlearning for Large Language Models Yuze Wang et.al. 2602.00577 null
2026-01-30 Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity Aayush Gautam et.al. 2602.00397 null
2026-01-30 Harvest: Opportunistic Peer-to-Peer GPU Caching for LLM Inference Nikhil Gopal et.al. 2602.00328 null
2026-01-30 EigenAI: Deterministic Inference, Verifiable Results David Ribeiro Alves et.al. 2602.00182 null
2026-01-30 Safer Policy Compliance with Dynamic Epistemic Fallback Joseph Marvin Imperial et.al. 2601.23094 null
2026-01-30 InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning Junyou Su et.al. 2601.23006 null
2026-01-30 Competitive Non-Clairvoyant KV-Cache Scheduling for LLM Inference Yiding Feng et.al. 2601.22996 null
2026-01-30 Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding Zhanglu Yan et.al. 2601.22876 null
2026-01-30 OSNIP: Breaking the Privacy-Utility-Efficiency Trilemma in LLM Inference via Obfuscated Semantic Null Space Zhiyuan Cao et.al. 2601.22752 null
2026-01-30 CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control Qiaoling Chen et.al. 2601.22705 null
2026-01-30 Small is Beautiful: A Practical and Efficient Log Parsing Framework Minxing Wang et.al. 2601.22590 null
2026-01-30 SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation Ruiqi Zheng et.al. 2601.22543 null
2026-01-30 Towards Resiliency in Large Language Model Serving with KevlarFlow Shangshu Qian et.al. 2601.22438 null
2026-01-29 Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use Julien Delavande et.al. 2601.22362 null
2026-01-29 Small Talk, Big Impact: The Energy Cost of Thanking AI Julien Delavande et.al. 2601.22357 null
2026-01-29 Causal Autoregressive Diffusion Language Model Junhao Ruan et.al. 2601.22031 null
2026-01-29 A Unified XAI-LLM Approach for EndotrachealSuctioning Activity Recognition Hoang Khang Phan et.al. 2601.21802 null
2026-01-29 EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference Bronislav Sidik et.al. 2601.21758 null
2026-01-29 ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management Zaifeng Pan et.al. 2601.21473 null
2026-01-29 Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving Chendong Song et.al. 2601.21351 null
2026-01-29 Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks Arther Tian et.al. 2601.21189 null
2026-01-28 ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference Ketan Thakkar et.al. 2601.21109 null
2026-01-29 ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler Bohua Zou et.al. 2601.20755 null
2026-01-29 DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning Yanlin Wang et.al. 2601.20615 null
2026-01-28 TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs Minjae Lee et.al. 2601.20357 null
2026-01-28 Beyond Speedup – Utilizing KV Cache for Sampling and Reasoning Zeyu Xing et.al. 2601.20326 null
2026-01-28 SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips Jiahuan Yu et.al. 2601.20309 null
2026-01-28 LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis Marcus Emmanuel Barnes et.al. 2601.20148 null
2026-01-27 Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering Fangan Dong et.al. 2601.19847 null
2026-01-27 Algorithmic Prompt-Augmentation for Efficient LLM-Based Heuristic Design for A* Search Thomas Bömer et.al. 2601.19622 null
2026-01-29 PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving Systems Amit Singh Bhatti et.al. 2601.19402 null
2026-01-27 DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference Fuliang Liu et.al. 2601.19278 null
2026-01-29 Native LLM and MLLM Inference at Scale on Apple Silicon Wayner Barrios et.al. 2601.19139 null
2026-01-26 Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective Fangzhou Wu et.al. 2601.18999 null
2026-01-26 Flatter Tokens are More Valuable for Speculative Draft Model Training Jiaming Fan et.al. 2601.18902 null
2026-01-26 Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B Jaiyoung Park et.al. 2601.18511 null
2026-01-26 CovertComBench: The First Domain-Specific Testbed for LLMs in Wireless Covert Communication Zhaozhi Liu et.al. 2601.18315 null
2026-01-26 FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning Lin Sun et.al. 2601.18116 null
2026-01-25 A Universal Load Balancing Principle and Its Application to Large Language Model Serving Zixi Chen et.al. 2601.17855 null
2026-01-25 LLM-42: Enabling Determinism in LLM Inference with Verified Speculation Raja Gond et.al. 2601.17768 null
2026-01-25 Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction Jang-Hyun Kim et.al. 2601.17668 null
2026-01-24 GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference Thomas Ziller et.al. 2601.17551 null
2026-01-24 Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning Lianlei Shan et.al. 2601.17275 null
2026-01-22 FlexLLM: Composable HLS Library for Flexible Hybrid LLM Accelerator Design Jiahao Zhang et.al. 2601.15710 null
2026-01-21 Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform Jiazhu Xie et.al. 2601.15528 null
2026-01-21 MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification Jingwei Song et.al. 2601.15498 null
2026-01-21 DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs Mingxuan Song et.al. 2601.14711 null
2026-01-21 QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design Nilesh Prasad Pandey et.al. 2601.14549 null
2026-01-20 Confident Rankings with Fewer Items: Adaptive LLM Evaluation with Continuous Scores Esma Balkır et.al. 2601.13885 null
2026-01-20 ELSA: Efficient LLM-Centric Split Aggregation for Privacy-Aware Hierarchical Federated Learning over Resource-Constrained Edge Networks Xiaohong Yang et.al. 2601.13824 null
2026-01-20 HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference Zhiyuan Shi et.al. 2601.13684 null
2026-01-20 PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator Yue Jiet Chong et.al. 2601.13628 null
2026-01-19 Explicit Cognitive Allocation: A Principle for Governed and Auditable Inference in Large Language Models Héctor Manuel Manzanilla-Granados et.al. 2601.13443 null
2026-01-19 Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference Zimeng Wu et.al. 2601.13155 null
2026-01-19 FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference Chaeyoung Jung et.al. 2601.13143 null
2026-01-19 Sutradhara: An Intelligent Orchestrator-Engine Co-design for Tool-based Agentic Inference Anish Biswas et.al. 2601.12967 null
2026-01-19 From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation Jiahao Wang et.al. 2601.12904 null
2026-01-23 An Evolutionary Framework for Automatic Optimization Benchmark Generation via Large Language Models Yuhiro Ono et.al. 2601.12723 null
2026-01-18 Power Aware Dynamic Reallocation For Inference Yiwei Jiang et.al. 2601.12241 null
2026-01-16 RAPID-Serve: Resource-efficient and Accelerated P/D Intra-GPU Disaggregation Amna Masood et.al. 2601.11822 null
2026-01-16 PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation Yu Yang et.al. 2601.11702 null
2026-01-16 HALO: Semantic-Aware Distributed LLM Inference in Lossy Edge Network Peirong Zheng et.al. 2601.11676 null
2026-01-15 WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching Xiangchen Li et.al. 2601.11652 null
2026-01-16 FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning Zhihan Yang et.al. 2601.11311 null
2026-01-16 SwiftKV: An Edge-Oriented Attention Algorithm and Multi-Head Accelerator for Fast, Efficient LLM Decoding Junming Zhang et.al. 2601.10953 null
2026-01-15 Mugi: Value Level Parallelism For Efficient LLMs Daniel Price et.al. 2601.10823 null
2026-01-14 Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs Jonathan Knoop et.al. 2601.09527 null
2026-01-19 RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering Wencheng Ye et.al. 2601.09269 null
2026-01-14 LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference Du Yin et.al. 2601.09258 null
2026-01-14 Evaluating local large language models for structured extraction from endometriosis-specific transvaginal ultrasound reports Haiyi Li et.al. 2601.09053 null
2026-01-13 HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding Qitan Lv et.al. 2601.08273 null
2026-01-13 Coordinated Cooling and Compute Management for AI Datacenters Nardos Belay Abera et.al. 2601.08113 null
2026-01-13 Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment Qitao Tan et.al. 2601.08089 null
2026-01-12 Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference Rei Taniguchi et.al. 2601.07667 null
2026-01-12 ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs Haoqian Meng et.al. 2601.07475 null
2026-01-12 TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees Tianyu Liu et.al. 2601.07353 null
2026-01-12 Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition Tanmay Joshi et.al. 2601.07239 null
2026-01-11 MicLog: Towards Accurate and Efficient LLM-based Log Parsing via Progressive Meta In-Context Learning Jianbo Yu et.al. 2601.07005 null
2026-01-09 AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving Tianhao Xu et.al. 2601.06288 null
2026-01-07 AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization Zhiqiang Wang et.al. 2601.06177 null
2026-01-08 Publishing FAIR and Machine-actionable Reviews in Materials Science: The Case for Symbolic Knowledge in Neuro-symbolic Artificial Intelligence Jennifer D’Souza et.al. 2601.05051 null
2026-01-14 Challenges and Research Directions for Large Language Model Inference Hardware Xiaoyu Ma et.al. 2601.05047 null
2026-01-08 CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters Ao Sun et.al. 2601.04885 null
2026-01-08 Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence Shengyin Sun et.al. 2601.04766 null
2026-01-08 GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models Maanas Taneja et.al. 2601.04719 null
2026-01-08 Prior-Informed Zeroth-Order Optimization with Adaptive Direction Alignment for Memory-Efficient LLM Fine-Tuning Feihu Jin et.al. 2601.04710 null
2026-01-07 XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs Linzhang Li et.al. 2601.04426 null
2026-01-06 Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning Yu Luo et.al. 2601.03320 null
2026-01-01 $α^3$ -Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks Mohamed Amine Ferrag et.al. 2601.03281 null
2026-01-06 Joint Encoding of KV-Cache Blocks for Scalable LLM Serving Joseph Kampeas et.al. 2601.03067 null
2026-01-05 LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference Hossein Rajabzadeh et.al. 2601.02569 null
2026-01-04 Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration Albert Sadowski et.al. 2601.01609 null
2026-01-06 Making MoE-based LLM Inference Resilient with Tarragon Songyu Zhang et.al. 2601.01310 null
2026-01-08 From Policy to Logic for Efficient and Interpretable Coverage Assessment Rhitabrat Pokharel et.al. 2601.01266 null
2025-12-31 Universal Conditional Logic: A Formal Language for Prompt Engineering Anthony Mikinka et.al. 2601.00880 null
2026-01-02 HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts Zihan Fang et.al. 2601.00583 null
2026-01-01 Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving Amey Agrawal et.al. 2601.00397 null
2026-01-01 FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems Shanli Xing et.al. 2601.00227 null
2025-12-31 FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference Fen-Yu Hsieh et.al. 2512.24713 null
2026-01-04 Hardware Acceleration for Neural Networks: A Comprehensive Survey Bin Xu et.al. 2512.23914 null
2025-12-29 Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding Yue Guan et.al. 2512.23858 null
2025-12-25 Break Out the Silverware – Semantic Understanding of Stored Household Items Michaela Levi-Richter et.al. 2512.23739 null
2025-12-28 Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware Alex Khalil et.al. 2512.23029 null
2025-12-28 Argus: Token Aware Distributed LLM Inference Optimization Panlong Wu et.al. 2512.22925 null
2025-12-27 Modality Inflation: Energy Characterization and Optimization Opportunities for MLLM Inference Mona Moghadampanah et.al. 2512.22695 null
2025-12-27 Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving Rui Li et.al. 2512.22420 null
2025-12-22 Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs Xinhao Cheng et.al. 2512.22219 null
2025-12-20 MatKV: Trading Compute for Flash Storage in LLM Inference Kun-Woo Shin et.al. 2512.22195 null
2025-12-26 Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling Hannah Atmer et.al. 2512.22066 null
2025-12-26 Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models Tingyang Sun et.al. 2512.21884 null
2025-12-26 LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices Mingyu Sun et.al. 2512.21835 null
2025-12-25 nncase: An End-to-End Compiler for Efficient LLM Deployment on Heterogeneous Storage Architectures Hui Guo et.al. 2512.21571 null
2025-12-25 Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model Yanhao Li et.al. 2512.21540 null
2025-12-23 KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System Zhongyu Xia et.al. 2512.20299 null
2025-12-23 Predictive-LoRA: A Proactive and Fragmentation-Aware Serverless Inference System for LLMs Yinan Ni et.al. 2512.20210 null
2025-12-23 Concept Generalization in Humans and Large Language Models: Insights from the Number Game Arghavan Bazigaran et.al. 2512.20162 null
2025-12-22 Demystifying LLM-as-a-Judge: Analytically Tractable Model for Inference-Time Scaling Indranil Halder et.al. 2512.19905 null
2025-12-22 L4: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling Yitao Yuan et.al. 2512.19179 null
2025-12-22 FASTRIC: Prompt Specification Language for Verifiable LLM Interactions Wen-Long Jin et.al. 2512.18940 null
2025-12-20 LLM-based Few-Shot Early Rumor Detection with Imitation Agent Fengzhu Zeng et.al. 2512.18352 null
2025-12-20 TraCT: Disaggregated LLM Serving with CXL Shared Memory KV Cache at Rack-Scale Dongha Yoon et.al. 2512.18194 null
2025-12-20 Making Strong Error-Correcting Codes Work Effectively for HBM in AI Inference Rui Xie et.al. 2512.18152 null
2025-12-19 Specification and Detection of LLM Code Smells Brahim Mahmoudi et.al. 2512.18020 null
2025-12-19 CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs Gunho Park et.al. 2512.17970 null
2025-12-19 Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing Lingxiao Zhao et.al. 2512.17574 null
2025-12-22 Learning What to Write: Write-Gated KV for Efficient Long-Context Inference Yen-Chieh Huang et.al. 2512.17452 null
2025-12-18 Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving Jiakun Fan et.al. 2512.17077 null
2025-12-18 MEPIC: Memory Efficient Position Independent Caching for LLM Serving Qian Wang et.al. 2512.16822 null
2025-12-18 Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference Dhruv Deshmukh et.al. 2512.16391 null
2025-12-18 Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference Arther Tian et.al. 2512.16317 null
2025-12-18 Fast Collaborative Inference via Distributed Speculative Decoding Ce Zheng et.al. 2512.16273 null
2025-12-18 Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference Jian Tian et.al. 2512.16134 null
2025-12-18 WeMusic-Agent: Efficient Conversational Music Recommendation via Knowledge Internalization and Agentic Boundary Learning Wendong Bi et.al. 2512.16108 null
2025-12-19 LLM4Perf: Large Language Models Are Effective Samplers for Multi-Objective Performance Modeling Xin Wang et.al. 2512.16070 null
2025-12-18 MultiPath Transfer Engine: Breaking GPU and Host-Memory Bandwidth Bottlenecks in LLM Services Lingfeng Tang et.al. 2512.16056 null
2025-12-16 EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving Shaoting Feng et.al. 2512.14946 null
2025-12-16 Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement Songze Liu et.al. 2512.14151 null
2025-12-16 RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees Junjie Ma et.al. 2512.14069 null
2025-12-16 MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning Haoyu Fu et.al. 2512.13636 null
2025-12-15 PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving Weizhe Huang et.al. 2512.12928 null
2025-12-14 Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM Furong Jia et.al. 2512.12868 null
2025-12-14 Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution Boyang Yan et.al. 2512.12806 null
2025-12-14 Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P Anurag Dutt et.al. 2512.12801 null
2025-12-19 V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval Donghyuk Kim et.al. 2512.12284 null
2025-12-13 WATOS: Efficient LLM Training Strategies and Architecture Co-exploration for Wafer-scale Chip Huizheng Wang et.al. 2512.12279 null
2025-12-12 Learning to Extract Context for Context-Aware LLM Inference Minseon Kim et.al. 2512.11986 null
2025-12-11 CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving Dong Liu et.al. 2512.11920 null
2025-12-12 PD-Swap: Prefill-Decode Logic Swapping for End-to-End LLM Inference on Edge FPGAs via Dynamic Partial Reconfiguration Yifan Zhang et.al. 2512.11550 null
2025-12-12 xGR: Efficient Generative Recommendation Serving at Scale Qingxiao Sun et.al. 2512.11529 null
2025-12-12 AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference Kuan-Wei Lu et.al. 2512.11280 null
2025-12-12 Adaptive Soft Rolling KV Freeze with Entropy-Guided Recovery: Sublinear Memory Growth for Efficient LLM Inference Adilet Metinov et.al. 2512.11221 null
2025-12-11 ESS: An Offload-Centric Latent-Cache Management Architecture for DeepSeek-V3.2-Exp Xinhang Chen et.al. 2512.10576 null
2025-12-11 LLM-Auction: Generative Auction towards LLM-Native Advertising Chujie Zhao et.al. 2512.10551 null
2025-12-12 BAMBO: Construct Ability and Efficiency LLM Pareto Set via Bayesian Adaptive Multi-objective Block-wise Optimization Kesheng Chen et.al. 2512.09972 null
2025-12-10 GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference Phuong Tran et.al. 2512.09963 null
2025-12-07 ELANA: A Simple Energy and Latency Analyzer for LLMs Hung-Yueh Chiang et.al. 2512.09946 null
2025-12-11 Exqutor: Extended Query Optimizer for Vector-augmented Analytical Queries Hyunjoon Kim et.al. 2512.09695 null
2025-12-10 WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving Chiheng Lou et.al. 2512.09472 null
2025-12-10 ODMA: On-Demand Memory Allocation Framework for LLM Serving on LPDDR-Class Accelerators Guoqiang Zou et.al. 2512.09427 null
2025-12-10 RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference Siyuan Ma et.al. 2512.09304 null
2025-12-09 LaMoSys3.5D: Enabling 3.5D-IC-Based Large Language Model Inference Serving Systems via Hardware/Software Co-Design Qipan Wang et.al. 2512.08731 null
2025-12-09 Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging Yi Pan et.al. 2512.08365 null
2025-12-08 LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples Yezi Liu et.al. 2512.07375 null
2025-12-08 Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning Yezi Liu et.al. 2512.07374 null
2025-12-08 NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models Feng Liang et.al. 2512.07218 null
2025-12-08 FOAM: Blocked State Folding for Memory-Efficient LLM Training Ziqing Wen et.al. 2512.07112 null
2025-12-08 Leveraging KV Similarity for Online Structured Pruning in LLMs Jungmin Lee et.al. 2512.07090 null
2025-12-11 LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding Yu Yu et.al. 2512.06982 null
2025-12-07 PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance Jifar Wakuma Ayana et.al. 2512.06747 null
2025-12-07 KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models Sourjya Roy et.al. 2512.06727 null
2025-12-06 Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices Xiangyu Li et.al. 2512.06443 null
2025-12-05 Compass: Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads Boyu Li et.al. 2512.06093 null
2025-12-05 MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution Sara Patel et.al. 2512.05958 null
2025-12-05 KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity Damien Lesens et.al. 2512.05916 null
2025-12-05 From Text to Returns: Using Large Language Models for Mutual Fund Portfolio Optimization and Risk-Adjusted Allocation Abrar Hossain Mufakir Qamar Ansari Haziq Jeelani Monia Digra Fayeq Jeelani Syed et.al. 2512.05907 null
2025-12-05 Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework Tasnimul Hassan et.al. 2512.05863 null
2025-12-05 Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning Jinlong Liu et.al. 2512.05747 null
2025-12-05 Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision Lennart Maack et.al. 2512.05740 null
2025-12-05 Efficient Text Classification with Conformal In-Context Learning Ippokratis Pantelidis et.al. 2512.05732 null
2025-12-05 LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving Yiming Shu et.al. 2512.05686 null
2025-12-05 A Greek Government Decisions Dataset for Public-Sector Analysis and Insight Giorgos Antoniou et.al. 2512.05647 null
2025-12-05 ProPhy: Progressive Physical Alignment for Dynamic World Simulation Zijun Wang et.al. 2512.05564 null
2025-12-05 Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models Weijue Bu et.al. 2512.05546 null
2025-12-05 RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs Jonathan Geuter et.al. 2512.05542 null
2025-12-05 Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches Namu Park et.al. 2512.05537 null
2025-12-05 Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement Nils Strassenburg et.al. 2512.05525 null
2025-12-05 Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning Chinthani Sugandhika et.al. 2512.05513 null
2025-12-05 A Hybrid Approach for EMF Code Generation:Code Templates Meet Large Language Models Xiao He et.al. 2512.05498 null
2025-12-05 Knowing Your Uncertainty – On the application of LLM in social sciences Bolun Zhang et.al. 2512.05461 null
2025-12-05 BEAVER: An Efficient Deterministic LLM Verifier Tarun Suresh et.al. 2512.05439 null
2025-12-05 A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems Pranav Pushkar Mishra et.al. 2512.05411 null
2025-12-05 SQ-format: A Unified Sparse-Quantized Hardware-friendly Data Format for LLMs Ruixuan Huang et.al. 2512.05409 null
2025-12-04 Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning Purbesh Mitra et.al. 2512.05105 null
2025-12-04 David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design? Shashwat Shankar et.al. 2512.05073 null
2025-12-04 Arbitrage: Efficient Reasoning via Advantage-Aware Speculation Monishwaran Maheswaran et.al. 2512.05033 null
2025-12-04 SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs Hao Wang et.al. 2512.04868 null
2025-12-04 Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing Rasul Tutunov et.al. 2512.04829 null
2025-12-04 MemLoRA: Distilling Expert Adapters for On-Device Memory Systems Massimo Bini et.al. 2512.04763 null
2025-12-04 EtCon: Edit-then-Consolidate for Reliable Knowledge Editing Ruilin Li et.al. 2512.04753 null
2025-12-04 RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting Siqi Wang et.al. 2512.04752 null
2025-12-04 Model Whisper: Steering Vectors Unlock Large Language Models’ Potential in Test-time Xinyue Kang et.al. 2512.04748 null
2025-12-04 SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs Wenhua Cheng et.al. 2512.04746 null
2025-12-04 OsmT: Bridging OpenStreetMap Queries and Natural Language with Open-source Tag-aware Language Models Zhuoyue Wan et.al. 2512.04738 null
2025-12-04 Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild Yigui Feng et.al. 2512.04728 null
2025-12-04 TRINITY: An Evolved LLM Coordinator Jinglue Xu et.al. 2512.04695 null
2025-12-04 Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective Jae Hee Lee et.al. 2512.04691 null
2025-12-04 PBFuzz: Agentic Directed Fuzzing for PoV Generation Haochen Zeng et.al. 2512.04611 null
2025-12-04 Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space Joey Hong et.al. 2512.04601 null
2025-12-04 A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution Huifeng Zhu et.al. 2512.04580 null
2025-12-04 On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference Yue Yu et.al. 2512.04558 null
2025-12-04 AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees Yangning Li et.al. 2512.04550 null
2025-12-04 EvoEdit: Lifelong Free-Text Knowledge Editing through Latent Perturbation Augmentation and Knowledge-driven Parameter Fusion Pengfei Cao et.al. 2512.04545 null
2025-12-04 LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models Jiaqi Sun et.al. 2512.04474 null
2025-12-03 Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study Yixuan Li et.al. 2512.04031 null
2025-12-03 AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving Ying Wang et.al. 2512.04013 null
2025-12-03 Training-Free Policy Violation Detection via Activation-Space Whitening in LLMs Oren Rachmil et.al. 2512.03994 null
2025-12-03 Sponsored Questions and How to Auction Them Kshipra Bhawalkar et.al. 2512.03975 null
2025-12-03 OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference Liujianfu Wang et.al. 2512.03927 null
2025-12-03 UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework Youxin Pang et.al. 2512.03918 null
2025-12-03 Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers Hongzhan Lin et.al. 2512.03870 null
2025-12-03 Training and Evaluation of Guideline-Based Medical Reasoning in LLMs Michael Staniek et.al. 2512.03838 null
2025-12-03 Log Probability Tracking of LLM APIs Timothée Chauvin et.al. 2512.03816 null
2025-12-03 Enhancing Instruction-Following Capabilities in Seq2Seq Models: DoLA Adaptations for T5 Huey Sun et.al. 2512.03803 null
2025-12-03 RoCo: Role-Based LLMs Collaboration for Automatic Heuristic Design Jiawei Xu et.al. 2512.03762 null
2025-12-03 AR-Med: Automated Relevance Enhancement in Medical Search via LLM-Driven Information Augmentation Chuyue Wang et.al. 2512.03737 null
2025-12-03 Tutorial on Large Language Model-Enhanced Reinforcement Learning for Wireless Networks Lingyi Cai et.al. 2512.03722 null
2025-12-03 Knowing oneself with and through AI: From self-tracking to chatbots Lucy Osler et.al. 2512.03682 null
2025-12-03 ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers Feice Huang et.al. 2512.03673 null
2025-12-03 Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning Ge-Peng Ji et.al. 2512.03667 null
2025-12-03 FFTrainer: Fast Failover in Large-Language Model Training with Almost-Free State Management Bohan Zhao et.al. 2512.03644 null
2025-12-03 KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing Lishuo Deng et.al. 2512.03608 null
2025-12-03 EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths Zhening Li et.al. 2512.03571 null
2025-12-03 State Space Models for Bioacoustics: A comparative Evaluation with Transformers Chengyu Tang et.al. 2512.03563 null
2025-12-03 TokenScale: Timely and Accurate Autoscaling for Disaggregated LLM Serving with Token Velocity Ruiqi Lai et.al. 2512.03416 null
2025-12-03 Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs Ngoc Bui et.al. 2512.03324 null
2025-12-02 LORE: A Large Generative Model for Search Relevance Chenji Lu et.al. 2512.03025 null
2025-12-02 TokenPowerBench: Benchmarking the Power Consumption of LLM Inference Chenxu Niu et.al. 2512.03024 null
2025-12-02 Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge Hamid Dadkhahi et.al. 2512.03019 null
2025-12-02 From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars? Dawei Li et.al. 2512.03005 null
2025-12-02 FAIRY2I: Universal Extremely-Low Bit QAT framework via Widely-Linear Representation and Phase-Aware Quantization Feiyu Wang et.al. 2512.02901 null
2025-12-02 MindGPT-4ov: An Enhanced MLLM via a Multi-Stage Post-Training Paradigm Wei Chen et.al. 2512.02895 null
2025-12-02 OptPO: Optimal Rollout Allocation for Test-time Policy Optimization Youkang Wang et.al. 2512.02882 null
2025-12-02 Network Self-Configuration based on Fine-Tuned Small Language Models Oscar G. Lira et.al. 2512.02861 null
2025-12-02 GraphMatch: Fusing Language and Graph Representations in a Dynamic Two-Sided Work Marketplace Mikołaj Sacha et.al. 2512.02849 null
2025-12-02 Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages Lechen Zhang et.al. 2512.02841 null
2025-12-02 Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach Siyuan Yang et.al. 2512.02834 null
2025-12-02 A Comparative Study on How Data Normalization Affects Zero-Shot Generalization in Time Series Foundation Models Ihab Ahmed et.al. 2512.02833 null
2025-12-02 Phase-Adaptive LLM Framework with Multi-Stage Validation for Construction Robot Task Allocation: A Systematic Benchmark Against Traditional Optimization Algorithms Shyam prasad reddy Kaitha et.al. 2512.02810 null
2025-12-02 FiMMIA: scaling semantic perturbation-based membership inference across modalities Anton Emelyanov et.al. 2512.02786 null
2025-12-02 PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models Robert Belanec et.al. 2512.02764 null
2025-12-02 RoboWheel: A Data Engine from Real-World Human Demonstrations for Cross-Embodiment Robotic Learning Yuhong Zhang et.al. 2512.02729 null
2025-12-02 AuditCopilot: Leveraging LLMs for Fraud Detection in Double-Entry Bookkeeping Md Abdul Kadir et.al. 2512.02726 null
2025-12-02 Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs Julian Ma et.al. 2512.02719 null
2025-12-02 CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer Lavish Bansal et.al. 2512.02711 null
2025-12-02 VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm Zhenkai Wu et.al. 2512.02700 null
2025-12-01 Trinity: Disaggregating Vector Search from Prefill-Decode Disaggregation in LLM Serving Yi Liu et.al. 2512.02281 null
2025-12-01 Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling Jack Cook et.al. 2512.02010 link
2025-12-01 The Art of Scaling Test-Time Compute for Large Language Models Aradhye Agarwal et.al. 2512.02008 null
2025-12-01 Low-Rank Prehab: Preparing Neural Networks for SVD Compression Haoran Qin et.al. 2512.01980 link
2025-12-01 KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference Sai Gokhale et.al. 2512.01953 null
2025-12-01 Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Zhongyu Yang et.al. 2512.01949 null
2025-12-01 Agentic Policy Optimization via Instruction-Policy Co-Evolution Han Zhou et.al. 2512.01945 link
2025-12-01 An Empirical Study of Agent Developer Practices in AI Agent Frameworks Yanlin Wang et.al. 2512.01939 null
2025-12-01 Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding Zahra Mahdavi et.al. 2512.01922 null
2025-12-01 Latent Debate: A Surrogate Framework for Interpreting LLM Thinking Lihu Chen et.al. 2512.01909 null
2025-12-01 CauSight: Learning to Supersense for Visual Causal Discovery Yize Zhang et.al. 2512.01827 null
2025-12-01 Generating REST API Tests With Descriptive Names Philip Garrett et.al. 2512.01690 null
2025-12-01 DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models Patrick Kwon et.al. 2512.01686 null
2025-12-01 A Systematic Characterization of LLM Inference on GPUs Haonan Wang et.al. 2512.01644 null
2025-12-01 Agent-Kernel: A MicroKernel Multi-Agent System Framework for Adaptive Social Simulation Powered by LLMs Yuren Mao et.al. 2512.01610 null
2025-12-01 LLM2Fx-Tools: Tool Calling For Music Post-Production Seungheon Doh et.al. 2512.01559 null
2025-12-01 LPCD: Unified Framework from Layer-Wise to Submodule Quantization Yuma Ichikawa et.al. 2512.01546 null
2025-12-01 MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages Yexing Du et.al. 2512.01512 null
2025-12-01 Multi-Path Collaborative Reasoning via Reinforcement Learning Jindi Lv et.al. 2512.01485 null
2025-12-01 ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation Rohin Manvi et.al. 2512.01457 null
2025-12-01 \textit{ViRectify}: A Challenging Benchmark for Video Reasoning Correction with Multimodal Large Language Models Xusen Hei et.al. 2512.01424 null
2025-11-30 SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving Bohan Zhao et.al. 2512.00719 null
2025-11-29 Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA Takuto Ando et.al. 2512.00335 null
2025-11-28 Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction Bao Shu et.al. 2511.23476 null
2025-11-28 ThetaEvolve: Test-time Learning on Open Problems Yiping Wang et.al. 2511.23473 link
2025-11-28 Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent Jianzhe Lin et.al. 2511.23436 null
2025-11-28 Hierarchical AI-Meteorologist: LLM-Agent System for Multi-Scale and Explainable Weather Forecast Reporting Daniil Sukhorukov et.al. 2511.23387 null
2025-11-28 Do LLM-judges Align with Human Relevance in Cranfield-style Recommender Evaluation? Gustavo Penha et.al. 2511.23312 null
2025-11-28 MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of Different Agent Interfaces to the Web (Technical Report) Aaron Steiner et.al. 2511.23281 null
2025-11-28 Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs Jiancheng Dong et.al. 2511.23271 null
2025-11-28 Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering Qiming Li et.al. 2511.23231 null
2025-11-28 Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day Milad Abdollahzadeh et.al. 2511.23220 null
2025-11-28 Obstruction reasoning for robotic grasping Runyu Jiao et.al. 2511.23186 null
2025-11-28 HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding Chen Li et.al. 2511.23178 null
2025-11-28 Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models Yujiao Yang et.al. 2511.23136 null
2025-11-28 Evolutionary Discovery of Heuristic Policies for Traffic Signal Control Ruibing Wang et.al. 2511.23122 null
2025-11-28 Dripper: Token-Efficient Main HTML Extraction with a Lightweight LM Mengjie Liu et.al. 2511.23119 null
2025-11-28 Conveying Imagistic Thinking in TCM Translation: A Prompt Engineering and LLM-Based Evaluation Framework Jiatong Han et.al. 2511.23059 null
2025-11-28 Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match Jinze Li et.al. 2511.22972 null
2025-11-28 Experts are all you need: A Composable Framework for Large Language Model Inference Shrihari Sridharan et.al. 2511.22955 null
2025-11-28 Visual Puns from Idioms: An Iterative LLM-T2IM-MLLM Framework Kelaiti Xiao et.al. 2511.22943 null
2025-11-28 RAG-Empowered LLM-Driven Dynamic Radio Resource Management in Open 6G RAN Onur Salan et.al. 2511.22933 null
2025-11-28 AgentShield: Make MAS more secure and efficient Kaixiang Wang et.al. 2511.22924 null
2025-11-28 Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems Shashwat Jaiswal et.al. 2511.22880 null
2025-11-27 PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration Junfei Zhan et.al. 2511.22788 null
2025-11-26 Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework Dong Wang et.al. 2511.21686 null
2025-11-26 DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving Fengze Yu et.al. 2511.21669 null
2025-11-26 TAGFN: A Text-Attributed Graph Dataset for Fake News Detection in the Age of LLMs Kay Liu et.al. 2511.21624 null
2025-11-26 Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining Dongyang Fan et.al. 2511.21613 null
2025-11-26 Auxiliary Metrics Help Decoding Skill Neurons in the Wild Yixiu Zhao et.al. 2511.21610 null
2025-11-26 SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition Peiran Xu et.al. 2511.21471 null
2025-11-26 MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning Junjian Wang et.al. 2511.21460 null
2025-11-26 A Systematic Study of Model Merging Techniques in Large Language Models Oğuz Kağan Hitit et.al. 2511.21437 null
2025-11-26 Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM Tim Trappen et.al. 2511.21413 null
2025-11-26 Prune4Web: DOM Tree Pruning Programming for Web Agent Jiayuan Zhang et.al. 2511.21398 null
2025-11-26 PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark Robert Belanec et.al. 2511.21285 null
2025-11-26 Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale Yicheng Zhong et.al. 2511.21270 null
2025-11-26 Can Finetuing LLMs on Small Human Samples Increase Heterogeneity, Alignment, and Belief-Action Coherence? Steven Wang et.al. 2511.21218 null
2025-11-26 Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation Joonhyung Park et.al. 2511.21185 null
2025-11-26 How to Correctly Report LLM-as-a-Judge Evaluations Chungpa Lee et.al. 2511.21140 null
2025-11-26 Beyond Patch Aggregation: 3-Pass Pyramid Indexing for Vision-Enhanced Document Retrieval Anup Roy et.al. 2511.21121 null
2025-11-26 BRIDGE: Building Representations In Domain Guided Program Verification Robert Joseph George et.al. 2511.21104 null
2025-11-26 MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts Ivan Novikov et.al. 2511.21089 null
2025-11-26 5G Network Automation Using Local Large Language Models and Retrieval-Augmented Generation Ahmadreza Majlesara et.al. 2511.21084 null
2025-11-26 Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning Zhenchao Tang et.al. 2511.21075 null
2025-11-25 LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight Yunze Man et.al. 2511.20648 null
2025-11-25 Latent Collaboration in Multi-Agent Systems Jiaru Zou et.al. 2511.20639 link
2025-11-25 ROOT: Robust Orthogonalized Optimizer for Neural Network Training Wei He et.al. 2511.20626 null
2025-11-25 Copyright Detection in Large Language Models: An Ethical Approach to Generative AI Development David Szczecina et.al. 2511.20623 null
2025-11-25 DiFR: Inference Verification Despite Nondeterminism Adam Karvonen et.al. 2511.20621 null
2025-11-25 Translating Large-Scale C Repositories to Idiomatic Rust Saman Dehghan et.al. 2511.20617 null
2025-11-25 Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models Shamima Hossain et.al. 2511.20531 null
2025-11-25 Assessing LLMs’ Performance: Insights from the Chinese Pharmacist Exam Xinran Wang et.al. 2511.20526 null
2025-11-25 HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation Xiang Wang et.al. 2511.20520 null
2025-11-25 Soft Adaptive Policy Optimization Chang Gao et.al. 2511.20347 null
2025-11-25 The Curious Case of Analogies: Investigating Analogical Reasoning in Large Language Models Taewhoo Lee et.al. 2511.20344 null
2025-11-25 Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios Luohe Shi et.al. 2511.20340 null
2025-11-25 Improving Language Agents through BREW Shashank Kirtania et.al. 2511.20297 null
2025-11-25 APT-CGLP: Advanced Persistent Threat Hunting via Contrastive Graph-Language Pre-Training Xuebo Qiu et.al. 2511.20290 null
2025-11-25 SMoG: Schema Matching on Graph Mingyu Jeon et.al. 2511.20285 null
2025-11-25 Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement Yang Liu et.al. 2511.20280 null
2025-11-25 HVAdam: A Full-Dimension Adaptive Optimizer Yiheng Zhang et.al. 2511.20277 null
2025-11-25 LLM-Driven Transient Stability Assessment: From Automated Simulation to Neural Architecture Design Lianzhe Hu et.al. 2511.20276 null
2025-11-25 Rectified Flow for Vision-Aided mmWave V2I Beam Prediction Can Zheng et.al. 2511.20265 null
2025-11-25 REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance Chuyi Kong et.al. 2511.20233 null
2025-11-24 Be My Eyes: Extending Large Language Models to New Modalities Through Multi-Agent Collaboration James Y. Huang et.al. 2511.19417 null
2025-11-24 Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning Qihan Huang et.al. 2511.19343 link
2025-11-24 Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces Shaltiel Shmidman et.al. 2511.19333 null
2025-11-24 MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization Boyuan Wu et.al. 2511.19253 null
2025-11-24 Learning Plug-and-play Memory for Guiding Video Diffusion Models Selena Song et.al. 2511.19229 link
2025-11-24 Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization Xurui Li et.al. 2511.19218 null
2025-11-24 From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation Moazzam Umer Gondal et.al. 2511.19149 null
2025-11-24 LLMs-Powered Real-Time Fault Injection: An Approach Toward Intelligent Fault Test Cases Generation Mohammad Abboush et.al. 2511.19132 null
2025-11-24 Facilitating the Integration of LLMs Into Online Experiments With Simple Chat R. Bermudez Schettino et.al. 2511.19123 null
2025-11-24 MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images Qirui Wang et.al. 2511.19119 null
2025-11-24 Large Language Model-Assisted Planning of Electric Vehicle Charging Infrastructure with Real-World Case Study Xinda Zheng et.al. 2511.19055 null
2025-11-24 FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning Xin Yuan et.al. 2511.18977 null
2025-11-24 SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression Santhosh G S et.al. 2511.18936 null
2025-11-24 Skeletons Matter: Dynamic Data Augmentation for Text-to-Query Yuchen Ji et.al. 2511.18934 null
2025-11-24 Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations Ryan Wong et.al. 2511.18933 null
2025-11-24 FineXtrol: Controllable Motion Generation via Fine-Grained Text Keming Shen et.al. 2511.18927 null
2025-11-24 BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models Juncheng Li et.al. 2511.18921 null
2025-11-24 EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models Wenhao Xu et.al. 2511.18920 null
2025-11-24 Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference Wengyi Zhan et.al. 2511.18875 null
2025-11-24 KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit Dezhi Ran et.al. 2511.18868 null
2025-11-21 Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models Mark Endo et.al. 2511.17487 link
2025-11-21 SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding Nikolay Nikolov et.al. 2511.17411 null
2025-11-21 That’s not natural: The Impact of Off-Policy Training Data on Probe Performance Nathalie Kirch et.al. 2511.17408 null
2025-11-21 Beyond Multiple Choice: A Hybrid Framework for Unifying Robust Evaluation and Verifiable Reasoning Training Yesheng Liu et.al. 2511.17405 null
2025-11-21 SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion Jiajie Guo et.al. 2511.17308 null
2025-11-21 SlsReuse: LLM-Powered Serverless Function Reuse Jinfeng Wen et.al. 2511.17262 null
2025-11-21 A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback Bulat Khaertdinov et.al. 2511.17255 null
2025-11-21 E $^3$ -Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models Tao Yuan et.al. 2511.17205 null
2025-11-21 AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale Ziyang Wang et.al. 2511.17190 null
2025-11-21 Efficient Robot Design with Multi-Objective Black-Box Optimization and Large Language Models Kento Kawaharazuka et.al. 2511.17178 null
2025-11-21 FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle Mario Markov et.al. 2511.17171 null
2025-11-21 Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models Vy Nguyen et.al. 2511.17170 null
2025-11-21 Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation Yeqin Zhang et.al. 2511.17129 null
2025-11-21 ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better Yuan Zhang et.al. 2511.17106 null
2025-11-21 Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models He Huang et.al. 2511.17094 null
2025-11-21 MUCH: A Multilingual Claim Hallucination Benchmark Jérémie Dentan et.al. 2511.17081 null
2025-11-21 Principled Design of Interpretable Automated Scoring for Large-Scale Educational Assessments Yunsung Kim et.al. 2511.17069 null
2025-11-21 Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters Zhan Su et.al. 2511.17044 null
2025-11-21 CLLMRec: LLM-powered Cognitive-Aware Concept Recommendation via Semantic Alignment and Prerequisite Knowledge Distillation Xiangrui Xiong et.al. 2511.17041 null
2025-11-21 FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models Fatemeh et.al. 2511.16992 null
2025-11-20 Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter Qinghao Hu et.al. 2511.16665 null
2025-11-20 Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs Ali Taghibakhshi et.al. 2511.16664 null
2025-11-20 Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems Elias Lumer et.al. 2511.16654 null
2025-11-20 You Only Forward Once: An Efficient Compositional Judging Paradigm Tianlong Zhang et.al. 2511.16600 null
2025-11-20 TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding Boshen Xu et.al. 2511.16595 null
2025-11-20 Integrating Symbolic Natural Language Understanding and Language Models for Word Sense Disambiguation Kexin Zhao et.al. 2511.16577 null
2025-11-20 Utilizing Large Language Models for Zero-Shot Medical Ontology Extension from Clinical Notes Guanchen Wu et.al. 2511.16548 null
2025-11-20 The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation Jiaheng Zhang et.al. 2511.16543 null
2025-11-20 Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks Éloïse Benito-Rodriguez et.al. 2511.16540 null
2025-11-20 LLM4EO: Large Language Model for Evolutionary Optimization in Flexible Job Shop Scheduling Rongjie Liao et.al. 2511.16485 null
2025-11-20 Optimizing Federated Learning in the Era of LLMs: Message Quantization and Streaming Ziyue Xu et.al. 2511.16450 null
2025-11-20 An Efficient LLM-based Evolutional Recommendation with Locate-Forget-Update Paradigm Hao Liu et.al. 2511.16414 null
2025-11-20 CorrectHDL: Agentic HDL Design with LLMs Leveraging High-Level Synthesis as Reference Kangwei Xu et.al. 2511.16395 null
2025-11-20 Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement Jiashu Yao et.al. 2511.16331 null
2025-11-20 ARK: Answer-Centric Retriever Tuning via KG-augmented Curriculum Learning Jiawei Zhou et.al. 2511.16326 null
2025-11-20 SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning Wei Xia et.al. 2511.16324 null
2025-11-20 “To Survive, I Must Defect”: Jailbreaking LLMs via the Game-Theory Scenarios Zhen Sun et.al. 2511.16278 null
2025-11-20 Pass@k Metric for RLVR: A Diagnostic Tool of Exploration, But Not an Objective Yang Yu et.al. 2511.16231 null
2025-11-20 Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security Wei Zhao et.al. 2511.16229 null
2025-11-20 Beyond Code Similarity: Benchmarking the Plausibility, Efficiency, and Complexity of LLM-Generated Smart Contracts Francesco Salzano et.al. 2511.16224 null
2025-11-19 MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping Yushi Huang et.al. 2511.15690 null
2025-11-19 DuoZone: A User-Centric, LLM-Guided Mixed-Initiative XR Window Management System Jing Qian et.al. 2511.15676 null
2025-11-19 Quantum-Guided Test Case Minimization for LLM-Based Code Generation Huixiang Zhang et.al. 2511.15665 null
2025-11-19 HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models through Curriculum Tuning Qihao Yang et.al. 2511.15574 null
2025-11-19 A Tensor Compiler for Processing-In-Memory Architectures Peiming Yang et.al. 2511.15503 null
2025-11-19 Insights from the ICLR Peer Review and Rebuttal Process Amir Hossein Kargaran et.al. 2511.15462 null
2025-11-19 Know Your Intent: An Autonomous Multi-Perspective LLM Agent Framework for DeFi User Transaction Intent Mining Qian’ang Mao et.al. 2511.15456 null
2025-11-19 CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search Ao Xie et.al. 2511.15443 null
2025-11-19 Small Language Models for Phishing Website Detection: Cost, Performance, and Privacy Trade-Offs Georg Goldenits et.al. 2511.15434 null
2025-11-19 DEPO: Dual-Efficiency Preference Optimization for LLM Agents Sirui Chen et.al. 2511.15392 null
2025-11-19 Unveiling Inference Scaling for Difference-Aware User Modeling in LLM Personalization Suyu Chen et.al. 2511.15389 null
2025-11-19 A Compliance-Preserving Retrieval System for Aircraft MRO Task Search Byungho Jo et.al. 2511.15383 null
2025-11-19 HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning Alexis Correa-Guillén et.al. 2511.15355 null
2025-11-19 Reflexive Evidence-Based Multimodal Learning for Clean Energy Transitions: Causal Insights on Cooking Fuel Access, Urbanization, and Carbon Emissions Shan Shan et.al. 2511.15342 null
2025-11-19 What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs Zhihan Ren et.al. 2511.15316 null
2025-11-19 EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control Kai Yang et.al. 2511.15248 null
2025-11-19 OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition Xinli Tao et.al. 2511.15211 null
2025-11-19 As If We’ve Met Before: LLMs Exhibit Certainty in Recognizing Seen Files Haodong Li et.al. 2511.15192 null
2025-11-19 A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models Duo Li et.al. 2511.15098 null
2025-11-19 Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference Kexin Chu et.al. 2511.15015 null
2025-11-18 Natural Language Interfaces for Databases: What Do Users Think? Panos Ipeirotis et.al. 2511.14718 null
2025-11-18 Strategic Innovation Management in the Age of Large Language Models Market Intelligence, Adaptive R&D, and Ethical Governance Raha Aghaei et.al. 2511.14709 null
2025-11-18 Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models Rui Zhu et.al. 2511.14694 link
2025-11-18 Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer Kallol Mondal et.al. 2511.14691 null
2025-11-18 SkillGen: Learning Domain Skills for In-Context Sequential Decision Making Ruomeng Ding et.al. 2511.14670 null
2025-11-18 Bias in, Bias out: Annotation Bias in Multilingual Large Language Models Xia Cui et.al. 2511.14662 null
2025-11-18 AutoTool: Efficient Tool Selection for Large Language Model Agents Jingyi Jia et.al. 2511.14650 null
2025-11-18 Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning Ruoyu Qin et.al. 2511.14617 null
2025-11-18 A Controllable Perceptual Feature Generative Model for Melody Harmonization via Conditional Variational Autoencoder Dengyun Huang et.al. 2511.14600 null
2025-11-18 OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models Keda Tao et.al. 2511.14582 null
2025-11-18 Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language Minyoung Hwang et.al. 2511.14565 null
2025-11-18 LLM-Assisted Thematic Analysis: Opportunities, Limitations, and Recommendations Tatiane Ornelas et.al. 2511.14528 null
2025-11-18 CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design Jiawei Yi et.al. 2511.14510 null
2025-11-18 Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks Mulei Ma et.al. 2511.14450 null
2025-11-18 Watchdogs and Oracles: Runtime Verification Meets Large Language Models for Autonomous Systems Angelo Ferrando et.al. 2511.14435 null
2025-11-18 When Words Change the Model: Sensitivity of LLMs for Constraint Programming Modelling Alessio Pellegrino et.al. 2511.14334 null
2025-11-18 PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models Yu Liu et.al. 2511.14256 null
2025-11-18 Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning Rui Liu et.al. 2511.14249 null
2025-11-18 N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator Zheyu Lin et.al. 2511.14195 null
2025-11-18 AdaTok: Adaptive Token Compression with Object-Aware Representations for Efficient Multimodal LLMs Xinliang Zhang et.al. 2511.14169 null
2025-11-17 TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone Xunjie Wang et.al. 2511.13717 null
2025-11-17 Generalist Foundation Models Are Not Clinical Enough for Hospital Operations Lavender Y. Jiang et.al. 2511.13703 null
2025-11-17 T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization Hyunwoo Oh et.al. 2511.13676 null
2025-11-17 Part-X-MLLM: Part-aware 3D Multimodal Large Language Model Chunshi Wang et.al. 2511.13647 link
2025-11-17 Data Value in the Age of Scaling: Understanding LLM Scaling Dynamics Under Real-Synthetic Data Mixtures Haohui Wang et.al. 2511.13640 null
2025-11-17 CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product Kaiwen Xue et.al. 2511.13626 null
2025-11-17 P1: Mastering Physics Olympiads with Reinforcement Learning Jiacheng Chen et.al. 2511.13612 null
2025-11-17 Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents Piaohong Wang et.al. 2511.13593 null
2025-11-17 Automated Construction of Medical Indicator Knowledge Graphs Using Retrieval Augmented Large Language Models Zhengda Wang et.al. 2511.13526 null
2025-11-17 FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI Yuhang Peng et.al. 2511.13524 null
2025-11-17 Tight and Practical Privacy Auditing for Differentially Private In-Context Learning Yuyang Xia et.al. 2511.13502 null
2025-11-17 Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation Zhipeng Ma et.al. 2511.13476 null
2025-11-17 Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline Rui Zuo et.al. 2511.13442 null
2025-11-17 Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction Zhaopei Huang et.al. 2511.13410 null
2025-11-17 A Novel Hierarchical Integration Method for Efficient Model Merging in Medical LLMs Prakrit Timilsina et.al. 2511.13373 null
2025-11-17 Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning Caroline Baumgartner et.al. 2511.13371 null
2025-11-17 FLOWER: Flow-Oriented Entity-Relationship Tool Dmitry Moskalev et.al. 2511.13357 null
2025-11-17 An LLM-based Quantitative Framework for Evaluating High-Stealthy Backdoor Risks in OSS Supply Chains Zihe Yan et.al. 2511.13341 null
2025-11-17 ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning Juntao Jian et.al. 2511.13327 null
2025-11-17 Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment Jea Kwon et.al. 2511.13290 null
2025-11-14 Optimizing Mixture of Block Attention Guangxuan Xiao et.al. 2511.11571 null
2025-11-14 Experience-Guided Adaptation of Inference-Time Reasoning Strategies Adam Stein et.al. 2511.11519 null
2025-11-14 W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search Zhenyu Ding et.al. 2511.11518 link
2025-11-14 PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision–Language Models Nhat Hoang-Xuan et.al. 2511.11502 null
2025-11-14 Benchmarking Visual LLMs Resilience to Unanswerable Questions on Visually Rich Documents Davide Napolitano et.al. 2511.11468 null
2025-11-14 CURENet: Combining Unified Representations for Efficient Chronic Disease Prediction Cong-Tinh Dao et.al. 2511.11423 null
2025-11-14 SCRUTINEER: Detecting Logic-Level Usage Violations of Reusable Components in Smart Contracts Xingshuang Lin et.al. 2511.11411 null
2025-11-14 MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism Shulin Liu et.al. 2511.11373 null
2025-11-14 SEAL: Subspace-Anchored Watermarks for LLM Ownership Yanbo Dai et.al. 2511.11356 null
2025-11-14 UFO $^3$ : Weaving the Digital Agent Galaxy Chaoyun Zhang et.al. 2511.11332 null
2025-11-14 LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models Jawad Ibn Ahad et.al. 2511.11315 null
2025-11-14 iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference Wei Fan et.al. 2511.11306 null
2025-11-14 EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment Ruoxi Cheng et.al. 2511.11301 null
2025-11-14 GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving Fabian Schmidt et.al. 2511.11266 null
2025-11-14 KGQuest: Template-Driven QA Generation from Knowledge Graphs with LLM-Based Refinement Sania Nayab et.al. 2511.11258 null
2025-11-14 T-MAN: Enabling End-to-End Low-Bit LLM Inference on NPUs via Unified Table Lookup Jianyu Wei et.al. 2511.11248 null
2025-11-14 STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models Huajian Zhang et.al. 2511.11233 null
2025-11-14 Questioning the Stability of Visual Question Answering Amir Rosenfeld et.al. 2511.11206 null
2025-11-14 Viper-F1: Fast and Fine-Grained Multimodal Understanding with Cross-Modal State-Space Modulation Quoc-Huy Trinh et.al. 2511.11177 null
2025-11-14 Explainable Deep Convolutional Multi-Type Anomaly Detection Alex George et.al. 2511.11165 null
2025-11-13 ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference Yesheng Liang et.al. 2511.10645 null
2025-11-13 Textual understanding boost in the WikiRace Raman Ebrahimi et.al. 2511.10585 null
2025-11-13 URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding Yongxin Shi et.al. 2511.10552 link
2025-11-13 Don’t Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding Yunkai Zhang et.al. 2511.10492 link
2025-11-13 Scalable Synthesis of distributed LLM workloads through Symbolic Tensor Graphs Changhai Man et.al. 2511.10480 null
2025-11-13 AgentEvolver: Towards Efficient Self-Evolving Agent System Yunpeng Zhai et.al. 2511.10395 link
2025-11-13 SITA: A Framework for Structure-to-Instance Theorem Autoformalization Chenyi Li et.al. 2511.10356 null
2025-11-13 EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training Qingao Yi et.al. 2511.10333 null
2025-11-13 Rethinking Visual Information Processing in Multimodal LLMs Dongwan Kim et.al. 2511.10301 null
2025-11-13 Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models Zhengtao Zou et.al. 2511.10292 null
2025-11-13 FactGuard: Event-Centric and Commonsense-Guided Fake News Detection Jing He et.al. 2511.10281 null
2025-11-13 Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics Xin Sun et.al. 2511.10271 null
2025-11-13 LangGPS: Language Separability Guided Data Pre-Selection for Joint Multilingual Instruction Tuning Yangfan Ye et.al. 2511.10229 null
2025-11-13 Persona-Aware Alignment Framework for Personalized Dialogue Generation Guanrong Li et.al. 2511.10215 null
2025-11-13 Advanced Black-Box Tuning of Large Language Models with Limited API Calls Zhikang Xie et.al. 2511.10210 null
2025-11-13 EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models Junquan Huang et.al. 2511.10201 null
2025-11-13 Efficient Thought Space Exploration through Strategic Intervention Ziheng Li et.al. 2511.10038 null
2025-11-13 AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models Xinyi Wang et.al. 2511.10017 null
2025-11-13 AssertMiner: Module-Level Spec Generation and Assertion Mining using Static Analysis Guided LLMs Hongqin Lyu et.al. 2511.10007 null
2025-11-13 PustakAI: Curriculum-Aligned and Interactive Textbooks Using Large Language Models Shivam Sharma et.al. 2511.10002 null
2025-11-10 Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models Tianrui Song et.al. 2511.07295 link
2025-11-10 LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure Jaehong Cho et.al. 2511.07229 null
2025-11-10 Importance-Aware Data Selection for Efficient LLM Instruction Tuning Tingyu Jiang et.al. 2511.07074 null
2025-11-10 GoCkpt: Gradient-Assisted Multi-Step overlapped Checkpointing for Efficient LLM Training Keyao Zhang et.al. 2511.07035 null
2025-11-10 P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats Yuzong Chen et.al. 2511.06838 null
2025-11-09 Efficient LLM Safety Evaluation through Multi-Agent Debate Dachuan Lin et.al. 2511.06396 null
2025-11-09 ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction Wenxuan Wu et.al. 2511.06288 null
2025-11-09 Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism Cong Li et.al. 2511.06247 null
2025-11-09 Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning Sangmook Lee et.al. 2511.06190 null
2025-11-09 LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs Zifan He et.al. 2511.06174 null
2025-11-08 Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving Hui Zeng et.al. 2511.06029 null
2025-11-08 MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference Myunghyun Rhee et.al. 2511.06010 null
2025-11-08 MCP-RiskCue: Can LLM infer risk information from MCP server System Logs? Jiayi Fu et.al. 2511.05867 null
2025-11-05 From Prompts to Power: Measuring the Energy Footprint of LLM Inference Francisco Caravaca et.al. 2511.05597 null
2025-11-06 DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing Lei Gao et.al. 2511.04791 null
2025-11-06 Enabling Dynamic Sparsity in Quantized LLM Inference Rongxiang Wang et.al. 2511.04477 null
2025-11-06 E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce Ge Zhang et.al. 2511.04087 null
2025-11-06 PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration Yue Jiet Chong et.al. 2511.04036 null
2025-11-06 LLM-Driven Adaptive Source-Sink Identification and False Positive Mitigation for Static Analysis Shiyin Lin et.al. 2511.04023 null
2025-11-05 RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse Yinsicheng Jiang et.al. 2511.03475 null
2025-11-07 UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM Hai Huang et.al. 2511.03293 null
2025-11-04 Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes Mohammadsajad Alipour et.al. 2511.02681 null
2025-11-04 Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks Xiumei Deng et.al. 2511.02647 null
2025-11-04 Verifying LLM Inference to Prevent Model Weight Exfiltration Roy Rinberg et.al. 2511.02620 null
2025-11-04 KV Cache Transform Coding for Compact Storage in LLM Inference Konrad Staniszewski et.al. 2511.01815 null
2025-11-04 Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding Jungyeon Koh et.al. 2511.01695 null
2025-11-03 Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving Chengying Huan et.al. 2511.01633 null
2025-11-03 When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding Min Fang et.al. 2511.01282 null
2025-11-04 CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing Yifan Zhou et.al. 2511.01197 null
2025-11-02 FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management Nazmul Takbir et.al. 2511.00868 null
2025-11-05 FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs Xuan He et.al. 2511.00807 null
2025-11-04 SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding Jameson Sandler et.al. 2511.00606 null
2025-11-01 FlashEVA: Accelerating LLM inference via Efficient Attention Juan Gabriel Kostelec et.al. 2511.00576 null
2025-11-01 Proactive DDoS Detection and Mitigation in Decentralized Software-Defined Networking via Port-Level Monitoring and Zero-Training Large Language Models Mohammed N. Swileh et.al. 2511.00460 null
2025-10-31 Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits Dowon Kim et.al. 2511.00321 null
2025-11-05 PDE-SHARP: PDE Solver Hybrids through Analysis and Refinement Passes Shaghayegh Fazliani et.al. 2511.00183 null
2025-10-31 AMD MI300X GPU Performance Analysis Chandrish Ambati et.al. 2510.27583 null
2025-10-31 Glia: A Human-Inspired AI for Automated Systems Design and Optimization Pouya Hamadanian et.al. 2510.27176 null
2025-10-29 Category-Aware Semantic Caching for Heterogeneous LLM Workloads Chen Wang et.al. 2510.26835 null
2025-10-30 Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model Biao Zhang et.al. 2510.26622 null
2025-10-30 1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models Zeliang Zong et.al. 2510.26446 null
2025-10-30 Beyond Benchmarks: The Economics of AI Inference Boqin Zhuang et.al. 2510.26136 null
2025-10-31 AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache Dinghong Song et.al. 2510.25979 link
2025-10-31 NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium Dinghong Song et.al. 2510.25977 null
2025-10-29 A Survey on Efficient Large Language Model Training: From Data-centric Perspectives Junyu Luo et.al. 2510.25817 null
2025-10-29 Serve Programs, Not Prompts In Gim et.al. 2510.25412 null
2025-10-29 GPTOpt: Towards Efficient LLM-Based Black-Box Optimization Jamison Meindl et.al. 2510.25404 null
2025-10-29 OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning Ziyou Hu et.al. 2510.24636 null
2025-10-28 Pie: A Programmable Serving System for Emerging LLM Applications In Gim et.al. 2510.24051 null
2025-10-28 Resource-Efficient LLM Application for Structured Transformation of Unstructured Financial Contracts Maruf Ahmed Mridul et.al. 2510.23990 null
2025-10-26 Batch Speculative Decoding Done Right Ranran Haoran Zhang et.al. 2510.22876 null
2025-10-26 TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination Omar Naim et.al. 2510.22767 null
2025-10-26 Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration Yuval Kainan et.al. 2510.22679 null
2025-10-26 SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size Jinhan Chen et.al. 2510.22556 null
2025-10-23 Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples Shiva Sreeram et.al. 2510.20800 null
2025-10-23 RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging Bowen Wang et.al. 2510.20479 null
2025-10-22 Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs Hongyi Liu et.al. 2510.20064 null
2025-10-22 AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Yuezhou Hu et.al. 2510.19779 null
2025-10-22 Are Large Language Models Sensitive to the Motives Behind Communication? Addison J. Wu et.al. 2510.19687 null
2025-10-22 DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference Xiang Liu et.al. 2510.19669 null
2025-10-22 Energy-Efficient and Dequantization-Free Q-LLMs: A Spiking Neural Network Approach to Salient Value Mitigation Chenyu Wang et.al. 2510.19498 null
2025-10-21 EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval Zebin Yang et.al. 2510.18546 null
2025-10-21 SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices Pan Zhou et.al. 2510.18544 null
2025-10-21 Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs Song Bian et.al. 2510.18245 null
2025-10-20 Planned Diffusion Daniel Israel et.al. 2510.18087 null
2025-10-20 Language Models as Semantic Augmenters for Sequential Recommenders Mahsa Valizadeh et.al. 2510.18046 null
2025-10-19 Justitia: Fair and Efficient Scheduling for LLM Applications Mingyan Yang et.al. 2510.17015 null
2025-10-18 FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference Jian Ma et.al. 2510.16418 null
2025-10-16 AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization Mengtao Lv et.al. 2510.16045 null
2025-10-16 Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing Tianhua Xia et.al. 2510.16040 null
2025-10-17 TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs Sibo Xiao et.al. 2510.15545 null
2025-10-16 Tail-Optimized Caching for LLM Inference Wenxin Zhang et.al. 2510.15152 null
2025-10-16 Identity-Link IRT for Label-Free LLM Evaluation: Preserving Additivity in TVD-MI Scores Zachary Robertson et.al. 2510.14966 null
2025-10-16 xLLM Technical Report Tongxuan Liu et.al. 2510.14686 null
2025-10-16 MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving Jungi Lee et.al. 2510.14557 null
2025-10-16 FairBatching: Fairness-Aware Batch Formation for LLM Inference Hongtao Lyu et.al. 2510.14392 null
2025-10-16 Qwen3Guard Technical Report Haiquan Zhao et.al. 2510.14276 null
2025-10-15 Efficiently Executing High-throughput Lightweight LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management Thanh Son Phung et.al. 2510.14024 null
2025-10-15 Adaptive Rescheduling in Prefill-Decode Disaggregated LLM Inference Zhibin Wang et.al. 2510.13668 null
2025-10-15 F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMs Jude Haris et.al. 2510.13401 null
2025-10-15 Taming the Fragility of KV Cache Eviction in LLM Inference Yuan Feng et.al. 2510.13334 null
2025-10-15 BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure Yiyuan He et.al. 2510.13223 null
2025-10-15 Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference Nikhil Bhendawade et.al. 2510.13161 null
2025-10-21 Retrieval-in-the-Chain: Bootstrapping Large Language Models for Generative Retrieval Yingchen Zhang et.al. 2510.13095 null
2025-10-14 On the Role of Preference Variance in Preference Optimization Jiacheng Guo et.al. 2510.13022 null
2025-10-14 KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems Hancheng Ye et.al. 2510.12872 null
2025-10-14 Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification? Cedric Richter et.al. 2510.12702 null
2025-10-14 Traveling Salesman-Based Token Ordering Improves Stability in Homomorphically Encrypted Language Models Donghwan Rho et.al. 2510.12343 null
2025-10-13 FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters Yanying Lin et.al. 2510.11938 null
2025-10-13 Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding Bingjie Zhu et.al. 2510.11331 null
2025-10-13 An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models Sheikh Azizul Hakim et.al. 2510.11211 null
2025-10-13 Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs João Paulo Cardoso de Lima et.al. 2510.11192 null
2025-10-12 Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems Yi Zhang et.al. 2510.10644 null
2025-10-11 MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation Wentian Zhu et.al. 2510.10271 null
2025-10-11 CacheClip: Accelerating RAG with Effective KV Cache Reuse Bin Yang et.al. 2510.10129 null
2025-10-11 Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization Yang Li et.al. 2510.10028 null
2025-10-10 Evaluating LLM-Based Process Explanations under Progressive Behavioral-Input Reduction P. van Oerle et.al. 2510.09732 null
2025-10-10 Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation Fanwei Zhu et.al. 2510.09722 null
2025-10-10 FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference Yu-Chen Lu et.al. 2510.09332 null
2025-10-10 Semantic-Condition Tuning: Fusing Graph Context with Large Language Models for Knowledge Graph Completion Ruitong Liu et.al. 2510.08966 null
2025-10-13 Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors Xin Liu et.al. 2510.08907 null
2025-10-10 Mozart: A Chiplet Ecosystem-Accelerator Codesign Framework for Composable Bespoke Application Specific Integrated Circuits Haoran Jin et.al. 2510.08873 null
2025-10-09 When to Reason: Semantic Router for vLLM Chen Wang et.al. 2510.08731 null
2025-10-09 SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference Hengrui Zhang et.al. 2510.08544 null
2025-10-09 From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill Gunjun Lee et.al. 2510.08055 null
2025-10-09 Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models Zhiqing Cui et.al. 2510.07858 null
2025-10-09 OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference Yuzhe Gu et.al. 2510.07651 null
2025-10-08 AsyncSpade: Efficient Test-Time Scaling with Asynchronous Sparse Decoding Shuqing Luo et.al. 2510.07486 null
2025-10-08 Accelerating Diffusion LLM Inference via Local Determinism Propagation Fanheng Kong et.al. 2510.07081 null
2025-10-08 Accelerating Sparse Ternary GEMM for Quantized LLM inference on Apple Silicon Baraq Lipshitz et.al. 2510.06957 null
2025-10-08 PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs Manuel Frank et.al. 2510.06730 null
2025-10-07 VecInfer: Efficient LLM Inference with Low-Bit KV Cache via Outlier-Suppressed Vector Quantization Dingyu Yao et.al. 2510.06175 null
2025-10-07 lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models Haoxin Wang et.al. 2510.06126 null
2025-10-07 From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs Tianhao Zhu et.al. 2510.05632 null
2025-10-07 Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM Ryan Solgi et.al. 2510.05544 null
2025-10-07 H1B-KV: Hybrid One-Bit Caches for Memory-Efficient Large Language Model Inference Harshil Vejendla et.al. 2510.05529 null
2025-10-07 Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting Zhongkai Yu et.al. 2510.05497 null
2025-10-06 KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction Utkarsh Saxena et.al. 2510.05373 null
2025-10-06 A novel hallucination classification framework Maksym Zavhorodnii et.al. 2510.05189 null
2025-10-06 RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms Samah Kansab et.al. 2510.04796 null
2025-10-06 SpikingMamba: Towards Energy-Efficient Large Language Models via Knowledge Distillation from Mamba Yulong Huang et.al. 2510.04595 null
2025-10-05 Speculative Actions: A Lossless Framework for Faster Agentic Systems Naimeng Ye et.al. 2510.04371 null
2025-10-05 Toward a unified framework for data-efficient evaluation of large language models Lele Liao et.al. 2510.04051 null
2025-10-02 KVComm: Enabling Efficient LLM Communication through Selective KV Sharing Xiangyu Shi et.al. 2510.03346 null
2025-10-03 Best-of-Majority: Minimax-Optimal Strategy for Pass@ $k$ Inference Scaling Qiwei Di et.al. 2510.03199 null
2025-10-03 Dissecting Transformers: A CLEAR Perspective towards Green AI Hemang Jain et.al. 2510.02810 null
2025-10-03 TokenFlow: Responsive LLM Text Streaming Serving under Request Burst via Preemptive Scheduling Junyi Chen et.al. 2510.02758 null
2025-10-03 HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference Shubham Negi et.al. 2510.02675 null
2025-10-02 Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework Nii Osae Osae Dade et.al. 2510.02483 null
2025-10-01 PolyLink: A Blockchain Based Decentralized Edge AI Platform for LLM Inference Hongbo Liu et.al. 2510.02395 null
2025-10-03 Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey Qiyuan Liu et.al. 2510.01925 null
2025-10-02 SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning Shicheng Liu et.al. 2510.01832 null
2025-10-01 HiSpec: Hierarchical Speculative Decoding for LLMs Avinash Kumar et.al. 2510.01336 null
2025-10-01 Generalized Parallel Scaling with Interdependent Generations Harry Dong et.al. 2510.01143 null
2025-10-01 Prompt Curriculum Learning for Efficient LLM Post-Training Zhaolin Gao et.al. 2510.01135 null
2025-10-01 Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese Jenny Kunz et.al. 2510.00810 null
2025-10-01 Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution Alessio Devoto et.al. 2510.00636 null
2025-10-01 Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space? Nandan Kumar Jha et.al. 2510.00537 null
2025-10-01 Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs Kairun Zhang et.al. 2510.00419 null
2025-10-02 Large Language Models Inference Engines based on Spiking Neural Networks Adarsha Balaji et.al. 2510.00133 null
2025-10-01 AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size Guanxi Lu et.al. 2509.26432 null
2025-09-30 Toward an Unbiased Collective Memory for Efficient LLM-Based Agentic 6G Cross-Domain Management Hatim Chergui et.al. 2509.26200 null
2025-09-30 Parallax: Efficient LLM Inference Service over Decentralized Environment Chris Tong et.al. 2509.26182 null
2025-09-30 Accelerating LLM Inference with Precomputed Query Storage Jay H. Park et.al. 2509.25919 null
2025-09-30 SAIL: SRAM-Accelerated LLM Inference System with Lookup-Table-based GEMV Jingyao Zhang et.al. 2509.25853 null
2025-09-29 Scaling with Collapse: Efficient and Predictable Training of LLM Families Shane Bergsma et.al. 2509.25087 null
2025-09-29 Intra-request branch orchestration for efficient LLM reasoning Weifan Jiang et.al. 2509.24957 null
2025-09-29 SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching Xinye Zhao et.al. 2509.24832 null
2025-09-29 SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving Qihui Zhou et.al. 2509.24626 null
2025-09-29 Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding Sungkyun Kim et.al. 2509.24328 null
2025-07-22 Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework Hongyi Tang et.al. 2507.16414 null
2025-07-21 Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing Shibo Yu et.al. 2507.15553 null
2025-07-18 Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need Michael Davies et.al. 2507.14397 null
2025-07-18 Characterizing Communication Patterns in Distributed Large Language Model Inference Lang Xu et.al. 2507.14392 null
2025-07-18 Can LLMs Infer Personality from Real World Conversations? Jianfeng Zhu et.al. 2507.14355 null
2025-07-14 PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training Pengfei Du et.al. 2507.14202 null
2025-07-23 Photonic Fabric Platform for AI Accelerators Jing Ding et.al. 2507.14000 null
2025-07-23 DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training Zhixin Wang et.al. 2507.13833 null
2025-07-18 Team of One: Cracking Complex Video QA with Model Synergy Jun Xie et.al. 2507.13820 null
2025-07-18 LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues Haoyang Li et.al. 2507.13681 null
2025-07-17 Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation Genki Kusano et.al. 2507.13525 null
2025-07-16 Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage Junqing Lin et.al. 2507.12205 null
2025-07-15 MIRAGE: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving Ruihao Li et.al. 2507.11507 null
2025-07-15 Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations Miray Özcan et.al. 2507.11417 null
2025-07-15 KV-Latent: Dimensional-level KV Cache Reduction with Frequency-aware Rotary Positional Embedding Luohe Shi et.al. 2507.11273 null
2025-07-16 GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning Ziru Liu et.al. 2507.10628 null
2025-07-14 Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving Wonung Kim et.al. 2507.10178 null
2025-07-14 Past-Future Scheduler for LLM Serving under SLA Guarantees Ruihao Gong et.al. 2507.10150 null
2025-07-14 ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism Zedong Liu et.al. 2507.10069 null
2025-07-14 Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference Jiaming Cheng et.al. 2507.09942 null
2025-07-13 Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset Lily Hong Zhang et.al. 2507.09650 null
2025-07-12 SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding Weihong Xu et.al. 2507.09201 null
2025-07-11 On Evaluating Performance of LLM Inference Serving Systems Amey Agrawal et.al. 2507.09019 null
2025-07-11 Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference Chun-Ting Chen et.al. 2507.09010 null
2025-07-11 Orchestration for Domain-specific Edge-Cloud Language Models Prasoon Patidar et.al. 2507.09003 null
2025-07-11 InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching Yilun Wang et.al. 2507.08523 null
2025-07-11 Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training Aleksei Ilin et.al. 2507.08284 null
2025-07-10 Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions Quanyan Zhu et.al. 2507.08208 null
2025-07-10 Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing Junyi Wen et.al. 2507.08045 null
2025-07-11 Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models Varin Sikka et.al. 2507.07505 null
2025-07-16 Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving Xiaoxiang Shi et.al. 2507.06608 null
2025-07-11 QUEST: Query Optimization in Unstructured Document Analysis Zhaoze Sun et.al. 2507.06515 null
2025-07-08 Voltage Regulation in Distribution Systems with Data Center Loads Yize Chen et.al. 2507.06416 null
2025-07-08 Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models L’ea Dubois et.al. 2507.05822 null
2025-07-07 Cascade: Token-Sharded Private LLM Inference Rahul Thomas et.al. 2507.05228 null
2025-07-07 MoLink: Distributed and Efficient Serving Framework for Large Models Lewei Jin et.al. 2507.05043 null
2025-07-16 Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? Yun Qu et.al. 2507.04632 null
2025-07-09 Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking Tim Beyer et.al. 2507.04446 null
2025-07-23 Fairness Evaluation of Large Language Models in Academic Library Reference Services Haining Wang et.al. 2507.04224 null
2025-07-05 Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States Karine Karine et.al. 2507.03871 null
2025-07-05 OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference Seungjun Shin et.al. 2507.03865 null
2025-07-08 MemOS: A Memory OS for AI System Zhiyu Li et.al. 2507.03724 null
2025-07-04 Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA Jindong Li et.al. 2507.03308 null
2025-07-03 HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference Weishu Deng et.al. 2507.03153 null
2025-06-20 Large Language Model-Driven Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization Lindong Xie et.al. 2507.02892 null
2025-07-03 On the Convergence of Large Language Model Optimizer for Black-Box Network Management Hoon Lee et.al. 2507.02689 null
2025-07-03 Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure Rui Xie et.al. 2507.02654 null
2025-07-14 FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference Xing Liu et.al. 2507.02620 null
2025-07-02 Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency Zongpu Zhang et.al. 2507.02135 null
2025-07-02 AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training Zhenyu Han et.al. 2507.01663 null
2025-07-02 Evaluating the Effectiveness of Direct Preference Optimization for Personalizing German Automatic Text Simplifications for Persons with Intellectual Disabilities Yingqiang Gao et.al. 2507.01479 null
2025-07-02 LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation Tianyu Liu et.al. 2507.01449 null
2025-07-02 EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices Zheyu Shen et.al. 2507.01438 null
2025-07-08 SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech Zhuangfei Cheng et.al. 2507.01348 null
2025-07-02 La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation Kai Liu et.al. 2507.01299 null
2025-07-01 PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning Xingke Yang et.al. 2507.01216 null
2025-06-28 A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval Puspendu Banerjee et.al. 2507.01058 null
2025-07-01 VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator Zhican Wang et.al. 2507.00797 null
2025-07-01 Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models Yilun Zhang et.al. 2507.00653 null
2025-07-01 LLM-Mesh: Enabling Elastic Sharing for Serverless LLM Inference Chuhao Xu et.al. 2507.00507 null
2025-07-01 Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs Mohammad Firas Sada et.al. 2507.00418 null
2025-06-30 Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission Faranaksadat Solat et.al. 2507.00082 null
2025-06-30 Scaling Human Judgment in Community Notes with LLMs Haiwen Li et.al. 2506.24118 null
2025-06-30 A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications Boyang Yang et.al. 2506.23749 null
2025-06-28 Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models Tejas Vaidhya et.al. 2506.23025 null
2025-06-28 Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation Sen Fang et.al. 2506.22776 null
2025-07-01 Not All Water Consumption Is Equal: A Water Stress Weighted Metric for Sustainable Computing Yanran Wu et.al. 2506.22773 null
2025-06-27 QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization Danush Khanna et.al. 2506.22396 null
2025-06-27 Towards Operational Data Analytics Chatbots – Virtual Knowledge Graph is All You Need Junaid Ahmed Khan et.al. 2506.22267 null
2025-06-27 SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference Yongchao He et.al. 2506.22033 null
2025-06-27 A Survey of LLM Inference Systems James Pan et.al. 2506.21901 null
2025-06-26 Efficient and Reuseable Cloud Configuration Search Using Discovery Spaces Michael Johnston et.al. 2506.21467 null
2025-06-26 BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services Zhaojiacheng Zhou et.al. 2506.21033 null
2025-06-17 Utility-Driven Speculative Decoding for Mixture-of-Experts Anish Saxena et.al. 2506.20675 null
2025-06-25 DipSVD: Dual-importance Protected SVD for Efficient LLM Compression Xuan Ding et.al. 2506.20353 null
2025-07-02 Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU He Sun et.al. 2506.20187 null
2025-06-24 MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection Zhengxiang Huang et.al. 2506.19884 null
2025-06-24 Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Jungwoo Park et.al. 2506.19697 null
2025-06-25 Adaptive Request Scheduling for CodeLLM Serving with SLA Guarantees Shi Chang et.al. 2506.19677 null
2025-06-23 Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation Ahmadreza Saboor Yaraghi et.al. 2506.19045 null
2025-06-23 WiLLM: An Open Wireless LLM Communication System Boyi Liu et.al. 2506.19030 null
2025-06-23 LLMs on a Budget? Say HOLA Zohaib Hasan Siddiqui et.al. 2506.18952 null
2025-06-23 CommVQ: Commutative Vector Quantization for KV Cache Compression Junyan Li et.al. 2506.18879 null
2025-06-26 PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries Steven Kolawole et.al. 2506.18728 null
2025-06-22 Mechanistic Interpretability in the Presence of Architectural Obfuscation Marcos Florencio et.al. 2506.18053 null
2025-06-22 LLMs for Customized Marketing Content Generation and Evaluation at Scale Haoran Liu et.al. 2506.17863 null
2025-07-18 LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning Haoxuan Che et.al. 2506.17562 null
2025-06-08 Training-free LLM Verification via Recycling Few-shot Examples Dongseok Lee et.al. 2506.17251 null
2025-06-20 Towards AI Search Paradigm Yuchen Li et.al. 2506.17188 null
2025-06-23 From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents Mohammad Amaan Sayeed et.al. 2506.15911 null
2025-05-30 Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding Feiyu Yao et.al. 2506.15704 null
2025-06-18 eLLM: Elastic Memory Management Framework for Efficient LLM Serving Jiale Xu et.al. 2506.15155 null
2025-06-17 CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision Dyah Adila et.al. 2506.14912 null
2025-06-17 Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching Qizheng Zhang et.al. 2506.14852 null
2025-06-05 MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs Zhenyan Lu et.al. 2506.13772 null
2025-06-17 Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention Haonan Wang et.al. 2506.13674 null
2025-06-16 Vector Ontologies as an LLM world view extraction method Kaspar Rothenfusser et.al. 2506.13252 link
2025-06-16 Empirical Evaluation of Large Language Models in Automated Program Repair Jiajun Sun et.al. 2506.13186 null
2025-06-19 Serving Large Language Models on Huawei CloudMatrix384 Pengfei Zuo et.al. 2506.12708 null
2025-06-13 Semantic Scheduling for LLM Inference Wenyue Hua et.al. 2506.12204 link
2025-05-21 FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization Fangxin Liu et.al. 2506.12024 null
2025-06-13 Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Xiaoran Liu et.al. 2506.11886 null
2025-06-13 GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news Abdul Haque et.al. 2506.11600 null
2025-06-13 Collaborative LLM Inference via Planning for Efficient Reasoning Byeongchan Lee et.al. 2506.11578 null
2025-06-13 Efficient Long-Context LLM Inference via KV Cache Clustering Jie Hu et.al. 2506.11418 null
2025-06-12 From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review Yaohui Zhang et.al. 2506.11343 null
2025-06-12 SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding Ziyi Zhang et.al. 2506.11309 null
2025-06-06 DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration Hanzhi Zhang et.al. 2506.11104 link
2025-06-12 Slimming Down LLMs Without Losing Their Minds Qingda et.al. 2506.10885 null
2025-06-12 AdaptiveLLM: A Framework for Selecting Optimal Cost-Efficient LLM for Code-Generation Based on CoT Length Junhang Cheng et.al. 2506.10525 link
2025-06-12 TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference Hongbin Zhang et.al. 2506.10470 null
2025-06-11 A First Look at Bugs in LLM Inference Engines Mugeng Liu et.al. 2506.09713 link
2025-06-12 Understanding the Performance and Power of LLM Inferencing on Edge Accelerators Mayank Arya et.al. 2506.09554 null
2025-06-11 Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning Jiayi Yuan et.al. 2506.09501 null
2025-06-10 Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive- $k$ Chihiro Taguchi et.al. 2506.08479 null
2025-07-19 Draft-based Approximate Inference for LLMs Kevin Galim et.al. 2506.08373 link
2025-06-09 MiniCPM4: Ultra-Efficient LLMs on End Devices MiniCPM Team et.al. 2506.07900 link
2025-06-09 How Benchmark Prediction from Fewer Data Misses the Mark Guanhua Zhang et.al. 2506.07673 link
2025-06-09 TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review Yuan Chang et.al. 2506.07642 null
2025-06-09 MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts Wei Tao et.al. 2506.07533 null
2025-06-07 Containerized In-Storage Processing and Computing-Enabled SSD Disaggregation Miryeong Kwon et.al. 2506.06769 null
2025-06-06 Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques Adarsh Prasad Behera et.al. 2506.06579 null
2025-06-06 Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage Ziqi Yuan et.al. 2506.06472 null
2025-07-08 On the Fundamental Impossibility of Hallucination Control in Large Language Models Michał P. Karpowicz et.al. 2506.06382 null
2025-05-21 Reward Is Enough: LLMs Are In-Context Reinforcement Learners Kefan Song et.al. 2506.06303 null
2025-06-06 AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search Yu Li et.al. 2506.06017 null
2025-06-06 FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model Md Jueal Mia et.al. 2506.05640 link
2025-06-11 Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Yanzhao Zhang et.al. 2506.05176 null
2025-06-05 Are LLMs Reliable Translators of Logical Reasoning Across Lexically Diversified Contexts? Qingchuan Li et.al. 2506.04575 link
2025-06-04 Cascadia: A Cascade Serving System for Large Language Models Youhe Jiang et.al. 2506.04203 null
2025-06-04 SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling Anhao Zhao et.al. 2506.04179 null
2025-06-04 GORACS: Group-level Optimal Transport-guided Coreset Selection for LLM-based Recommender Systems Tiehua Mei et.al. 2506.04015 null
2025-06-04 Pre $^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation Junyi Chen et.al. 2506.03887 null
2025-06-04 Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis Avihay Cohen et.al. 2506.03656 null
2025-06-04 POSS: Position Specialist Generates Better Draft for Speculative Decoding Langlin Huang et.al. 2506.03566 link
2025-07-10 Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs Jiakun Fan et.al. 2506.03296 null
2025-06-03 QKV Projections Require a Fraction of Their Memory Malik Khalaf et.al. 2506.02939 null
2025-06-03 Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs Shangmin Guo et.al. 2506.02918 null
2025-06-14 TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression Zhong-Zhi Li et.al. 2506.02678 link
2025-07-23 KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider Jiahao Wang et.al. 2506.02634 link
2025-06-03 HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference Ping Gong et.al. 2506.02572 link
2025-06-03 Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective Shenghua He et.al. 2506.02553 null
2025-05-29 NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs Haeun Lee et.al. 2506.02024 null
2025-05-24 Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing Zhaoyuan Su et.al. 2506.02006 null
2025-05-16 Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism Yuhao Shen et.al. 2506.01979 null
2025-06-02 Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts Spencer Banasik et.al. 2506.01827 null
2025-05-13 AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies Amit Sharma et.al. 2506.00008 null
2025-05-30 AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption Yajie Zhou et.al. 2505.24773 null
2025-05-30 SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training Yehonathan Refael et.al. 2505.24749 null
2025-05-30 Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching Juan Wisznia et.al. 2505.24643 null
2025-05-30 LLM Inference Enhanced by External Knowledge: A Survey Yu-Hsuan Lin et.al. 2505.24377 link
2025-05-30 SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference Tian Xia et.al. 2505.24095 null
2025-05-29 Large Language Model Meets Constraint Propagation Alexandre Bonlarron et.al. 2505.24012 null
2025-05-29 EmbAdvisor: Adaptive Cache Management for Sustainable LLM Serving Yuyang Tian et.al. 2505.23970 null
2025-05-29 Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters Hayden Moore et.al. 2505.23554 null
2025-06-10 Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism Jinhui Wei et.al. 2505.23219 null
2025-05-29 SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference Yinghao Tang et.al. 2505.23022 null
2025-05-28 Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference Donghyeon Joo et.al. 2505.22913 link
2025-05-28 AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Feng Luo et.al. 2505.22662 null
2025-05-28 Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR Mingchen Shao et.al. 2505.22063 null
2025-05-28 ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning Zhendong Mi et.al. 2505.21987 null
2025-05-28 Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference Yue Zhu et.al. 2505.21919 null
2025-05-29 EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse Tianyu Guo et.al. 2505.21889 link
2025-05-28 HoliTom: Holistic Token Merging for Fast Video Large Language Models Kele Shao et.al. 2505.21334 link
2025-06-04 LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models Jieyong Kim et.al. 2505.21082 null
2025-05-27 Efficient Large Language Model Inference with Neural Block Linearization Mete Erdogan et.al. 2505.21077 null
2025-07-18 FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration Daehyeon Baek et.al. 2505.20839 null
2025-05-26 HAMburger: Accelerating LLM Inference via Token Smashing Jingyu Liu et.al. 2505.20438 null
2025-05-23 Less Context, Same Performance: A RAG Framework for Resource-Efficient LLM-Based Clinical NLP Satya Narayana Cheetirala et.al. 2505.20320 null
2025-05-26 APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text Summarization Javier Marín et.al. 2505.19912 link
2025-06-13 MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE Zongle Huang et.al. 2505.19645 null
2025-05-26 VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Maonan Wang et.al. 2505.19486 null
2025-05-26 BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs Guilong Lu et.al. 2505.19457 link
2025-05-26 WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference Sihan Chen et.al. 2505.19427 link
2025-05-25 DECA: A Near-Core LLM Decompression Accelerator Supporting Out-of-Order Invocation Gerasimos Gerogiannis et.al. 2505.19349 null
2025-05-25 Can Large Language Models Infer Causal Relationships from Real-World Text? Ryan Saklad et.al. 2505.18931 null
2025-06-18 ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models Hao Chen et.al. 2505.18799 null
2025-06-03 A Survey of LLM $\times$ DATA Xuanhe Zhou et.al. 2505.18458 null
2025-05-23 LatentLLM: Attention-Aware Joint Tensor Compression Toshiaki Koike-Akino et.al. 2505.18413 null
2025-05-23 An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs Rahul Thomas et.al. 2505.18332 null
2025-07-01 Two-Stage Regularization-Based Structured Pruning for LLMs Mingkuan Feng et.al. 2505.18232 null
2025-05-23 NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache Donghyun Son et.al. 2505.18231 null
2025-05-23 Navigating Pitfalls: Evaluating LLMs in Machine Learning Programming Education Smitha Kumar et.al. 2505.18220 null
2025-05-23 Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning Michael Hassid et.al. 2505.17813 null
2025-05-23 DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies Ning Yang et.al. 2505.17420 null
2025-05-26 RAP: Runtime-Adaptive Pruning for LLM Inference Huanrong Liu et.al. 2505.17138 null
2025-05-20 Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency Ruixiao Li et.al. 2505.17074 null
2025-05-16 SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs Jinwoo Park et.al. 2505.17052 null
2025-05-22 CASTILLO: Characterizing Response Length Distributions of Large Language Models Daniel F. Perez-Ramirez et.al. 2505.16881 link
2025-05-24 Recursive Offloading for LLM Serving in Multi-tier Networks Zhiyuan Wu et.al. 2505.16502 link
2025-05-22 Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization Vera Neplenbroek et.al. 2505.16467 link
2025-05-22 LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead Yifan Zhang et.al. 2505.16221 null
2025-05-31 QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Benjamin Schneider et.al. 2505.16175 link
2025-05-22 KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization Mingbo Song et.al. 2505.16162 null
2025-05-21 Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning Jinghui Lu et.al. 2505.15154 null
2025-05-21 BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms Yunlong Hou et.al. 2505.15141 null
2025-06-04 Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity Susav Shrestha et.al. 2505.14884 link
2025-05-20 ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions Bufang Yang et.al. 2505.14668 null
2025-05-20 ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs Yifan Sui et.al. 2505.14468 null
2025-05-20 Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Jiwon Song et.al. 2505.13866 link
2025-05-19 Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training Shane Bergsma et.al. 2505.13738 null
2025-05-16 An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents Ayesha Amjad et.al. 2505.13504 null
2025-04-02 Large Language Model powered Symbolic Execution Yihe Li et.al. 2505.13452 null
2025-05-19 Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately Yuhang Wang et.al. 2505.13326 null
2025-05-19 HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding Siran Liu et.al. 2505.13254 null
2025-05-19 FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference Guangda Liu et.al. 2505.13109 null
2025-05-19 EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code Yuhao Qing et.al. 2505.13004 link
2025-05-25 FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks Zihua Wang et.al. 2505.12728 link
2025-05-19 HydraInfer: Hybrid Disaggregated Scheduling for Multimodal Large Language Model Serving Xianzhe Dong et.al. 2505.12658 null
2025-05-17 Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning Yuheng Lu et.al. 2505.11922 null
2025-05-17 Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture Yu Wu et.al. 2505.11916 null
2025-05-25 Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning Yansong Ning et.al. 2505.11827 null
2025-07-10 TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference Raja Gond et.al. 2505.11329 link
2025-05-23 SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning Zheng Li et.al. 2505.11274 null
2025-05-16 Vaiage: A Multi-Agent Solution to Personalized Travel Planning Binwen Liu et.al. 2505.10922 null
2025-05-21 SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices Xiangwen Zhuge et.al. 2505.10259 link
2025-06-05 ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production Yuxing Xiang et.al. 2505.09999 link
2025-05-15 How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference Nidhal Jegham et.al. 2505.09598 null
2025-05-14 Statistical Modeling and Uncertainty Estimation of LLM Inference Systems Kaustabha Ray et.al. 2505.09319 null
2025-05-15 ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor Seungbeom Choi et.al. 2505.09142 link
2025-05-13 ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition Keran Zheng et.al. 2505.08981 null
2025-06-30 LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries Zekun Wu et.al. 2505.08842 null
2025-05-13 Automatic Task Detection and Heterogeneous LLM Speculative Decoding Danying Ge et.al. 2505.08600 null
2025-05-08 Scaling Laws for Speculative Decoding Siyuan Yan et.al. 2505.07858 null
2025-05-12 SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models Hang Wu et.al. 2505.07680 null
2025-05-12 LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning Xiaotian Lin et.al. 2505.07437 link
2025-05-12 Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity Guang Yan et.al. 2505.07239 null
2025-05-12 PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications Kuntai Du et.al. 2505.07203 null
2025-06-15 I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference Zibo Gao et.al. 2505.06738 null
2025-05-09 Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference Haolin Zhang et.al. 2505.06461 null
2025-04-30 Towards Efficient LLM Storage Reduction via Tensor Deduplication and Delta Compression Zirui Wang et.al. 2505.06252 null
2025-05-09 Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM Zehao Fan et.al. 2505.05772 null
2025-05-08 PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization Mohamed Salah Bouafif et.al. 2505.05584 link
2025-05-08 HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow You Peng et.al. 2505.05286 link
2025-05-12 Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving Shan Yu et.al. 2505.04021 null
2025-05-31 LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection Xinyue Zeng et.al. 2505.03793 link
2025-05-15 GPU Performance Portability needs Autotuning Burkhard Ringlein et.al. 2505.03780 link
2025-04-21 Splitwiser: Efficient LM inference with constrained resources Asad Aali et.al. 2505.03763 link
2025-04-07 AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design Yanbiao Liang et.al. 2505.03745 null
2025-05-06 Faster MoE LLM Inference for Extremely Large Models Haoqi Yang et.al. 2505.03531 null
2025-05-16 34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery Yoel Zimmermann et.al. 2505.03049 null
2025-06-30 RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Yaoqi Chen et.al. 2505.02922 null
2025-05-06 EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices Arnab Sanyal et.al. 2505.02380 null
2025-05-03 Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients Yezhen Wang et.al. 2505.01744 null
2025-05-03 High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers Brian Wong et.al. 2505.01693 null
2025-05-08 A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Sihyeong Park et.al. 2505.01658 link
2025-05-02 PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding Bradley McDanel et.al. 2505.01572 null
2025-05-01 Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models Andrew Adiletta et.al. 2505.00817 null
2025-04-29 Efficient LLMs with AMP: Attention Heads and MLP Pruning Leandro Giusti Mugnaini et.al. 2504.21174 null
2025-04-29 Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts Hanhua Hong et.al. 2504.21117 null
2025-04-30 Ascendra: Dynamic Request Prioritization for Efficient LLM Serving Azam Ikram et.al. 2504.20828 null
2025-04-30 GenTorrent: Scaling Large Language Model Serving with An Overley Network Fei Fang et.al. 2504.20101 null
2025-04-24 Tempo: Application-aware LLM Serving with Mixed SLO Requirements Wei Zhang et.al. 2504.20068 null
2025-04-28 AutoJudge: Judge Decoding Without Manual Annotation Roman Garipov et.al. 2504.20039 null
2025-04-28 semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage Ke Hong et.al. 2504.19867 null
2025-04-28 Taming the Titans: A Survey of Efficient LLM Inference Serving Ranran Zhen et.al. 2504.19720 link
2025-04-28 Bullet: Boosting GPU Utilization for LLM Serving via Dynamic Spatial-Temporal Orchestration Zejia Lin et.al. 2504.19516 null
2025-04-28 R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference Zhenyu Zhang et.al. 2504.19449 null
2025-04-28 Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory Prateek Chhikara et.al. 2504.19413 null
2025-05-07 A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification Junichiro Niimi et.al. 2504.18884 link
2025-06-15 PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation Zihao An et.al. 2504.18583 null
2025-04-25 EcoServe: Enabling Cost-effective LLM Serving with Proactive Intra- and Inter-Instance Orchestration Jiangsu Du et.al. 2504.18154 null
2025-04-25 PropRAG: Guiding Retrieval with Beam Search over Proposition Paths Jingjin Wang et.al. 2504.18070 null
2025-04-25 Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving Chang Xiao et.al. 2504.17999 null
2025-04-24 Energy Considerations of Large Language Model Inference and Efficiency Optimizations Jared Fernandez et.al. 2504.17674 null
2025-04-24 L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference Qingyuan Liu et.al. 2504.17584 null
2025-04-24 A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task Jiaqi Deng et.al. 2504.17547 null
2025-04-24 On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration Maoyang Xiang et.al. 2504.17376 null
2025-04-26 QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining Fengze Liu et.al. 2504.16511 null
2025-04-18 HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing Myunghyun Rhee et.al. 2504.16112 null
2025-05-29 Optimizing Token Consumption in LLMs: A Nano Surge Approach for Code Reasoning Efficiency Junwei Hu et.al. 2504.15989 null
2025-04-22 SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference Yihao Zhao et.al. 2504.15720 null
2025-04-23 A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings Md Millat Hosen et.al. 2504.15610 link
2025-04-21 Speculative Sampling via Exponential Races Szymon Kobus et.al. 2504.15475 null
2025-05-20 KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments Junyoung Park et.al. 2504.15364 null
2025-04-18 High-Throughput LLM inference on Heterogeneous Clusters Yi Xiong et.al. 2504.15303 null
2025-04-17 D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving Haodong Wang et.al. 2504.15299 null
2025-06-12 SLO-Aware Scheduling for Large Language Model Inferences Jinqi Huang et.al. 2504.14966 null
2025-04-21 Hardware-based Heterogeneous Memory Management for Large Language Model Inference Soojin Hwang et.al. 2504.14893 null
2025-05-28 gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling Tianyu Guo et.al. 2504.14775 link
2025-04-20 Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions Luyang Fang et.al. 2504.14772 null
2025-04-22 Optimizing SLO-oriented LLM Serving with PD-Multiplexing Weihao Cui et.al. 2504.14489 null
2025-04-19 Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator Akshat Ramachandran et.al. 2504.14365 null
2025-04-19 FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference Coleman Hooper et.al. 2504.14152 null
2025-05-12 From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs Jiliang Ni et.al. 2504.13471 null
2025-05-23 The Quantum LLM: Modeling Semantic Spaces with Quantum Principles Timo Aukusti Laine et.al. 2504.13202 null
2025-04-25 Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving Yaoyao Ding et.al. 2504.12984 null
2025-04-17 Data-efficient LLM Fine-tuning for Code Generation Weijie Lv et.al. 2504.12687 link
2025-04-16 Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading Kihyun Kim et.al. 2504.11816 link
2025-04-16 Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs Hyungwoo Lee et.al. 2504.11765 null
2025-04-16 Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures Prabhu Vellaisamy et.al. 2504.11750 null
2025-04-16 Progent: Programmable Privilege Control for LLM Agents Tianneng Shi et.al. 2504.11703 link
2025-04-15 Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints Ruicheng Ao et.al. 2504.11320 link
2025-04-14 HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving Avinash Kumar et.al. 2504.10724 null
2025-04-14 Load Balancing with Network Latencies via Distributed Gradient Descent Santiago R. Balseiro et.al. 2504.10693 null
2025-04-15 AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference Yangshen Deng et.al. 2504.10326 null
2025-04-14 KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference Yuxuan Tian et.al. 2504.09936 null
2025-04-20 Understanding and Optimizing Multi-Stage AI Inference Pipelines Abhimanyu Rajeshkumar Bambhaniya et.al. 2504.09775 null
2025-04-13 Integrating Large Language Models for Automated Structural Analysis Haoran Liang et.al. 2504.09754 null
2025-04-13 Efficient LLM Serving on Hybrid Real-time and Best-effort Requests Wan Borui et.al. 2504.09590 null
2025-04-13 LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference Jianing Zheng et.al. 2504.09561 link
2025-04-12 MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints Yichao Yuan et.al. 2504.09345 null
2025-05-22 DynaServe: Unified and Elastic Execution for Dynamic Disaggregated LLM Serving Chaoyi Ruan et.al. 2504.09285 null
2025-04-11 An Adaptive Vector Index Partitioning Scheme for Low-Latency RAG Pipeline Junkyum Kim et.al. 2504.08930 null
2025-04-11 SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting Jiaming Xu et.al. 2504.08850 null
2025-05-31 SD $^2$ : Self-Distilled Sparse Drafters Mike Lasby et.al. 2504.08838 null
2025-04-07 PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Zonghang Li et.al. 2504.08791 link
2025-04-11 Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash Fucheng Jia et.al. 2504.08378 null
2025-04-11 Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices Shengyuan Ye et.al. 2504.08242 null
2025-04-10 Token Level Routing Inference System for Edge Devices Jianshu She et.al. 2504.07878 null
2025-04-10 A System for Comprehensive Assessment of RAG Frameworks Mattia Rengo et.al. 2504.07803 link
2025-04-11 Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving Shihong Gao et.al. 2504.07494 null
2025-04-10 UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference Weikai Xu et.al. 2504.07479 null
2025-04-24 Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents Yueying Li et.al. 2504.07347 null
2025-04-08 S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning Hanqing Zeng et.al. 2504.06426 null
2025-04-08 SPIRe: Boosting LLM Inference Throughput with Speculative Decoding Sanjit Neelam et.al. 2504.06419 null
2025-04-08 Mosaic: Composite Projection Pruning for Resource-efficient LLMs Bailey J. Eccles et.al. 2504.06323 null
2025-04-08 Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching Yanhao Dong et.al. 2504.06319 null
2025-05-23 Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Gleb Rodionov et.al. 2504.06261 null
2025-05-27 User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems Jianling Wang et.al. 2504.05522 null
2025-04-07 REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding Sakib Reza et.al. 2504.05491 null
2025-04-07 Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness Dongzhuoran Zhou et.al. 2504.05163 null
2025-05-20 Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning Sugyeong Eo et.al. 2504.05047 null
2025-04-05 PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models Haofei Yin et.al. 2504.04104 null
2025-04-03 FlowKV: A Disaggregated Inference Framework with Low-Latency KV Cache Transfer and Load-Aware Scheduling Weiqing Li et.al. 2504.03775 null
2025-03-30 VFlow: Discovering Optimal Agentic Workflows for Verilog Generation Yangbo Wei et.al. 2504.03723 null
2025-04-08 MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization Zongwu Wang et.al. 2504.03661 link
2025-03-01 Echo: Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model Serving Zhibin Wang et.al. 2504.03651 null
2025-02-22 AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure The AIBrix Team et.al. 2504.03648 null
2025-04-04 Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency Erik Johannes Husom et.al. 2504.03360 null
2025-04-04 Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation Weitao Li et.al. 2504.03165 link
2025-04-03 Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search Parsa Ghaffari et.al. 2504.02426 link
2025-04-01 SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching Yuxuan Zhu et.al. 2504.00970 null
2025-06-04 Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding Aayush Gautam et.al. 2504.00030 null
2025-03-31 TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance Jingxian Xu et.al. 2503.24198 null
2025-04-06 ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance Tong Xie et.al. 2503.24053 link
2025-03-31 Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving Wei Gao et.al. 2503.24000 link
2025-03-31 Model Hemorrhage and the Robustness Limits of Large Language Models Ziyang Ma et.al. 2503.23924 null
2025-03-31 MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration Tatsuya Kubo et.al. 2503.23817 null
2025-03-30 Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference Wei Tao et.al. 2503.23294 null
2025-03-30 PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference Weisheng Jin et.al. 2503.23274 link
2025-03-28 Niyama : Breaking the Silos of LLM Inference Serving Kanishk Goel et.al. 2503.22562 null
2025-03-26 Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation Yunkai Liang et.al. 2503.20552 link
2025-03-25 LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation Han Chen et.al. 2503.19950 link
2025-03-24 LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment Varsha Embar et.al. 2503.19090 null
2025-03-23 SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices Jian Ma et.al. 2503.18986 null
2025-03-24 xKV: Cross-Layer SVD for KV-Cache Compression Chi-Chih Chang et.al. 2503.18893 link
2025-04-21 Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design Rui Xie et.al. 2503.18869 null
2025-05-14 Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization Minsu Kim et.al. 2503.18599 null
2025-03-24 DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective Changlun Li et.al. 2503.18313 null
2025-03-24 Jenga: Effective Memory Management for Serving LLM with Heterogeneity Chen Zhang et.al. 2503.18292 null
2025-03-27 WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference Youhui Zuo et.al. 2503.17922 link
2025-03-22 PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling Chongpeng Liu et.al. 2503.17707 null
2025-03-21 V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms Javier J. Poveda Rodrigo et.al. 2503.17422 null
2025-03-21 Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation Jingzhi Fang et.al. 2503.16893 null
2025-05-16 KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse Huan Yang et.al. 2503.16525 null
2025-03-20 SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models Fahao Chen et.al. 2503.15921 null
2025-03-19 Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study Jomar Thomas Almonte et.al. 2503.15248 null
2025-04-15 ELTEX: A Framework for Domain-Driven Synthetic Data Generation Arina Razmyslovich et.al. 2503.15055 link
2025-03-19 FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding Chongjun Tu et.al. 2503.14935 null
2025-03-19 Communication-Efficient Distributed On-Device LLM Inference Over Wireless Networks Kai Zhang et.al. 2503.14882 null
2025-03-21 RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving Wenqi Jiang et.al. 2503.14649 null
2025-03-18 PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play Wei Fang et.al. 2503.14432 null
2025-03-24 Mitigating KV Cache Competition to Enhance User Experience in LLM Inference Haiying Shen et.al. 2503.13773 null
2025-03-17 AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications Haiying Shen et.al. 2503.13737 null
2025-03-17 ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts Evangelos Georganas et.al. 2503.13565 null
2025-03-14 Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce Jingying Zeng et.al. 2503.13518 null
2025-03-17 xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference Maximilian Beck et.al. 2503.13427 link
2025-04-14 VeriLeaky: Navigating IP Protection vs Utility in Fine-Tuning for LLM-Driven Verilog Coding Zeng Wang et.al. 2503.13116 null
2025-03-15 TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation Mayank Kumar et.al. 2503.12217 null
2025-04-22 Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques Neusha Javidnia et.al. 2503.11816 null
2025-05-19 D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning Jia Zhang et.al. 2503.11441 null
2025-03-14 MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens Jeong Hun Yeo et.al. 2503.11315 link
2025-04-08 Green Prompting Marta Adamska et.al. 2503.10666 null
2025-05-16 Collaborative Speculative Inference for Efficient LLM Inference Serving Luyao Gao et.al. 2503.10325 null
2025-03-17 Exploiting Edited Large Language Models as General Scientific Optimizers Qitan Lv et.al. 2503.09620 null
2025-03-13 BIMBA: Selective-Scan Compression for Long-Range Video Question Answering Md Mohaiminul Islam et.al. 2503.09590 link
2025-05-23 Prompt Inference Attack on Distributed Large Language Model Inference Frameworks Xinjian Luo et.al. 2503.09291 null
2025-05-02 Prompt Inversion Attack against Collaborative Inference of Large Language Models Wenjie Qu et.al. 2503.09022 null
2025-03-19 Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning Yuan Jiang et.al. 2503.09020 link
2025-03-11 Position-Aware Depth Decay Decoding ( $D^3$ ): Boosting Large Language Model Inference Efficiency Siqi Fan et.al. 2503.08524 null
2025-03-11 FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework Jianian Zhu et.al. 2503.08461 null
2025-03-19 TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems Feiyang Wu et.al. 2503.08415 link
2025-03-11 Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference Pol G. Recasens et.al. 2503.08311 null
2025-03-09 Seesaw: High-throughput LLM Inference via Model Re-sharding Qidong Su et.al. 2503.06433 null
2025-02-24 Encoding Inequity: Examining Demographic Bias in LLM-Driven Robot Caregiving Raj Korpan et.al. 2503.05765 null
2025-03-07 Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching Bowen Pang et.al. 2503.05248 link
2025-05-21 Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Simon A. Aytes et.al. 2503.05179 link
2025-03-07 SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding Kaiyu Huang et.al. 2503.05096 null
2025-03-07 Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size Alireza Behtash et.al. 2503.04704 null
2025-03-15 Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking Yijie Xu et.al. 2503.04636 null
2025-03-06 AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services Xiaoqi Wang et.al. 2503.04418 null
2025-03-06 Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Kou Misaki et.al. 2503.04412 null
2025-03-06 ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput Junsoo Kim et.al. 2503.04253 null
2025-03-06 Beyond Memorization: Evaluating the True Type Inference Capabilities of LLMs for Java Code Snippets Yiwen Dong et.al. 2503.04076 null
2025-03-04 FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference Hongchao Du et.al. 2503.03777 null
2025-03-05 MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems Rui Ye et.al. 2503.03686 null
2025-03-05 Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems Yaoru Li et.al. 2503.03505 link
2025-03-05 Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism Xinyuan Lin et.al. 2503.03182 null
2025-03-04 PersonaX: A Recommendation Agent Oriented User Modeling Framework for Long Behavior Sequence Yunxiao Shi et.al. 2503.02398 link
2025-03-04 VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference Zihan Liu et.al. 2503.02236 null
2025-02-26 Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis Long Cheng et.al. 2503.01873 null
2025-04-30 SAGE: A Framework of Precise Retrieval for RAG Jintao Zhang et.al. 2503.01713 null
2025-03-03 Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens Xinsheng Wang et.al. 2503.01710 link
2025-03-03 DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems Minoo Hosseinzadeh et.al. 2503.01704 null
2025-03-15 Towards An Efficient LLM Training Paradigm for CTR Prediction Allen Lin et.al. 2503.01001 null
2025-03-02 Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers Yiran Zhao et.al. 2503.00865 null
2025-03-01 Tutorial Proposal: Speculative Decoding for Efficient LLM Inference Heming Xia et.al. 2503.00491 null
2025-03-04 Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving Qihui Zhou et.al. 2503.00392 null
2025-02-28 FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference Xunhao Lai et.al. 2502.20766 link
2025-05-04 SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models Han-Byul Kim et.al. 2502.20727 null
2025-04-02 Smart Routing: Cost-Effective Multi-LLM Serving for Multi-Core AIOS Kai Mei et.al. 2502.20576 link
2025-02-27 M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging Jinghao Feng et.al. 2502.20301 null
2025-02-26 Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs Yiheng Yang et.al. 2502.19078 null
2025-02-26 Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection Carter Adams et.al. 2502.18823 null
2025-02-24 LLM Inference Acceleration via Efficient Operation Fusion Mahsa Salmani et.al. 2502.17728 null
2025-02-24 CodeSwift: Accelerating LLM Inference for Efficient Code Generation Qianhui Zhao et.al. 2502.17139 null
2025-02-24 Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM Lian Liu et.al. 2502.16963 null
2025-02-24 DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance Xuanfan Ni et.al. 2502.16886 null
2025-03-01 CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter Yepeng Weng et.al. 2502.16880 null
2025-02-23 DISC: Dynamic Decomposition Improves LLM Inference Scaling Jonathan Light et.al. 2502.16706 null
2025-02-23 Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines Xinwei Long et.al. 2502.16641 null
2025-05-01 TerEffic: Highly Efficient Ternary LLM Inference on FPGA Chenyang Yin et.al. 2502.16473 null
2025-02-27 Dynamic Parallel Tree Search for Efficient LLM Reasoning Yifu Ding et.al. 2502.16235 null
2025-02-21 KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse Jingbo Yang et.al. 2502.16002 link
2025-02-14 Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization Bowen Pang et.al. 2502.15763 null
2025-02-21 Towards Swift Serverless LLM Cold Starts with ParaServe Chiheng Lou et.al. 2502.15524 null
2025-02-24 HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings Rasmus Aavang et.al. 2502.15411 link
2025-02-24 Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference Yaohua Tang et.al. 2502.15294 null
2025-02-21 A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation Shilong Hou et.al. 2502.15233 link
2025-02-19 EvoP: Robust LLM Inference via Evolutionary Pruning Shangyu Wu et.al. 2502.14910 null
2025-04-21 LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Shang Yang et.al. 2502.14866 link
2025-02-20 Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale Shashwat Jaiswal et.al. 2502.14617 null
2025-02-20 SR-LLM: Rethinking the Structured Representation in Large Language Model Jiahuan Zhang et.al. 2502.14352 null
2025-02-20 Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications Kayhan Behdin et.al. 2502.14305 null
2025-02-19 RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression Payman Behnam et.al. 2502.14051 null
2025-02-19 Autellix: An Efficient Serving Engine for LLM Agents as General Programs Michael Luo et.al. 2502.13965 null
2025-02-19 Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference Qingfa Xiao et.al. 2502.13542 null
2025-02-19 What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis Peiran Wang et.al. 2502.13490 null
2025-02-24 BaKlaVa – Budgeted Allocation of KV cache for Long-context Inference Ahmed Burak Gulhan et.al. 2502.13176 null
2025-02-18 SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems Mike Zhang et.al. 2502.12927 link
2025-03-27 R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs Sumin Jo et.al. 2502.12767 link
2025-02-18 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Cheng Luo et.al. 2502.12574 link
2025-02-18 Distributed On-Device LLM Inference With Over-the-Air Computation Kai Zhang et.al. 2502.12559 null
2025-02-18 SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs Ahmed F. AbouElhamayed et.al. 2502.12444 link
2025-02-17 Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs Kan Zhu et.al. 2502.12216 null
2025-02-17 Designing Role Vectors to Improve LLM Inference Behaviour Daniele Potertì et.al. 2502.12055 null
2025-02-17 DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services Ting Sun et.al. 2502.11417 null
2025-02-17 Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment Ben Dong et.al. 2502.11347 null
2025-02-16 Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View Yanran Wu et.al. 2502.11256 null
2025-02-16 Diversified Sampling Improves Scaling LLM inference Tianchun Wang et.al. 2502.11027 null
2025-02-16 Leveraging Uncertainty Estimation for Efficient LLM Routing Tuo Zhang et.al. 2502.11021 null
2025-04-07 Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings Liangqi Yuan et.al. 2502.11007 link
2025-02-15 Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA Jindong Li et.al. 2502.10659 null
2025-02-05 QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache Rishabh Tiwari et.al. 2502.10424 null
2025-02-14 λScale: Enabling Fast Scaling for Serverless Large Language Model Inference Minchen Yu et.al. 2502.09922 null
2025-02-14 INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing Hongsun Jang et.al. 2502.09921 null
2025-02-13 On multi-token prediction for efficient LLM inference Somesh Mehra et.al. 2502.09419 null
2025-02-13 ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments Youhe Jiang et.al. 2502.09334 null
2025-03-21 RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models Quan Wei et.al. 2502.09003 null
2025-02-13 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Heejun Lee et.al. 2502.08910 null
2025-02-13 DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation Tangyu Jiang et.al. 2502.08905 null
2025-02-12 Universal Model Routing for Efficient LLM Inference Wittawat Jitkrittum et.al. 2502.08773 null
2025-02-12 MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation Min Hou et.al. 2502.08271 null
2025-02-12 Memory Offloading for Large Language Model Inference with Latency SLO Guarantees Chenxiang Ma et.al. 2502.08182 null
2025-02-12 Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences Shanshan Han et.al. 2502.08142 null
2025-03-19 Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding Ziyao Wang et.al. 2502.08020 null
2025-02-11 HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment Youhe Jiang et.al. 2502.07903 null
2025-02-11 SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters Yiping Wang et.al. 2502.07832 null
2025-03-21 PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference Yufeng Gu et.al. 2502.07578 link
2025-03-05 Online Scheduling for LLM Inference with KV Cache Constraints Patrick Jaillet et.al. 2502.07115 null
2025-02-10 Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE Haiduo Huang et.al. 2502.06282 link
2025-03-15 Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models Soham Poddar et.al. 2502.05610 null
2025-02-08 Mechanistic Interpretability of Emotion Inference in Large Language Models Ala N. Tak et.al. 2502.05489 null
2025-02-07 BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference Reena Elangovan et.al. 2502.05376 null
2025-01-31 Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies Nadav Timor et.al. 2502.05202 null
2025-03-15 EcoServe: Designing Carbon-Aware AI Inference Systems Yueying Li et.al. 2502.05043 null
2025-02-07 LLM Query Scheduling with Prefix Reuse and Latency Constraints Gregory Dexter et.al. 2502.04677 null
2025-02-18 WaferLLM: A Wafer-Scale LLM Inference System Congjie He et.al. 2502.04563 null
2025-02-25 KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference Xing Li et.al. 2502.04420 link
2025-02-06 CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference Zehua Pei et.al. 2502.04416 link
2025-02-11 Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing Kunfeng Lai et.al. 2502.04411 null
2025-02-26 AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference Qingyue Yang et.al. 2502.04077 link
2025-02-06 CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing Yu Yuan et.al. 2502.03997 null
2025-02-06 Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective Yuan Feng et.al. 2502.03805 link
2025-04-04 Adaptive Semantic Prompt Caching with VectorQ Luis Gaspar Schroeder et.al. 2502.03771 null
2025-02-05 Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training Reza Shirkavand et.al. 2502.03604 null
2025-02-05 HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference Zeyu Zhang et.al. 2502.03589 null
2025-02-05 Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL Wenbo Sun et.al. 2502.02818 null
2025-02-05 Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation Jingyu Liu et.al. 2502.02789 link
2025-02-04 LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing Yang Li et.al. 2502.02743 null
2025-02-04 EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization Yize Wu et.al. 2502.02493 null
2025-01-30 Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency Sazzad Hossain et.al. 2502.01651 null
2025-02-06 An Investigation of FP8 Across Accelerators for LLM Inference Jiwoo Kim et.al. 2502.01070 null
2025-02-02 Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference Patrick Yubeaton et.al. 2502.00922 null
2025-02-02 MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies Ehsaneddin Asgari et.al. 2502.00894 null
2025-02-02 SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models Jiawen Zhang et.al. 2502.00847 null
2025-02-02 Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs Youhe Jiang et.al. 2502.00722 null
2025-02-13 Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning Zhi Zhou et.al. 2502.00511 null
2025-02-01 UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs Yizhe Xiong et.al. 2502.00439 null
2025-02-01 ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference Xiang Liu et.al. 2502.00299 null
2025-01-16 Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models Tom Wallace et.al. 2502.00046 null
2025-02-07 Pushing the Limits of BFP on Narrow Precision LLM Inference Hui Wang et.al. 2502.00026 null
2025-02-14 Reward-Guided Speculative Decoding for Efficient LLM Reasoning Baohao Liao et.al. 2501.19324 null
2025-01-31 Pheromone-based Learning of Optimal Reasoning Paths Anirudh Chari et.al. 2501.19278 null
2025-01-31 Structural Embedding Projection for Contextual Large Language Model Inference Vincent Enoasmo et.al. 2501.18826 null
2025-01-29 On the Partitioning of GPU Power among Multi-Instances Tirth Vamja et.al. 2501.17752 null
2025-02-02 RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations Zunhai Su et.al. 2501.16383 null
2025-01-27 Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs Antony Bartlett et.al. 2501.16191 null
2025-01-27 TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference Jack Min Ong et.al. 2501.16007 null
2025-01-27 Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference Tharindu B. Hewage et.al. 2501.15829 link
2025-01-25 Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads Xingyang He et.al. 2501.15113 null
2025-01-25 PatchRec: Multi-Grained Patching for Efficient LLM-based Sequential Recommendation Jiayi Liao et.al. 2501.15087 null
2025-02-09 HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location Ting Sun et.al. 2501.14808 null
2025-01-11 HeteroLLM: Accelerating Large Language Model Inference on Mobile SoCs platform with Heterogeneous AI Accelerators Le Chen et.al. 2501.14794 null
2025-01-04 DeServe: Towards Affordable Offline LLM Inference via Decentralization Linyu Wu et.al. 2501.14784 null
2024-12-13 KVDirect: Distributed Disaggregated LLM Inference Shiyang Chen et.al. 2501.14743 null
2025-01-24 Accelerated Preference Elicitation with LLM-Based Proxies David Huang et.al. 2501.14625 null
2025-01-27 DeepFlow: Serverless Large Language Model Serving at Scale Junhao Hu et.al. 2501.14417 null
2025-01-24 Locality-aware Fair Scheduling in LLM Serving Shiyi Cao et.al. 2501.14312 null
2025-01-27 Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading Minrui Xu et.al. 2501.14205 null
2025-01-08 iServe: An Intent-based Serving System for LLMs Dimitrios Liakopoulos et.al. 2501.13111 null
2025-01-24 EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation Yifan Yu et.al. 2501.12689 null
2025-03-16 Human-like conceptual representations emerge from language prediction Ningyu Xu et.al. 2501.12547 null
2025-01-21 AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding Zikun Li et.al. 2501.12162 null
2025-02-11 Glinthawk: A Two-Tiered Architecture for Offline LLM Inference Pouya Hamadanian et.al. 2501.11779 link
2025-01-20 Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas Nishant Balepur et.al. 2501.11549 link
2025-03-21 GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation Shashikant Ilager et.al. 2501.11006 link
2025-03-06 A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks Xinzhe Li et.al. 2501.10069 link
2025-01-16 PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks Huiyou Zhan et.al. 2501.09367 null
2025-01-16 Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition Takaaki Hori et.al. 2501.09258 null
2025-01-16 Split Fine-Tuning for Large Language Models in Wireless Networks Songge Zhang et.al. 2501.09237 null
2025-01-15 Guiding Retrieval using LLM-based Listwise Rankers Mandeep Rathee et.al. 2501.09186 link
2025-01-14 Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings Paul Joe Maliakel et.al. 2501.08219 null
2025-01-14 PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving Ahmet Caner Yüzügüler et.al. 2501.08192 null
2025-01-14 Hierarchical Autoscaling for Large Language Model Serving with Chiron Archit Patke et.al. 2501.08090 null
2025-01-12 MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference Wenxuan Zeng et.al. 2501.06807 null
2025-01-12 Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management Liu Qianli et.al. 2501.06709 null
2025-02-07 Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping Muru Zhang et.al. 2501.06589 link
2025-01-15 Multimodal-to-Text Prompt Engineering in Large Language Models Using Feature Embeddings for GNSS Interference Characterization Harshith Manjunath et.al. 2501.05079 null
2025-02-08 Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text Ali Al-Lawati et.al. 2501.03166 link
2025-01-05 TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms Jovan Stojkovic et.al. 2501.02600 null
2025-01-04 AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference Zhuomin He et.al. 2501.02336 link
2024-12-31 Towards Sustainable Large Language Model Serving Sophia Nguyen et.al. 2501.01990 null
2025-01-03 Efficient LLM Inference with Activation Checkpointing and Hybrid Caching Sanghyeon Lee et.al. 2501.01792 null
2025-01-03 (WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges Mohamed Hisham Abdellatif et.al. 2501.01588 null
2025-01-21 BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference Wonsuk Jang et.al. 2501.01144 link
2025-04-23 FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Zihao Ye et.al. 2501.01005 null
2025-02-25 Rethinking Layer Removal: A Hybrid Pruning Framework Combining Layer Removal and Singular Value Selection for Efficient LLM Compression Kainan Liu et.al. 2501.00339 null
2024-12-23 Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs Dibakar Gope et.al. 2501.00032 link
2024-12-29 TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication Zongwu Wang et.al. 2412.20501 link
2024-12-29 GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions Tianyao Shi et.al. 2412.20322 null
2025-01-15 LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System Hyucksung Kwon et.al. 2412.20166 null
2024-12-19 GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors Chengming Zhang et.al. 2412.19829 null
2025-01-05 Gradient Weight-normalized Low-rank Projection for Efficient LLM Training Jia-Hong Huang et.al. 2412.19616 link
2025-01-02 A Survey on Large Language Model Acceleration based on KV Cache Management Haoyang Li et.al. 2412.19442 link
2025-02-13 An Engorgio Prompt Makes Large Language Model Babble on Jianshuo Dong et.al. 2412.19394 link
2024-12-25 Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference Libo Zhang et.al. 2412.18934 null
2024-12-24 TimelyLLM: Segmented LLM Serving System for Time-sensitive Robotic Applications Neiwen Ling et.al. 2412.18695 null
2024-12-26 KunServe: Elastic and Efficient Large Language Model Serving with Parameter-centric Memory Management Rongxin Cheng et.al. 2412.18169 null
2025-02-22 Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media Zhen Sun et.al. 2412.18148 null
2024-12-24 Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels Mingcong Song et.al. 2412.18106 null
2024-12-23 Trustworthy and Efficient LLMs Meet Databases Kyoungmin Kim et.al. 2412.18022 null
2025-02-20 GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference Chao Zeng et.al. 2412.17560 null
2025-02-18 VilBias: A Study of Bias Detection through Linguistic and Visual Cues , presenting Annotation Strategies, Evaluation, and Key Challenges Shaina Raza et.al. 2412.17052 link
2024-12-21 SYMPHONY: Improving Memory Management for LLM Inference Workloads Saurabh Agarwal et.al. 2412.16434 null
2024-12-20 WebLLM: A High-Performance In-Browser LLM Inference Engine Charlie F. Ruan et.al. 2412.15803 link
2024-12-19 Fietje: An open, efficient LLM for Dutch Bram Vanroy et.al. 2412.15450 link
2024-12-19 PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization Jiayi Wu et.al. 2412.14510 link
2024-12-19 Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems Genki Kusano et.al. 2412.14454 null
2024-12-18 A Survey on LLM Inference-Time Self-Improvement Xiangjue Dong et.al. 2412.14352 link
2024-12-18 Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models Seungeun Oh et.al. 2412.12687 null
2024-12-17 A System for Microserving of LLMs Hongyi Jin et.al. 2412.12488 null
2024-12-17 LITA: An Efficient LLM-assisted Iterative Topic Augmentation Framework Chia-Hsuan Chang et.al. 2412.12459 null
2024-12-16 CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation Hongxuan Zhang et.al. 2412.11741 null
2025-01-20 FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation Dannong Wang et.al. 2412.11378 null
2025-01-09 Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning Yun Qu et.al. 2412.11120 link
2024-12-15 NITRO: LLM Inference on Intel Laptop NPUs Anthony Fei et.al. 2412.11053 link
2025-03-11 SCBench: A KV Cache-Centric Analysis of Long-Context Methods Yucheng Li et.al. 2412.10319 null
2024-12-17 TurboAttention: Efficient Attention Approximation For High Throughputs LLMs Hao Kang et.al. 2412.08585 null
2024-12-11 Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths Naryeong Kim et.al. 2412.08281 null
2024-12-12 TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch Xingchen Song et.al. 2412.08237 null
2024-12-09 Asynchronous LLM Function Calling In Gim et.al. 2412.07017 null
2024-12-08 Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization Dongwei Wang et.al. 2412.06858 null
2024-12-09 JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM Takuro Fujii et.al. 2412.06738 link
2024-12-09 SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs James Vo et.al. 2412.06198 null
2024-12-08 XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference Weizhuo Li et.al. 2412.05896 null
2025-02-17 APOLLO: SGD-like Memory, AdamW-level Performance Hanqing Zhu et.al. 2412.05270 link
2024-12-06 Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale? Seyed Amin Tabatabaei et.al. 2412.05137 null
2024-12-11 Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference Qingyuan Li et.al. 2412.04964 null
2025-01-26 GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments Yanyu Chen et.al. 2412.04788 null
2024-12-09 Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems Ayush Gundawar et.al. 2412.04569 link
2024-12-03 Multi-Bin Batching for Increasing LLM Inference Throughput Ozgur Guldogan et.al. 2412.04504 null
2025-01-17 BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching Zhen Zheng et.al. 2412.03594 null
2024-12-04 Unifying KV Cache Compression for Large Language Models with LeanKV Yanqi Zhang et.al. 2412.03131 null
2024-12-03 Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Da Ma et.al. 2412.02252 null
2024-12-02 Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training Yujie Wang et.al. 2412.01523 null
2024-12-02 PLD+: Accelerating LLM inference by leveraging Language Model Artifacts Shwetha Somasundaram et.al. 2412.01447 null
2024-12-02 Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking Marco Federici et.al. 2412.01380 null
2024-12-02 Can Large Language Models Serve as Evaluators for Code Summarization? Yang Wu et.al. 2412.01333 link
2024-12-05 RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy Geonho Lee et.al. 2412.01129 null
2024-12-02 TruncFormer: Private LLM Inference Using Only Truncations Patrick Yubeaton et.al. 2412.01042 null
2024-11-25 Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration Zhuofan Wen et.al. 2412.00061 null
2024-11-29 A dynamic parallel method for performance optimization on hybrid CPUs Luo Yu et.al. 2411.19542 null
2024-12-04 Marconi: Prefix Caching for the Era of Hybrid LLMs Rui Pan et.al. 2411.19379 null
2024-12-08 Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Akhiad Bercovich et.al. 2411.19146 null
2024-11-27 FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving Ao Shen et.al. 2411.18424 null
2024-11-29 InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks Xinyao Zheng et.al. 2411.18191 null
2024-11-28 MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache Akshat Sharma et.al. 2411.18077 null
2024-11-24 Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments Nikoleta Iliakopoulou et.al. 2411.17741 null
2024-11-18 Generative AI on the Edge: Architecture and Performance Evaluation Zeinab Nezami et.al. 2411.17712 null
2024-11-26 Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism Yi-Chien Lin et.al. 2411.17651 null
2024-11-26 PIM-AI: A Novel Architecture for High-Efficiency LLM Inference Cristobal Ortega et.al. 2411.17309 null
2024-11-26 Star Attention: Efficient LLM Inference over Long Sequences Shantanu Acharya et.al. 2411.17116 link
2024-11-26 Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation Chaoyi Jiang et.al. 2411.17089 null
2024-11-25 MixPE: Quantization and Hardware Co-design for Efficient LLM Inference Yu Zhang et.al. 2411.16158 null
2024-11-24 eFedLLM: Efficient LLM Inference Based on Federated Learning Shengwen Ding et.al. 2411.16003 null
2024-11-24 Ensuring Fair LLM Serving Amid Diverse Applications Redwan Ibne Seraj Khan et.al. 2411.15997 null
2024-11-24 Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format Chao Fang et.al. 2411.15982 null
2024-11-24 Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems Wenxiang Lin et.al. 2411.15715 null
2024-11-26 Enabling Efficient Serverless Inference Serving for LLM (Large Language Model) in the Cloud Himel Ghosh et.al. 2411.15664 null
2025-01-14 AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution Fengyuan Liu et.al. 2411.15102 link
2024-11-27 XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models Yixin Dong et.al. 2411.15100 null
2024-11-02 Transforming Engineering Education Using Generative AI and Digital Twin Technologies Yu-Zheng Lin et.al. 2411.14433 null
2024-11-21 InstCache: A Predictive Cache for LLM Serving Longwei Zou et.al. 2411.13820 null
2024-11-21 Disentangling Memory and Reasoning Ability in Large Language Models Mingyu Jin et.al. 2411.13504 link
2024-11-27 Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding Hyun Ryu et.al. 2411.13157 null
2024-11-21 LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts Zhuohan Gu et.al. 2411.13009 null
2024-11-15 An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2 Pepijn de Reus et.al. 2411.12758 link
2025-01-24 SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference Jiho Shin et.al. 2411.12692 null
2024-11-18 BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration Yuzong Chen et.al. 2411.11745 link
2024-11-18 MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs Shiyi Cao et.al. 2411.11217 null
2024-11-17 FastDraft: How to Train Your Draft Ofir Zafrir et.al. 2411.11055 null
2024-12-16 SAM Decoding: Speculative Decoding via Suffix Automaton Yuxuan Hu et.al. 2411.10666 link
2024-11-15 Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity Zichen Song et.al. 2411.10069 null
2024-11-15 AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference Janghwan Lee et.al. 2411.09909 null
2024-11-23 Squeezed Attention: Accelerating Long Context Length LLM Inference Coleman Hooper et.al. 2411.09688 link
2024-11-15 Communication Compression for Tensor Parallel LLM Inference Jan Hansen-Palmus et.al. 2411.09510 null
2024-11-14 Pie: Pooling CPU Memory for LLM Inference Yi Xu et.al. 2411.09317 null
2025-01-23 Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism Libo Wang et.al. 2411.09111 link
2024-11-12 Towards Low-bit Communication for Tensor Parallel LLM Inference Harry Dong et.al. 2411.07942 null
2024-12-12 ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization Weibo Zhao et.al. 2411.07762 null
2025-01-08 BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks Shubham Gandhi et.al. 2411.07464 null
2024-11-19 The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving Kyoungmin Kim et.al. 2411.07447 null
2024-11-10 EcoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving Haiying Shen et.al. 2411.06364 null
2024-11-08 SSSD: Simply-Scalable Speculative Decoding Michele Marzollo et.al. 2411.05894 null
2024-11-08 AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality Ilias Bournias et.al. 2411.05555 null
2024-11-07 Hardware and Software Platform Inference Cheng Zhang et.al. 2411.05197 null
2024-10-22 Scattered Forest Search: Smarter Code Space Exploration with LLMs Jonathan Light et.al. 2411.05010 null
2024-11-07 SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference Gabriele Oliaro et.al. 2411.04975 null
2024-11-05 CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration Hongpeng Jin et.al. 2411.02829 null
2024-12-19 DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving Yuhan Liu et.al. 2411.02820 null
2024-11-10 Context Parallelism for Scalable Million-Token Inference Amy Yang et.al. 2411.01783 null
2024-11-04 RAGViz: Diagnose and Visualize Retrieval-Augmented Generation Tevin Wang et.al. 2411.01751 link
2024-11-03 Autoformulation of Mathematical Optimization Models Using LLMs Nicolás Astorga et.al. 2411.01679 null
2024-11-06 HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Peng Tang et.al. 2411.01433 null
2024-11-02 RA-WEBs: Remote Attestation for WEB services Kosei Akama et.al. 2411.01340 null
2024-11-02 NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference Xuanlin Jiang et.al. 2411.01142 null
2024-10-30 A Theoretical Perspective for Speculative Decoding Algorithm Ming Yin et.al. 2411.00841 null
2024-11-01 Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction Houjing Wei et.al. 2411.00646 null
2024-11-01 LLM-Based Misconfiguration Detection for AWS Serverless Computing Jinfeng Wen et.al. 2411.00642 null
2024-12-08 ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models Anbang Wang et.al. 2411.00533 null
2024-11-01 Attention Tracker: Detecting Prompt Injection Attacks in LLMs Kuo-Han Hung et.al. 2411.00348 null
2024-10-31 LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators Krishna Teja Chitty-Venkata et.al. 2411.00136 link
2024-10-31 Interpretable Language Modeling via Induction-head Ngram Models Eunji Kim et.al. 2411.00066 link
2024-10-31 ALISE: Accelerating Large Language Model Serving with Speculative Scheduling Youpeng Zhao et.al. 2410.23537 null
2024-10-30 BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference Junqi Zhao et.al. 2410.23079 link
2024-10-29 Scaling LLM Inference with Optimized Sample Compute Allocation Kexun Zhang et.al. 2410.22480 link
2024-10-29 SVIP: Towards Verifiable Inference of Open-source Large Language Models Yifan Sun et.al. 2410.22307 null
2025-02-08 ProMoE: Fast MoE-based LLM Serving using Proactive Caching Xiaoniu Song et.al. 2410.22134 null
2025-01-21 MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression Noel Elias et.al. 2410.21548 link
2025-04-29 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun et.al. 2410.21465 null
2024-10-27 FIRP: Faster LLM inference via future intermediate representation prediction Pengfei Wu et.al. 2410.20488 null
2024-10-29 Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management Tuowei Wang et.al. 2410.19274 null
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-30 Dynamic Vocabulary Pruning in Early-Exit LLMs Jort Vincenti et.al. 2410.18952 link
2024-10-25 A Survey on Speech Large Language Models Jing Peng et.al. 2410.18908 null
2024-10-24 A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs Ankit Singh Rawat et.al. 2410.18779 null
2024-10-24 BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching Peizhuang Cong et.al. 2410.18701 null
2024-10-23 CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation Qinsi Wang et.al. 2410.18311 null
2024-10-25 Fast Inference for Augmented Large Language Models Rana Shahout et.al. 2410.18248 null
2024-10-23 POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference Aditya K Kamath et.al. 2410.18038 null
2024-12-29 AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning Yehonathan Refael et.al. 2410.17881 null
2024-10-22 FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs Haoran Lin et.al. 2410.16663 null
2024-10-22 Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency Prafulla Kumar Choubey et.al. 2410.16597 null
2024-12-18 MagicPIG: LSH Sampling for Efficient LLM Generation Zhuoming Chen et.al. 2410.16179 link
2024-10-21 Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuning Arijit Das et.al. 2410.16029 link
2024-10-21 RAC: Efficient LLM Factuality Correction with Retrieval Augmentation Changmao Li et.al. 2410.15667 link
2024-10-21 Bayesian Concept Bottleneck Models with LLM Priors Jean Feng et.al. 2410.15555 link
2024-10-20 CompAct: Compressed Activations for Memory-Efficient LLM Training Yara Shamshoum et.al. 2410.15352 null
2024-10-20 EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models Junhao Hu et.al. 2410.15332 null
2024-10-19 IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System Minseok Seo et.al. 2410.15008 null
2024-10-23 Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching Jie Peng et.al. 2410.14740 null
2024-10-18 A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference You Wu et.al. 2410.14442 link
2024-10-18 Revisiting SLO and Goodput Metrics in LLM Serving Zhibin Wang et.al. 2410.14257 null
2024-10-18 Leveraging Large Language Models for Enhancing Public Transit Services Jiahao Wang et.al. 2410.14147 null
2024-10-17 RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs Jiatan Huang et.al. 2410.13987 null
2024-11-07 Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo et.al. 2410.13835 link
2024-10-17 Progressive Mixed-Precision Decoding for Efficient LLM Inference Hao Mark Chen et.al. 2410.13461 null
2024-10-17 Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning Minseok Choi et.al. 2410.13274 null
2024-10-17 Data Defenses Against Large Language Models William Agnew et.al. 2410.13138 link
2024-10-19 In-context KV-Cache Eviction for LLMs via Attention-Gate Zihao Zeng et.al. 2410.12876 null
2024-10-10 RecurFormer: Not All Transformer Heads Need Self-Attention Ruiqing Yan et.al. 2410.12850 null
2024-10-16 COMET: Towards Partical W4A4KV4 LLMs Serving Lian Liu et.al. 2410.12168 null
2024-10-16 Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning Huiwen Wu et.al. 2410.12130 null
2024-10-15 Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix Yingyu Liang et.al. 2410.11261 null
2024-10-06 Continuous Approximations for Improving Quantization Aware Training of LLMs He Li et.al. 2410.10849 null
2024-10-14 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Guangxuan Xiao et.al. 2410.10819 link
2024-10-16 SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization Akrit Mudvari et.al. 2410.10759 null
2024-10-12 Power-Softmax: Towards Secure LLM Inference over Encrypted Data Itamar Zimerman et.al. 2410.09457 null
2024-10-11 Large Language Models for Energy-Efficient Code: Emerging Results and Future Directions Huiyun Peng et.al. 2410.09241 null
2024-10-11 SubZero: Random Subspace Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning Ziming Yu et.al. 2410.08989 link
2024-12-03 HyperDPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework Yinuo Ren et.al. 2410.08316 null
2024-10-14 Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining Tianyi Bai et.al. 2410.08102 link
2024-10-09 SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration Heming Xia et.al. 2410.06916 link
2024-10-08 Active Evaluation Acquisition for Efficient LLM Benchmarking Yang Li et.al. 2410.05952 null
2024-10-08 Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space Zhonghan Chen et.al. 2410.05752 null
2024-10-08 ParallelSpec: Parallel Drafter for Efficient Speculative Decoding Zilin Xiao et.al. 2410.05589 null
2024-10-07 Fast State Restoration in LLM Serving with HCache Shiwei Gao et.al. 2410.05004 null
2024-10-06 RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference Yige Xu et.al. 2410.04519 link
2025-01-23 Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective Jinhao Li et.al. 2410.04466 null
2024-12-05 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation Aurick Qiao et.al. 2410.03960 null
2024-10-04 LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity Selim Furkan Tekin et.al. 2410.03953 link
2024-10-04 EXAQ: Exponent Aware Quantization For LLMs Acceleration Moran Shkolnik et.al. 2410.03185 link
2024-10-04 UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference Jing Xiong et.al. 2410.03090 null
2024-10-03 LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences Zhenxiao Fu et.al. 2410.02950 null
2024-10-03 Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration Yun Qu et.al. 2410.02511 link
2024-10-03 LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services Małgorzata Łazuka et.al. 2410.02425 link
2024-10-04 Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation Xiaoqun Liu et.al. 2410.02220 null
2024-10-05 Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models Yinhong Liu et.al. 2410.02205 null
2024-10-02 Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads Yuxiang Huang et.al. 2410.01805 link
2024-10-02 ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving Yifan Qiao et.al. 2410.01228 null
2024-10-01 TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Zonghang Li et.al. 2410.00531 link
2024-10-09 LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management Yi Xiong et.al. 2410.00428 null
2024-11-06 The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems Linke Song et.al. 2409.20002 null
2024-09-28 SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models Yi Wu et.al. 2409.19471 null
2024-11-28 Confidential Prompting: Protecting User Prompts from Cloud LLM Providers In Gim et.al. 2409.19134 link
2024-09-26 Control Industrial Automation System with Large Language Models Yuchen Xia et.al. 2409.18009 link
2024-10-18 Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores Shaobo Ma et.al. 2409.17870 null
2024-09-25 Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Zhenmei Shi et.al. 2409.17422 link
2025-06-23 Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations Amey Agrawal et.al. 2409.17264 null
2024-09-25 Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Fan Zhou et.al. 2409.17115 link
2024-09-25 Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference Zongyue Qin et.al. 2409.16560 null
2024-10-21 AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization Yifan Tan et.al. 2409.16546 link
2024-11-07 Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines Lei Gao et.al. 2409.15520 link
2024-10-29 Eagle: Efficient Training-Free Router for Multi-LLM Inference Zesen Zhao et.al. 2409.15518 null
2024-10-03 Archon: An Architecture Search Framework for Inference-Time Techniques Jon Saad-Falcon et.al. 2409.15254 link
2024-09-23 CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts Zeyu Zhang et.al. 2409.15104 null
2024-09-25 UELLM: A Unified and Efficient Approach for LLM Inference Serving Yiyuan He et.al. 2409.14961 null
2024-11-01 RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph Lindsey Linxi Wei et.al. 2409.14556 null
2024-09-21 Practically implementing an LLM-supported collaborative vulnerability remediation process: a team-based approach Xiaoqing Wang et.al. 2409.14058 null
2024-10-21 Do Large Language Models Need a Content Delivery Network? Yihua Cheng et.al. 2409.13761 link
2024-09-19 PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) Mahmoud Nazzal et.al. 2409.12699 link
2024-09-12 LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs Han Xu et.al. 2409.11424 null
2024-09-04 ISO: Overlap of Computation and Communication within Seqenence For LLM Inference Bin Xiao et.al. 2409.11155 null
2024-12-31 RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Di Liu et.al. 2409.10516 link
2024-09-12 Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat Sidong Feng et.al. 2409.07829 null
2024-09-13 LLM-Enhanced Software Patch Localization Jinhong Yu et.al. 2409.06816 null
2024-09-24 OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models Jahyun Koo et.al. 2409.05902 null
2024-09-08 InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference Xiurui Pan et.al. 2409.04992 null
2024-09-07 Achieving Peak Performance for Large Language Models: A Systematic Review Zhyar Rzgar K Rostam et.al. 2409.04833 null
2024-09-06 Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Guanyu Lin et.al. 2409.04593 null
2024-09-06 A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage Huan Yang et.al. 2409.04040 null
2024-11-05 Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study Jianwei Zhu et.al. 2409.03992 null
2024-09-05 Sirius: Contextual Sparsity with Correction for Efficient LLMs Yang Zhou et.al. 2409.03856 link
2024-08-31 HSF: Defending against Jailbreak Attacks with Hidden State Filtering Cheng Qian et.al. 2409.03788 null
2024-12-11 Efficient Large Foundation Model Inference: A Perspective From Model and System Co-Design Dong Liu et.al. 2409.01990 null
2024-09-03 Efficient LLM Context Distillation Rajesh Upadhayayaya et.al. 2409.01930 null
2024-09-03 Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information Xinyu Zhang et.al. 2409.01605 null
2024-09-02 CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification Junhui He et.al. 2409.01366 null
2024-12-18 Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference Barys Liskavets et.al. 2409.01227 null
2024-09-01 Research on LLM Acceleration Using the High-Performance RISC-V Processor “Xiangshan” (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product) Xu-Hao Chen et.al. 2409.00661 null
2024-11-10 Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling Guangya Wan et.al. 2408.17017 null
2024-08-28 Decentralized LLM Inference over Edge Networks with Energy Harvesting Aria Khoshsirat et.al. 2408.15907 null
2024-08-28 Efficient LLM Scheduling by Learning to Rank Yichao Fu et.al. 2408.15792 link
2024-08-28 Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation Lujun Gui et.al. 2408.15562 null
2024-08-23 Memory-Efficient LLM Training with Online Subspace Descent Kaizhao Liang et.al. 2408.12857 link
2024-08-22 NanoFlow: Towards Optimal Large Language Model Serving Throughput Kan Zhu et.al. 2408.12757 link
2024-10-23 TensorOpera Router: A Multi-Model Router for Efficient LLM Inference Dimitris Stripelis et.al. 2408.12320 null
2024-09-04 Parallel Speculative Decoding with Adaptive Draft Length Tianyu Liu et.al. 2408.11850 link
2024-08-21 MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models Elias Frantar et.al. 2408.11743 link
2024-08-23 Xinyu: An Efficient LLM-based System for Commentary Generation Yiquan Wu et.al. 2408.11609 null
2024-08-21 Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning Kai Xiong et.al. 2408.11431 null
2024-08-21 Image Score: Learning and Evaluating Human Preferences for Mercari Search Chingis Oinar et.al. 2408.11349 null
2024-08-20 Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models Artem Vazhentsev et.al. 2408.10692 null
2024-08-20 How Well Do Large Language Models Serve as End-to-End Secure Code Producers? Jianian Gong et.al. 2408.10495 null
2024-09-29 GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making Arsham Gholamzadeh Khoee et.al. 2408.09785 null
2024-08-19 PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars Sumanth Prabhu et.al. 2408.08869 null
2024-08-23 ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models Chao Zeng et.al. 2408.08554 link
2024-08-14 LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference Seungjae Moon et.al. 2408.07326 null
2024-08-12 LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration Zhiwen Mo et.al. 2408.06003 null
2024-08-16 Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion Jacob K Christopher et.al. 2408.05636 null
2024-08-10 LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale Jaehong Cho et.al. 2408.05499 link
2024-08-05 SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving Andreas Kosmas Kakolyris et.al. 2408.05235 null
2024-09-14 Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness Xiaojing Fan et.al. 2408.04585 null
2024-08-08 Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning Ke Cheng et.al. 2408.04323 null
2024-08-07 Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference Zeyu Zhang et.al. 2408.04107 null
2024-08-07 MPC-Minimized Secure LLM Inference Deevashwer Rathee et.al. 2408.03561 null
2024-08-06 Can LLMs Serve As Time Series Anomaly Detectors? Manqing Dong et.al. 2408.03475 null
2024-08-05 Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning Hao Zhou et.al. 2408.02549 null
2024-08-02 The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines Matias Martinez et.al. 2408.01050 null
2024-08-01 DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency Jovan Stojkovic et.al. 2408.00741 null
2024-08-01 Designing Efficient LLM Accelerators for Edge Devices Jude Haris et.al. 2408.00462 null
2024-08-01 Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control Hao Zhou et.al. 2408.00214 null
2024-09-10 ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency Yuhang Yao et.al. 2408.00008 null
2024-08-01 Responsive ML inference in multi-tenanted environments using AQUA Abhishek Vijaya Kumar et.al. 2407.21255 null
2024-11-04 Palu: Compressing KV-Cache with Low-Rank Projection Chi-Chih Chang et.al. 2407.21118 link
2024-07-30 Accelerating Large Language Model Inference with Self-Supervised Early Exits Florian Valade et.al. 2407.21082 null
2024-10-03 ThinK: Thinner Key Cache by Query-Driven Pruning Yuhui Xu et.al. 2407.21018 null
2024-07-25 An Efficient Inference Framework for Early-exit Large Language Models Ruijie Miao et.al. 2407.20272 null
2024-07-29 Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost Sania Nayab et.al. 2407.19825 null
2024-07-29 Teaching LLMs at Charles University: Assignments and Activities Jindřich Helcl et.al. 2407.19798 null
2024-07-09 Mobile Edge Intelligence for Large Language Models: A Contemporary Survey Guanqiao Qu et.al. 2407.18921 null
2024-07-04 The Price of Prompting: Profiling Energy Use in Large Language Models Inference Erik Johannes Husom et.al. 2407.16893 link
2024-07-23 PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets Jaeyoung Kim et.al. 2407.16329 null
2024-07-22 RazorAttention: Efficient KV Cache Compression Through Retrieval Heads Hanlin Tang et.al. 2407.15891 null
2024-07-22 vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving Jiale Xu et.al. 2407.15309 link
2024-07-20 All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks Ajay Jaiswal et.al. 2407.14996 null
2024-07-19 LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Qichen Fu et.al. 2407.14057 null
2024-07-13 Beyond KV Caching: Shared Attention for Efficient LLMs Bingli Liao et.al. 2407.12866 link
2025-04-01 PQCache: Product Quantization-based KVCache for Long Context LLM Inference Hailin Zhang et.al. 2407.12820 null
2024-07-17 Struct-X: Enhancing Large Language Models Reasoning with Structured Data Xiaoyu Tan et.al. 2407.12522 null
2024-07-17 LLM Inference Serving: Survey of Recent Advances and Opportunities Baolin Li et.al. 2407.12391 null
2024-10-11 Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale Ayush Kaushal et.al. 2407.12327 link
2024-11-16 PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation Branden Butler et.al. 2407.11798 null
2024-08-16 Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference Yuan Feng et.al. 2407.11550 link
2024-07-15 Static Detection of Filesystem Vulnerabilities in Android Systems Yu-Tsung Lee et.al. 2407.11279 null
2024-10-03 Fast Matrix Multiplications for Lookup Table-Quantized LLMs Han Guo et.al. 2407.10960 link
2024-10-02 Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference Zongyue Qin et.al. 2407.09722 null
2024-08-30 Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems Amey Agrawal et.al. 2407.07000 link
2024-07-08 Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU Daliang Xu et.al. 2407.05858 link
2024-07-07 A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length Yuqing Yang et.al. 2407.05347 null
2024-07-06 Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning Yun-Da Tsai et.al. 2407.05040 null
2024-11-16 Software-Hardware Co-Design For Embodied AI Robots Yiyang Huang et.al. 2407.04292 link
2024-07-04 Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems Grant Wilkins et.al. 2407.04014 null
2024-10-30 MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang et.al. 2407.02490 link
2024-06-29 When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration Philipp Allgeuer et.al. 2407.00518 link
2024-06-29 Teola: Towards End-to-End Optimization of LLM-based Applications Xin Tan et.al. 2407.00326 null
2024-06-25 T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Jianyu Wei et.al. 2407.00088 link
2024-07-09 Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving Ruoyu Qin et.al. 2407.00079 link
2024-06-28 InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management Wonbeom Lee et.al. 2406.19707 null
2024-08-28 AI-native Memory: A Pathway from LLMs Towards AGI Jingbo Shang et.al. 2406.18312 null
2024-06-25 FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model Feijie Wu et.al. 2406.17706 link
2024-06-26 MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool Cunchen Hu et.al. 2406.17565 null
2024-11-11 Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters Euiin Yi et.al. 2406.16758 link
2025-05-16 Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models Abhimanyu Bambhaniya et.al. 2406.01698 null
2025-05-02 QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Yujun Lin et.al. 2405.04532 link
2024-11-26 Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction Haoran Qiu et.al. 2404.08509 null
2024-05-31 InferCept: Efficient Intercept Support for Augmented Large Language Model Inference Reyna Abhyankar et.al. 2402.01869 null
2023-12-08 Efficient LLM Inference on CPUs Haihao Shen et.al. 2311.00502 null
2024-04-02 SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification Xupeng Miao et.al. 2305.09781 null

LLM Scheduling

Publish Date Title Authors PDF Code
2026-03-13 SageSched: Efficient LLM Scheduling Confronting Demand Uncertainty and Hybridity Zhenghao Gan et.al. 2603.07917 null
2025-12-04 Counting Without Running: Evaluating LLMs’ Reasoning About Code Complexity Gregory Bolet et.al. 2512.04355 null
2025-11-28 LegalWebAgent: Empowering Access to Justice via LLM-Based Web Agents Jinzhe Tan et.al. 2512.04105 null
2025-12-03 AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving Ying Wang et.al. 2512.04013 null
2025-12-02 PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Junyi Hou et.al. 2512.02589 null
2025-12-01 Trinity: Disaggregating Vector Search from Prefill-Decode Disaggregation in LLM Serving Yi Liu et.al. 2512.02281 null
2025-12-01 RoMe: Row Granularity Access Memory System for Large Language Models Hwayong Nam et.al. 2512.01541 null
2025-12-01 Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity Wenbin Zhu et.al. 2512.01357 null
2025-12-01 Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding Yilong Zhao et.al. 2512.01278 null
2025-11-30 Neural Variable Name Repair: Learning to Rename Identifiers for Readability Muhammad Yousuf et.al. 2512.01141 null
2025-11-28 OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning Zixun Huang et.al. 2511.23310 null
2025-11-28 Beyond Curve Fitting: Neuro-Symbolic Agents for Context-Aware Epidemic Forecasting Joongwon Chae et.al. 2511.23276 null
2025-11-27 OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency Jun Wang et.al. 2511.22481 null
2025-11-27 FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators Shuao Jia et.al. 2511.22348 null
2025-11-27 Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation Zehao Deng et.al. 2511.22235 null
2025-11-27 Optimizing NetGPT via Routing-Based Synergy and Reinforcement Learning Yuxuan Chen et.al. 2511.22217 null
2025-11-26 OOCO: Latency-disaggregated Architecture for Online-Offline Co-locate LLM Serving Siyu Wu et.al. 2511.21862 null
2025-12-01 DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving Fengze Yu et.al. 2511.21669 null
2025-11-28 DOPO: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving Junhan Liao et.al. 2511.20982 null
2025-11-26 Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows Yinwei Dai et.al. 2511.20975 null
2025-11-25 Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios Luohe Shi et.al. 2511.20340 null
2025-11-25 Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design Zixiao Huang et.al. 2511.20048 null
2025-11-25 HiCoGen: Hierarchical Compositional Text-to-Image Generation in Diffusion Models via Reinforcement Learning Hongji Yang et.al. 2511.19965 null
2025-11-24 Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution Dingkang Liang et.al. 2511.19430 null
2025-11-24 How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining Kairong Luo et.al. 2511.18903 null
2025-11-24 Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference Wengyi Zhan et.al. 2511.18875 null
2025-11-23 Optimal Meal Schedule for a Local Nonprofit Using LLM-Aided Data Extraction Sergio Marin et.al. 2511.18483 null
2025-11-28 Progressive Localisation in Localist LLMs Joachim Diederich et.al. 2511.18375 null
2025-11-23 Hybrid Agentic AI and Multi-Agent Systems in Smart Manufacturing Mojtaba A. Farahani et.al. 2511.18258 null
2025-11-22 Towards a General Framework for HTN Modeling with LLMs Israel Puerta-Merino et.al. 2511.18165 null
2025-11-20 LLM4EO: Large Language Model for Evolutionary Optimization in Flexible Job Shop Scheduling Rongjie Liao et.al. 2511.16485 null
2025-11-20 Operon: Incremental Construction of Ragged Data via Named Dimensions Sungbin Moon et.al. 2511.16080 null
2025-11-19 MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping Yushi Huang et.al. 2511.15690 null
2025-11-18 Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models Rui Zhu et.al. 2511.14694 null
2025-11-23 Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning Ruoyu Qin et.al. 2511.14617 null
2025-11-18 Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks Mulei Ma et.al. 2511.14450 null
2025-11-17 The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training Subramanyam Sahoo et.al. 2511.13016 null
2025-11-17 ENGRAM: Effective, Lightweight Memory Orchestration for Conversational Agents Daivik Patel et.al. 2511.12960 null
2025-11-17 CoS: Towards Optimal Event Scheduling via Chain-of-Scheduling Yiming Zhao et.al. 2511.12913 null
2025-11-19 Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms Ao Xu et.al. 2511.11729 null
2025-11-05 AnchorTP: Resilient LLM Inference with State-Preserving Elastic Tensor Parallelism Wendong Xu et.al. 2511.11617 null
2025-11-13 EEGAgent: A Unified Framework for Automated EEG Analysis Using Large Language Models Sha Zhao et.al. 2511.09947 null
2025-11-12 AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting Renda Li et.al. 2511.09478 null
2025-11-12 POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Text Translation Xuanchen Li et.al. 2511.09232 null
2025-11-12 FLAD: Federated Learning for LLM-based Autonomous Driving in Vehicle-Edge-Cloud Networks Tianao Xiang et.al. 2511.09025 null
2025-11-07 Motif 2 12.7B technical report Junghwan Lim et.al. 2511.07464 null
2025-11-10 LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure Jaehong Cho et.al. 2511.07229 null
2025-11-10 Can LLM Annotations Replace User Clicks for Learning to Rank? Lulu Yu et.al. 2511.06635 null
2025-11-09 AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving Ruifei Zhang et.al. 2511.06253 null
2025-11-08 CoEdge-RAG: Optimizing Hierarchical Scheduling for Retrieval-Augmented LLMs in Collaborative Edge Computing Guihang Hong et.al. 2511.05915 null
2025-11-09 Optimal Inference Schedules for Masked Diffusion Models Sitan Chen et.al. 2511.04647 null
2025-11-06 PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration Yue Jiet Chong et.al. 2511.04036 null
2025-11-05 ALAS: Transactional and Dynamic Multi-Agent LLM Planning Longling Geng et.al. 2511.03094 null
2025-11-04 LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context Yudong Li et.al. 2511.02366 null
2025-11-04 An LLM-powered MILP modelling engine for workforce scheduling guided by expert knowledge Qingyang Li et.al. 2511.02364 null
2025-11-04 Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live Hanchen Li et.al. 2511.02230 null
2025-11-04 Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration Jingbo Wang et.al. 2511.02200 null
2025-11-03 TPS-Bench: Evaluating AI Agents’ Tool Planning \& Scheduling Abilities in Compounding Tasks Hanwen Xu et.al. 2511.01527 null
2025-11-03 Modular Task Decomposition and Dynamic Collaboration in Multi-Agent Systems Driven by Large Language Models Shuaidong Pan et.al. 2511.01149 null
2025-11-05 FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs Xuan He et.al. 2511.00807 null
2025-11-02 AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs Ran Yan et.al. 2511.00796 null
2025-10-19 Justitia: Fair and Efficient Scheduling for LLM Applications Mingyan Yang et.al. 2510.17015 null
2025-10-08 OptPipe: Memory- and Scheduling-Optimized Pipeline Parallelism for LLM Training Hongpei Li et.al. 2510.05186 null
2025-08-14 Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling Wei Da et.al. 2508.03611 null
2025-08-05 Optimal Scheduling Algorithms for LLM Inference: Theory and Practice Agrim Bari et.al. 2508.01002 null
2025-09-16 InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching Yilun Wang et.al. 2507.08523 null
2025-07-09 Gradientsys: A Multi-Agent LLM Scheduler with ReAct Orchestration Xinyuan Song et.al. 2507.06520 null
2025-06-17 Semantic Scheduling for LLM Inference Wenyue Hua et.al. 2506.12204 null
2025-05-29 Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters Hayden Moore et.al. 2505.23554 null
2025-05-26 Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency Ruixiao Li et.al. 2505.17074 null
2025-05-14 ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor Seungbeom Choi et.al. 2505.09142 null
2025-04-25 Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents Yueying Li et.al. 2504.07347 null
2025-04-08 LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications Botao Zhu et.al. 2504.03444 null
2025-07-25 How do language models learn facts? Dynamics, curricula and hallucinations Nicolas Zucchet et.al. 2503.21676 null
2025-05-21 Online Scheduling for LLM Inference with KV Cache Constraints Patrick Jaillet et.al. 2502.07115 null
2025-11-06 LLM Query Scheduling with Prefix Reuse and Latency Constraints Gregory Dexter et.al. 2502.04677 null
2024-11-01 ALISE: Accelerating Large Language Model Serving with Speculative Scheduling Youpeng Zhao et.al. 2410.23537 null
2025-06-08 PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference Zeyu Zhang et.al. 2409.15104 null
2024-08-28 Efficient LLM Scheduling by Learning to Rank Yichao Fu et.al. 2408.15792 null
2024-11-15 Large Language Models for Power Scheduling: A User-Centric Approach Thomas Mongaillard et.al. 2407.00476 null
2024-06-07 Llumnix: Dynamic Scheduling for Large Language Model Serving Biao Sun et.al. 2406.03243 null
2024-05-24 PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services Zheming Yang et.al. 2405.14636 null
2024-05-14 Automated Conversion of Static to Dynamic Scheduler via Natural Language Paul Mingzheng Tang et.al. 2405.06697 null
2024-08-06 On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS) Vishal Pallagani et.al. 2401.02500 null
2023-05-30 Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline Zangwei Zheng et.al. 2305.13144 null

MoE

Publish Date Title Authors PDF Code
2026-04-02 The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level Jeremy Herbst et.al. 2604.02178 null
2026-04-02 FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators Chi Zhang et.al. 2604.02110 null
2026-04-02 SURE: Synergistic Uncertainty-aware Reasoning for Multimodal Emotion Recognition in Conversations Yiqiang Cai et.al. 2604.01916 null
2026-04-02 FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models Juyong Jiang et.al. 2604.01762 null
2026-04-02 M3D-BFS: a Multi-stage Dynamic Fusion Strategy for Sample-Adaptive Multi-Modal Brain Network Analysis Rui Dong et.al. 2604.01667 null
2026-04-02 Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models Shuibai Zhang et.al. 2604.01622 null
2026-04-02 DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72 Wanqian Li et.al. 2604.01621 null
2026-04-01 Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation Jiuzhou Lei et.al. 2604.01414 null
2026-04-01 Sparse Spectral LoRA: Routed Experts for Medical VLMs Omid Nejati Manzari et.al. 2604.01310 null
2026-04-01 Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning Mohammad R. Abu Ayyash et.al. 2604.01152 null
2026-04-02 Asymptotically Optimal Sequential Testing with Heterogeneous LLMs Guokai Li et.al. 2604.01086 null
2026-04-01 PHASOR: Anatomy- and Phase-Consistent Volumetric Diffusion for CT Virtual Contrast Enhancement Zilong Li et.al. 2604.01053 null
2026-04-01 KUET at StanceNakba Shared Task: StanceMoE: Mixture-of-Experts Architecture for Stance Detection Abdullah Al Shafi et.al. 2604.00878 null
2026-04-01 Cost-Penalized Fitness in FMA-Orchestrated Mixture of Experts: Experimental Evidence for Molecular Memory in Domain Adaptation Martin Jaraiz et.al. 2604.00812 null
2026-04-01 Routing-Free Mixture-of-Experts Yilun Liu et.al. 2604.00801 null
2026-04-01 Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer Dharma Teja Vooturi et.al. 2604.00785 null
2026-04-01 Toward Optimal Sampling Rate Selection and Unbiased Classification for Precise Animal Activity Recognition Axiu Mao et.al. 2604.00517 null
2026-04-01 Self-Routing: Parameter-Free Expert Routing from Hidden States Jama Hussein Mohamud et.al. 2604.00421 null
2026-03-31 From Skew to Symmetry: Node-Interconnect Multi-Path Balancing with Execution-time Planning for Modern GPU Clusters Jinghan Yao et.al. 2604.00317 null
2026-03-31 Directly visualizing the energy level structure of quantum dot molecules Heun Mo Yoo et.al. 2604.00232 null
2026-03-31 Towards Verifiable and Self-Correcting AI Physicists for Quantum Many-Body Simulations Ken Deng et.al. 2604.00149 null
2026-03-31 PASM: Population Adaptive Symbolic Mixture-of-Experts Model for Cross-location Hurricane Evacuation Decision Prediction Xiao Qian et.al. 2604.00074 null
2026-03-31 Short proofs in combinatorics and number theory Boris Alexeev et.al. 2603.29961 null
2026-03-31 First energy scan measurement of $e^{+}e^{-}\to K^{+}K^{-}$ around the $ψ(2S)$ resonance BESIII Collaboration et.al. 2603.29854 null
2026-03-31 Counterfactual Analysis of Brain Network Dynamics Moo K. Chung et.al. 2603.29843 null
2026-03-31 Training-Free Dynamic Upcycling of Expert Language Models Eros Fanì et.al. 2603.29765 null
2026-03-31 TrafficMoE: Heterogeneity-aware Mixture of Experts for Encrypted Traffic Classification Qing He et.al. 2603.29520 null
2026-03-31 Aligning Multimodal Sequential Recommendations via Robust Direct Preference Optimization with Sparse MoE Hejin Huang et.al. 2603.29259 null
2026-03-31 Route-Induced Density and Stability (RIDE): Controlled Intervention and Mechanism Analysis of Routing-Style Meta Prompts on LLM Internal States Dianxing Zhang et.al. 2603.29206 null
2026-03-31 BiMoE: Brain-Inspired Experts for EEG-Dominant Affective State Recognition Hongyu Zhu et.al. 2603.29205 null
2026-03-30 Rethinking Language Model Scaling under Transferable Hypersphere Optimization Liliang Ren et.al. 2603.28743 null
2026-03-30 StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation Yiran Shi et.al. 2603.28565 null
2026-03-30 Observation of $Λ^+_c\to nπ^+η$ and search for $Λ^+_c\to na_0(980)^+$ BESIII Collaboration et.al. 2603.28232 null
2026-03-30 Graph Vector Field: A Unified Framework for Multimodal Health Risk Assessment from Heterogeneous Wearable and Environmental Data Streams Silvano Coletti et.al. 2603.28115 null
2026-03-30 ExFusion: Efficient Transformer Training via Multi-Experts Fusion Jiacheng Ruan et.al. 2603.27965 null
2026-03-31 MathGen: Revealing the Illusion of Mathematical Competence through Text-to-Image Generation Ruiyao Liu et.al. 2603.27959 null
2026-03-29 KAT-Coder-V2 Technical Report Fengxiang Li et.al. 2603.27703 null
2026-03-29 LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation Shentong Mo et.al. 2603.27693 null
2026-03-29 PRBench: End-to-end Paper Reproduction in Physics Research Shi Qiu et.al. 2603.27646 null
2026-03-29 Expert Streaming: Accelerating Low-Batch MoE Inference via Multi-chiplet Architecture and Dynamic Expert Trajectory Scheduling Songchen Ma et.al. 2603.27624 null
2026-03-29 Fully Spiking Neural Networks with Target Awareness for Energy-Efficient UAV Tracking Pengzhi Zhong et.al. 2603.27493 null
2026-03-29 On Token’s Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models Chongyang Zhao et.al. 2603.27481 null
2026-03-28 Unveiling Code Clones in the Eclipse IIoT Software Ecosystem Zengyang Li et.al. 2603.27308 null
2026-03-28 Persistent Memory Through Triple-Loop Consolidation in a Non-Gradient Dissipative Cognitive Architecture Jianwei Lou et.al. 2603.27188 null
2026-03-28 Routing Sensitivity Without Controllability: A Diagnostic Study of Fairness in MoE Language Models Junhyeok Lee et.al. 2603.27141 null
2026-03-27 TAPS: Task Aware Proposal Distributions for Speculative Sampling Mohamad Zbib et.al. 2603.27027 null
2026-03-27 Learning to Commit: Generating Organic Pull Requests via Online Repository Memory Mo Li et.al. 2603.26664 null
2026-03-27 Sustainability Is Not Linear: Quantifying Performance, Energy, and Privacy Trade-offs in On-Device Intelligence Eziyo Ehsani et.al. 2603.26603 null
2026-03-26 Can Small Models Reason About Legal Documents? A Comparative Study Snehit Vaddi et.al. 2603.25944 null
2026-03-26 Narrowband searches for continuous gravitational waves from known pulsars in the first two parts of the fourth LIGO–Virgo–KAGRA observing run The LIGO Scientific Collaboration et.al. 2603.25938 null
2026-03-26 AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer’s Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study Wenlong Hou et.al. 2603.25322 null
2026-03-26 SliderQuant: Accurate Post-Training Quantization for LLMs Shigeng Wang et.al. 2603.25284 null
2026-03-26 A Wireless World Model for AI-Native 6G Networks Ziqi Chen et.al. 2603.25216 null
2026-03-26 MCLMR: A Model-Agnostic Causal Learning Framework for Multi-Behavior Recommendation Ranxu Zhang et.al. 2603.25126 null
2026-03-26 MP-MoE: Matrix Profile-Guided Mixture of Experts for Precipitation Forecasting Huyen Ngoc Tran et.al. 2603.25046 null
2026-03-26 MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models Dohwan Ko et.al. 2603.24984 null
2026-03-26 CROSS: A Mixture-of-Experts Reinforcement Learning Framework for Generalizable Large-Scale Traffic Signal Control Xibei Chen et.al. 2603.24930 null
2026-03-25 OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding Xiaoyu Tang et.al. 2603.24876 null
2026-03-25 Enes Causal Discovery Alexis Kafantaris et.al. 2603.24436 null
2026-03-25 Cross Section Measurements of $\bar{n}p \rightarrow K^{+}K^{-}π^{+}(π^{0})$ via Antineutrons Produced by $J/ψ\to p π^{-} \bar{n}$ Decays BESIII Collaboration et.al. 2603.24272 null
2026-03-25 B-MoE: A Body-Part-Aware Mixture-of-Experts “All Parts Matter” Approach to Micro-Action Recognition Nishit Poddar et.al. 2603.24245 null
2026-03-25 Sequence-aware Large Language Models for Explainable Recommendation Gangyi Zhang et.al. 2603.24136 null
2026-03-25 PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning Huanyu Li et.al. 2603.24047 null
2026-03-25 LGEST: Dynamic Spatial-Spectral Expert Routing for Hyperspectral Image Classification Jiawen Wen et.al. 2603.24045 null
2026-03-25 MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning Andrea Manzoni et.al. 2603.24044 null
2026-03-25 SiftMoE: Similarity-Aware Energy-Efficient Expert Selection for Wireless Distributed MoE Inference Qian Chen et.al. 2603.23888 null
2026-03-24 Lightweight Fairness for LLM-Based Recommendations via Kernelized Projection and Gated Adapters Nan Cui et.al. 2603.23780 null
2026-03-24 The Diminishing Returns of Early-Exit Decoding in Modern LLMs Rui Wei et.al. 2603.23701 null
2026-03-24 VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs Haoran Yuan et.al. 2603.23481 link
2026-03-24 Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning Connor Mclaughlin et.al. 2603.23436 null
2026-03-24 Amplitude Analysis of the Isospin-Violating Decay $J/ψ\rightarrowγηπ^{0}$ BESIII Collaboration et.al. 2603.23081 null
2026-03-24 IntentWeave: A Progressive Entry Ladder for Multi-Surface Browser Agents in Cloud Portals Wanying Mo et.al. 2603.22917 null
2026-03-24 Search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ BESIII Collaboration et.al. 2603.22804 null
2026-03-24 KALAVAI: Predicting When Independent Specialist Fusion Works – A Quantitative Model for Post-Hoc Cooperative LLM Training Ramchand Kumaresan et.al. 2603.22755 null
2026-03-24 Why Database Manuals Are Not Enough: Efficient and Reliable Configuration Tuning for DBMSs via Code-Driven LLM Agents Xinyi Zhang et.al. 2603.22708 null
2026-03-23 Bridging the Know-Act Gap via Task-Level Autoregressive Reasoning Jihyun Janice Ahn et.al. 2603.22619 null
2026-03-23 FullCircle: Effortless 3D Reconstruction from Casual 360 $^\circ$ Captures Yalda Foroutan et.al. 2603.22572 null
2026-03-23 3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing Haoyu Zhen et.al. 2603.22279 null
2026-03-23 A bending in the size-mass relation of star-forming galaxies across $0.5 < z < 6.0$ at a critical stellar mass of $10^{10}M_\odot$ revealed by JWST Longyue Chen et.al. 2603.22239 null
2026-03-23 Mixture of Mini Experts: Overcoming the Linear Layer Bottleneck in Multiple Instance Learning Daniel Shao et.al. 2603.22198 null
2026-03-23 ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval Zhuocheng Zhang et.al. 2603.21886 null
2026-03-23 Holistic Scaling Laws for Optimal Mixture-of-Experts Architecture Optimization Weilin Wan et.al. 2603.21862 null
2026-03-23 DiT-Flow: Speech Enhancement Robust to Multiple Distortions based on Flow Matching in Latent Space and Diffusion Transformers Tianyu Cao et.al. 2603.21608 null
2026-03-22 Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity Zihan Fang et.al. 2603.21276 null
2026-03-22 QMoP: Query Guided Mixture-of-Projector for Efficient Visual Token Compression Zhongyang Li et.al. 2603.21232 null
2026-03-22 MI-DPG: Decomposable Parameter Generation Network Based on Mutual Information for Multi-Scenario Recommendation Wenzhuo Cheng et.al. 2603.21209 null
2026-03-22 Diffusion-based Probabilistic Air Quality Forecasting with Mechanistic Insight Ao Ding et.al. 2603.21131 null
2026-03-22 Mixture of Chapters: Scaling Learnt Memory in Transformers Tasmay Pankaj Tibrewal et.al. 2603.21096 null
2026-03-22 CoVFT: Context-aware Visual Fine-tuning for Multimodal Large Language Models Nan Zhou et.al. 2603.21077 null
2026-03-22 LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Jianing Wang et.al. 2603.21065 null
2026-03-21 Satellite-to-Street: Synthesizing Post-Disaster Views from Satellite Imagery via Generative Vision Models Yifan Yang et.al. 2603.20697 null
2026-03-21 CFNN: Continued Fraction Neural Network Chao Wang et.al. 2603.20634 null
2026-03-21 A 4R-supported circular product-service system for luxury branded events Ke Ma et.al. 2603.20613 null
2026-03-20 AE-LLM: Adaptive Efficiency Optimization for Large Language Models Kaito Tanaka et.al. 2603.20492 null
2026-03-20 Thinking in Different Spaces: Domain-Specific Latent Geometry Survives Cross-Architecture Translation Marcus Armstrong et.al. 2603.20406 null
2026-03-20 Gesture2Speech: How Far Can Hand Movements Shape Expressive Speech? Lokesh Kumar et.al. 2603.19831 null
2026-03-20 Making Video Models Adhere to User Intent with Minor Adjustments Daniel Ajisafe et.al. 2603.19672 null
2026-03-20 Structured Prompting for Arabic Essay Proficiency: A Trait-Centric Evaluation Approach Salim Al Mandhari et.al. 2603.19668 null
2026-03-20 CS-MUNet: A Channel-Spatial Dual-Stream Mamba Network for Multi-Organ Segmentation Yuyang Zheng et.al. 2603.19659 null
2026-03-20 UniBioTransfer: A Unified Framework for Multiple Biometrics Transfer Caiyi Sun et.al. 2603.19637 null
2026-03-19 Scalable Prompt Routing via Fine-Grained Latent Task Discovery Yunyi Zhang et.al. 2603.19415 null
2026-03-22 Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Zhuolin Yang et.al. 2603.19220 null
2026-03-19 DyMoE: Dynamic Expert Orchestration with Mixed-Precision Quantization for Efficient MoE Inference on Edge Yuegui Huang et.al. 2603.19172 null
2026-03-19 ATG-MoE: Autoregressive trajectory generation with mixture-of-experts for assembly skill learning Weihang Huang et.al. 2603.19029 null
2026-03-19 GWTC-4.0: Tests of General Relativity. III. Tests of the Remnants The LIGO Scientific Collaboration et.al. 2603.19021 null
2026-03-19 GWTC-4.0: Tests of General Relativity. II. Parameterized Tests The LIGO Scientific Collaboration et.al. 2603.19020 null
2026-03-19 GWTC-4.0: Tests of General Relativity. I. Overview and General Tests The LIGO Scientific Collaboration et.al. 2603.19019 null
2026-03-19 DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning Yizhou Han et.al. 2603.18872 null
2026-03-19 Empathetic Motion Generation for Humanoid Educational Robots via Reasoning-Guided Vision–Language–Motion Diffusion Architecture Fuze Sun et.al. 2603.18771 null
2026-03-19 Observation of $D_s^+ \to a_0(980)^+f_0(500)$ in the Amplitude Analysis of $D_s^+ \to π^+ π^0 π^0 η$ BESIII Collaboration et.al. 2603.18521 null
2026-03-19 AIMER: Calibration-Free Task-Agnostic MoE Pruning Zongfang Liu et.al. 2603.18492 null
2026-03-19 AlignMamba-2: Enhancing Multimodal Fusion and Sentiment Analysis with Modality-Aware Mamba Yan Li et.al. 2603.18462 null
2026-03-19 Spatially Indirect Exciton Condensation in Two-Dimensional Strongly Correlated Semimetals Yao Zeng et.al. 2603.18445 null
2026-03-18 Path-Constrained Mixture-of-Experts Zijin Gu et.al. 2603.18297 null
2026-03-18 CORE: Robust Out-of-Distribution Detection via Confidence and Orthogonal Residual Scoring Jin Mo Yang et.al. 2603.18290 null
2026-03-18 Resonance-enhanced integrated acousto-optic beam steering Yue Yu et.al. 2603.18191 null
2026-03-18 Understanding Task Aggregation for Generalizable Ultrasound Foundation Models Fangyijie Wang et.al. 2603.18123 null
2026-03-18 DebugLM: Learning Traceable Training Data Provenance for LLMs Wenjie Jacky Mo et.al. 2603.17884 null
2026-03-18 The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency Huamin Chen et.al. 2603.17280 null
2026-03-17 Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency Lucas Bandarkar et.al. 2603.17102 null
2026-03-17 Edge-Efficient Two-Stream Multimodal Architecture for Non-Intrusive Bathroom Fall Detection Haitian Wang et.al. 2603.17069 null
2026-03-17 SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding D. Darankoum et.al. 2603.16739 null
2026-03-17 HMAR: Hierarchical Modality-Aware Expert and Dynamic Routing Medical Image Retrieval Architecture Aojie Yuan et.al. 2603.16679 null
2026-03-19 Mixture of Style Experts for Diverse Image Stylization Shihao Zhu et.al. 2603.16649 null
2026-03-17 Tarab: A Multi-Dialect Corpus of Arabic Lyrics and Poetry Mo El-Haj et.al. 2603.16601 null
2026-03-17 Visual Distraction Undermines Moral Reasoning in Vision-Language Models Xinyi Yang et.al. 2603.16445 null
2026-03-18 EngGPT2: Sovereign, Efficient and Open Intelligence G. Ciarfaglia et.al. 2603.16430 null
2026-03-17 PlotTwist: A Creative Plot Generation Framework with Small Language Models Abhinav Thorat et.al. 2603.16410 null
2026-03-17 DynamicGate MLP Conditional Computation via Learned Structural Dropout and Input Dependent Gating for Functional Plasticity Yong Il Choi et.al. 2603.16367 null
2026-03-17 Behavioral Steering in a 35B MoE Language Model via SAE-Decoded Probe Vectors: One Agency Axis, Not Five Traits Jia Qing Yap et.al. 2603.16335 null
2026-03-17 AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection Hongwei Lin et.al. 2603.16261 null
2026-03-17 Accelerating Approximate Analytical Join Queries over Unstructured Data with Statistical Guarantees Yuxuan Zhu et.al. 2603.16153 null
2026-03-16 Confidently Wrong: Why Ignoring Binaries Biases IMF Inference at Large Sample Sizes Anna L. Rosen et.al. 2603.15779 null
2026-03-16 Mastering the Minority: An Uncertainty-guided Multi-Expert Framework for Challenging-tailed Sequence Learning Ye Wang et.al. 2603.15708 null
2026-03-16 Bridging Local and Global Knowledge: Cascaded Mixture-of-Experts Learning for Near-Shortest Path Routing Yung-Fu Chen et.al. 2603.15541 null
2026-03-16 Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis Penny Chong et.al. 2603.15483 null
2026-03-16 A Closer Look into LLMs for Table Understanding Jia Wang et.al. 2603.15402 null
2026-03-16 MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers Kangjun Guo et.al. 2603.15265 null
2026-03-17 Tracking the Discriminative Axis: Dual Prototypes for Test-Time OOD Detection Under Covariate Shift Wooseok Lee et.al. 2603.15213 null
2026-03-16 ForceVLA2: Unleashing Hybrid Force-Position Control with Force Awareness for Contact-Rich Manipulation Yang Li et.al. 2603.15169 null
2026-03-16 M2IR: Proactive All-in-One Image Restoration via Mamba-style Modulation and Mixture-of-Experts Shiwei Wang et.al. 2603.14816 null
2026-03-16 Genetic Algorithms in Regression Mo Li et.al. 2603.14801 null
2026-03-16 Universe Routing: Why Self-Evolving Agents Need Epistemic Control Zhaohui Geoffrey Wang et.al. 2603.14799 null
2026-03-15 TopoCL: Topological Contrastive Learning for Medical Imaging Guangyu Meng et.al. 2603.14647 null
2026-03-15 A measurement of gas rotation in galaxy groups via the kinetic Sunyaev-Zeldovich effect Tianyi Yang et.al. 2603.14494 null
2026-03-15 Towards One-for-All Anomaly Detection for Tabular Data Shiyuan Li et.al. 2603.14407 null
2026-03-15 WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems Yuchen Wang et.al. 2603.14392 null
2026-03-15 M $^2$ RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling Mayank Mishra et.al. 2603.14360 null
2026-03-15 A Physically-Grounded Attack and Adaptive Defense Framework for Real-World Low-Light Image Enhancement Tongshun Zhang et.al. 2603.14304 null
2026-03-15 All-sky Searches for Continuous Gravitational Waves from Isolated Neutron Stars in the Data from the First Part of the Fourth LIGO-Virgo-KAGRA Observing Run The LIGO Scientific Collaboration et.al. 2603.14168 null
2026-03-14 PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting Xinyu Xiao et.al. 2603.13818 null
2026-03-14 Implicit Maximum Likelihood Estimation for Real-time Generative Model Predictive Control Grayson Lee et.al. 2603.13733 null
2026-03-14 Sparse-Dense Mixture of Experts Adapter for Multi-Modal Tracking Yabin Zhu et.al. 2603.13719 null
2026-03-13 NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL Amos Goldman et.al. 2603.13606 null
2026-03-13 MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models Md. Abdul Awal et.al. 2603.13213 null
2026-03-13 Reference-Free Image Quality Assessment for Virtual Try-On via Human Feedback Yuki Hirakawa et.al. 2603.13057 null
2026-03-13 Team RAS in 10th ABAW Competition: Multimodal Valence and Arousal Estimation Approach Elena Ryumina et.al. 2603.13056 null
2026-03-13 Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation Fei Wang et.al. 2603.12845 null
2026-03-13 Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking Zizhao Mo et.al. 2603.12831 null
2026-03-13 LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing Jiawei Hao et.al. 2603.12645 null
2026-03-13 CarPLAN: Context-Adaptive and Robust Planning with Dynamic Scene Awareness for Autonomous Driving Junyong Yun et.al. 2603.12607 null
2026-03-13 Spectral Dataset of Stripped-Envelope Supernovae from the Tsinghua Supernova Group Danfeng Xiang et.al. 2603.12604 null
2026-03-13 Expert Pyramid Tuning: Efficient Parameter Fine-Tuning for Expertise-Driven Task Allocation Jia-Chen Zhang et.al. 2603.12577 null
2026-03-13 Spatio-Semantic Expert Routing Architecture with Mixture-of-Experts for Referring Image Segmentation Alaa Dalaq et.al. 2603.12538 null
2026-03-12 TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition Prabhu Vellaisamy et.al. 2603.12465 null
2026-03-12 NeuroLoRA: Context-Aware Neuromodulation for Parameter-Efficient Multi-Task Adaptation Yuxin Yang et.al. 2603.12378 null
2026-03-12 A Two-Stage Dual-Modality Model for Facial Emotional Expression Recognition Jiajun Sun et.al. 2603.12221 null
2026-03-12 CrossEarth-SAR: A SAR-Centric and Billion-Scale Geospatial Foundation Model for Domain Generalizable Semantic Segmentation Ziqi Ye et.al. 2603.12008 null
2026-03-12 AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization Qiyang Li et.al. 2603.11873 null
2026-03-12 Expert Threshold Routing for Autoregressive Language Modeling with Dynamic Computation Allocation and Load Balancing Hanchi Sun et.al. 2603.11535 null
2026-03-11 Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers Mynampati Sri Ranganadha Avinash et.al. 2603.11114 null
2026-03-11 Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions Kangke Cheng et.al. 2603.10721 null
2026-03-11 UniStitch: Unifying Semantic and Geometric Features for Image Stitching Yuan Mei et.al. 2603.10568 null
2026-03-11 Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design Junzhuo Li et.al. 2603.10379 null
2026-03-12 The Orthogonal Vulnerabilities of Generative AI Watermarks: A Comparative Empirical Benchmark of Spatial and Latent Provenance Jesse Yu et.al. 2603.10323 null
2026-03-10 Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions Mingyang Song et.al. 2603.09938 null
2026-03-10 Quantifying the Necessity of Chain of Thought through Opaque Serial Depth Jonah Brown-Cohen et.al. 2603.09786 null
2026-03-10 MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants Zuhao Zhang et.al. 2603.09652 null
2026-03-10 MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning Xiang Yuan et.al. 2603.09478 null
2026-03-12 Multi-tasking through quantum annealing Jargalsaikhan Artag et.al. 2603.09468 null
2026-03-10 Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers Albus Yizhuo Li et.al. 2603.09453 null
2026-03-10 Exploring Modality-Aware Fusion and Decoupled Temporal Propagation for Multi-Modal Object Tracking Shilei Wang et.al. 2603.09287 null
2026-03-10 Acoustic and Semantic Modeling of Emotion in Spoken Language Soumya Dutta et.al. 2603.09212 null
2026-03-10 GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models Md Selim Sarowar et.al. 2603.09079 null
2026-03-09 The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference Vignesh Adhinarayanan et.al. 2603.08960 null
2026-03-09 ConFu: Contemplate the Future for Better Speculative Sampling Zongyue Qin et.al. 2603.08899 null
2026-03-09 Microwave response of electrically driven spins in a three-qubit quantum processor Tanner M. Janda et.al. 2603.08577 null
2026-03-09 LAR-MoE: Latent-Aligned Routing for Mixture of Experts in Robotic Imitation Learning Ariel Rodriguez et.al. 2603.08476 null
2026-03-09 Amplitude Analysis of Singly Cabibbo-Suppressed Decay $Λ^{+}_{c}\to p K^{+} K^{-}$ BESIII Collaboration et.al. 2603.08469 null
2026-03-09 IronEngine: Towards General AI Assistant Xi Mo et.al. 2603.08425 null
2026-03-09 Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows Shentong Mo et.al. 2603.08126 null
2026-03-09 An improved measurement of $η^\prime\rightarrow e^{+}e^{-}ω$ BESIII Collaboration et.al. 2603.08120 null
2026-03-09 SAMoE-VLA: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving Zihan You et.al. 2603.08113 null
2026-03-09 Deterministic Differentiable Structured Pruning for Large Language Models Weiyu Huang et.al. 2603.08065 null
2026-03-09 Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization Jingwei Li et.al. 2603.08022 null
2026-03-09 Scaling Machine Learning Interatomic Potentials with Mixtures of Experts Yuzhi Liu et.al. 2603.07977 null
2026-03-09 Structural Design and Performance Analysis of Laser Transmitting Telescope for Space Gravitational Wave Detection Long Yongtao et.al. 2603.07967 null
2026-03-09 SGG-R $^{\rm 3}$ : From Next-Token Prediction to End-to-End Unbiased Scene Graph Generation Jiaye Feng et.al. 2603.07961 null
2026-03-09 SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans Hansi Zeng et.al. 2603.07853 null
2026-03-08 Scalable Training of Mixture-of-Experts Models with Megatron Core Zijie Yan et.al. 2603.07685 null
2026-03-08 AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots Likui Zhang et.al. 2603.07648 null
2026-03-08 Mixed Effects Mixture of Experts: Modeling Double Heterogeneous Trajectories Xinkai Yue et.al. 2603.07479 null
2026-03-08 UnSCAR: Universal, Scalable, Controllable, and Adaptable Image Restoration Debabrata Mandal et.al. 2603.07406 null
2026-03-07 Scheduling Parallel Optical Circuit Switches for AI Training Kevin Liang et.al. 2603.07373 null
2026-03-07 Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures Shuqing Luo et.al. 2603.07006 null
2026-03-06 Swimba: Switch Mamba Model Scales State Space Models Zhixu Du et.al. 2603.06938 null
2026-03-06 PaQ-DETR: Learning Pattern and Quality-Aware Dynamic Queries for Object Detection Zhengjian Kang et.al. 2603.06917 null
2026-03-06 RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering Gaia A. Bertolino et.al. 2603.06542 null
2026-03-06 A Mixture-of-Experts Framework for Practical Hybrid-Quantum Models in Credit Card Fraud Detection Rodrigo Chaves et.al. 2603.06473 null
2026-03-06 MoEMambaMIL: Structure-Aware Selective State Space Modeling for Whole-Slide Image Analysis Dongqing Xie et.al. 2603.06378 null
2026-03-06 MoEless: Efficient MoE LLM Serving via Serverless Computing Hanfei Yu et.al. 2603.06350 null
2026-03-06 WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection Peng Chen et.al. 2603.06313 null
2026-03-06 GazeMoE: Perception of Gaze Target with Mixture-of-Experts Zhuangzhuang Dai et.al. 2603.06256 null
2026-03-06 EvoESAP: Non-Uniform Expert Pruning for Sparse MoE Zongfang Liu et.al. 2603.06003 null
2026-03-06 MoE Lens – An Expert Is All You Need Marmik Chaudhari et.al. 2603.05806 null
2026-03-06 Sparse Crosscoders for diffing MoEs and Dense models Marmik Chaudhari et.al. 2603.05805 null
2026-03-05 Change Point Detection for Cell Populations Measured via Flow Cytometry Yik Lun Kei et.al. 2603.05700 null
2026-03-05 FreeTxt-Vi: A Benchmarked Vietnamese-English Toolkit for Segmentation, Sentiment, and Summarisation Hung Nguyen Huy et.al. 2603.05690 null
2026-03-05 Multi-channel joint analysis of the exotic charmonium-like state $T_{c\bar{c}}(4020)$ BESIII Collaboration et.al. 2603.05564 null
2026-03-05 VietJobs: A Vietnamese Job Advertisement Dataset Hieu Pham Dinh et.al. 2603.05262 null
2026-03-05 NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension Rongzhi Li et.al. 2603.05046 null
2026-03-05 Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation Yilong Chen et.al. 2603.04971 null
2026-03-05 Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling Yong Liu et.al. 2603.04791 null
2026-03-05 TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings Yebo Wu et.al. 2603.04772 null
2026-03-04 ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model Yuhao Xu et.al. 2603.04589 null
2026-03-04 Augmenting representations with scientific papers Nicolò Oreste Pinciroli Vago et.al. 2603.04516 null
2026-03-04 RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation Yixin Chen et.al. 2603.04348 null
2026-03-04 CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation Jinfeng Xu et.al. 2603.04320 null
2026-03-04 Precise measurement of the form factors in $D^0\rightarrow K^(892)^-\ell^+ν_{\ell}$ and observation of $D^0\rightarrow K_2^(1430)^-\ell^+ν_{\ell}$ BESIII Collaboration et.al. 2603.04136 null
2026-03-04 UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization Qianfeng Yang et.al. 2603.03967 null
2026-03-04 Glass Segmentation with Fusion of Learned and General Visual Features Risto Ojala et.al. 2603.03718 null
2026-03-04 Plasmonic polaron in self-intercalated 1T-TiS2 Byoung Ki Choi et.al. 2603.03663 null
2026-03-03 Modeling Cross-vision Synergy for Unified Large Vision Model Shengqiong Wu et.al. 2603.03564 null
2026-03-03 Beyond Language Modeling: An Exploration of Multimodal Pretraining Shengbang Tong et.al. 2603.03276 null
2026-03-03 Search for a massless particle beyond the Standard Model in the $Ξ^0\toΛ+ \text{invisible}$ decay BESIII Collaboration et.al. 2603.03199 null
2026-03-04 MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection Jun Yeong Park et.al. 2603.03101 null
2026-03-03 CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots Shihao Ma et.al. 2603.03067 null
2026-03-03 EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education Baoliang Chen et.al. 2603.03066 null
2026-03-03 Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs Wuyue Zhang et.al. 2603.02731 null
2026-03-03 TenExp: Mixture-of-Experts-Based Tensor Decomposition Structure Search Framework Ting-Wei Zhou et.al. 2603.02720 null
2026-03-03 MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration Lingshun Kong et.al. 2603.02710 null
2026-03-03 Addressing Missing and Noisy Modalities in One Solution: Unified Modality-Quality Framework for Low-quality Multimodal Data Sijie Mai et.al. 2603.02695 null
2026-03-03 Robust Heterogeneous Analog-Digital Computing for Mixture-of-Experts Models with Theoretical Generalization Guarantees Mohammed Nowaz Rabbani Chowdhury et.al. 2603.02633 null
2026-03-02 Search for the charmonium weak decay $ψ(2S)\to D_s^-π^+ + c.c.$ and $ψ(2S)\to D_s^-ρ^+ + c.c.$ BESIII Collaboration et.al. 2603.01777 null
2026-03-02 DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks Gökdeniz Gülmez et.al. 2603.01697 null
2026-03-02 PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification Jian Yu et.al. 2603.01547 null
2026-03-02 Multimodal Mixture-of-Experts with Retrieval Augmentation for Protein Active Site Identification Jiayang Wu et.al. 2603.01511 null
2026-03-02 DOCFORGE-BENCH: A Comprehensive Benchmark for Document Forgery Detection and Analysis Zengqi Zhao et.al. 2603.01433 null
2026-03-03 UETrack: A Unified and Efficient Framework for Single Object Tracking Ben Kang et.al. 2603.01412 null
2026-03-02 Fed-GAME: Personalized Federated Learning with Graph Attention Mixture-of-Experts For Time-Series Forecasting Yi Li et.al. 2603.01363 null
2026-03-01 Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning Hamed Damirchi et.al. 2603.01326 null
2026-03-01 Fast Confidence-Aware Human Prediction via Hardware-accelerated Bayesian Inference for Safe Robot Navigation Michael Lu et.al. 2603.01122 null
2026-03-01 TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading Yudong Pan et.al. 2603.01058 null
2026-03-01 Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving Xubo Zhu et.al. 2603.01007 null
2026-02-28 MME: Mixture of Mesh Experts with Random Walk Transformer Gating Amir Belder et.al. 2603.00828 null
2026-02-28 First Amplitude Analysis of $D^0\rightarrow K^-π^0e^+ν_e$ and Observation of $D^0\rightarrow K^*_2(1430)^-e^+ν_e$ BESIII Collaboration et.al. 2603.00743 null
2026-02-28 K^2-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control Zhe Wu et.al. 2603.00676 null
2026-02-28 Precise Measurement and Control of Radon Progeny on Detector Surfaces C. B. Z. Luo et.al. 2603.00647 null
2026-02-28 CoMoL: Efficient Mixture of LoRA Experts via Dynamic Core Space Merging Jie Cao et.al. 2603.00573 null
2026-02-27 CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning Yuxuan Liu et.al. 2602.24142 null
2026-02-27 Precision Studies and Searches for CP Asymmetries in the Inclusive Decay $Λ_{c}^{+}\to ΛX$ BESIII Collaboration et.al. 2602.24089 null
2026-02-27 Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization Chenwei Jia et.al. 2602.24059 null
2026-02-27 Measurement of Born Cross Sections for $e^+e^-\toΣ^-\barΣ^+$ at $\sqrt{s}=3.51-4.95$ GeV and Observation of $ψ(3770)\toΣ^-\barΣ^+$ BESIII Collaboration et.al. 2602.23835 null
2026-02-27 ProductResearch: Training E-Commerce Deep Research Agents via Multi-Agent Synthetic Trajectory Distillation Jiangyuan Wang et.al. 2602.23716 null
2026-02-26 Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG Hanning Guo et.al. 2602.23410 null
2026-02-26 A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations Soumya Dutta et.al. 2602.23300 null
2026-02-26 Learning Physical Operators using Neural Operators Vignesh Gopakumar et.al. 2602.23113 null
2026-02-26 Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability Bum Jun Kim et.al. 2602.22988 null
2026-02-26 pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation Shentong Mo et.al. 2602.22938 null
2026-02-26 MEDNA-DFM: A Dual-View FiLM-MoE Model for Explainable DNA Methylation Prediction Yi He et.al. 2602.22850 null
2026-02-26 DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation Hao Zheng et.al. 2602.22839 null
2026-02-26 Productivity and Collaboration in Hybrid Agile Teams: An Interview Study Elisabeth Mo et.al. 2602.22835 null
2026-02-26 Measurements of branching fractions of $Λ_{c}^{+}\toΣ^{0}K_{S}^{0}π^{+}$ and $Λ_{c}^{+}\toΣ^{0}K_{S}^{0}K^{+}$ BESIII Collaboration et.al. 2602.22754 null
2026-02-26 IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation Yanpei Guo et.al. 2602.22700 null
2026-02-26 Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting Fabian Muşat et.al. 2602.22685 null
2026-02-26 Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement Shuchen Zhu et.al. 2602.22681 null
2026-02-26 Predictive variational inference for flexible regression models Lucas Kock et.al. 2602.22582 null
2026-02-26 Towards Dynamic Dense Retrieval with Routing Strategy Zhan Su et.al. 2602.22547 null
2026-02-25 NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training Dengdi Sun et.al. 2602.22059 null
2026-02-25 Excitation: Momentum For Experts Sagi Shaier et.al. 2602.21798 null
2026-02-25 Learning from Yesterday’s Error: An Efficient Online Learning Method for Traffic Demand Prediction Xiannan Huang et.al. 2602.21757 null
2026-02-25 TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts Jiafeng Lin et.al. 2602.21693 null
2026-02-25 Multi-Layer Scheduling for MoE-Based LLM Reasoning Yifan Sun et.al. 2602.21626 null
2026-02-24 A Path to an All-Sky Survey with Roman Jiwon Jesse Han et.al. 2602.21280 null
2026-02-24 On infinite sets with no $3$ on a line Moe Putterman et.al. 2602.21275 null
2026-02-24 ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments Haley Li et.al. 2602.21140 null
2026-02-24 MUSE: Harnessing Precise and Diverse Semantics for Few-Shot Whole Slide Image Classification Jiahao Xu et.al. 2602.20873 null
2026-02-25 GeCo-SRT: Geometry-aware Continual Adaptation for Robotic Cross-Task Sim-to-Real Transfer Wenbo Yu et.al. 2602.20871 null
2026-02-24 Multi-time Loewner energy: rate function for large deviation Mo Chen et.al. 2602.20642 null
2026-02-24 Precise Measurement of Matter-Antimatter Asymmetry with Entangled Hyperon Antihyperon Pairs BESIII Collaboration et.al. 2602.20524 null
2026-02-24 Search for Light-Mass Fractionally Charged Particles in Space with DAMPE Experiment F. Alemanno et.al. 2602.20519 null
2026-02-24 Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA Nuocheng Yang et.al. 2602.20492 null
2026-02-23 Learning Discriminative and Generalizable Anomaly Detector for Dynamic Graph with Limited Supervision Yuxing Tian et.al. 2602.20019 null
2026-02-23 Counterfactual Understanding via Retrieval-aware Multimodal Modeling for Time-to-Event Survival Prediction Ha-Anh Hoang Nguyen et.al. 2602.19987 null
2026-02-23 ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting Yuxing Tian et.al. 2602.19969 null
2026-02-23 A Replicate-and-Quantize Strategy for Plug-and-Play Load Balancing of Sparse Mixture-of-Experts LLMs Zijie Liu et.al. 2602.19938 null
2026-02-23 Towards Dexterous Embodied Manipulation via Deep Multi-Sensory Fusion and Sparse Expert Scaling Yirui Sun et.al. 2602.19764 null
2026-02-23 Multimodal Dataset Distillation Made Simple by Prototype-Guided Data Synthesis Junhyeok Choi et.al. 2602.19756 null
2026-02-23 RAID: Retrieval-Augmented Anomaly Detection Mingxiu Cai et.al. 2602.19611 null
2026-02-23 EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting Angzi Xu et.al. 2602.19485 null
2026-02-22 RegionRoute: Regional Style Transfer with Diffusion Model Bowen Chen et.al. 2602.19254 null
2026-02-22 Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts Toshihide Ubukata et.al. 2602.19244 null
2026-02-22 SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation Yujie Lu et.al. 2602.19213 null
2026-02-22 JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Kai Liu et.al. 2602.19163 null
2026-02-22 K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model Shiyi Cao et.al. 2602.19128 null
2026-02-22 Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection Hossein Shokouhinejad et.al. 2602.19025 null
2026-02-21 NeuroWise: A Multi-Agent LLM “Glass-Box” System for Practicing Double-Empathy Communication with Autistic Partners Albert Tang et.al. 2602.18962 null
2026-02-21 Give Users the Wheel: Towards Promptable Recommendation Paradigm Fuyuan Lyu et.al. 2602.18929 null
2026-02-21 Diverse properties of electron Forbush decreases revealed by the Dark Matter Particle Explorer F. Alemanno et.al. 2602.18743 null
2026-02-21 Comprehensive measurement of $η^\prime$ photoproduction off the proton at $E_γ< 2.4$ $\mathrm{GeV}$ N. Muramatsu et.al. 2602.18675 null
2026-02-20 Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory Vatsal Agarwal et.al. 2602.18434 null
2026-02-20 RamanSeg: Interpretability-driven Deep Learning on Raman Spectra for Cancer Diagnosis Chris Tomy et.al. 2602.18119 null
2026-02-20 DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE Yujie Jin et.al. 2602.18019 null
2026-02-19 Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds Ibne Farabi Shihab et.al. 2602.17798 null
2026-02-19 Phase-Aware Mixture of Experts for Agentic Reinforcement Learning Shengtian Yang et.al. 2602.17038 null
2026-02-19 Arcee Trinity Large Technical Report Varun Singh et.al. 2602.17004 null
2026-02-19 Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation Yan Wang et.al. 2602.16990 null
2026-02-18 Claim Automation using Large Language Model Zhengda Mo et.al. 2602.16836 null
2026-02-18 Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning Zifan Wang et.al. 2602.16796 null
2026-02-18 Geometric Neural Operators via Lie Group-Constrained Latent Dynamics Jiaquan Zhang et.al. 2602.16209 null
2026-02-18 OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis Tianwei Lin et.al. 2602.16110 null
2026-02-18 Federated Graph AGI for Cross-Border Insider Threat Intelligence in Government Financial Schemes Srikumar Nayak et.al. 2602.16109 null
2026-02-17 MoE-Spec: Expert Budgeting for Efficient Speculative Decoding Bradley McDanel et.al. 2602.16052 null
2026-02-17 ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns Ziyu Zhao et.al. 2602.15521 null
2026-02-17 GMAIL: Generative Modality Alignment for generated Image Learning Shentong Mo et.al. 2602.15368 null
2026-02-16 Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs Ali Khalesi et.al. 2602.15091 null
2026-02-13 RynnBrain: Open Embodied Foundation Models Ronghao Dang et.al. 2602.14979 null
2026-02-16 Topological and arithmetic characteristics about products of projective lines with complex tori Jia-Li Mo et.al. 2602.14745 null
2026-02-16 DriveFine: Refining-Augmented Masked Diffusion VLA for Precise and Robust Driving Chenxu Dang et.al. 2602.14577 null
2026-02-15 DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices Songyuan Li et.al. 2602.14301 null
2026-02-15 MILD: Multi-Intent Learning and Disambiguation for Proactive Failure Prediction in Intent-based Networking Md. Kamrul Hossain et.al. 2602.14283 null
2026-02-15 Multi-Agent Debate: A Unified Agentic Framework for Tabular Anomaly Detection Pinqiao Wang et.al. 2602.14251 null
2026-02-15 Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling via Functional Scaling Laws Jinbo Wang et.al. 2602.14208 null
2026-02-15 Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization Rizhen Hu et.al. 2602.14159 null
2026-02-15 REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment Kai Ye et.al. 2602.14065 null
2026-02-15 LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts Yang Liu et.al. 2602.14060 null
2026-02-15 Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models Sajjad Kachuee et.al. 2602.14039 null
2026-02-15 Eureka-Audio: Triggering Audio Intelligence in Compact Language Models Dan Zhang et.al. 2602.13954 null
2026-02-14 Assessing Cybersecurity Risks and Traffic Impact in Connected Autonomous Vehicles Saurav Silwal et.al. 2602.13898 null
2026-02-14 Mixture-of-experts Wishart model for covariance matrices with an application to Cancer drug screening The Tien Mai et.al. 2602.13888 null
2026-02-13 Dyad: a binary-star dynamics and statistics library for Python Amery Gration et.al. 2602.13388 null
2026-02-13 Improved measurements of the coherence factors and strong-phase differences in $D\to K^-π^+π^+π^-$ and $D\to K^-π^+π^0$ with quantum-correlated $D\bar{D}$ decays BESIII Collaboration et.al. 2602.13002 null
2026-02-13 Aspect-Based Sentiment Analysis for Future Tourism Experiences: A BERT-MoE Framework for Persian User Reviews Hamidreza Kazemi Taskooh et.al. 2602.12778 null
2026-02-13 Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning Jon Irureta et.al. 2602.12708 null
2026-02-13 Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers Anrui Chen et.al. 2602.12587 null
2026-02-13 SD-MoE: Spectral Decomposition for Effective Expert Specialization Ruijun Huang et.al. 2602.12556 null
2026-02-13 Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR Jaeyoung Lee et.al. 2602.12546 null
2026-02-12 Query-focused and Memory-aware Reranker for Long Context Processing Yuqing Li et.al. 2602.12192 null
2026-02-12 Measurement of the singly Cabibbo-suppressed decay $Λ_c^+\to pη’$ with Deep Learning BESIII Collaboration et.al. 2602.11974 null
2026-02-12 Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration Akhiad Bercovich et.al. 2602.11937 null
2026-02-12 Deep Kernel Fusion for Transformers Zixi Zhang et.al. 2602.11808 null
2026-02-12 LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training Xinyi Liu et.al. 2602.11686 null
2026-02-12 Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts Haiyang Jiang et.al. 2602.11622 null
2026-02-12 Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm Jinrui Zhang et.al. 2602.11543 null
2026-02-12 Adaptive Milestone Reward for GUI Agents Congmin Zheng et.al. 2602.11524 null
2026-02-12 Observation of a New Excited $Σ$ State in $ψ(3686)\to\bar{p}K^+Σ^0+c.c.$ BESIII Collaboration et.al. 2602.11501 null
2026-02-11 Charting Empirical Laws for LLM Fine-Tuning in Scientific Multi-Discipline Learning Lintao Wang et.al. 2602.11215 null
2026-02-11 MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs Yupu Gu et.al. 2602.10965 null
2026-02-11 CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control Riccardo Barbano et.al. 2602.10933 null
2026-02-11 VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Guobin Shen et.al. 2602.10693 null
2026-02-11 Multimodal Priors-Augmented Text-Driven 3D Human-Object Interaction Generation Yin Wang et.al. 2602.10659 null
2026-02-11 A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology Siyuan Yan et.al. 2602.10624 null
2026-02-11 Supercharging Packet-level Network Simulation of Large Model Training via Memoization and Fast-Forwarding Fei Long et.al. 2602.10615 null
2026-02-11 Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Ailin Huang et.al. 2602.10604 null
2026-02-11 Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity Guangzhi Xiong et.al. 2602.10585 null
2026-02-12 3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars Zhongju Wang et.al. 2602.10516 null
2026-02-10 Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and Caching Hanyuan Gao et.al. 2602.10254 null
2026-02-10 TDE 2025abcr: A Tidal Disruption Event in the Outskirts of a Massive Galaxy Robert Stein et.al. 2602.10180 null
2026-02-10 MalMoE: Mixture-of-Experts Enhanced Encrypted Malicious Traffic Detection Under Graph Drift Yunpeng Tan et.al. 2602.10157 null
2026-02-10 Diverse Skill Discovery for Quadruped Robots via Unsupervised Learning Ruopeng Cui et.al. 2602.09767 null
2026-02-10 Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware Systems Guowei Liu et.al. 2602.09721 null
2026-02-10 First observation of the $η_{c}\toΞ^{0} \barΞ^{0}$ decay BESIII Collaboration et.al. 2602.09652 null
2026-02-10 DR.Experts: Differential Refinement of Distortion-Aware Experts for Blind Image Quality Assessment Bohan Fu et.al. 2602.09531 null
2026-02-10 SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity Yukun Zhang et.al. 2602.09386 null
2026-02-10 Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density Zhendong Mi et.al. 2602.09316 null
2026-02-09 Generalizing GNNs with Tokenized Mixture of Experts Xiaoguang Guo et.al. 2602.09258 null
2026-02-09 UI-Venus-1.5 Technical Report Veuns-Team et.al. 2602.09082 null
2026-02-09 DirMoE: Dirichlet-routed Mixture of Experts Amirhossein Vahidi et.al. 2602.09001 null
2026-02-09 OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation Yehua Huang et.al. 2602.08896 null
2026-02-09 FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models Annemette Brok Pirchert et.al. 2602.08818 null
2026-02-10 MOVA: Towards Scalable and Synchronized Video-Audio Generation SII-OpenMOSS Team et.al. 2602.08794 null
2026-02-10 Redundancy-Free View Alignment for Multimodal Human Activity Recognition with Arbitrarily Missing Views Duc-Anh Nguyen et.al. 2602.08755 null
2026-02-09 Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing Jona te Lintelo et.al. 2602.08741 null
2026-02-09 6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks Mohamed Amine Ferrag et.al. 2602.08675 null
2026-02-10 Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models Mingzi Cao et.al. 2602.08658 null
2026-02-09 Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs Yukun Jiang et.al. 2602.08621 null
2026-02-09 Giant Magnetocaloric Effect in a High-Spin Shastry-Sutherland Dipolar Magnet Jianjian Gong et.al. 2602.08497 null
2026-02-09 TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration Linye Wei et.al. 2602.08404 null
2026-02-09 Tighnari v2: Mitigating Label Noise and Distribution Shift in Multimodal Plant Distribution Prediction via Mixture of Experts and Weakly Supervised Learning Haixu Liu et.al. 2602.08282 null
2026-02-09 Large Language Models in Peer-Run Community Behavioral Health Services: Understanding Peer Specialists and Service Users’ Perspectives on Opportunities, Risks, and Mitigation Strategies Cindy Peng et.al. 2602.08187 null
2026-02-08 Multimodal normative modeling in Alzheimers Disease with introspective variational autoencoders Sayantan Kumar et.al. 2602.08077 null
2026-02-08 Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation Shayan Ali Hassan et.al. 2602.08062 null
2026-02-08 Enhanced Mixture 3D CGAN for Completion and Generation of 3D Objects Yahia Hamdi et.al. 2602.08046 null
2026-02-08 The Rise of Sparse Mixture-of-Experts: A Survey from Algorithmic Foundations to Decentralized Architectures and Vertical Domain Applications Dong Pan et.al. 2602.08019 null
2026-02-08 Fast Model Selection and Stable Optimization for Softmax-Gated Multinomial-Logistic Mixture of Experts Models TrungKhang Tran et.al. 2602.07997 null
2026-02-08 Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds Chen Yang et.al. 2602.07864 null
2026-02-07 SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models Juntong Wu et.al. 2602.07616 null
2026-02-06 DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Shenyuan Gao et.al. 2602.06949 null
2026-02-06 Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing Meng Lou et.al. 2602.06862 null
2026-02-06 POP: Online Structural Pruning Enables Efficient Inference of Large Foundation Models Yi Chen et.al. 2602.06822 null
2026-02-06 SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers Shentong Mo et.al. 2602.06706 null
2026-02-06 Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making Baichuan-M3 Team et.al. 2602.06570 null
2026-02-06 TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders Yuchen Jiang et.al. 2602.06563 null
2026-02-06 HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction Shengxuan Qiu et.al. 2602.06527 null
2026-02-05 GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt Mark Russinovich et.al. 2602.06258 null
2026-02-05 To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training Meghana Madhyastha et.al. 2602.06183 null
2026-02-05 MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models Nurbek Tastan et.al. 2602.06154 null
2026-02-05 OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale Jingze Shi et.al. 2602.05711 null
2026-02-05 Hidden simplicity in AdS spinning Mellin amplitudes via scaffolding Song He et.al. 2602.05568 null
2026-02-05 M $^2$ -Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining Rui Lv et.al. 2602.05429 null
2026-02-05 Mergers Drive Structural Complexity but Not Starbursts in Lyman- $α$ Emitters at $3 < z < 4$ : A JWST Spatially Resolved View Qi Song et.al. 2602.05411 null
2026-02-05 Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach Beichen Wan et.al. 2602.05340 null
2026-02-05 Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink Guozhi Liu et.al. 2602.05228 null
2026-02-04 Rule-Based Spatial Mixture-of-Experts U-Net for Explainable Edge Detection Bharadwaj Dogga et.al. 2602.05100 null
2026-02-04 Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism Chenwei Cui et.al. 2602.04870 null
2026-02-04 PDF-HR: Pose Distance Fields for Humanoid Robots Yi Gu et.al. 2602.04851 null
2026-02-04 ERNIE 5.0 Technical Report Haifeng Wang et.al. 2602.04705 null
2026-02-04 Let Experts Feel Uncertainty: A Multi-Expert Label Distribution Approach to Probabilistic Time Series Forecasting Zhen Zhou et.al. 2602.04678 null
2026-02-04 RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models Jiacheng Liang et.al. 2602.04448 null
2026-02-04 Mixture of Masters: Sparse Chess Language Models with Player Routing Giacomo Frisoni et.al. 2602.04447 null
2026-02-04 Study of $\barΛ$-$p$ Annihilation into Light Mesons BESIII Collaboration et.al. 2602.04276 null
2026-02-04 Universal Quantized Berry-Dipole Flat Bands Qingyang Mo et.al. 2602.04194 null
2026-02-04 OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows Ruiting Dai et.al. 2602.04144 null
2026-02-04 Expert Selections In MoE Models Reveal (Almost) As Much As Text Amir Nuriyev et.al. 2602.04105 null
2026-02-03 SpecMD: A Comprehensive Study On Speculative Expert Prefetching Duc Hoang et.al. 2602.03921 null
2026-02-03 UniGeM: Unifying Data Mixing and Selection via Geometric Exploration and Mining Changhao Wang et.al. 2602.03772 null
2026-02-03 HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing Yizhao Gao et.al. 2602.03560 null
2026-02-03 DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs Zeyu Zhu et.al. 2602.03495 null
2026-02-03 Scaling Continual Learning with Bi-Level Routing Mixture-of-Experts Meng Lou et.al. 2602.03473 null
2026-02-03 VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers Zhiwen Li et.al. 2602.03210 null
2026-02-03 Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry Ye Su et.al. 2602.03204 null
2026-02-03 Aligning Forest and Trees in Images and Long Captions for Visually Grounded Understanding Byeongju Woo et.al. 2602.02977 null
2026-02-02 Decision-Focused Optimal Transport Suhan Liu et.al. 2602.02800 null
2026-02-02 Loss mechanisms of microwave frequency acoustic waves in thin film lithium niobate Qixuan Lin et.al. 2602.02797 null
2026-02-02 SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning Qifan Yu et.al. 2602.02472 null
2026-02-02 Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE Yuanteng Chen et.al. 2602.02443 null
2026-02-02 DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild Arnab Das et.al. 2602.02286 null
2026-02-02 MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology Susu Hu et.al. 2602.02282 null
2026-02-02 Kimi K2.5: Visual Agentic Intelligence Kimi Team et.al. 2602.02276 null
2026-02-02 vLLM-Omni: Fully Disaggregated Serving for Any-to-Any Multimodal Models Peiqi Yin et.al. 2602.02204 null
2026-02-02 No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs Liyan Xu et.al. 2602.02103 null
2026-02-02 Edge-Aligned Initialization of Kernels for Steered Mixture-of-Experts Martin Determann et.al. 2602.02031 null
2026-02-02 SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning Zhen-Hao Xie et.al. 2602.01990 null
2026-02-02 Mixture-of-Experts with Intermediate CTC Supervision for Accented Speech Recognition Wonjun Lee et.al. 2602.01967 null
2026-02-02 SOPRAG: Multi-view Graph Experts Retrieval for Industrial Standard Operating Procedures Liangtao Lin et.al. 2602.01858 null
2026-02-02 From Knowing to Doing Precisely: A General Self-Correction and Termination Framework for VLA models Wentao Zhang et.al. 2602.01811 null
2026-02-02 Mutual-Guided Expert Collaboration for Cross-Subject EEG Classification Zhi Zhang et.al. 2602.01728 null
2026-02-02 AdNanny: One Reasoning LLM for All Offline Ads Recommendation Tasks Nan Hu et.al. 2602.01563 null
2026-02-01 A Statistical Theory of Gated Attention through the Lens of Hierarchical Mixture of Experts Viet Nguyen et.al. 2602.01468 null
2026-02-01 Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function Tuan Minh Pham et.al. 2602.01466 null
2026-02-01 Exposing and Defending the Achilles’ Heel of Video Mixture-of-Experts Songping Wang et.al. 2602.01369 null
2026-02-01 Observation of $\barΛp\to K^{+}π^{+}π^{-}π^{0}$ and $\barΛp\to K^{+}π^{+}π^{-}2π^{0}$ BESIII Collaboration et.al. 2602.01282 null
2026-02-01 MiTA Attention: Efficient Fast-Weight Scaling via a Mixture of Top- $k$ Activations Qishuai Wen et.al. 2602.01219 null
2026-02-01 Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse Zizhuo Fu et.al. 2602.01203 null
2026-01-30 Omni-fMRI: A Universal Atlas-Free fMRI Foundation Model Mo Wang et.al. 2601.23090 null
2026-01-30 UrbanMoE: A Sparse Multi-Modal Mixture-of-Experts Framework for Multi-Task Urban Region Profiling Pingping Liu et.al. 2601.22746 null
2026-01-30 A Cross-Domain Graph Learning Protocol for Single-Step Molecular Geometry Refinement Chengchun Liu et.al. 2601.22723 null
2026-01-30 A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization Shiye Lei et.al. 2601.22718 null
2026-01-30 A Unified Study of LoRA Variants: Taxonomy, Review, Codebase, and Empirical Evaluation Haonan He et.al. 2601.22708 null
2026-01-30 Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments Jinwoo Jang et.al. 2601.22647 null
2026-01-30 SpanNorm: Reconciling Training Stability and Performance in Deep Transformers Chao Wang et.al. 2601.22580 null
2026-01-30 SHED Light on Segmentation for Dense Prediction Seung Hyun Lee et.al. 2601.22529 null
2026-01-30 Continual Policy Distillation from Distributed Reinforcement Learning Teachers Yuxuan Li et.al. 2601.22475 null
2026-01-29 ECO: Quantized Training without Full-Precision Master Weights Mahdi Nikdan et.al. 2601.22101 null
2026-01-29 Heterogeneous Computing: The Key to Powering the Future of AI Agent Inference Yiren Zhao et.al. 2601.22001 null
2026-01-29 MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts Lorenzo Mazza et.al. 2601.21971 null
2026-01-29 MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts Evandro S. Ortigossa et.al. 2601.21866 null
2026-01-29 OneMall: One Model, More Scenarios – End-to-End Generative Recommender Family at Kuaishou E-Commerce Kun Zhang et.al. 2601.21770 null
2026-01-29 Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers Evandro S. Ortigossa et.al. 2601.21641 null
2026-01-29 Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves Jonas Knupp et.al. 2601.21582 null
2026-01-29 Multi-Modal Time Series Prediction via Mixture of Modulated Experts Lige Zhang et.al. 2601.21547 null
2026-01-29 ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory Yang Zhao et.al. 2601.21545 null
2026-01-30 L $^3$ : Large Lookup Layers Albert Tseng et.al. 2601.21461 null
2026-01-29 ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Zihao Huang et.al. 2601.21420 null
2026-01-29 L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts Minghao Yang et.al. 2601.21349 null
2026-01-29 Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies Ce Hao et.al. 2601.21251 null
2026-01-29 Scaling Embeddings Outperforms Scaling Experts in Language Models Hong Liu et.al. 2601.21204 null
2026-01-29 ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling Yuchen Yang et.al. 2601.21198 null
2026-01-29 Precise measurements of $D^0 \to K^-\ell^+ν_\ell$ and $D^+ \to \bar K^0\ell^+ν_\ell$ decays BESIII Collaboration et.al. 2601.21196 null
2026-01-29 Search for $ψ_0(4360)\rightarrow ηψ(2S)$ through the process $e^+e^- \rightarrow ηηψ(2S)$ BESIII Collaboration et.al. 2601.21190 null
2026-01-29 First Experimental Constraint on the Scalar Current in the $D^{0(+)}\to \bar K\ell^+ν_{\ell}$ Transition BESIII Collaboration et.al. 2601.21185 null
2026-01-29 BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding Ziyi Zhao et.al. 2601.21148 null
2026-01-29 TRACE: Trajectory Recovery for Continuous Mechanism Evolution in Causal Representation Learning Shicheng Fan et.al. 2601.21135 null
2026-01-28 ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler Bohua Zou et.al. 2601.20755 null
2026-01-28 ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code Mingqiao Mo et.al. 2601.20679 null
2026-01-28 Unsupervised Ensemble Learning Through Deep Energy-based Models Ariel Maymon et.al. 2601.20556 null
2026-01-28 OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution Le Zhang et.al. 2601.20380 null
2026-01-28 OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion Shuoyan Wei et.al. 2601.20308 null
2026-01-28 MiLorE-SSL: Scaling Multilingual Capabilities in Self-Supervised Models without Forgetting Jing Xu et.al. 2601.20300 null
2026-01-28 HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH Yueyang Wang et.al. 2601.20255 null
2026-01-28 Hyperparameter Transfer with Mixture-of-Expert Layers Tianze Jiang et.al. 2601.20205 null
2026-01-28 Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery Zhipeng Zhang et.al. 2601.20193 null
2026-01-27 Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts TrungKhang Tran et.al. 2601.19811 null
2026-01-27 Component-Level Lesioning of Language Models Reveals Clinically Aligned Aphasia Phenotypes Yifan Wang et.al. 2601.19723 null
2026-01-27 LoPRo: Enhancing Low-Rank Quantization via Permuted Block-Wise Rotation Hongyaoxing Gu et.al. 2601.19675 null
2026-01-27 GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining Shentong Mo et.al. 2601.19606 null
2026-01-27 Search for the isospin-violating decays $\boldsymbol{χ_{cJ}\toΛ\barΣ^{0}+c.c.}$ and $\boldsymbol{η_{c}\toΛ\barΣ^{0}+c.c.}$ BESIII Collaboration et.al. 2601.19493 null
2026-01-27 Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition Isha Pandey et.al. 2601.19451 null
2026-01-26 Superlinear Multi-Step Attention Yufeng Huang et.al. 2601.18401 null
2026-01-26 FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning Zhaopeng Qiu et.al. 2601.18150 null
2026-01-26 Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions Pedram Agand et.al. 2601.18107 null
2026-01-26 OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion Zhichao Wang et.al. 2601.18094 null
2026-01-26 LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts Venmugil Elango et.al. 2601.18089 null
2026-01-25 Domain-Expert-Guided Hybrid Mixture-of-Experts for Medical AI: Integrating Data-Driven Learning with Clinical Priors Jinchen Gu et.al. 2601.17977 null
2026-01-25 EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI Agents Ying Mo et.al. 2601.17722 null
2026-01-25 $\infty$ -MoE: Generalizing Mixture of Experts to Infinite Experts Shota Takashiro et.al. 2601.17680 null
2026-01-25 Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context Zhihao Zhang et.al. 2601.17642 null
2026-01-24 PILOT: A Perceptive Integrated Low-level Controller for Loco-manipulation over Unstructured Scenes Xinru Cui et.al. 2601.17440 null
2026-01-24 Topological Protection by Local Support Symmetry and Destructive Interference Jun-Won Rhim et.al. 2601.17272 null
2026-01-23 Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts Xuan-Phi Nguyen et.al. 2601.17111 null
2026-01-23 First evidence for $D_s^+ \to f_1(1420) e^+ν_e$ and search for $D_s^+ \to f_1(1285) e^+ν_e$ BESIII Collaboration et.al. 2601.16938 null
2026-01-23 Coarse-Grained Geometric Quantum Dynamics in the Tensor Network Representation Mo Sha et.al. 2601.16913 null
2026-01-23 GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints Andy Zhu et.al. 2601.16905 null
2026-01-23 Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation Tims Pecerskis et.al. 2601.16863 null
2026-01-23 SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents Yuhang Wang et.al. 2601.16746 null
2026-01-23 LongCat-Flash-Thinking-2601 Technical Report Meituan LongCat Team et.al. 2601.16725 null
2026-01-23 Search for the radiative decay $D^+_s \to γK^*(892)^+$ BESIII Collaboration et.al. 2601.16476 null
2026-01-22 proto-Lightspeed: a high-speed, ultra-low read noise imager on the Magellan Clay Telescope Christopher Layden et.al. 2601.16268 null
2026-01-22 Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Moo Jin Kim et.al. 2601.16163 null
2026-01-22 Universal Refusal Circuits Across LLMs: Cross-Model Transfer via Trajectory Replay and Concept-Basis Reconstruction Tony Cristofano et.al. 2601.16034 null
2026-01-22 Search for the reaction channel $e^+ e^- \to ηη\,J/ψ$ and the isospin partner of the $Z_c(3900)$ at center-of-mass energies $\sqrt{s} = 4.226-4.950$ GeV BESIII Collaboration et.al. 2601.15882 null
2026-01-22 LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting Yuhan Chen et.al. 2601.15772 null
2026-01-22 Redshift-Binned Constraints on the Hubble Constant under $Λ$ CDM, CPL, and Padé Cosmography Zhi-Yuan Mo et.al. 2601.15765 null
2026-01-21 On the diagonal of low bidegree hypersurfaces Morten Lüders et.al. 2601.15409 null
2026-01-21 Improving MoE Compute Efficiency by Composing Weight and Data Sparsity Maciej Kilian et.al. 2601.15370 null
2026-01-21 Pb4U-GNet: Resolution-Adaptive Garment Simulation via Propagation-before-Update Graph Network Aoran Liu et.al. 2601.15110 null
2026-01-21 Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization Adam Rokah et.al. 2601.15021 null
2026-01-21 SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction Kaixuan Zhang et.al. 2601.14910 null
2026-01-21 Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation Rui Qi et.al. 2601.14896 null
2026-01-21 UBATrack: Spatio-Temporal State Space Model for General Multi-Modal Tracking Qihua Liang et.al. 2601.14799 null
2026-01-21 UniRoute: Unified Routing Mixture-of-Experts for Modality-Adaptive Remote Sensing Change Detection Qingling Shu et.al. 2601.14797 null
2026-01-21 Robustness of Mixtures of Experts to Feature Noise Dong Sun et.al. 2601.14792 null
2026-01-21 Online Linear Programming with Replenishment Yuze Chen et.al. 2601.14629 null
2026-01-20 $π$ MPC: A Parallel-in-horizon and Construction-free NMPC Solver Liang Wu et.al. 2601.14414 null
2026-01-20 Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models YuanLab. ai et.al. 2601.14327 null
2026-01-20 LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems Badri N. Patro et.al. 2601.14053 null
2026-01-20 Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering Yuxin Chen et.al. 2601.14050 null
2026-01-20 DExTeR: Weakly Semi-Supervised Object Detection with Class and Instance Experts for Medical Imaging Adrien Meyer et.al. 2601.13954 null
2026-01-20 The R2Pub Telescopes for Surveying: An Overview and Performance Evaluation of the System Xuan Song et.al. 2601.13587 null
2026-01-20 ButterflyMoE: Sub-Linear Ternary Experts via Structured Butterfly Orbits Aryan Karmore et.al. 2601.13563 null
2026-01-20 MN-TSG:Continuous Time Series Generation with Irregular Observations Xu Zhang et.al. 2601.13534 null
2026-01-19 CLIP-Guided Adaptable Self-Supervised Learning for Human-Centric Visual Tasks Mingshuang Luo et.al. 2601.13133 null
2026-01-19 Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning Fengran Mo et.al. 2601.13115 null
2026-01-19 Polychronous Wave Computing: Timing-Native Address Selection in Spiking Networks Natalila G. Berloff et.al. 2601.13079 null
2026-01-19 Synthesizing Strong-Coupling Kohn-Luttinger Superconductivity in 2D Van der Waals materials Shi-Cong Mo et.al. 2601.13074 null
2026-01-19 PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning Zhiyan Hou et.al. 2601.13020 null
2026-01-19 HT-GNN: Hyper-Temporal Graph Neural Network for Customer Lifetime Value Prediction in Baidu Ads Xiaohui Zhao et.al. 2601.13013 null
2026-01-19 OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models Shiyuan Li et.al. 2601.12996 null
2026-01-19 PhyG-MoE: A Physics-Guided Mixture-of-Experts Framework for Energy-Efficient GNSS Interference Recognition Zhihan Zeng et.al. 2601.12798 null
2026-01-19 Topology-Aware Multiscale Mixture of Experts for Efficient Molecular Property Prediction Long D. Nguyen et.al. 2601.12637 null
2026-01-18 A Mixture of Experts Vision Transformer for High-Fidelity Surface Code Decoding Hoang Viet Nguyen et.al. 2601.12483 null
2026-01-18 Learning Diverse Skills for Behavior Models with Mixture of Experts Wangtian Shen et.al. 2601.12397 null
2026-01-18 NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages Lakshya Tomar et.al. 2601.12389 null
2026-01-18 GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer Xinyuan Zhao et.al. 2601.12316 null
2026-01-18 Facet-Aware Multi-Head Mixture-of-Experts Model with Text-Enhanced Pre-training for Sequential Recommendation Mingrui Liu et.al. 2601.12301 null
2026-01-16 Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering Yuling Shi et.al. 2601.11255 null
2026-01-16 First Measurement of the Absolute Branching Fraction of $η_c \to γγ$ BESIII Collaboration et.al. 2601.11236 null
2026-01-16 Self-Augmented Mixture-of-Experts for QoS Prediction Kecheng Cai et.al. 2601.11036 null
2026-01-16 RobuMTL: Enhancing Multi-Task Learning Robustness Against Weather Conditions Tasneem Shaffee et.al. 2601.10921 null
2026-01-15 Search for sub-GeV dark particles in $η\toπ^0+\rm{invisible}$ decay BESIII Collaboration et.al. 2601.10597 null
2026-01-15 Deterministic and scalable generation of large Fock states Mo Xiong et.al. 2601.10559 null
2026-01-15 Algebraic Farkas Lemma and Strong Duality for Perturbed Conic Linear Programming P. D. Khanh et.al. 2601.10390 null
2026-01-15 MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts Yuxuan Lou et.al. 2601.10272 null
2026-01-15 A Highly Magnetic Ultra Massive White Dwarf with a 23-minute Rotation Period Jincheng Guo et.al. 2601.10188 null
2026-01-15 What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models Guimin Hu et.al. 2601.10159 null
2026-01-15 MMPG: MoE-based Adaptive Multi-Perspective Graph Fusion for Protein Representation Learning Yusong Wang et.al. 2601.10157 null
2026-01-15 Extremum Seeking Nonovershooting Control of Strict-Feedback Systems Under Unknown Control Direction Kaixin Lu et.al. 2601.09998 null
2026-01-14 Progressive Mixture-of-Experts with autoencoder routing for continual RANS turbulence modelling Haoyu Ji et.al. 2601.09305 null
2026-01-14 A Raman-Gas Spectral Compressor for High-Energy Femtosecond Laser Pulses Zegui Wang et.al. 2601.09234 null
2026-01-15 A.X K1 Technical Report Sung Jun Cheon et.al. 2601.09200 null
2026-01-14 WiFo-E: A Scalable Wireless Foundation Model for End-to-End FDD Precoding in Communication Networks Weibo Wen et.al. 2601.09186 null
2026-01-14 Horseshoe Mixtures-of-Experts (HS-MoE) Nick Polson et.al. 2601.09043 null
2026-01-13 OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG Fengran Mo et.al. 2601.09028 null
2026-01-12 TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts Yu Xu et.al. 2601.08881 null
2026-01-13 MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm Bowen Zhou et.al. 2601.08800 null
2026-01-13 LWM-Spectro: A Foundation Model for Wireless Baseband Signal Spectrograms Namhyun Kim et.al. 2601.08780 null
2026-01-13 M $^2$ FMoE: Multi-Resolution Multi-View Frequency Mixture-of-Experts for Extreme-Adaptive Time Series Forecasting Yaohui Huang et.al. 2601.08631 null
2026-01-13 Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances Ziqi Ding et.al. 2601.08516 null
2026-01-13 Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance Jihang Li et.al. 2601.08418 null
2026-01-13 Controlled LLM Training on Spectral Sphere Tian Xie et.al. 2601.08393 null
2026-01-13 Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models Bo Wang et.al. 2601.08383 null
2026-01-13 Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints Seng Pei Liew et.al. 2601.08215 null
2026-01-12 Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation Yuxin Yang et.al. 2601.07935 null
2026-01-12 An eclipsing 8.56 minute orbital period mass-transferring binary Emma T. Chickles et.al. 2601.07925 null
2026-01-12 Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator Chaewon Heo et.al. 2601.07698 null
2026-01-12 Amplitude analysis and branching fraction measurement of $J/ψ\to Λ\barΣ^0η+\mathrm{c.c}$ BESIII Collaboration et.al. 2601.07617 null
2026-01-12 Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models Xin Cheng et.al. 2601.07372 null
2026-01-11 PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation Yuanzhe Liu et.al. 2601.07060 null
2026-01-11 Solar Open Technical Report Sungrae Park et.al. 2601.07022 null
2026-01-11 Deep Learning Based Channel Extrapolation for Dual-Band Massive MIMO Systems Qikai Xiao et.al. 2601.06858 null
2026-01-11 MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models Xin Ye et.al. 2601.06857 null
2026-01-11 MoEScore: Mixture-of-Experts-Based Text-Audio Relevance Score Prediction for Text-to-Audio System Evaluation Bochao Sun et.al. 2601.06829 null
2026-01-11 SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute Bowen Shen et.al. 2601.06790 null
2026-01-11 AutoTour: Automatic Photo Tour Guide with Smartphones and LLMs Huatao Xu et.al. 2601.06781 null
2026-01-11 MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues Zheyuan Liu et.al. 2601.06757 null
2026-01-10 R-Estimation with Right-Censored Data Glen A. Satten et.al. 2601.06685 null
2026-01-10 Efficient and Reliable Estimation of Named Entity Linking Quality: A Case Study on GutBrainIE Marco Martinelli et.al. 2601.06624 null
2026-01-10 Hellinger Multimodal Variational Autoencoders Huyen Khanh Vo et.al. 2601.06572 null
2026-01-10 Physics-guided foundation model for universal speckle removal in ultrathin multimode fiber imaging Xianrui Zeng et.al. 2601.06448 null
2026-01-10 The Promise of Time-Series Foundation Models for Agricultural Forecasting: Evidence from Marketing Year Average Prices Le Wang et.al. 2601.06371 null
2026-01-09 Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning Nusrat Jahan Prottasha et.al. 2601.06356 null
2026-01-09 AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving Tianhao Xu et.al. 2601.06288 null
2026-01-09 Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR Zijun Min et.al. 2601.05607 null
2026-01-09 Buffered AUC maximization for scoring systems via mixed-integer optimization Moe Shiina et.al. 2601.05544 null
2026-01-09 Scalable Heterogeneous Graph Learning via Heterogeneous-aware Orthogonal Prototype Experts Wei Zhou et.al. 2601.05537 null
2026-01-08 MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs Jiyuan Zhang et.al. 2601.05296 null
2026-01-08 MoE3D: A Mixture-of-Experts Module for 3D Reconstruction Zichen Wang et.al. 2601.05208 null
2026-01-08 FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts Yiji Zhao et.al. 2601.05174 link
2026-01-08 How to Set the Learning Rate for Large-Scale Pre-training? Yunhua Zhou et.al. 2601.05049 null
2026-01-08 CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters Ao Sun et.al. 2601.04885 null
2026-01-08 DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation Guanzhi Deng et.al. 2601.04823 null
2026-01-08 Users Mispredict Their Own Preferences for AI Writing Assistance Vivian Lai et.al. 2601.04461 null
2026-01-08 Re-Rankers as Relevance Judges Chuan Meng et.al. 2601.04455 null
2026-01-07 Transitive Expert Error and Routing Problems in Complex AI Systems Forest Mars et.al. 2601.04416 null
2026-01-06 Scaling Trends for Multi-Hop Contextual Reasoning in Mid-Scale Language Models Brady Steele et.al. 2601.04254 null
2026-01-07 When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life Xinyue Lou et.al. 2601.04043 null
2026-01-07 A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems Qi Wu et.al. 2601.03992 null
2026-01-07 Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures Ibrahim Delibasoglu et.al. 2601.03889 null
2026-01-07 PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation Wenlong Huang et.al. 2601.03782 null
2026-01-07 Variational Inference, Entropy, and Orthogonality: A Unified Theory of Mixture-of-Experts Ye Su et.al. 2601.03577 null
2026-01-07 CALM: Culturally Self-Aware Language Models Lingzhi Shen et.al. 2601.03483 null
2026-01-06 The Illusion of Specialization: Unveiling the Domain-Invariant “Standing Committee” in Mixture-of-Experts Models Yan Wang et.al. 2601.03425 null
2026-01-06 AT2024wpp: An Extremely Luminous Fast Ultraviolet Transient Powered by Accretion onto a Black Hole Daniel A. Perley et.al. 2601.03337 null
2026-01-06 ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios Yihan Wei et.al. 2601.03011 null
2026-01-08 MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free Yishu Lei et.al. 2601.02967 null
2026-01-06 MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation Wenzhao Jiang et.al. 2601.02943 null
2026-01-06 MiMo-V2-Flash Technical Report Bangjun Xiao et.al. 2601.02780 null
2026-01-05 Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts Boxuan Lyu et.al. 2601.02144 null
2026-01-05 Cross section measurement of $e^{+}e^{-}\rightarrow π^{0}π^{0}ψ(3686)$ from $\sqrt{s}=$ 4.008 GeV to 4.951 GeV BESIII Collaboration et.al. 2601.02136 null
2026-01-07 FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations Adeshola Okubena et.al. 2601.02071 null
2026-01-05 GCR: Geometry-Consistent Routing for Task-Agnostic Continual Anomaly Detection Joongwon Chae et.al. 2601.01856 null
2026-01-05 First Observation of $D^{0(+)}\to \bar Kωe^+ν_e$ and Determination of the Branching Fraction of $\bar K_1(1270)\to \bar K ω$ BESIII Collaboration et.al. 2601.01817 null
2026-01-05 Causality-Aware Temporal Projection for Video Understanding in Video-LLMs Zhengjian Kang et.al. 2601.01804 null
2026-01-05 Measurements of the branching fractions of $χ_{cJ}\to 2K^+ 2K^- ω$ and $φK^+ K^- ω$ decays BESIII Collaboration et.al. 2601.01758 null
2026-01-05 K-EXAONE Technical Report Eunbi Choi et.al. 2601.01739 null
2026-01-05 Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications YuanLab. ai et.al. 2601.01718 null
2026-01-05 Varying-Coefficient Mixture of Experts Model Qicheng Zhao et.al. 2601.01699 null
2026-01-06 Measurements of the absolute branching fractions of the $Λ_{c}^{+}$ hadronic decays BESIII Collaboration et.al. 2601.01503 null
2026-01-04 Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts Ruofeng Yang et.al. 2601.01475 null
2026-01-06 Making MoE-based LLM Inference Resilient with Tarragon Songyu Zhang et.al. 2601.01310 null
2026-01-03 MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance Hamad Khan et.al. 2601.01260 null
2026-01-02 Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures Kabir Grover et.al. 2601.00942 null
2026-01-02 HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts Zihan Fang et.al. 2601.00583 null
2026-01-02 A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR Yuang Zheng et.al. 2601.00557 null
2026-01-01 Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations Hyunjun Kim et.al. 2601.00457 null
2026-01-01 Traffic-MoE: A Sparse Foundation Model for Network Traffic Analysis Jiajun Zhou et.al. 2601.00357 null
2026-01-01 Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach Kohei Yoshikawa et.al. 2601.00287 null
2025-12-31 Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem Weixun Wang et.al. 2512.24873 null
2025-12-31 Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models Ákos Prucs et.al. 2512.24776 null
2025-12-30 Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning Ziqing Fan et.al. 2512.24265 null
2025-12-30 Training Report of TeleChat3-MoE Xinzhang Liu et.al. 2512.24157 null
2025-12-30 Skyrmion and Meron Crystals in Intermetallic Gd $3$Ru$_4$Al${12}$ : Microscopic Model Insights into Chiral Phases Jiajun Mo et.al. 2512.24071 null
2025-12-30 RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress Ruixuan Huang et.al. 2512.23995 null
2025-12-30 Towards a bottom-up formulation of spin kinetic theory Zonglin Mo et.al. 2512.23960 null
2026-01-02 Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling Chulun Zhou et.al. 2512.23959 null
2025-12-30 Learnable Query Aggregation with KV Routing for Cross-view Geo-localisation Hualin Ye et.al. 2512.23938 null
2025-12-29 Observations of the Fermi bubbles and the Galactic center excess with the DArk Matter Particle Explorer F. Alemanno et.al. 2512.23458 null
2025-12-29 Dynamic Subspace Composition: Efficient Adaptation via Contractive Basis Expansion Vladimer Khasia et.al. 2512.23448 null
2025-12-29 Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Ang Lv et.al. 2512.23447 null
2025-12-29 Bitcoin-IPC: Scaling Bitcoin with a Network of Proof-of-Stake Subnets Marko Vukolić et.al. 2512.23439 null
2025-12-29 Study of $\bar{K}^*(892)^0 η$ and $K_S^0 a_0(980)^0$ in the $D^{0} \to K_{S}^{0}π^0η$ decay BESIII Collaboration et.al. 2512.23389 null
2025-12-30 YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection Xu Lin et.al. 2512.23273 null
2025-12-28 Trust Region Masking for Long-Horizon LLM Reinforcement Learning Yingru Li et.al. 2512.23075 null
2025-12-28 FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment Boyang Zhang et.al. 2512.23070 null
2025-12-28 Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware Alex Khalil et.al. 2512.23029 null
2025-12-28 Reach-Avoid Differential game with Reachability Analysis for UAVs: A decomposition approach Minh Bui et.al. 2512.22793 null
2025-12-28 Text-Routed Sparse Mixture-of-Experts Model with Explanation and Temporal Alignment for Multi-Modal Sentiment Analysis Dongning Rao et.al. 2512.22741 null
2025-12-27 RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure Wei Gao et.al. 2512.22560 null
2025-12-27 Scalpel-SAM: A Semi-Supervised Paradigm for Adapting SAM to Infrared Small Object Detection Zihan Liu et.al. 2512.22483 null
2025-12-27 Bright 4B: Scaling Hyperspherical Learning for Segmentation in 3D Brightfield Microscopy Amil Khan et.al. 2512.22423 null
2025-12-26 FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion Zhuoran Zhu et.al. 2512.22036 null
2025-12-26 SWE-RM: Execution-free Feedback For Software Engineering Agents KaShun Shum et.al. 2512.21919 null
2025-12-26 Accelerate Speculative Decoding with Sparse Computation in Verification Jikai Wang et.al. 2512.21911 null
2025-12-26 MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction Carolina Aparício et.al. 2512.21897 null
2025-12-26 CrownGen: Patient-customized Crown Generation via Point Diffusion Model Juyoung Bae et.al. 2512.21890 null
2025-12-26 SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis Mo Wang et.al. 2512.21881 null
2025-12-25 Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction Zheng Yin et.al. 2512.21707 null
2025-12-25 Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism Xinglin Pan et.al. 2512.21487 null
2025-12-24 DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction Khondoker Mirazul Mumenin et.al. 2512.21433 null
2025-12-24 SparScene: Efficient Traffic Scene Representation via Sparse Graph Learning for Large-Scale Trajectory Generation Xiaoyu Mo et.al. 2512.21133 null
2025-12-26 Identification with Orthogonal Basis Functions: Convergence Speed, Asymptotic Bias, and Rate-Optimal Pole Selection Jiayun Li et.al. 2512.21096 null
2025-12-25 GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs Lichao Wu et.al. 2512.21008 null
2025-12-24 SACodec: Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs Zhongren Dong et.al. 2512.20944 null
2025-12-24 RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks Ningyuan Liu et.al. 2512.20920 null
2025-12-24 NVIDIA Nemotron 3: Efficient and Open Intelligence NVIDIA et.al. 2512.20856 null
2025-12-23 Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning NVIDIA et.al. 2512.20848 null
2025-12-23 Defending against adversarial attacks using mixture of experts Mohammad Meymani et.al. 2512.20821 null
2025-12-23 MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts Alexandros Christoforos et.al. 2512.20604 null
2025-12-23 Branch Learning in MRI: More Data, More Models, More Training Yuyang Li et.al. 2512.20330 null
2025-12-23 Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity Yuxing Gan et.al. 2512.20291 null
2025-12-23 Degradation-Aware Metric Prompting for Hyperspectral Image Restoration Binfeng Wang et.al. 2512.20251 null
2025-12-23 AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model Sofian Chaybouti et.al. 2512.20157 null
2025-12-23 Fun-Audio-Chat Technical Report Qian Chen et.al. 2512.20156 null
2025-12-23 Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting Sangoh Lee et.al. 2512.20014 null
2025-12-23 Observation and branching fraction measurements of $χ_{cJ}\to p \bar p K^0_S K^0_S$ BESIII Collaboration et.al. 2512.19993 null
2025-12-22 UCCL-EP: Portable Expert-Parallel Communication Ziming Mao et.al. 2512.19849 null
2025-12-21 How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts Sumin Park et.al. 2512.19765 null
2025-12-22 Towards Closed-Loop Embodied Empathy Evolution: Probing LLM-Centric Lifelong Empathic Motion Generation in Unseen Scenarios Jiawen Wang et.al. 2512.19551 null
2025-12-22 EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control Chao Yang et.al. 2512.19043 null
2025-12-21 Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation Guangtao Lyu et.al. 2512.18804 null
2025-12-21 Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts Linwei Qiu et.al. 2512.18718 null
2025-12-21 Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless Computing Wentao Liu et.al. 2512.18674 null
2025-12-21 Commercial Vehicle Braking Optimization: A Robust SIFT-Trajectory Approach Zhe Li et.al. 2512.18597 null
2025-12-20 Secret mixtures of experts inside your LLM Enric Boix-Adsera et.al. 2512.18452 null
2025-12-20 MoE Pathfinder: Trajectory-driven Expert Pruning Xican Yang et.al. 2512.18425 null
2025-12-20 MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation Kaixing Yang et.al. 2512.18181 null
2025-12-20 Cross section and parametrization of charmonium decay Xiao-Hu Mo et.al. 2512.18154 null
2025-12-19 MoE-TransMov: A Transformer-based Model for Next POI Prediction in Familiar & Unfamiliar Movements Ruichen Tan et.al. 2512.17985 null
2025-12-19 Interpreting the strong clustering of ultra-diffuse galaxies by halo spin bias Qinglin Ma et.al. 2512.17742 null
2025-12-19 Cross sections measurement of $e^+e^-\to Ξ(1530)^0\barΞ^0 + c.c.$ and search for $ψ(3770)\toΞ(1530)^0\barΞ^0 + c.c.$ BESIII Colaboration et.al. 2512.17275 null
2025-12-19 Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding Yuqing Li et.al. 2512.17220 null
2025-12-19 Capturing Arbitrary Waveform without Absorption with Synthesis of Complex Frequencies Zhaohua Tian et.al. 2512.17156 null
2025-12-18 Bandwidth-Efficient Adaptive Mixture-of-Experts via Low-Rank Compensation Zhenyu Liu et.al. 2512.17073 null
2025-12-18 Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models Zhongpan Tang et.al. 2512.16963 null
2025-12-18 LinkedOut: Linking World Knowledge Representation Out of Video LLM for Next-Generation Video Recommendation Haichao Zhang et.al. 2512.16891 null
2025-12-18 The WINTER Observatory: A One-Degree InGaAs Survey Camera to study the Transient Infrared Sky Danielle Frostig et.al. 2512.16753 null
2025-12-18 PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation Mengyuan Liu et.al. 2512.16494 null
2025-12-18 Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems En-Ming Huang et.al. 2512.16473 null
2025-12-18 Pretrained Battery Transformer (PBT): A battery life prediction foundation model Ruifeng Tan et.al. 2512.16334 null
2025-12-19 Sigma-MoE-Tiny Technical Report Qingguo Hu et.al. 2512.16248 null
2025-12-18 Open Ad-hoc Categorization with Contextualized Feature Learning Zilin Wang et.al. 2512.16202 null
2025-12-18 INTELLECT-3: Technical Report Prime Intellect Team et.al. 2512.16144 null
2025-12-17 Wake instability past a sphere settling in a strongly stratified flow Chang-Fan Mo et.al. 2512.15626 null
2025-12-17 Measurements of the Absolute Branching Fraction of the Semileptonic Decay $\mathbf{Ξ^{-}\rightarrow Λe^- \barν_{e}}$ and the Axial Charge of the $\mathbfΞ^{-}$ BESIII Collaboration et.al. 2512.15273 null
2025-12-19 VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments Yuze Wu et.al. 2512.15258 null
2025-12-17 Search for the decays $X(3872)\to K_{S}^{0}K^{\pm}π^{\mp}$ and $K^*(892)\bar{K}$ at BESIII BESIII Collaboration et.al. 2512.15091 null
2025-12-19 Let the Barbarians In: How AI Can Accelerate Systems Performance Research Audrey Cheng et.al. 2512.14806 null
2025-12-15 SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning Tomohito Kawabata et.al. 2512.14757 null
2025-12-16 Measurements of the branching fractions of $χ_{cJ}\to φφη, φφη^{\prime}$ and $φK^+K^-η$ BESIII Collaboration et.al. 2512.14369 null
2025-12-16 SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing Han Zou et.al. 2512.14140 null
2025-12-16 SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations Wentao Guo et.al. 2512.14080 null
2025-12-16 Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training Can Jin et.al. 2512.13996 null
2025-12-15 Connection between galaxy morphology and dark-matter halo structure II: predicting disk structure from dark-matter halo properties Jinning Liang et.al. 2512.13822 null
2025-12-13 RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing Yuhan Tang et.al. 2512.13727 null
2025-12-15 StutterFuse: Mitigating Modality Collapse in Stuttering Detection with Jaccard-Weighted Metric Learning and Gated Fusion Guransh Singh et.al. 2512.13632 null
2025-12-16 Janus: Disaggregating Attention and Experts for Scalable MoE Inference Zhexiang Zhang et.al. 2512.13525 null
2025-12-15 SIGMA: An AI-Empowered Training Stack on Early-Life Hardware Lei Qu et.al. 2512.13488 null
2025-12-15 Automated Information Flow Selection for Multi-scenario Multi-task Recommendation Chaohua Yang et.al. 2512.13396 null
2025-12-15 Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC Qingyuan Liu et.al. 2512.13047 null
2025-12-15 Safe Control of Multi-Agent Systems with Minimal Communication Mo Yang et.al. 2512.13021 null
2025-12-15 SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE Inference Yuseon Choi et.al. 2512.12990 null
2025-12-14 Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution Boyang Yan et.al. 2512.12806 null
2025-12-14 Bayesian Optimization Parameter Tuning Framework for a Lyapunov Based Path Following Controller Zhewen Zheng et.al. 2512.12649 null
2025-12-13 Amplitude Analysis and Branching Fraction Measurement of $D^+ \to π^+π^0π^0$ BESIII Collaboration et.al. 2512.12397 null
2025-12-13 Fine-Grained Zero-Shot Learning with Attribute-Centric Representations Zhi Chen et.al. 2512.12219 null
2025-12-13 ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB Jeongjun Park et.al. 2512.12206 null
2025-12-13 MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models Ahmad Chamma et.al. 2512.12121 null
2025-12-12 Measurement of the cosmic ray nickel energy spectrum from 10 GeV/n to 2 TeV/n with the DAMPE F. Alemanno et.al. 2512.11425 null
2025-12-11 Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration Sicheng Mo et.al. 2512.10954 null
2025-12-11 Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration Wenlong Jiao et.al. 2512.10581 null
2025-12-11 Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment Han Li et.al. 2512.10450 null
2025-12-12 Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge Junjie Bai et.al. 2512.10071 null
2025-12-10 Efficient Continual Learning in Neural Machine Translation: A Low-Rank Adaptation Approach Salvador Carrión et.al. 2512.09910 null
2025-12-10 DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation Zhizhong Wang et.al. 2512.09814 null
2025-12-10 M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks Blessed Guda et.al. 2512.09797 null
2025-12-10 First measurement of the absolute branching fractions of $Σ^+$ nonleptonic decays and test of the $ΔI = 1/2$ rule % $Σ^+ \to p π^0$ and $Σ^+ \to n π^+$ BESIII Collaboration et.al. 2512.09628 null
2025-12-10 FoundIR-v2: Optimizing Pre-Training Data Mixtures for Image Restoration Foundation Model Xiang Chen et.al. 2512.09282 null
2025-12-10 Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens Yanpeng Yu et.al. 2512.09277 null
2025-12-10 Bug Priority Change Prediction: An Exploratory Study on Apache Software Guangzong Cai et.al. 2512.09216 null
2025-12-09 Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts Yifan Lyu et.al. 2512.08814 null
2025-12-09 What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance Athena Psalta et.al. 2512.08697 null
2025-12-09 Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems Mingwei Li et.al. 2512.08411 null
2025-12-09 FastBEV++: Fast by Algorithm, Deployable by Design Yuanpeng Chen et.al. 2512.08237 null
2025-12-08 Relational Visual Similarity Thao Nguyen et.al. 2512.07833 null
2025-12-08 Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE Anxiang Zeng et.al. 2512.07710 null
2025-12-08 LongCat-Image Technical Report Meituan LongCat Team et.al. 2512.07584 null
2025-12-12 MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion Transformer Penghui Liu et.al. 2512.07500 null
2025-12-08 Equivariant Diffusion for Crystal Structure Prediction Peijia Lin et.al. 2512.07289 null
2025-12-08 Measurement of the branching fraction of $η\to μ^+ μ^-$ and search for $η\to e^+ e^-$ BESIII Collaboration et.al. 2512.07144 null
2025-12-09 TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning Zebin Xing et.al. 2512.07135 null
2025-12-08 PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes Kepeng Lin et.al. 2512.07113 null
2025-12-07 Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding MinCheol Jeon et.al. 2512.06929 null
2025-12-07 Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks Long Shi et.al. 2512.06784 null
2025-12-07 Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving Wei-Bin Kou et.al. 2512.06664 null
2025-12-06 Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion Jaewon Ahn et.al. 2512.06449 null
2025-12-04 The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation Ranjan Sapkota et.al. 2512.06032 null
2025-12-05 HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies Zhiying Du et.al. 2512.05693 null
2025-12-05 ProPhy: Progressive Physical Alignment for Dynamic World Simulation Zijun Wang et.al. 2512.05564 null
2025-12-04 Evidence for the semileptonic decays $Λ_c^{+} \to Σ^{\pm} π^{\mp} e^+ ν_e$ BESIII Collaboration et.al. 2512.05178 null
2025-12-09 EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture Xin He et.al. 2512.04810 null
2025-12-04 Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild Yigui Feng et.al. 2512.04728 null
2025-12-04 Study of the reaction $Ξ^{0}n\rightarrowΛΛX$ using $Ξ^{0}$ -nucleus scattering BESIII Collaboration et.al. 2512.04701 null
2025-12-04 Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space Joey Hong et.al. 2512.04601 null
2025-12-04 The Binary Fraction of Stars in the Dwarf Galaxy Ursa Minor via Dark Energy Spectroscopic Instrument Tian Qiu et.al. 2512.04477 null
2025-12-04 Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems Zehao Fan et.al. 2512.04476 null
2025-12-03 Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research Zia Qi et.al. 2512.04261 null
2025-12-03 Decoding Large Language Diffusion Models with Foreseeing Movement Yichuan Mo et.al. 2512.04135 null
2025-12-03 Stable Signer: Hierarchical Sign Language Generative Model Sen Fang et.al. 2512.04048 null
2025-12-03 OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference Liujianfu Wang et.al. 2512.03927 null
2025-12-04 A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models X. Y. Han et.al. 2512.03915 null
2025-12-03 Parsimonious Clustering of Covariance Matrices Yixi Xu et.al. 2512.03912 null
2025-12-03 Measurement of the hyperon weak radiative decay $Ξ^0\toγΣ^0$ at BESIII BESIII Collaboration et.al. 2512.03877 null
2025-12-03 Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation Subin Kim et.al. 2512.03534 null
2025-12-03 CellScout: Visual Analytics for Mining Biomarkers in Cell State Discovery Rui Sheng et.al. 2512.03485 null
2025-12-03 Unconventional Magneto-Optical Effects in Altermagnets Yongpan Li et.al. 2512.03435 null
2025-12-03 SSLfmm: An R Package for Semi-Supervised Learning with a Mixed-Missingness Mechanism in Finite Mixture Models Geoffrey J. McLachlan et.al. 2512.03322 null
2025-12-02 Intrinsic Second-Order Topological Superconductors with Tunable Majorana Zero Modes Xiao-Jiao Wang et.al. 2512.02775 null
2025-12-02 Stepwise Schema-Guided Prompting Framework with Parameter Efficient Instruction Tuning for Multimedia Event Extraction Xiang Yuan et.al. 2512.02584 null
2025-12-02 SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts Jiaqi Liu et.al. 2512.02517 null
2025-12-02 A Fully First-Order Layer for Differentiable Optimization Zihao Zhao et.al. 2512.02494 null
2025-12-02 Quasi-steady electron-excitonic complexes coupling in a two-dimensional semiconductor Shangkun Mo et.al. 2512.02490 null
2025-12-02 Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention Wenyi Xiong et.al. 2512.02368 null
2025-12-02 Understanding and Harnessing Sparsity in Unified Multimodal Models Shwai He et.al. 2512.02351 null
2025-12-02 OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning Boyu Zhu et.al. 2512.02306 null
2025-12-01 Towards Unified Video Quality Assessment Chen Feng et.al. 2512.02224 null
2025-12-01 ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation Chenyang Gu et.al. 2512.02013 null
2025-12-01 Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks Kai Zhang et.al. 2512.01750 null
2025-12-01 GRASP: Guided Residual Adapters with Sample-wise Partitioning Felix Nützel et.al. 2512.01675 null
2025-12-01 Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery Zhicheng Zhao et.al. 2512.01665 null
2025-12-01 Cuffless Blood Pressure Estimation from Six Wearable Sensor Modalities in Multi-Motion-State Scenarios Yiqiao Chen et.al. 2512.01653 null
2025-12-01 Integrated YOLOP Perception and Lyapunov-based Control for Autonomous Mobile Robot Navigation on Track Mo Chen et.al. 2512.01608 null
2025-12-01 Personalized optimization of pediatric HD-tDCS for dose consistency and target engagement Zeming Liu et.al. 2512.01406 null
2025-12-02 Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Chujie Zheng et.al. 2512.01374 null
2025-12-01 TagSplat: Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking Hanzhi Guo et.al. 2512.01329 null
2025-12-01 Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe Yahui Liu et.al. 2512.01252 null
2025-11-30 Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios Jianxiang Zang et.al. 2512.00920 null
2025-11-30 Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning Yebo Wu et.al. 2512.00902 null
2025-11-30 Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking Lingling Fu et.al. 2512.00724 null
2025-11-29 GCMCG: A Clustering-Aware Graph Attention and Expert Fusion Network for Multi-Paradigm, Multi-task, and Cross-Subject EEG Decoding Yiqiao Chen et.al. 2512.00574 null
2025-11-28 Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model Junshu Tang et.al. 2511.23429 null
2025-11-28 LFM2 Technical Report Alexander Amini et.al. 2511.23404 null
2025-11-28 Chart2Code-MoLA: Efficient Multi-Modal Code Generation via Adaptive Expert Routing Yifei Wang et.al. 2511.23321 null
2025-11-28 Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models Xiang Hu et.al. 2511.23319 null
2025-11-28 Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering Zijian Fu et.al. 2511.23304 null
2025-11-28 Experts are all you need: A Composable Framework for Large Language Model Inference Shrihari Sridharan et.al. 2511.22955 null
2025-11-28 EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model Yuhao Xu et.al. 2511.22935 null
2025-11-27 Architecture Decoupling Is Not All You Need For Unified Multimodal Model Dian Zheng et.al. 2511.22663 null
2025-11-27 OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency Jun Wang et.al. 2511.22481 null
2025-11-27 Foundation Model for Intelligent Wireless Communications Boxun Liu et.al. 2511.22222 null
2025-11-27 MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding Yu Li et.al. 2511.22103 null
2025-11-27 Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian Yiran Zhang et.al. 2511.22069 null
2025-11-26 Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models Naifu Zhang et.al. 2511.21663 null
2025-11-26 Continual Error Correction on Low-Resource Devices Kirill Paramonov et.al. 2511.21652 null
2025-11-27 Qwen3-VL Technical Report Shuai Bai et.al. 2511.21631 null
2025-11-26 Enhanced Landmark Detection Model in Pelvic Fluoroscopy using 2D/3D Registration Loss Chou Mo et.al. 2511.21575 null
2025-11-26 Scaling limits of critical FK-decorated random planar maps with $q=4$ William Da Silva et.al. 2511.21480 null
2025-11-26 Study of the reactions $\bar{n} p \to 2π^{+}π^{-}$, $2π^{+}π^{-}π^{0}$, and $2π^{+}π^{-}2π^{0}$ using $J/ψ\to p π^{-}\bar{n}$ BESIII Collaboration et.al. 2511.21462 null
2025-11-26 MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training Lu Zhao et.al. 2511.21431 null
2025-11-26 Do Reasoning Vision-Language Models Inversely Scale in Test-Time Compute? A Distractor-centric Empirical Analysis Jiyun Bae et.al. 2511.21397 null
2025-11-26 Conditional Generative Modeling of Stochastic LTI Systems: A Behavioral Approach Jiayun Li et.al. 2511.21219 null
2025-11-26 MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts Ivan Novikov et.al. 2511.21089 null
2025-11-25 HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation Xiang Wang et.al. 2511.20520 null
2025-11-25 Soft Adaptive Policy Optimization Chang Gao et.al. 2511.20347 null
2025-11-25 ADNet: A Large-Scale and Extensible Multi-Domain Benchmark for Anomaly Detection Across 380 Real-World Categories Hai Ling et.al. 2511.20169 null
2025-11-25 Adaptive Knowledge Transfer for Cross-Disciplinary Cold-Start Knowledge Tracing Yulong Deng et.al. 2511.20009 null
2025-11-25 SONIC: Spectral Optimization of Noise for Inpainting with Consistency Seungyeon Baek et.al. 2511.19985 null
2025-11-25 Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models Wentao Hu et.al. 2511.19822 null
2025-11-22 Exploiting the Experts: Unauthorized Compression in MoE-LLMs Pinaki Prasad Guha Neogi et.al. 2511.19480 null
2025-11-22 Tracking and Segmenting Anything in Any Modality Tianlu Zhang et.al. 2511.19475 null
2025-11-24 Life-IQA: Boosting Blind Image Quality Assessment through GCN-enhanced Layer Interaction and MoE-based Feature Decoupling Long Tang et.al. 2511.19024 null
2025-11-24 OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs Yuting Gao et.al. 2511.19023 null
2025-11-24 Dynamic Mixture of Experts Against Severe Distribution Shifts Donghu Kim et.al. 2511.18987 null
2025-11-23 HiFi-MambaV2: Hierarchical Shared-Routed MoE for High-Fidelity MRI Reconstruction Pengcheng Fang et.al. 2511.18534 null
2025-11-23 AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert Yuting Gao et.al. 2511.18314 null
2025-11-22 PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures Yuheng Shao et.al. 2511.18116 null
2025-11-22 CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking Hao Li et.al. 2511.17967 null
2025-11-22 Measuring the Impact of Lexical Training Data Coverage on Hallucination Detection in Large Language Models Shuo Zhang et.al. 2511.17946 null
2025-11-22 FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning Guoyang Xia et.al. 2511.17885 null
2025-11-22 Equivalence of Context and Parameter Updates in Modern Transformer Blocks Adrian Goldwaser et.al. 2511.17864 null
2025-11-21 Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization Akhil Singampalli et.al. 2511.17829 null
2025-11-21 Boosting Brain-inspired Path Integration Efficiency via Learning-based Replication of Continuous Attractor Neurodynamics Zhangyu Ge et.al. 2511.17687 null
2025-11-21 Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required? Sukwon Yun et.al. 2511.17400 null
2025-11-21 MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment Huangbiao Xu et.al. 2511.17397 link
2025-11-21 Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design Quentin Anthony et.al. 2511.17127 null
2025-11-21 Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters Zhan Su et.al. 2511.17044 null
2025-11-21 VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions Qianyi Shao et.al. 2511.16998 null
2025-11-21 RadioKMoE: Knowledge-Guided Radiomap Estimation with Kolmogorov-Arnold Networks and Mixture-of-Experts Fupei Guo et.al. 2511.16986 null
2025-11-21 MicroMoE: Fine-Grained Load Balancing for Mixture-of-Experts with Token Scheduling Chenqi Zhao et.al. 2511.16947 null
2025-11-20 Search for the charmonium weak decay $J/ψ\to\bar{D}^0\bar{K}^{*0}+{\rm c.c.}$ BESIII Collaboration et.al. 2511.16083 null
2025-11-20 Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution Xiao He et.al. 2511.16024 null
2025-11-19 AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture Qiming Guo et.al. 2511.15870 null
2025-11-19 MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping Yushi Huang et.al. 2511.15690 null
2025-11-19 Search for the lepton number violating process $Ξ^- \rightarrow Σ^+ e^- e^- +c.c.$ BESIII Collaboration et.al. 2511.15394 null
2025-11-19 VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation Tairan He et.al. 2511.15200 null
2025-11-19 GPU-Initiated Networking for NCCL Khaled Hamidouche et.al. 2511.15076 null
2025-11-19 WiCo-PG: Wireless Channel Foundation Model for Pathloss Map Generation via Synesthesia of Machines Mingran Sun et.al. 2511.15030 null
2025-11-19 WiCo-MG: Wireless Channel Foundation Model for Multipath Generation via Synesthesia of Machines Zengrui Han et.al. 2511.15026 null
2025-11-19 Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference Kexin Chu et.al. 2511.15015 null
2025-11-18 HMC: Learning Heterogeneous Meta-Control for Contact-Rich Loco-Manipulation Lai Wei et.al. 2511.14756 null
2025-11-18 Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching Jintao Zhang et.al. 2511.14488 null
2025-11-18 MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts Wenfeng Wang et.al. 2511.14102 null
2025-11-18 FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration Jingren Liu et.al. 2511.14099 null
2025-11-18 SMGeo: Cross-View Object Geo-Localization with Grid-Level Mixture-of-Experts Fan Zhang et.al. 2511.14093 null
2025-11-17 MoMoE: A Mixture of Expert Agent Model for Financial Sentiment Analysis Peng Shu et.al. 2511.13983 null
2025-11-17 InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE Lipeng Wang et.al. 2511.13488 null
2025-11-18 YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection Ori Meiraz et.al. 2511.13344 null
2025-11-17 Skeletons Speak Louder than Text: A Motion-Aware Pretraining Paradigm for Video-Based Person Re-Identification Rifen Lin et.al. 2511.13150 null
2025-11-17 Self-Adaptive Graph Mixture of Models Mohit Meena et.al. 2511.13062 null
2025-11-17 Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation Yu Hou et.al. 2511.12922 null
2025-11-17 Simple Lines, Big Ideas: Towards Interpretable Assessment of Human Creativity from Drawings Zihao Lin et.al. 2511.12880 null
2025-11-16 Connectivity-Guided Sparsification of 2-FWL GNNs: Preserving Full Expressivity with Improved Efficiency Rongqin Chen et.al. 2511.12838 null
2025-11-16 Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Yunxin Li et.al. 2511.12609 null
2025-11-16 SEMC: Structure-Enhanced Mixture-of-Experts Contrastive Learning for Ultrasound Standard Plane Recognition Qing Cai et.al. 2511.12559 null
2025-11-16 MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics Jing Li et.al. 2511.12525 null
2025-11-16 MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding Zhanheng Nie et.al. 2511.12449 null
2025-11-16 Self-Supervised Visual Prompting for Cross-Domain Road Damage Detection Xi Xiao et.al. 2511.12410 null
2025-11-15 SAC-MoE: Reinforcement Learning with Mixture-of-Experts for Control of Hybrid Dynamical Systems with Uncertainty Leroy D’Souza et.al. 2511.12361 null
2025-11-15 AMR-MoEGA: Antimicrobial Resistance Prediction using Mixture of Experts and Genetic Algorithms Anshul Bagaria et.al. 2511.12223 null
2025-11-15 ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction Ruochen Li et.al. 2511.12214 null
2025-11-14 FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models Yonatan Dukler et.al. 2511.11505 null
2025-11-14 Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification Qinghao Gao et.al. 2511.11460 null
2025-11-14 SPOT: Single-Shot Positioning via Trainable Near-Field Rainbow Beamforming Yeyue Cai et.al. 2511.11391 null
2025-11-14 Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing Cong Cao et.al. 2511.11236 null
2025-11-14 DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding Mingwei Xing et.al. 2511.11232 null
2025-11-14 ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization Anzhe Cheng et.al. 2511.10971 null
2025-11-14 Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go Yashshi Pipalani et.al. 2511.10868 null
2025-11-13 Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts Sumin Lee et.al. 2511.10300 null
2025-11-13 RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo Jueun Ko et.al. 2511.10107 null
2025-11-13 BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference Yun Wang et.al. 2511.10054 null
2025-11-14 HI-TransPA: Hearing Impairments Translation Personal Assistant Zhiming Ma et.al. 2511.09915 null
2025-11-13 ConSurv: Multimodal Continual Learning for Survival Analysis Dianzhi Yu et.al. 2511.09853 null
2025-11-11 Let the Experts Speak: Improving Survival Prediction & Calibration via Mixture-of-Experts Heads Todd Morrill et.al. 2511.09567 null
2025-11-12 SMF-VO: Direct Ego-Motion Estimation via Sparse Motion Fields Sangheon Yang et.al. 2511.09072 null
2025-11-12 UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving Ziyi Song et.al. 2511.09013 null
2025-11-12 Selective Sinkhorn Routing for Improved Sparse Mixture of Experts Duc Anh Nguyen et.al. 2511.08972 null
2025-11-12 Bayesian Mixture of Experts For Large Language Models Maryam Dialameh et.al. 2511.08968 null
2025-11-12 An Improved Dual-Attention Transformer-LSTM for Small-Sample Prediction of Modal Frequency and Actual Anchor Radius in Micro Hemispherical Resonator Design Yuyi Yao et.al. 2511.08900 null
2025-11-11 OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild Yuncheng Guo et.al. 2511.08423 null
2025-11-11 Text-based Aerial-Ground Person Retrieval Xinyu Zhou et.al. 2511.08369 null
2025-11-14 Towards Non-Stationary Time Series Forecasting with Temporal Stabilization and Frequency Differencing Junkai Lu et.al. 2511.08229 null
2025-11-13 National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech – The SpeechCARE Solution Maryam Zolnoori et.al. 2511.08132 null
2025-11-13 Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression Cheng Yuan et.al. 2511.08066 null
2025-11-11 TouchWalker: Real-Time Avatar Locomotion from Touchscreen Finger Walking Geuntae Park et.al. 2511.07860 null
2025-11-10 One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers Georgiy Shakirov et.al. 2511.07603 null
2025-11-12 Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs Zhongyang Li et.al. 2511.07419 null
2025-11-11 Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction Hyeryun Park et.al. 2511.07392 null
2025-11-10 AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning Qile Jiang et.al. 2511.07262 null
2025-11-10 Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture Tianhao Fu et.al. 2511.07110 null
2025-11-10 CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition Hung-Yang Sung et.al. 2511.06860 null
2025-11-10 S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning Jiangwen Dong et.al. 2511.06727 null
2025-11-10 Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation Evelyn Chee et.al. 2511.06723 null
2025-11-09 Route Experts by Sequence, not by Token Tiansheng Wen et.al. 2511.06494 null
2025-11-09 HyMoERec: Hybrid Mixture-of-Experts for Sequential Recommendation Kunrong Li et.al. 2511.06388 null
2025-11-09 DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Speed Zhu et.al. 2511.06307 null
2025-11-09 A Mixture-of-Experts Framework with Log-Logistic Components for Survival Analysis on Histopathology Images Ardhendu Sekhar et.al. 2511.06266 null
2025-11-08 MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference Myunghyun Rhee et.al. 2511.06010 null
2025-11-08 DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities Nagur Shareef Shaik et.al. 2511.05968 null
2025-11-08 MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering Jian Zhu et.al. 2511.05876 null
2025-11-08 In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading Shuning Lin et.al. 2511.05814 null
2025-11-07 Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder Zhen Xu et.al. 2511.05745 null
2025-11-07 BrainCSD: A Hierarchical Consistency-Driven MoE Foundation Model for Unified Connectome Synthesis and Multitask Brain Trait Prediction Xiongri Shen et.al. 2511.05630 null
2025-11-07 Quantum-Uncertainty-Governed Spin Dynamics in s-d Coupled Systems Jie Zheng et.al. 2511.05388 null
2025-11-07 OvA-LP: A Simple and Efficient Framework for Federated Learning on Non-IID Data Dongjin Park et.al. 2511.05028 null
2025-11-07 MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery Baiye Cheng et.al. 2511.05007 null
2025-11-06 PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference Yushu Zhao et.al. 2511.04805 null
2025-11-06 GNN-MoE: Context-Aware Patch Routing using GNNs for Parameter-Efficient Domain Generalization Mahmoud Soliman et.al. 2511.04008 null
2025-11-05 GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models Zhibin Wang et.al. 2511.03251 null
2025-11-04 From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos Xun Wang et.al. 2511.02762 null
2025-11-04 Verifying LLM Inference to Prevent Model Weight Exfiltration Roy Rinberg et.al. 2511.02620 null
2025-11-04 RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains Tianle Pu et.al. 2511.02331 null
2025-11-04 FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error Fengjuan Wang et.al. 2511.02302 null
2025-11-04 Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining Costin-Andrei Oncescu et.al. 2511.02237 null
2025-11-03 Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing Song Gao et.al. 2511.01743 null
2025-11-03 HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA Lei Hu et.al. 2511.01463 null
2025-11-04 CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing Yifan Zhou et.al. 2511.01197 null
2025-11-03 DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection Guoxin Ma et.al. 2511.01192 null
2025-11-01 OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback Kai Luo et.al. 2511.00510 null
2025-10-31 LongCat-Flash-Omni Technical Report Meituan LongCat Team et.al. 2511.00279 null
2025-10-31 Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals Xiangyu Fan et.al. 2510.27684 null
2025-10-31 RDMA Point-to-Point Communication for LLM Systems Nandor Licker et.al. 2510.27656 null
2025-10-31 MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts Jingnan Gao et.al. 2510.27234 null
2025-10-31 AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification Yuanhao Tang et.al. 2510.27155 null
2025-10-30 Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement Aaditya Shukla et.al. 2510.27051 null
2025-10-30 Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems Hongbo Li et.al. 2510.27004 null
2025-10-30 MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation Arghavan Rezvani et.al. 2510.26996 null
2025-10-30 ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference Zixu Shen et.al. 2510.26730 null
2025-10-30 Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications Chuang Zhang et.al. 2510.26628 null
2025-10-30 Asymptotic meshes from $r$ -variational adaptation methods for static problems in one dimension Darith Hun et.al. 2510.26375 null
2025-10-30 MossNet: Mixture of State-Space Experts is a Multi-Head Attention Shikhar Tuli et.al. 2510.26182 null
2025-10-29 Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis Hyeonjun Lee et.al. 2510.26014 null
2025-10-31 Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training Hong Wang et.al. 2510.25803 null
2025-10-29 Revisiting scalable sequential recommendation with Multi-Embedding Approach and Mixture-of-Experts Qiushi Pan et.al. 2510.25285 null
2025-10-29 MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference Xinru Tang et.al. 2510.25258 null
2025-10-29 H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts Peilin Tan et.al. 2510.25091 null
2025-10-28 Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Inclusion AI et.al. 2510.24821 null
2025-10-28 Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance Yujie Wei et.al. 2510.24711 null
2025-10-28 Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation Xiucheng Zhang et.al. 2510.24055 null
2025-10-26 Sparsity and Superposition in Mixture of Experts Marmik Chaudhari et.al. 2510.23671 null
2025-10-27 EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting Musleh Alharthi et.al. 2510.23396 null
2025-10-27 Rethinking GSPO: The Perplexity-Entropy Equivalence Chi Liu et.al. 2510.23142 null
2025-10-27 Knocking-Heads Attention Zhanchao Zhou et.al. 2510.23052 null
2025-10-27 Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts Di Zhang et.al. 2510.23027 null
2025-10-27 MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning Han Wu et.al. 2510.23013 null
2025-10-25 Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation Ling-Team et.al. 2510.22115 null
2025-10-23 Addressing Corner Cases in Autonomous Driving: A World Model-based Approach with Mixture of Experts and LLMs Haicheng Liao et.al. 2510.21867 null
2025-10-24 PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling Andrea Bonfanti et.al. 2510.21262 null
2025-10-24 Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization Yunlong Chu et.al. 2510.21207 null
2025-10-24 Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts Yanguang Sun et.al. 2510.21114 null
2025-10-24 MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning Siyong Chen et.al. 2510.21093 null
2025-10-23 Bayesian Jammer Localization with a Hybrid CNN and Path-Loss Mixture of Experts Mariona Jaramillo-Civill et.al. 2510.20666 null
2025-10-23 xTime: Extreme Event Prediction with Hierarchical Knowledge Distillation and Expert Fusion Quan Li et.al. 2510.20651 null
2025-10-23 Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning Xiaohan Lan et.al. 2510.20519 null
2025-10-23 A Parameter-Efficient Mixture-of-Experts Framework for Cross-Modal Geo-Localization LinFeng Li et.al. 2510.20291 null
2025-10-23 AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training Huawei Bai et.al. 2510.20111 null
2025-10-22 HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission Weihao Yang et.al. 2510.19470 null
2025-10-22 MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs Xinfeng Xia et.al. 2510.19366 null
2025-10-22 Modeling Turn-Taking with Semantically Informed Gestures Varsha Suresh et.al. 2510.19350 null
2025-10-23 RailS: Load Balancing for All-to-All Communication in Distributed Mixture-of-Experts Training Heng Xu et.al. 2510.19262 null
2025-10-22 A Design Science Blueprint for an Orchestrated AI Assistant in Doctoral Supervision Teo Susnjak et.al. 2510.19227 null
2025-10-23 MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting In-Hwan Jin et.al. 2510.19210 null
2025-10-25 Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model Ling Team et.al. 2510.18855 null
2025-10-21 Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework Yujie Xing et.al. 2510.18825 null
2025-10-21 Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification Bin Gu et.al. 2510.18533 null
2025-10-21 Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study Gangda Deng et.al. 2510.18370 null
2025-10-21 DeepSeek-OCR: Contexts Optical Compression Haoran Wei et.al. 2510.18234 null
2025-10-22 L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts Shihao Ji et.al. 2510.17898 null
2025-10-20 Towards 3D Objectness Learning in an Open World Taichi Liu et.al. 2510.17686 null
2025-10-20 Intelligent Communication Mixture-of-Experts Boosted-Medical Image Segmentation Foundation Model Xinwei Zhang et.al. 2510.17684 null
2025-10-20 Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm Hao Qiao et.al. 2510.17604 null
2025-10-23 Photon radiation induced by rescattering in strong-interacting medium with a magnetic field Yue Zhang et.al. 2510.17597 null
2025-10-20 ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts Zheyue Tan et.al. 2510.17483 null
2025-10-19 Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures Pingzhi Li et.al. 2510.16968 null
2025-10-19 End-to-end Listen, Look, Speak and Act Siyin Wang et.al. 2510.16756 null
2025-10-18 NeurIPT: Foundation Model for Neural Interfaces Zitao Fang et.al. 2510.16548 link
2025-10-18 Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts Yongxiang Hua et.al. 2510.16448 null
2025-10-18 Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures Minh-Khoi Nguyen-Nhat et.al. 2510.16411 null
2025-10-17 Expert Merging in Sparse Mixture of Experts with Nash Bargaining Dung V. Nguyen et.al. 2510.16138 null
2025-10-17 Human or AI? Comparing Design Thinking Assessments by Teaching Assistants and Bots Sumbul Khan et.al. 2510.16069 null
2025-10-17 Mixture of Experts Approaches in Dense Retrieval Tasks Effrosyni Sokli et.al. 2510.15683 null
2025-10-17 FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification Zhen Sun et.al. 2510.15595 null
2025-10-17 Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks Yuyuan Feng et.al. 2510.15333 null
2025-10-17 MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation Xianyang Qi et.al. 2510.15286 null
2025-10-17 Adaptive Individual Uncertainty under Out-Of-Distribution Shift with Expert-Routed Conformal Prediction Amitesh Badkul et.al. 2510.15233 null
2025-10-16 Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models Guinan Su et.al. 2510.14853 null
2025-10-16 MergeMoE: Efficient Compression of MoE Models via Expert Output Merging Ruijie Miao et.al. 2510.14436 null
2025-10-16 Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning Weijie Shen et.al. 2510.14300 null
2025-10-16 MACE: Mixture-of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering Mingkai Liu et.al. 2510.14251 null
2025-10-16 Demonstrating Exoplanet Transit Photometry from Space with a 15-mm Aperture Optical Navigation Camera on Hayabusa2 Koki Yumoto et.al. 2510.14229 null
2025-10-15 REAP the Experts: Why Pruning Prevails for One-Shot MoE compression Mike Lasby et.al. 2510.13999 null
2025-10-15 Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module Ruitao Feng et.al. 2510.13558 null
2025-10-15 ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition Deeptimaan Banerjee et.al. 2510.13493 null
2025-10-15 Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers Xin Zhao et.al. 2510.13462 null
2025-10-15 Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts Li Bai et.al. 2510.13451 null
2025-10-15 UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE Zhenyu Liu et.al. 2510.13344 null
2025-10-15 GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models Chen Zheng et.al. 2510.13079 null
2025-10-17 Scope: Selective Cross-modal Orchestration of Visual Perception Experts Tianyu Zhang et.al. 2510.12974 null
2025-10-14 Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps Do Tien Hai et.al. 2510.12744 null
2025-10-14 Proof of Cloud: Data Center Execution Assurance for Confidential VMs Filip Rezabek et.al. 2510.12469 null
2025-10-14 MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts Yushu Zhao et.al. 2510.12357 null
2025-10-14 DE3S: Dual-Enhanced Soft-Sparse-Shape Learning for Medical Early Time-Series Classification Tao Xie et.al. 2510.12214 null
2025-10-13 Enhancing the Quality of 3D Lunar Maps Using JAXA’s Kaguya Imagery Yumi Iwashita et.al. 2510.11817 null
2025-10-13 Beyond ‘Templates’: Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View Jinyu Zhang et.al. 2510.11687 null
2025-10-13 Robust Ego-Exo Correspondence with Long-Term Memory Yijun Hu et.al. 2510.11417 null
2025-10-13 Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers Wenhan Ma et.al. 2510.11370 null
2025-10-13 What to expect from microscopic nuclear modelling for k $_{\rm eff}$ calculations ? D. Rochman et.al. 2510.11256 null
2025-10-13 DND: Boosting Large Language Models with Dynamic Nested Depth Tieyuan Chen et.al. 2510.11001 null
2025-10-13 MC#: Mixture Compressor for Mixture-of-Experts Large Models Wei Huang et.al. 2510.10962 null
2025-10-12 Crisis-Aware Regime-Conditioned Diffusion with CVaR Allocation Ali Atiah Alzahrani et.al. 2510.10807 null
2025-10-12 Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection Shizhen Zhao et.al. 2510.10584 null
2025-10-12 Hierarchical LoRA MoE for Efficient CTR Model Scaling Zhichen Zeng et.al. 2510.10432 null
2025-10-11 SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference Liangkun Chen et.al. 2510.10302 null
2025-10-10 MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest Xiao Yang et.al. 2510.09857 null
2025-10-10 ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting Jindong Tian et.al. 2510.09734 null
2025-10-10 Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation Youwei Zheng et.al. 2510.09094 null
2025-10-09 LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution Xiaohui Li et.al. 2510.08771 null
2025-10-13 dInfer: An Efficient Inference Framework for Diffusion Language Models Yuxin Ma et.al. 2510.08666 null
2025-10-08 Dynamic Mixture-of-Experts for Visual Autoregressive Model Jort Vincenti et.al. 2510.08629 null
2025-10-09 FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts Heming Zou et.al. 2510.08396 null
2025-10-09 Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization Jason Bohne et.al. 2510.08256 null
2025-10-09 From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill Gunjun Lee et.al. 2510.08055 null
2025-10-09 Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training Ruizhe Wang et.al. 2510.08008 null
2025-10-09 Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing Cunli Mao et.al. 2510.07736 null
2025-10-09 Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision Xiaoxu Ma et.al. 2510.07703 null
2025-10-09 LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning Yuhan Sun et.al. 2510.07685 null
2025-10-08 MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting Yoli Shavit et.al. 2510.07459 null
2025-10-08 Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting Walid Guettala et.al. 2510.07426 null
2025-10-08 Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts Fangshuo Liao et.al. 2510.07205 null
2025-10-08 A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages Zibo Su et.al. 2510.06612 null
2025-10-09 SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation Shuang Cheng et.al. 2510.06303 null
2025-10-06 Reproducibility Study of “XRec: Large Language Models for Explainable Recommendation” Ranjan Mishra et.al. 2510.06275 null
2025-10-10 Barbarians at the Gate: How AI is Upending Systems Research Audrey Cheng et.al. 2510.06189 null
2025-10-07 CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits Kangyu Wang et.al. 2510.06133 null
2025-10-07 Rasterized Steered Mixture of Experts for Efficient 2D Image Regression Yi-Hsin Li et.al. 2510.05814 null
2025-10-07 Mixture of Neuron Experts Runxi Cheng et.al. 2510.05781 null
2025-10-07 MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition Haoxun Li et.al. 2510.05749 null
2025-10-07 Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting Zhongkai Yu et.al. 2510.05497 null
2025-10-06 Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving Yue Pan et.al. 2510.05245 null
2025-10-06 REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis Alec K. Peltekian et.al. 2510.04923 null
2025-10-06 LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0 Jinbo Wen et.al. 2510.04765 null
2025-10-06 Multilingual Routing in Mixture-of-Experts Lucas Bandarkar et.al. 2510.04694 null
2025-10-06 Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing Xuanhua Yin et.al. 2510.04670 null
2025-10-06 Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space Tomas Figliolia et.al. 2510.04476 null
2025-10-05 HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks Nghiem T. Diep et.al. 2510.04295 null
2025-10-05 SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling Harshil Vejendla et.al. 2510.04286 null
2025-10-05 MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition Umberto Cappellazzo et.al. 2510.04136 null
2025-10-03 Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective Yehuda Dar et.al. 2510.03151 null
2025-10-02 ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models Gursimran Singh et.al. 2510.02613 null
2025-10-02 UpSafe $^\circ$ C: Upcycling for Controllable Safety in Large Language Models Yuhao Sun et.al. 2510.02194 null
2025-10-02 LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition Rixin Zhou et.al. 2510.01651 null
2025-10-01 Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs Leyla Mirvakhabova et.al. 2510.01185 null
2025-10-01 Learning Compact Representations of LLM Abilities via Item Response Theory Jianhao Chen et.al. 2510.00844 null
2025-10-01 Graph Integrated Multimodal Concept Bottleneck Model Jiakai Lin et.al. 2510.00701 null
2025-10-01 FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression Yifei Gao et.al. 2510.00621 null
2025-10-01 Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning Minghao Yang et.al. 2510.00570 null
2025-09-30 FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training Yunqi Gao et.al. 2510.00207 null
2025-09-30 Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization Yaoxiang Wang et.al. 2509.26520 null
2025-09-30 Nephrobase Cell+: Multimodal Single-Cell Foundation Model for Decoding Kidney Biology Chenyu Li et.al. 2509.26223 null
2025-09-30 Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline Haiyang Li et.al. 2509.25991 null
2025-09-30 UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression Yuan Zhao et.al. 2509.25934 null
2025-09-30 Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel Chuanyang Zheng et.al. 2509.25913 null
2025-10-01 A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI Arvind Murari Vepa et.al. 2509.25889 null
2025-09-30 Collaborative Compression for Large-Scale MoE Deployment on Edge Yixiao Chen et.al. 2509.25689 null
2025-09-30 LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts Yuan Zhuang et.al. 2509.25684 null
2025-09-30 Guiding Mixture-of-Experts with Temporal Multimodal Interactions Xing Han et.al. 2509.25678 null
2025-09-29 K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model Bangwei Guo et.al. 2509.25594 null
2025-09-29 GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference Yu Han et.al. 2509.25041 null
2025-09-29 LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection Bao-Ngoc Dao et.al. 2509.24547 null
2025-11-03 Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding Zhibin Wang et.al. 2508.21706 null
2025-07-22 Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data Yunyi Shen et.al. 2507.16817 null
2025-07-22 Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training Zixiao Huang et.al. 2507.16274 null
2025-07-21 Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure Alexandra Junell et.al. 2507.16088 null
2025-07-21 Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation Alessandro B. Melchiorre et.al. 2507.15826 null
2025-07-21 RankMixer: Scaling Up Ranking Models in Industrial Recommenders Jie Zhu et.al. 2507.15551 null
2025-07-21 The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts Sungmin Yun et.al. 2507.15465 null
2025-07-21 Universal crystal material property prediction via multi-view geometric fusion in graph transformers Liang Zhang et.al. 2507.15303 null
2025-07-20 CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning Pan Hu et.al. 2507.14903 null
2025-07-23 GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving Chi Wan et.al. 2507.14456 null
2025-07-18 SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing Yingying Zhang et.al. 2507.13812 null
2025-07-17 Apple Intelligence Foundation Language Models: Tech Report 2025 Hanzhi Zhou et.al. 2507.13575 null
2025-07-17 R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning Xiaohan Guo et.al. 2507.13107 null
2025-07-16 Astro-MoE: Mixture of Experts for Multiband Astronomical Time Series Martina Cádiz-Leyton et.al. 2507.12611 null
2025-07-16 Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models Gen Luo et.al. 2507.12566 null
2025-07-16 Mixture of Raytraced Experts Andrea Perin et.al. 2507.12419 null
2025-07-16 CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning Peiwen Xia et.al. 2507.11834 null
2025-07-09 The AI Shadow War: SaaS vs. Edge Computing Architectures Rhea Pritham Marpu et.al. 2507.11545 null
2025-07-15 Mixture of Experts in Large Language Models Danyang Zhang et.al. 2507.11181 null
2025-07-15 Atmos-Bench: 3D Atmospheric Structures for Climate Insight Tianchi Xu et.al. 2507.11085 null
2025-07-14 DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models Luolin Xiong et.al. 2507.09955 null
2025-07-14 ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization Huilai Li et.al. 2507.09945 null
2025-07-14 Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems Vindula Jayawardana et.al. 2507.09836 null
2025-07-18 Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts Aakash Tripathi et.al. 2507.09754 null
2025-07-13 Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive You Huang et.al. 2507.09612 null
2025-07-12 PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process Shiqi Jiang et.al. 2507.09242 null
2025-07-11 SSH-Passkeys: Leveraging Web Authentication for Passwordless SSH Moe Kayali et.al. 2507.09022 null
2025-07-11 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Chenyang Song et.al. 2507.08771 null
2025-07-11 CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes Tianyou Jiang et.al. 2507.08542 null
2025-07-11 White-Basilisk: A Hybrid Model for Code Vulnerability Detection Ioannis Lamprou et.al. 2507.08540 null
2025-07-21 KAT-V1: Kwai-AutoThink Technical Report Zizheng Zhan et.al. 2507.08297 null
2025-07-11 Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization Woon Ryong Kim et.al. 2507.08269 null
2025-07-10 MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving Lu Xu et.al. 2507.07818 null
2025-07-10 When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance Peizhang Shao et.al. 2507.07748 null
2025-07-09 Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning Ankit Jyothish et.al. 2507.07335 null
2025-07-08 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate A. Bochkov et.al. 2507.07129 null
2025-07-07 Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding Nidhi Bhatia et.al. 2507.07120 null
2025-06-03 Multi-level Mixture of Experts for Multimodal Entity Linking Zhiwei Hu et.al. 2507.07108 null
2025-07-09 4KAgent: Agentic Any Image to 4K Super-Resolution Yushen Zuo et.al. 2507.07105 null
2025-07-11 FlexOlmo: Open Language Models for Flexible Data Use Weijia Shi et.al. 2507.07024 null
2025-07-09 Deep Disentangled Representation Network for Treatment Effect Estimation Hui Meng et.al. 2507.06650 null
2025-07-09 SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference Qian Chen et.al. 2507.06567 null
2025-07-09 MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models Yiwen Liu et.al. 2507.06502 null
2025-07-08 Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation Szymon Płotka et.al. 2507.06363 null
2025-07-08 Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis Xintong Hu et.al. 2507.06116 null
2025-07-09 A Survey on Prompt Tuning Zongqian Li et.al. 2507.06085 null
2025-07-08 Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors Bing Wang et.al. 2507.05939 null
2025-07-08 What You Have is What You Track: Adaptive and Robust Multimodal Tracking Yuedong Tan et.al. 2507.05899 null
2025-07-21 Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition Zijin Gu et.al. 2507.05724 null
2025-07-08 Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach Xiaobing Chen et.al. 2507.05685 null
2025-07-08 City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data Tianxing Wu et.al. 2507.05651 null
2025-07-07 QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks Hoang-Quan Nguyen et.al. 2507.05190 null
2025-07-07 NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification Jun Hu et.al. 2507.04870 null
2025-07-07 UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization Kai Yang et.al. 2507.04706 null
2025-07-07 DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics Yayu Long et.al. 2507.04661 null
2025-07-08 UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification Xixi Wan et.al. 2507.04638 null
2025-07-07 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts Yun Wang et.al. 2507.04631 null
2025-07-06 Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts Guokan Shang et.al. 2507.04569 null
2025-07-22 Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge Linshen Liu et.al. 2507.04123 null
2025-07-05 From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM Xinyi Wu et.al. 2507.03868 null
2025-07-04 Decoupled Relative Learning Rate Schedules Jan Ludziejewski et.al. 2507.03526 null
2025-07-03 Neural Inhibition Improves Dynamic Routing and Mixture of Experts Will Y. Zou et.al. 2507.03221 null
2025-07-02 Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model! Do-hyeon Yoon et.al. 2507.03014 null
2025-07-03 System-performance and cost modeling of Large Language Model training and inference Wenzhe Guo et.al. 2507.02456 null
2025-07-03 NLP4Neuro: Sequence-to-sequence learning for neural population decoding Jacob J. Morra et.al. 2507.02264 null
2025-07-02 MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics Dmytro Kuzmenko et.al. 2507.01843 null
2025-07-02 GradMetaNet: An Equivariant Architecture for Learning on Gradients Yoav Gelberg et.al. 2507.01649 null
2025-07-02 Mixtures of Neural Network Experts with Application to Phytoplankton Flow Cytometry Data Ethan Pawl et.al. 2507.01375 null
2025-07-02 Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model Chaoxiang Cai et.al. 2507.01351 null
2025-07-02 Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations Bohao Wang et.al. 2507.01337 null
2025-07-02 ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation JianChao Zhao et.al. 2507.00502 null
2025-07-01 MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE Geng Zhang et.al. 2507.00390 null
2025-06-30 Engineering NV Centers via Hydrogen-Driven Defect Chemistry in CVD Diamonds for Quantum Applications: NVHx Dissociations into NV, Origin of 468nm Center, and Cause of Brown Coloration Mubashir Mansoor et.al. 2507.00300 null
2025-06-17 LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing Wenbing Li et.al. 2507.00029 null
2025-06-30 MotionGPT3: Human Motion as a Second Modality Bingfan Zhu et.al. 2506.24086 null
2025-06-30 MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis Zhe Liu et.al. 2506.23648 null
2025-06-30 Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model Mu-Chi Chen et.al. 2506.23635 null
2025-07-01 Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging Lujun Li et.al. 2506.23266 null
2025-06-29 External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting Haoran Li et.al. 2506.23201 null
2025-06-29 Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound Zhiyuan Zhu et.al. 2506.23108 null
2025-07-01 Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning Sanskar Pandey et.al. 2506.22919 null
2025-06-27 QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization Danush Khanna et.al. 2506.22396 null
2025-06-27 Towards Distributed Neural Architectures Aditya Cowsik et.al. 2506.22389 null
2025-06-27 MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism Zheng Zhang et.al. 2506.22175 null
2025-07-09 DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE Hang Shao et.al. 2506.21864 null
2025-06-21 AdaptGOT: A Pre-trained Model for Adaptive Contextual POI Representation Learning Xiaobin Ren et.al. 2506.21612 null
2025-06-26 Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts Jiajie Yang et.al. 2506.21328 null
2025-06-26 Learning to Skip the Middle Layers of Transformers Tim Lawson et.al. 2506.21103 null
2025-06-26 Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning Haodong Lu et.al. 2506.21035 null
2025-06-26 EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning Xiao Zhang et.al. 2506.20986 null
2025-06-30 The Singapore Consensus on Global AI Safety Research Priorities Yoshua Bengio et.al. 2506.20702 null
2025-06-17 Utility-Driven Speculative Decoding for Mixture-of-Experts Anish Saxena et.al. 2506.20675 null
2025-06-25 Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration Jiaxing Huang et.al. 2506.20282 null
2025-06-24 Integrating Pair Programming as a Work Practice Nina Haugland Andersen et.al. 2506.19511 null
2025-07-05 The H $α$ line as a probe of chromospheric magnetic fields Harsh Mathur et.al. 2506.19510 null
2025-06-23 Multimodal Anomaly Detection with a Mixture-of-Experts Christoph Willibald et.al. 2506.19077 null
2025-06-23 Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Zihan Wang et.al. 2506.18945 null
2025-06-23 Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning Rahul Atul Bhope et.al. 2506.18789 null
2025-06-23 An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify Shivam Verma et.al. 2506.18735 null
2025-06-23 Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks Xiaodong Wu et.al. 2506.18543 null
2025-06-23 SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation Zichong Li et.al. 2506.18349 null
2025-06-23 Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies Junchao Fan et.al. 2506.18304 null
2025-06-22 Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection Zheng Zhan et.al. 2506.18145 null
2025-06-21 Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert Gelei Xu et.al. 2506.17787 null
2025-06-21 Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities Xinghao Huang et.al. 2506.17755 null
2025-06-21 PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation Xinyu Xiong et.al. 2506.17712 null
2025-06-20 SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification Zhenglin Lai et.al. 2506.17368 null
2025-07-14 FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE Khiem Le et.al. 2506.16600 null
2025-06-19 Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models Daniel Fidel Harvey et.al. 2506.16419 null
2025-06-19 DCFNet: Doppler Correction Filter Network for Integrated Sensing and Communication in Multi-User MIMO-OFDM Systems Hyeonho Noh et.al. 2506.16191 null
2025-06-17 Scaling Intelligence: Designing Data Centers for Next-Gen Language Models Jesmin Jahan Tithi et.al. 2506.15006 null
2025-06-17 NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification Wajih Hassan Raza et.al. 2506.14970 null
2025-06-17 Narrowing the Gap between TEEs Threat Model and Deployment Strategies Filip Rezabek et.al. 2506.14964 null
2025-05-31 Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors Henrik Klagges et.al. 2506.14794 null
2025-06-19 Integrating Dynamical Systems Learning with Foundational Models: A Meta-Evolutionary AI Framework for Clinical Trials Joseph Geraci et.al. 2506.14782 null
2025-06-17 GMT: General Motion Tracking for Humanoid Whole-Body Control Zixuan Chen et.al. 2506.14770 null
2025-06-17 Exploring Speaker Diarization with Mixture of Experts Gaobin Yang et.al. 2506.14750 null
2025-06-18 Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs Ling Team et.al. 2506.14731 null
2025-09-23 GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors Hengyuan Zhang et.al. 2506.14646 null
2025-06-17 Single-Example Learning in a Mixture of GPDMs with Latent Geometries Jesse St. Amand et.al. 2506.14563 null
2025-06-30 MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation Shen Yuan et.al. 2506.14436 link
2025-06-17 MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models Hongyu Wang et.al. 2506.14435 null
2025-06-17 Less is More: Undertraining Experts Improves Model Upcycling Stefan Horoi et.al. 2506.14126 null
2025-06-16 Load Balancing Mixture of Experts with Similarity Preserving Routers Nabil Omi et.al. 2506.14038 null
2025-06-16 GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics Qianzhong Chen et.al. 2506.14009 null
2025-06-16 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention MiniMax et.al. 2506.13585 link
2025-06-16 Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization Guanghui Song et.al. 2506.13541 null
2025-07-04 EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization Zhongqian Fu et.al. 2506.13329 link
2025-06-16 Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs Xintong Tang et.al. 2506.13192 null
2025-06-19 Serving Large Language Models on Huawei CloudMatrix384 Pengfei Zuo et.al. 2506.12708 null
2025-06-14 Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts Shengzhuang Chen et.al. 2506.12597 null
2025-06-14 Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control Rongpeng Li et.al. 2506.12453 null
2025-06-17 HarMoEny: Efficient Multi-GPU Inference of MoE Models Zachary Doucet et.al. 2506.12417 null
2025-06-14 Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model Chong Li et.al. 2506.12388 null
2025-06-13 Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? Houyi Li et.al. 2506.12119 null
2025-06-13 Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution Zhangkai Ni et.al. 2506.11823 link
2025-05-21 MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning Linjie Li et.al. 2506.11038 null
2025-04-23 Test code generation at Ericsson using Program Analysis Augmented Fine Tuned LLMs Sai Krishna et.al. 2506.11006 null
2025-06-12 Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts Zaijing Li et.al. 2506.10357 null
2025-06-12 Technical Report with Proofs for A Full Picture in Conformance Checking: Efficiently Summarizing All Optimal Alignments Philipp Bär et.al. 2506.10345 null
2025-06-13 A Survey of Generative Categories and Techniques in Multimodal Large Language Models Longzhen Han et.al. 2506.10016 null
2025-06-11 GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture GigaChat team et.al. 2506.09440 null
2025-06-11 DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts Yuchen Feng et.al. 2506.09351 null
2025-06-11 Ming-Omni: A Unified Multimodal Model for Perception and Generation Inclusion AI et.al. 2506.09344 link
2025-06-10 CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks Yixuan Li et.al. 2506.08931 null
2025-06-10 CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA Jiale Dong et.al. 2506.08496 link
2025-06-11 MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding Shivang Chopra et.al. 2506.08356 null
2025-06-09 Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting Timothée Hornek Amir Sartipi et.al. 2506.08113 null
2025-06-11 STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation Yiming Wang et.al. 2506.08054 link
2025-06-09 A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling Jacob Helwig et.al. 2506.07969 link
2025-06-09 New Insights into the T Tauri Binary Separation Distribution Caleb Eastlund et.al. 2506.07938 null
2025-06-09 M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration Yongzhen Wang et.al. 2506.07814 null
2025-07-23 MIRA: Medical Time Series Foundation Model for Real-World Health Data Hao Li et.al. 2506.07584 null
2025-06-11 MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization Ken Yaggel et.al. 2506.07563 link
2025-06-09 MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts Wei Tao et.al. 2506.07533 null
2025-06-09 Graph-of-Causal Evolution: Challenging Chain-of-Model for Reasoning Libo Wang et.al. 2506.07501 null
2025-06-09 MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing Haiyue Ma et.al. 2506.07366 null
2025-06-08 UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment Wentao Zhao et.al. 2506.07013 null
2025-06-07 High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations Ziwei Li et.al. 2506.06858 null
2025-06-07 Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning Yuan Yuan et.al. 2506.06694 null
2025-06-25 SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities Guoyang Xia et.al. 2506.06406 null
2025-05-27 MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes Feiyang Pan et.al. 2506.06318 null
2025-06-06 Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization Jonathan Yang et.al. 2506.06196 null
2025-06-06 MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models Jie Cao et.al. 2506.05928 null
2025-06-06 dots.llm1 Technical Report Bi Huo et.al. 2506.05767 null
2025-06-05 Mixture-of-Experts Meets In-Context Reinforcement Learning Wenhao Wu et.al. 2506.05426 null
2025-06-20 Kinetics: Rethinking Test-Time Scaling Laws Ranajoy Sadhukhan et.al. 2506.05333 link
2025-06-05 Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection Ziyi Zhou et.al. 2506.04739 null
2025-06-09 FlashDMoE: Fast Distributed MoE in a Single Kernel Osayamen Jonathan Aimuyo et.al. 2506.04667 link
2025-06-04 Out-of-Distribution Graph Models Merging Yidi Wang et.al. 2506.03674 null
2025-06-04 Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts Jiaxing Zhang et.al. 2506.03591 null
2025-06-04 PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs Ze Yu Zhang et.al. 2506.02965 null
2025-06-03 Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights Jakub Krajewski et.al. 2506.02890 null
2025-06-03 Brain-Like Processing Pathways Form in Models With Heterogeneous Experts Jack Cook et.al. 2506.02813 null
2025-06-04 MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection Juntong Li et.al. 2506.02535 null
2025-06-03 MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework Yupeng Qi et.al. 2506.02460 null
2025-05-31 Enhancing Multimodal Continual Instruction Tuning with BranchLoRA Duzhen Zhang et.al. 2506.02041 null
2025-06-02 SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model Zhao Yang et.al. 2506.01833 link
2025-06-02 Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning Ryotaro Kawata et.al. 2506.01656 null
2025-06-02 DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models Jiancheng Ye et.al. 2506.01257 null
2025-06-01 Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts Fan Liu et.al. 2506.00965 null
2025-05-31 FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts Xinyi Wang et.al. 2506.00495 null
2025-05-30 Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction Shuai Liu et.al. 2505.24597 null
2025-06-11 Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis Junzhuo Li et.al. 2505.24593 null
2025-05-30 Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer Yilun Kong et.al. 2505.24378 link
2025-05-30 GradPower: Powering Gradients for Faster Language Model Pre-Training Mingze Wang et.al. 2505.24275 null
2025-05-30 On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks Mingze Wang et.al. 2505.24205 null
2025-06-02 Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts Xuweiyi Chen et.al. 2505.23926 null
2025-06-09 Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert Zhaokun Wang et.al. 2505.23868 null
2025-05-29 Revisiting Uncertainty Estimation and Calibration of Large Language Models Linwei Tao et.al. 2505.23854 null
2025-05-28 EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models Linglin Jing et.al. 2505.23830 null
2025-06-03 LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions Hadi Askari et.al. 2505.23811 null
2025-05-29 From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents Tobias Lindenbauer et.al. 2505.23422 link
2025-05-29 Context-Aware Semantic Communication for the Wireless Networks Guangyuan Liu et.al. 2505.23249 null
2025-05-29 Two Is Better Than One: Rotations Scale LoRAs Hongcan Guo et.al. 2505.23184 null
2025-05-28 HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer Qi Cai et.al. 2505.22705 link
2025-05-28 Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts Xue Zhang et.al. 2505.22582 null
2025-05-28 A Human-Centric Approach to Explainable AI for Personalized Education Vinitra Swamy et.al. 2505.22541 link
2025-05-28 Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion Kewen Chen et.al. 2505.22360 null
2025-05-28 Advancing Expert Specialization for Better MoE Hongcan Guo et.al. 2505.22323 null
2025-05-28 ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation Jiawen Yu et.al. 2505.22159 null
2025-05-28 On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition Shujie HU et.al. 2505.22072 null
2025-05-28 AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation Yan Rong et.al. 2505.22053 null
2025-05-29 ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge Zhongyi Zhou et.al. 2505.21906 null
2025-05-27 MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis Yitong Li et.al. 2505.21698 null
2025-05-23 EvidenceMoE: A Physics-Guided Mixture-of-Experts with Evidential Critics for Advancing Fluorescence Light Detection and Ranging in Scattering Media Ismail Erbas et.al. 2505.21532 null
2025-05-29 Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity Yehui Tang et.al. 2505.21411 null
2025-05-27 Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM’s Instruction-Following Capabilities Junyan Zhang et.al. 2505.21191 null
2025-05-27 Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts Yue Zhang et.al. 2505.21079 null
2025-05-27 Multi-objective Large Language Model Alignment with Hierarchical Experts Zhuo Li et.al. 2505.20925 null
2025-05-27 FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models Hao Kang et.al. 2505.20225 null
2025-06-01 NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID Shihao Li et.al. 2505.20001 null
2025-05-26 Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments Junming Liu et.al. 2505.19699 null
2025-06-13 MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE Zongle Huang et.al. 2505.19645 null
2025-05-26 Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate Liangwei Nathan Zheng et.al. 2505.19525 link
2025-05-26 WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference Sihan Chen et.al. 2505.19427 link
2025-05-25 RankLLM: A Python Package for Reranking with LLMs Sahel Sharifymoghaddam et.al. 2505.19284 null
2025-05-25 I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin et.al. 2505.19190 link
2025-05-24 TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling Chonghua Han et.al. 2505.18670 null
2025-05-24 ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation Jian Liang et.al. 2505.18640 link
2025-07-02 Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter Weizhi Zhong et.al. 2505.18612 null
2025-05-24 Guiding the Experts: Semantic Priors for Efficient and Focused MoE Routing Chengxi Min et.al. 2505.18586 link
2025-05-24 Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning Aofei Chang et.al. 2505.18503 null
2025-05-24 On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts Fanqi Yan et.al. 2505.18455 null
2025-05-24 $μ$ -MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts Toshiaki Koike-Akino et.al. 2505.18451 null
2025-05-23 Betelgeuse’s Buddy: X-Ray Constraints on the Nature of $α$ Ori B Anna J. G. O’Grady et.al. 2505.18376 null
2025-05-23 Betelgeuse, Betelgeuse, Betelgeuse, Betel-buddy? Constraints on the dynamical companion to $α$ Orionis from HST Jared A. Goldberg et.al. 2505.18375 null
2025-05-13 Constrained Edge AI Deployment: Fine-Tuning vs Distillation for LLM Compression Jacob Sander et.al. 2505.18166 null
2025-05-23 Enhancing CTR Prediction with De-correlated Expert Networks Jiancheng Wang et.al. 2505.17925 null
2025-05-23 PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval Zehua Pei et.al. 2505.17639 null
2025-05-23 CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning Jinyuan Feng et.al. 2505.17553 null
2025-05-31 MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation Kaixing Yang et.al. 2505.17543 null
2025-07-04 JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model Qihao Duan et.al. 2505.17257 null
2025-05-31 TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling Weizhe Lin et.al. 2505.17155 null
2025-05-22 DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving Zhenjie Yang et.al. 2505.16278 null
2025-05-22 DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor Yan Zhao et.al. 2505.16256 null
2025-05-21 Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models Jingcong Liang et.al. 2505.16056 link
2025-05-26 MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding Yuxiang Wei et.al. 2505.15946 null
2025-05-21 Who “Controls” Where Work Shall be Done? State-of-Practice in Post-Pandemic Remote Work Regulation Darja Smite et.al. 2505.15743 null
2025-05-21 CoLA: Collaborative Low-Rank Adaptation Yiyun Zhou et.al. 2505.15471 link
2025-07-04 Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Tencent Hunyuan Team et.al. 2505.15431 null
2025-05-21 Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks Uranik Berisha et.al. 2505.15414 null
2025-05-21 Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites Xintong Wang et.al. 2505.15297 null
2025-05-21 Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines Xiaohou Shi et.al. 2505.15151 null
2025-05-20 Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies Haoyi Qiu et.al. 2505.14972 link
2025-05-30 TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis Yu Zhang et.al. 2505.14910 link
2025-05-20 Balanced and Elastic End-to-end Training of Dynamic LLMs Mohamed Wahib et.al. 2505.14864 null
2025-05-20 Solving MNIST with a globally trained Mixture of Quantum Experts Paolo Alessandro Xavier Tognini et.al. 2505.14789 null
2025-05-27 Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training Mengru Wang et.al. 2505.14681 null
2025-05-21 Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach Umberto Cappellazzo et.al. 2505.14336 null
2025-05-20 FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation Shaolin Zhu et.al. 2505.14256 null
2025-05-20 THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation Yunlong Liang et.al. 2505.14173 null
2025-05-20 Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition Shuo Zhang et.al. 2505.14143 null
2025-05-20 Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging Ryo Bertolissi et.al. 2505.14136 null
2025-05-20 Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts Xi Chen et.al. 2505.14088 null
2025-05-20 StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning Huaijie Wang et.al. 2505.13997 null
2025-05-20 Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting Bao-Ngoc Dao et.al. 2505.13944 link
2025-05-27 U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding Ziqian Wang et.al. 2505.13880 link
2025-05-20 EfficientLLM: Efficiency in Large Language Models Zhengqing Yuan et.al. 2505.13840 null
2025-05-19 CompeteSMoE – Statistically Guaranteed Mixture of Experts Training via Competition Nam V. Nguyen et.al. 2505.13380 link
2025-05-19 Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference Shuqing Luo et.al. 2505.13345 link
2025-05-19 Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models Lucas Berry et.al. 2505.13273 null
2025-05-19 True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics Christoph Jürgen Hemmer et.al. 2505.13192 null
2025-05-23 Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures Tuan Thai et.al. 2505.13052 null
2025-05-19 TransferTraj: A Vehicle Trajectory Learning Model for Region and Task Transferability Tonglong Wei et.al. 2505.12672 null
2025-05-30 Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization Hongbiao Zhu et.al. 2505.12311 null
2025-05-22 Model Merging in Pre-training of Large Language Models Yunshui Li et.al. 2505.12082 null
2025-05-22 Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition Runduo Han et.al. 2505.12007 link
2025-05-17 MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging Zihuan Qiu et.al. 2505.11883 null
2025-05-17 Improving Coverage in Combined Prediction Sets with Weighted p-values Gina Wong et.al. 2505.11785 null
2025-05-16 HessFormer: Hessians at Foundation Scale Diego Granziol et.al. 2505.11564 null
2025-05-10 PRIME: Physics-Related Intelligent Mixture of Experts for Transistor Characteristics Prediction Zhenxing Dou et.al. 2505.11523 null
2025-05-19 MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production Chao Jin et.al. 2505.11432 null
2025-05-21 MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems Yinsicheng Jiang et.al. 2505.11415 null
2025-05-16 A Fast Kernel-based Conditional Independence test with Application to Causal Discovery Oliver Schacht et.al. 2505.11085 null
2025-05-16 On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating Huy Nguyen et.al. 2505.10860 null
2025-05-14 PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning Zongqian Li et.al. 2505.09519 link
2025-05-14 Qwen3 Technical Report An Yang et.al. 2505.09388 link
2025-05-14 Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Chenggang Zhao et.al. 2505.09343 null
2025-05-29 Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony Shaoyu Wang et.al. 2505.08944 null
2025-05-13 PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts Yang Su et.al. 2505.08719 null
2025-05-25 AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale Yunjie Ji et.al. 2505.08311 null
2025-05-12 UMoE: Unifying Attention and FFN with Shared Experts Yuanhang Yang et.al. 2505.07260 null
2025-05-11 Seed1.5-VL Technical Report Dong Guo et.al. 2505.07062 null
2025-05-21 FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers Tianyu Chen et.al. 2505.06858 null
2025-05-11 The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts Enric Boix-Adsera et.al. 2505.06839 null
2025-05-10 Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Zihan Qiu et.al. 2505.06708 link
2025-05-30 Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding Dawei Huang et.al. 2505.06685 link
2025-05-10 QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration HamidReza Imani et.al. 2505.06481 null
2025-05-06 A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning Junzhou Xu et.al. 2505.06272 null
2025-05-12 FloE: On-the-Fly MoE Inference on Memory-constrained GPU Yuxin Zhou et.al. 2505.05950 null
2025-05-09 MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design Haojie Duanmu et.al. 2505.05799 link
2025-05-10 SDR-RDMA: Software-Defined Reliability Architecture for Planetary Scale RDMA Communication Mikhail Khalilov et.al. 2505.05366 null
2025-05-08 Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts Ming Li et.al. 2505.05035 null
2025-05-07 Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs Yehui Tang et.al. 2505.04519 null
2025-05-07 SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios Ning Cheng et.al. 2505.04201 null
2025-05-07 LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress? Teddy Foley et.al. 2505.04075 link
2025-05-07 Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications Yuanai Xie et.al. 2505.04068 null
2025-05-24 Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks Mehran Mazandarani et.al. 2505.03806 null
2025-05-02 MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance Xing Hu et.al. 2505.03804 null
2025-05-06 Towards Smart Point-and-Shoot Photography Jiawan Li et.al. 2505.03638 null
2025-05-06 Faster MoE LLM Inference for Extremely Large Models Haoqi Yang et.al. 2505.03531 null
2025-05-06 STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation Maolin Wang et.al. 2505.03484 null
2025-05-06 3D Gaussian Splatting Data Compression with Mixture of Priors Lei Liu et.al. 2505.03310 null
2025-05-05 Finger Pose Estimation for Under-screen Fingerprint Sensor Xiongjun Guan et.al. 2505.02481 link
2025-05-05 Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems Kai Zhang et.al. 2505.02381 null
2025-05-08 Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques Sanjay Surendranath Girija et.al. 2505.02309 null
2025-05-04 Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields Zhenxing Mi et.al. 2505.02005 link
2025-05-03 Backdoor Attacks Against Patch-based Mixture of Experts Cedric Chan et.al. 2505.01811 link
2025-05-01 MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling Abdoul Majid O. Thiombiano et.al. 2505.01459 null
2025-05-02 Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders Rogelio A Mancisidor et.al. 2505.01134 null
2025-05-02 CoCoAFusE: Beyond Mixtures of Experts via Model Fusion Aurelio Raffa Ugolini et.al. 2505.01105 null
2025-05-01 Improving Routing in Sparse Mixture of Experts with Graph of Tokens Tam Nguyen et.al. 2505.00792 null
2025-05-01 CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series Tian Lan et.al. 2505.00415 null
2025-05-01 Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing Piotr Piękos et.al. 2505.00315 link
2025-04-30 Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders Xuwei Yang et.al. 2505.00216 null
2025-05-08 Identifying Critical Dependencies in Large-Scale Continuous Software Engineering Anastasiia Tkalich et.al. 2504.21437 null
2025-04-29 TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts Pradip Kunwar et.al. 2504.21190 null
2025-04-29 Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization Shuai Gong et.al. 2504.21063 null
2025-04-26 PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight Ben Goertzel et.al. 2504.21029 null
2025-04-29 In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Zechuan Zhang et.al. 2504.20690 null
2025-05-30 ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting Yu Zhang et.al. 2504.20630 null
2025-04-29 MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification Yichu Xu et.al. 2504.20509 null
2025-04-29 FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks Wenjing Xiao et.al. 2504.20446 null
2025-04-29 MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation Amaan Izhar et.al. 2504.20343 link
2025-04-28 Accelerating Mixture-of-Experts Training with Adaptive Expert Replication Athinagoras Skiadopoulos et.al. 2504.19925 null
2025-04-28 DUETS: Setting expectations for asteroseismic binaries and binary products with synthetic populations A. Mazzi et.al. 2504.19866 null
2025-04-28 Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey Yunting Xu et.al. 2504.19660 null
2025-05-04 ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving Renju Feng et.al. 2504.19580 link
2025-05-30 Versatile Framework for Song Generation with Prompt-based Control Yu Zhang et.al. 2504.19062 null
2025-04-29 BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts Qingyue Wang et.al. 2504.18598 null
2025-04-25 NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation Rob Romijnders et.al. 2504.18147 null
2025-05-15 TGDT: A Temporal Graph-based Digital Twin for Urban Traffic Corridors Nooshin Yousefzadeh et.al. 2504.18008 null
2025-06-11 Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection Haokai Zhang et.al. 2504.17834 link
2025-04-22 Compass-V2 Technical Report Sophia Maria et.al. 2504.15527 null
2025-04-21 Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images Jonathan Brokman et.al. 2504.15470 link
2025-04-17 D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving Haodong Wang et.al. 2504.15299 null
2025-04-23 MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core Dennis Liu et.al. 2504.14960 null
2025-04-20 Evaluating Temporal Plasticity in Foundation Time Series Models for Incremental Fine-tuning Jia Liu et.al. 2504.14677 null
2025-04-29 Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning ByteDance Seed et.al. 2504.13914 null
2025-04-18 Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts Jie Zou et.al. 2504.13655 null
2025-04-18 HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering Alexander Rusnak et.al. 2504.13590 null
2025-04-18 Dense Backpropagation Improves Training for Sparse Mixture-of-Experts Ashwinee Panda et.al. 2504.12463 link
2025-04-16 Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models Yuanbo Tang et.al. 2504.12359 null
2025-04-16 Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data Sangwon Hyun et.al. 2504.12287 null
2025-04-16 The Discovery of Two Quadruple Star Systems with the Second and Third Shortest Outer Periods Brian P. Powell et.al. 2504.12239 null
2025-04-16 MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models Hang Yuan et.al. 2504.12234 null
2025-04-13 Transmission of low energy electrons through a polyethylene terephthalate 800-nm diameter nanocapillary Li Pengfei et.al. 2504.11479 null
2025-04-15 Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology Henrik Häggström et.al. 2504.11279 link
2025-05-22 Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability Jiani Liu et.al. 2504.10804 null
2025-04-14 Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning LeiLei Ma et.al. 2504.09990 null
2025-04-14 DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training Masahiro Tanaka et.al. 2504.09983 null
2025-04-14 Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications Nathalie Bartoli et.al. 2504.09930 null
2025-04-14 Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming Zhiqiang He et.al. 2504.09906 null
2025-04-13 Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation Jia Wei et.al. 2504.09601 null
2025-04-12 MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints Yichao Yuan et.al. 2504.09345 null
2025-04-12 Mixture of Group Experts for Learning Invariant Representations Lei Kang et.al. 2504.09265 null
2025-04-12 Exploring Modality Disruption in Multimodal Fake News Detection Moyang Liu et.al. 2504.09154 null
2025-05-08 RouterKT: Mixture-of-Experts for Knowledge Tracing Han Liao et.al. 2504.08989 null
2025-03-23 ExpertRAG: Efficient RAG with Mixture of Experts – Optimizing Context Retrieval for Adaptive LLM Responses Esmail Gumaan et.al. 2504.08744 null
2025-04-11 Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design Robin Grapin et.al. 2504.08671 null
2025-04-11 Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner Liu Xiao et.al. 2504.08247 null
2025-04-10 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Zhongyang Li et.al. 2504.07964 link
2025-04-11 Scaling Laws for Native Multimodal Models Mustafa Shukor et.al. 2504.07951 null
2025-04-10 Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models Hongcheng Guo et.al. 2504.07807 link
2025-04-10 Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network Peng Jia et.al. 2504.07777 null
2025-04-15 Kimi-VL Technical Report Kimi Team et.al. 2504.07491 link
2025-04-09 MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution Zhe Wang et.al. 2504.07308 link
2025-04-11 Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models Ling Team et.al. 2504.07158 null
2025-05-28 Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations Zican Dong et.al. 2504.06792 null
2025-04-24 FedMerge: Federated Personalization via Model Merging Shutong Chen et.al. 2504.06768 null
2025-04-08 S’MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning Hanqing Zeng et.al. 2504.06426 null
2025-04-08 HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference Shuzhang Zhong et.al. 2504.05897 link
2025-04-08 Adaptive Substructure-Aware Expert Model for Molecular Property Prediction Tianyi Jiang et.al. 2504.05844 null
2025-04-10 Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations Ajay Jaiswal et.al. 2504.05586 null
2025-04-07 SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement Zuying Xie et.al. 2504.04818 null
2025-04-06 On the Spatial Structure of Mixture-of-Experts in Transformers Daniel Bershatsky et.al. 2504.04444 null
2025-04-05 Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator Bing Wang et.al. 2504.04076 link
2025-04-04 HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs Yongji Wu et.al. 2504.03871 null
2025-04-01 Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns Diego Vallarino et.al. 2504.03750 null
2025-04-01 A Unified Virtual Mixture-of-Experts Framework:Enhanced Inference and Hallucination Mitigation in Single-Model System Mingyan Liu et.al. 2504.03739 null
2025-03-26 A multi-scale lithium-ion battery capacity prediction using mixture of experts and patch-based MLP Yuzhu Lei et.al. 2504.03706 link
2025-04-04 RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation Hanbo Bi et.al. 2504.03166 null
2025-06-01 TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models Xinquan Wang et.al. 2504.02712 null
2025-04-07 MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators Beichen Huang et.al. 2504.02658 link
2025-04-24 Cognitive Memory in Large Language Models Lianlei Shan et.al. 2504.02441 null
2025-04-23 MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism Ruidong Zhu et.al. 2504.02263 null
2025-04-20 Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design Mohan Zhang et.al. 2504.01337 null
2025-04-01 Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function Qiuchen Song et.al. 2504.00819 null
2025-04-01 DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism Dengchun Li et.al. 2504.00661 link
2025-04-01 CFP: Low-overhead Profiling-based Intra-operator Parallelism Generation by Preserving Communication-Free Structures Weifang Hu et.al. 2504.00598 null
2025-04-01 Continual Cross-Modal Generalization Yan Xia et.al. 2504.00561 null
2025-04-01 Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection Shunxin Chen et.al. 2504.00458 null
2025-03-31 Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion Jiagen Li et.al. 2503.23721 null
2025-05-16 Mixture of Routers Jia-Chen Zhang et.al. 2503.23362 null
2025-05-25 MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models Zehua Liu et.al. 2503.23100 null
2025-03-29 S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning Giang Do et.al. 2503.23007 null
2025-03-29 Sparse Mixture of Experts as Unified Competitive Learning Giang Do et.al. 2503.22996 null
2025-03-26 Reasoning Beyond Limits: Advances and Open Problems for LLMs Mohamed Amine Ferrag et.al. 2503.22732 null
2025-04-01 Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities Raman Dutt et.al. 2503.22517 null
2025-04-29 RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction Armin Abdollahi et.al. 2503.21971 null
2025-05-08 Binarity at LOw Metallicity (BLOeM): Enhanced multiplicity of early B-type dwarfs and giants at $Z=0.2\,{\rm Z}_\odot$ J. I. Villaseñor et.al. 2503.21936 null
2025-03-27 iMedImage Technical Report Ran Wei et.al. 2503.21836 null
2025-03-27 LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models Hengyuan Zhao et.al. 2503.21227 null
2025-05-17 MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness Zihao Zheng et.al. 2503.21135 null
2025-03-26 Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework Soham Sane et.al. 2503.20750 null
2025-03-26 UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines Chen Tang et.al. 2503.20748 null
2025-03-26 Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning Sashuai Zhou et.al. 2503.20633 null
2025-04-14 MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation Rongyu Zhang et.al. 2503.20384 null
2025-03-26 Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning Yousef Sadegheih et.al. 2503.20326 link
2025-03-31 Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion Konyul Park et.al. 2503.19776 null
2025-04-30 BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts Suzhe Xu et.al. 2503.19769 null
2025-03-25 M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation Ziyuan Liu et.al. 2503.19406 null
2025-04-21 Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design Rui Xie et.al. 2503.18869 null
2025-04-30 Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding Tianyu Chen et.al. 2503.18578 null
2025-03-24 SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking Wenrui Cai et.al. 2503.18338 null
2025-04-01 Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding Ze Zhang et.al. 2503.18104 link
2025-03-22 Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM Codefuse et.al. 2503.17793 null
2025-03-25 Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Yike Yuan et.al. 2503.16057 null
2025-03-21 UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations Debabrata Mandal et.al. 2503.15868 null
2025-03-20 Mixture of Lookup Experts Shibo Jie et.al. 2503.15798 link
2025-03-21 Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication Sin-Yu Huang et.al. 2503.15722 null
2025-04-29 SemEval-2025 Task 1: AdMIRe – Advancing Multimodal Idiomaticity Representation Thomas Pickard et.al. 2503.15358 null
2025-03-21 Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition Seungyeon Cho et.al. 2503.14960 null
2025-03-18 Core-Periphery Principle Guided State Space Model for Functional Connectome Classification Minheng Chen et.al. 2503.14655 null
2025-03-18 DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers Minglei Shi et.al. 2503.14487 null
2025-03-18 MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts Runqi Meng et.al. 2503.14355 null
2025-03-18 Frac-Connections: Fractional Extension of Hyper-Connections Defa Zhu et.al. 2503.14125 null
2025-03-18 SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture Tian Qin et.al. 2503.13808 null
2025-03-13 Ensemble Learning for Large Language Models in Text and Code Generation: A Survey Mari Ashiga et.al. 2503.13505 null
2025-03-17 Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge Shengling Qin et.al. 2503.13421 null
2025-05-10 Channel Estimation for Pinching-Antenna Systems (PASS) Jian Xiao et.al. 2503.13268 null
2025-03-17 Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation Yu Liu et.al. 2503.13254 null
2025-05-21 Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps Mohammad Al-Jarrah et.al. 2503.12633 link
2025-03-16 MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts Harshit et.al. 2503.12592 null
2025-03-16 MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification Jianwei Zhao et.al. 2503.12401 null
2025-05-10 Adaptive Mixture of Low-Rank Experts for Robust Audio Spoofing Detection Qixian Chen et.al. 2503.12010 null
2025-03-14 FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA Jieming Bian et.al. 2503.11880 null
2025-03-10 MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical Care Jiaqing Zhang et.al. 2503.11695 null
2025-03-14 A Review of DeepSeek Models’ Key Innovative Techniques Chengen Wang et.al. 2503.11486 null
2025-03-14 MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling Rachel S. Y. Teo et.al. 2503.11144 link
2025-03-13 Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores Chenpeng Wu et.al. 2503.10725 link
2025-05-19 dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis Luyuan Xie et.al. 2503.10412 null
2025-04-10 Continual Text-to-Video Retrieval with Frame Fusion and Task-Aware Routing Zecheng Zhao et.al. 2503.10111 link
2025-03-12 MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching Tairan Xu et.al. 2503.09716 null
2025-03-12 Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework Bakary Badjie et.al. 2503.09504 null
2025-03-12 Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment Nazanin Moradinasab et.al. 2503.09498 link
2025-04-01 Astrea: A MOE-based Visual Understanding Model with Progressive Alignment Xiaoda Yang et.al. 2503.09445 null
2025-03-12 Automatic Operator-level Parallelism Planning for Distributed Deep Learning – A Mixed-Integer Programming Approach Ruifeng She et.al. 2503.09357 null
2025-03-12 Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference Mohammad Siavashi et.al. 2503.09304 null
2025-03-13 FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models Fufangchen Zhao et.al. 2503.09158 null
2025-05-22 MoE-Loco: Mixture of Experts for Multitask Locomotion Runhan Huang et.al. 2503.08564 null
2025-03-11 BoundarEase: Fostering Constructive Community Engagement to Inform More Equitable Student Assignment Policies Cassandra Overney et.al. 2503.08543 link
2025-03-11 Accelerating MoE Model Inference with Expert Sharding Oana Balmau et.al. 2503.08467 null
2025-03-26 Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models Junzhe Li et.al. 2503.08120 null
2025-03-11 MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models Han Zhao et.al. 2503.08007 null
2025-03-10 Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM Yongqiang Yao et.al. 2503.07680 null
2025-04-01 TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster Kanghui Ning et.al. 2503.07649 null
2025-03-05 BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification Jing Zhang et.al. 2503.07640 null
2025-03-05 Mixture of Experts Made Intrinsically Interpretable Xingyi Yang et.al. 2503.07639 null
2025-03-26 GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts Minwen Liao et.al. 2503.07417 null
2025-04-18 A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications Siyuan Mu et.al. 2503.07137 link
2025-03-10 VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots Fu Chen et.al. 2503.07049 link
2025-03-10 ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration Mengting Ai et.al. 2503.06881 link
2025-03-10 eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference Suraiya Tairin et.al. 2503.06823 null
2025-03-09 MoFE: Mixture of Frozen Experts Architecture Jean Seo et.al. 2503.06491 null
2025-03-25 Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models Nguyen Do et.al. 2503.06413 link
2025-03-08 MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering Vinay Kumar Verma et.al. 2503.06296 null
2025-03-08 A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts Wenzhuo Du et.al. 2503.06064 null
2025-03-08 MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model Miguel Contreras et.al. 2503.06059 null
2025-03-08 GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices Xudong Lu et.al. 2503.06019 null
2025-03-03 How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model Diego Vallarino et.al. 2503.05800 null
2025-03-11 Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning Justin Chih-Yao Chen et.al. 2503.05641 null
2025-03-07 FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework Jingyu Xu et.al. 2503.05626 null
2025-04-15 Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Weigao Sun et.al. 2503.05447 link
2025-03-10 Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs Ling Team et.al. 2503.05139 null
2025-03-07 Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts Shwai He et.al. 2503.05066 null
2025-03-06 Continual Pre-training of MoEs: How robust is your router? Benjamin Thérien et.al. 2503.05029 null
2025-02-25 Comparative Analysis Based on DeepSeek, ChatGPT, and Google Gemini: Features, Techniques, Performance, Future Prospects Anichur Rahman et.al. 2503.04783 null
2025-03-19 Predictable Scale: Part I – Optimal Hyperparameter Scaling Law in Large Language Model Pretraining Houyi Li et.al. 2503.04715 null
2025-03-07 Question-Aware Gaussian Experts for Audio-Visual Question Answering Hongyeob Kim et.al. 2503.04459 link
2025-03-19 Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling Yan Li et.al. 2503.04398 null
2025-03-06 A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery Yiheng Zhu et.al. 2503.04362 null
2025-03-06 Quantum metric induced magneto-optical effects in $\mathcal{PT}$ -symmetric antiferromagnets Yongpan Li et.al. 2503.04312 null
2025-03-06 DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval Yating Liu et.al. 2503.04144 null
2025-03-05 VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection Enkhtogtokh Togootogtokh et.al. 2503.03797 link
2025-03-09 Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs Haoran Fan et.al. 2503.03594 link
2025-03-06 Convergence Rates for Softmax Gating Mixture of Experts Huy Nguyen et.al. 2503.03213 null
2025-03-04 MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation Weihang Wang et.al. 2503.02799 link
2025-03-04 FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting Congluo Xu et.al. 2503.02692 null
2025-03-06 Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer Yujiao Yang et.al. 2503.02495 link
2025-03-04 Tabby: Tabular Data Synthesis with Language Models Sonia Cromp et.al. 2503.02152 null
2025-03-03 ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition Nastaran Mansourian et.al. 2503.01750 null
2025-03-03 Effective High-order Graph Representation Learning for Credit Card Fraud Detection Yao Zou et.al. 2503.01556 null
2025-03-03 DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models Yongqi Huang et.al. 2503.01359 null
2025-03-03 PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation Linhai Zhang et.al. 2503.01303 null
2025-03-03 Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting Xiaobin Hong et.al. 2503.01157 null
2025-03-02 Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion Daiki Nishiyama et.al. 2503.00925 null
2025-03-01 Efficiently Editing Mixture-of-Experts Models with Compressed Experts Yifei He et.al. 2503.00634 null
2025-03-01 CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering Tianyu Huai et.al. 2503.00413 null
2025-02-28 CoSMoEs: Compact Sparse Mixture of Experts Patrick Huber et.al. 2503.00245 null
2025-02-26 Omni-SILA: Towards Omni-scene Driven Visual Sentiment Identifying, Locating and Attributing in Videos Jiamin Luo et.al. 2503.00049 null
2025-03-01 R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Zhongyang Li et.al. 2502.20395 link
2025-02-27 Mixture of Experts for Recognizing Depression from Interview and Reading Tasks Loukas Ilias et.al. 2502.20213 null
2025-02-27 Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems Zeyi Ren et.al. 2502.20183 null
2025-02-27 UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook Yidi Jiang et.al. 2502.20067 null
2025-02-27 AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs Xuyang Wei et.al. 2502.20035 link
2025-03-04 Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts Shulai Zhang et.al. 2502.19811 link
2025-02-27 Extension of SUSY SU(5) GUTs with Nelson-Barr models Junji Hisano et.al. 2502.19686 null
2025-03-15 Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Taishi Nakamura et.al. 2502.19261 null
2025-02-26 OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment Jiaxin Deng et.al. 2502.18965 null
2025-02-26 Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM Junxiao Ma et.al. 2502.18863 null
2025-02-25 Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking Changyuan Zhao et.al. 2502.18118 null
2025-02-09 MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition Mehran Shabanpour et.al. 2502.17457 null
2025-03-17 The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE Andrei Chernov et.al. 2502.17391 null
2025-02-24 Delta Decompression for MoE-based LLMs Compression Hao Gu et.al. 2502.17298 link
2025-02-24 Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks Andrei Chernov et.al. 2502.17187 null
2025-02-24 Muon is Scalable for LLM Training Jingyuan Liu et.al. 2502.16982 link
2025-03-07 BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference Zewen Jin et.al. 2502.16927 null
2025-02-24 ENACT-Heart – ENsemble-based Assessment Using CNN and Transformer on Heart Sounds Jiho Han et.al. 2502.16914 null
2025-02-26 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Chenghao Fan et.al. 2502.16894 null
2025-02-22 An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning Masoud Shokrnezhad et.al. 2502.16198 null
2025-02-20 A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models Mengyang Sun et.al. 2502.15828 link
2025-03-20 Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models Yuan Sun et.al. 2502.15451 link
2025-03-02 Tight Clusters Make Specialized Experts Stefan K. Nielsen et.al. 2502.15315 link
2025-02-21 Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction Baohang Zhou et.al. 2502.15290 link
2025-02-20 Ray-Tracing for Conditionally Activated Neural Networks Claudio Gallicchio et.al. 2502.14788 null
2025-02-21 ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model Zhongyi Zhou et.al. 2502.14420 null
2025-02-19 MoM: Linear Sequence Modeling with Mixture-of-Memories Jusen Du et.al. 2502.13685 link
2025-02-19 Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts Xin Li et.al. 2502.13577 null
2025-02-18 MoBA: Mixture of Block Attention for Long-Context LLMs Enzhe Lu et.al. 2502.13189 link
2025-02-18 Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models Gyeongman Kim et.al. 2502.12947 null
2025-03-13 DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs Minxuan Lv et.al. 2502.12455 null
2025-02-17 From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs Kumari Nishu et.al. 2502.12325 null
2025-02-17 Binarity at LOw Metallicity (BLOeM): Multiplicity of early B-type supergiants in the Small Magellanic Cloud N. Britavskiy et.al. 2502.12239 null
2025-02-17 Accurate Expert Predictions in MoE Inference via Cross-Layer Gate Zhiyuan Fang et.al. 2502.12224 null
2025-02-17 How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines Ayan Sengupta et.al. 2502.12051 null
2025-02-17 Connector-S: A Survey of Connectors in Multi-modal Large Language Models Xun Zhu et.al. 2502.11453 null
2025-02-16 Mixture of Tunable Experts – Behavior Modification of DeepSeek-R1 at Inference Time Robert Dahlke et.al. 2502.11096 null
2025-02-16 ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models Shixuan Li et.al. 2502.11059 null
2025-02-15 Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization Matthew Lyle Olson et.al. 2502.10928 null
2025-02-11 MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition Sungnyun Kim et.al. 2502.10447 null
2025-04-03 Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution Bowen Chen et.al. 2502.09654 null
2025-02-14 Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting Nicholas Dronen et.al. 2502.09500 link
2025-02-12 The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities Ning Li et.al. 2502.08381 null
2025-02-12 Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification Xuanze Chen et.al. 2502.08083 null
2025-03-09 Training Sparse Mixture Of Experts Text Embedding Models Zach Nussbaum et.al. 2502.07972 link
2025-02-11 Memory Analysis on the Training Course of DeepSeek Models Ping Zhang et.al. 2502.07846 null
2025-02-11 LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid Weigao Sun et.al. 2502.07563 link
2025-02-11 MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks Lotfi Abdelkrim Mecharbat et.al. 2502.07422 null
2025-02-11 Online Aggregation of Trajectory Predictors Alex Tong et.al. 2502.07178 null
2025-02-09 Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline Zhiyuan Fang et.al. 2502.06888 null
2025-02-12 Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach Xu Zhang et.al. 2502.06832 null
2025-02-10 MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing Seokjin Go et.al. 2502.06643 null
2025-02-10 Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE Haiduo Huang et.al. 2502.06282 link
2025-02-10 Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models Peiran Wang et.al. 2502.06094 null
2025-02-08 Mol-MoE: Training Preference-Guided Routers for Molecule Generation Diego Calanzone et.al. 2502.05633 null
2025-02-17 UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA Jiale Dong et.al. 2502.05602 link
2025-02-07 fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving Hanfei Yu et.al. 2502.05370 null
2025-02-07 Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts Roussel Desmond Nzoyem et.al. 2502.05335 null
2025-02-19 Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient Jan Ludziejewski et.al. 2502.05172 null
2025-02-06 Mixture of neural operator experts for learning boundary conditions and model selection Dwyer Deighan et.al. 2502.04562 null
2025-02-06 CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference Zehua Pei et.al. 2502.04416 link
2025-02-06 Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning Peizhuang Cong et.al. 2502.03884 null
2025-03-20 A Retrospective Systematic Study on Hierarchical Sparse Query Transformer-assisted Ultrasound Screening for Early Hepatocellular Carcinoma Chaoyin She et.al. 2502.03772 link
2025-02-05 (GG) MoE vs. MLP on Tabular Data Andrei Chernov et.al. 2502.03608 null
2025-02-05 RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts Tuan Truong et.al. 2502.03044 null
2025-03-22 On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation Nghiem T. Diep et.al. 2502.03029 null
2025-02-05 Scaling Laws for Upcycling Mixture-of-Experts Language Models Seng Pei Liew et.al. 2502.03009 null
2025-02-04 ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals Jianan Nie et.al. 2502.02748 null
2025-02-04 Binarity at LOw Metallicity (BLOeM): The multiplicity properties and evolution of BAF-type supergiants L. R. Patrick et.al. 2502.02644 null
2025-02-04 Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism Yuhao Qing et.al. 2502.02581 null
2025-02-07 Brief analysis of DeepSeek R1 and its implications for Generative AI Sarah Mercer et.al. 2502.02523 null
2025-02-04 M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference Nikhil Bhendawade et.al. 2502.02040 null
2025-02-07 MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation Haibo Tong et.al. 2502.01719 null
2025-02-27 Omni-Mol: Exploring Universal Convergent Space for Omni-Molecular Tasks Chengxin Hu et.al. 2502.01074 null
2025-02-17 MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs Yuhang Zhou et.al. 2502.00997 null
2025-02-03 CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling Xinze Wang et.al. 2502.00965 null
2025-02-02 UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs Yufei He et.al. 2502.00806 null
2025-02-02 Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective Yujin Oh et.al. 2502.00619 null
2025-02-05 Weak-to-Strong Diffusion with Reflection Lichen Bai et.al. 2502.00473 null
2025-02-01 PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning Yu Feng et.al. 2502.00354 link
2025-02-01 Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective Fanqi Yan et.al. 2502.00281 null
2025-01-31 Pheromone-based Learning of Optimal Reasoning Paths Anirudh Chari et.al. 2501.19278 null
2025-03-03 Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning Minh Le et.al. 2501.18936 null
2025-01-30 MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability Yan Sun et.al. 2501.18439 null
2025-02-10 Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework Jung-Hua Liu et.al. 2501.17903 null
2025-01-29 Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks Lucio La Cava et.al. 2501.17557 null
2025-01-28 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow Yueen Ma et.al. 2501.16698 null
2025-01-27 Searching for GEMS: Discovery and Characterization of Two Brown Dwarfs Around M Dwarfs Alexander Larsen et.al. 2501.16554 null
2025-02-12 One-for-All Does Not Work! Enhancing Vulnerability Detection by Mixture-of-Experts (MoE) Xu Yang et.al. 2501.16454 null
2025-01-29 Mixture of Experts (MoE): A Big Data Perspective Wensheng Gan et.al. 2501.16352 null
2025-01-27 Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference Yinghan Li et.al. 2501.16103 null
2025-01-25 ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning Shangqian Gao et.al. 2501.15316 null
2025-03-16 FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts Ziqi Liu et.al. 2501.15125 link
2025-01-25 Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning Ziyu Zhao et.al. 2501.15103 null
2025-01-24 Mean-field limit from general mixtures of experts to quantum neural networks Anderson Melchor Hernandez et.al. 2501.14660 null
2025-01-30 Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation Shengzhe Zhang et.al. 2501.14269 link
2025-03-12 Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images Zeyun Deng et.al. 2501.14198 null
2025-01-23 CSAOT: Cooperative Multi-Agent System for Active Object Tracking Hy Nguyen et.al. 2501.13994 null
2025-01-22 Autonomy-of-Experts Models Ang Lv et.al. 2501.13074 null
2025-02-07 LLM4WM: Adapting LLM for Wireless Multi-Tasking Xuanyu Liu et.al. 2501.12983 null
2025-01-22 UniUIR: Considering Underwater Image Restoration as An All-in-One Learner Xu Zhang et.al. 2501.12981 null
2025-01-22 BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR Guodong Ma et.al. 2501.12602 null
2025-02-26 Modality Interactive Mixture-of-Experts for Fake News Detection Yifan Liu et.al. 2501.12431 link
2025-01-21 SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection Xiaocheng Zhang et.al. 2501.12430 null
2025-01-25 Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Samira Abnar et.al. 2501.12370 null
2025-01-21 MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks Qishen Zhou et.al. 2501.12281 link
2025-02-04 Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Zihan Qiu et.al. 2501.11873 null
2025-01-18 FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models Xinglin Pan et.al. 2501.10714 null
2024-12-16 DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference Yujie Zhang et.al. 2501.10375 null
2025-01-17 OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning Jinyuan Feng et.al. 2501.10062 null
2025-01-17 LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading Kuan-Ming Liu et.al. 2501.09636 null
2025-01-16 MoE $^2$ : Optimizing Collaborative Inference for Edge Large Language Models Lyudong Jin et.al. 2501.09410 null
2025-01-14 MiniMax-01: Scaling Foundation Models with Lightning Attention MiniMax et.al. 2501.08313 null
2025-01-14 Guiding polaritonic energy and momentum through two-dimensional Bravais lattices Zhonglin Li et.al. 2501.08123 null
2025-02-11 GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism Chen Tang et.al. 2501.07890 null
2025-01-18 PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration Xiaoshui Huang et.al. 2501.07762 null
2025-01-13 A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis Binyu Zhang et.al. 2501.07016 link
2025-01-12 Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning Hanwen Zhong et.al. 2501.06884 link
2025-01-12 A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context Noureldin Zahran et.al. 2501.06859 null
2025-03-18 TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning Yinghao Zhu et.al. 2501.05661 link
2025-01-09 Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing Mengfan Liu et.al. 2501.05313 null
2025-01-07 LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes Xiang Xu et.al. 2501.04004 link
2025-01-07 mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training Xudong Liao et.al. 2501.03905 null
2025-01-08 Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection Donatella Genovese et.al. 2501.03432 null
2025-01-06 Solving the Porous Medium Equation with the eXtreme Mesh deformation approach (X-Mesh) Alexandre Chemin et.al. 2501.03083 null
2025-01-05 Soft and Compliant Contact-Rich Hair Manipulation and Care Uksang Yoo et.al. 2501.02630 null
2025-01-12 Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning Zhongyi Zhou et.al. 2501.02198 null
2025-03-18 MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders Jiajun Cao et.al. 2501.01709 null
2025-01-01 REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization Huyen Nguyen et.al. 2501.00779 null
2025-01-06 Superposition in Transformers: A Novel Way of Building Mixture of Experts Ayoub Ben Chaliah et.al. 2501.00530 link
2024-12-31 CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection Xiaolei Wang et.al. 2501.00346 null
2024-12-30 SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection Yuxuan Li et.al. 2412.20665 link
2024-12-29 Multimodal Variational Autoencoder: a Barycentric View Peijie Qiu et.al. 2412.20487 null
2025-03-05 A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement Sidra Nasir et.al. 2412.20468 null
2024-12-29 Mind the Data Gap: Bridging LLMs to Enterprise Data Integration Moe Kayali et.al. 2412.20331 null
2025-03-09 UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity Jingbo Lin et.al. 2412.20157 link
2024-12-28 Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection Yaning Zhang et.al. 2412.20156 null
2025-02-18 DeepSeek-V3 Technical Report DeepSeek-AI et.al. 2412.19437 link
2024-12-26 AskChart: Universal Chart Understanding through Textual Enhancement Xudong Yang et.al. 2412.19146 link
2024-12-30 Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection Xiaoyu Huang et.al. 2412.19108 null
2024-12-26 DAPoinTr: Domain Adaptive Point Transformer for Point Cloud Completion Yinghui Li et.al. 2412.19062 link
2025-03-10 Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making David Shoresh et.al. 2412.18593 link
2024-12-24 BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing Yingjie Ma et.al. 2412.18065 link
2024-12-23 UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition Li Fu et.al. 2412.17507 null
2025-02-01 BrainMAP: Learning Multiple Activation Pathways in Brain Networks Song Wang et.al. 2412.17404 link
2024-12-23 Efficient fine-tuning methodology of text embedding models for information retrieval: contrastive learning penalty (clp) Jeongsu Yu et.al. 2412.17364 link
2024-12-22 The Fermat curves and arrangements of lines and conics Nils Peder Astrup Toft et.al. 2412.16993 null
2024-12-22 Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models Elie Antoine et.al. 2412.16971 null
2024-12-18 GraphLoRA: Empowering LLMs Fine-Tuning via Graph Collaboration of MoE Ting Bai et.al. 2412.16216 null
2024-12-20 Theory of Mixture-of-Experts for Mobile Edge Computing Hongbo Li et.al. 2412.15690 null
2024-12-19 MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale Swapnil Gandhi et.al. 2412.15411 null
2025-01-03 Qwen2.5 Technical Report Qwen et.al. 2412.15115 link
2025-02-27 ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Ziteng Wang et.al. 2412.14711 link
2025-01-22 A Survey on Inference Optimization Techniques for Mixture of Experts Models Jiacheng Liu et.al. 2412.14219 link
2024-12-18 SEKE: Specialised Experts for Keyword Extraction Matej Martinc et.al. 2412.14087 link
2024-12-18 MedCoT: Medical Chain of Thought via Hierarchical Expert Jiaxiang Liu et.al. 2412.13736 link
2024-12-17 SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks Mátyás Vincze et.al. 2412.13053 null
2024-12-17 Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Moritz Reuss et.al. 2412.12953 null
2025-01-09 CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition He Wang et.al. 2412.12760 null
2024-12-16 Investigating Mixture of Experts in Dense Retrieval Effrosyni Sokli et.al. 2412.11864 null
2024-12-20 Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture Jingze Shi et.al. 2412.11834 link
2024-12-16 Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation Svetlana Pavlitska et.al. 2412.11608 link
2024-12-16 Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture Jingyu Xu et.al. 2412.11557 null
2024-12-14 DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification Yuhao Wang et.al. 2412.10650 link
2024-12-13 DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Zhiyu Wu et.al. 2412.10302 link
2024-12-13 Llama 3 Meets MoE: Efficient Upcycling Aditya Vavre et.al. 2412.09952 link
2024-12-20 Memory Layers at Scale Vincent-Pierre Berges et.al. 2412.09764 link
2025-01-10 Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine Xiaoshuang Huang et.al. 2412.09278 link
2024-12-12 MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning Lulu Zhao et.al. 2412.08946 null
2024-11-26 Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection Tzu-Ting Yang et.al. 2412.08651 null
2025-01-18 Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective Minh Le et.al. 2412.08285 null
2025-02-12 Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification Xuanze Chen et.al. 2412.08193 link
2024-12-10 MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning Yufei Ma et.al. 2412.07405 null
2024-12-10 Post-Training Statistical Calibration for Higher Activation Sparsity Vui Seng Chua et.al. 2412.07174 link
2025-03-02 MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems Yao Fu et.al. 2412.07067 null
2024-12-07 Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts Arturo Rodriguez et.al. 2412.06842 null
2024-12-09 Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset Xiao Wang et.al. 2412.06647 link
2024-12-09 UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts Zhen Wan et.al. 2412.06340 null
2024-12-08 Hallucination-aware Optimization for Large Language Model-empowered Communications Yinqiu Liu et.al. 2412.06007 link
2024-12-10 An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism Qing Zhang et.al. 2412.05821 null
2024-12-10 RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts Xu Liu et.al. 2412.05679 link
2024-12-07 SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts Gengze Zhou et.al. 2412.05552 link
2024-12-07 Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers Boxun Xu et.al. 2412.05540 null
2024-12-23 Steps are all you need: Rethinking STEM Education with Prompt Engineering Krishnasai Addala et.al. 2412.05023 null
2024-12-05 Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts Chenyang Zhu et.al. 2412.04220 null
2025-03-02 Monet: Mixture of Monosemantic Experts for Transformers Jungwoo Park et.al. 2412.04139 link
2024-12-05 Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks Zhaoyang Liu et.al. 2412.03850 null
2024-12-04 Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond Loukas Ilias et.al. 2412.03483 null
2024-12-03 CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting Hao Chen et.al. 2412.02503 null
2025-02-14 MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption Siddhant Dutta et.al. 2412.01858 null
2025-01-22 Yi-Lightning Technical Report Alan Wake et.al. 2412.01253 null
2024-11-30 Mixture of Experts for Node Classification Yu Shi et.al. 2412.00418 null
2025-01-22 HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting Shaohan Yu et.al. 2412.00316 null
2024-11-27 Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference Andrii Skliar et.al. 2412.00099 null
2025-02-16 Condense, Don’t Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning Mingyu Cao et.al. 2412.00069 link
2024-11-29 LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References Shuguo Jiang et.al. 2411.19758 null
2024-11-28 On the effectiveness of discrete representations in sparse mixture of experts Giang Do et.al. 2411.19402 null
2024-11-28 Bayesian Cluster Weighted Gaussian Models Panagiotis Papastamoulis et.al. 2411.18957 link
2024-11-27 UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS Haomin Zhuang et.al. 2411.18797 null
2024-11-27 Complexity Experts are Task-Discriminative Learners for Any Image Restoration Eduard Zamfir et.al. 2411.18466 null
2024-11-27 Mixture of Experts in Image Classification: What’s the Sweet Spot? Mathurin Videau et.al. 2411.18322 null
2024-11-26 $H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs Selim Furkan Tekin et.al. 2411.17792 link
2024-11-26 The Tempered Finite Element Method Antoine Quiriny et.al. 2411.17564 null
2024-11-25 Staleness-Centric Optimizations for Efficient Diffusion MoE Inference Jiajun Luo et.al. 2411.16786 null
2024-12-02 MH-MoE: Multi-Head Mixture-of-Experts Shaohan Huang et.al. 2411.16205 null
2024-11-25 LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy Peng Cui et.al. 2411.16095 null
2024-11-24 Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution Haiquan Wang et.al. 2411.15871 null
2024-11-24 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Xiaoye Qu et.al. 2411.15708 link
2024-11-23 Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts Qizhou Chen et.al. 2411.15432 null
2024-11-23 Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation Fahao Chen et.al. 2411.15419 null
2024-11-21 Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning Jiange Yang et.al. 2411.14519 null
2024-11-20 MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification Yuxuan Chen et.al. 2411.13004 null
2024-11-23 KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning Ming Yin et.al. 2411.12950 null
2025-02-06 Ultra-Sparse Memory Network Zihao Huang et.al. 2411.12364 null
2025-01-28 CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese Characters Zishuo Feng et.al. 2411.11770 link
2024-11-18 MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs Shiyi Cao et.al. 2411.11217 null
2024-11-16 Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Jinqiang Long et.al. 2411.10669 link
2024-11-15 Weakly-Supervised Multimodal Learning on MIMIC-CXR Andrea Agostini et.al. 2411.10356 link
2024-11-21 Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models Wei Wang et.al. 2411.10003 null
2024-11-13 Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection Vima Gupta et.al. 2411.08982 null
2024-11-13 Sparse Upcycling: Inference Inefficient Finetuning Sasha Doubov et.al. 2411.08968 null
2024-11-13 LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing Xiaonan Nie et.al. 2411.08446 null
2024-11-12 Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach Renzi Wang et.al. 2411.08232 null
2024-11-12 PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model Yilun Liu et.al. 2411.08212 null
2024-11-08 Biodynamic Analysis of Alpine Skiing with a Skier-Ski-Snow Interaction Model Nan Gao et.al. 2411.08056 null
2024-11-12 Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge Emmanuel Azuh Mensah et.al. 2411.07834 null
2024-11-11 Adaptive Conditional Expert Selection Network for Multi-domain Recommendation Kuiyao Dong et.al. 2411.06826 null
2024-11-11 WDMoE: Wireless Distributed Mixture of Experts for Large Language Models Nan Xue et.al. 2411.06681 null
2024-11-09 Learning Mixtures of Experts with EM Quentin Fruytier et.al. 2411.06056 null
2024-11-08 NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts Yen-Ting Lin et.al. 2411.05945 null
2024-11-05 DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts Zelin Yao et.al. 2411.03025 link
2024-11-05 Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts Yuan Xie et.al. 2411.02787 null
2024-11-27 SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models Jianyi Zhang et.al. 2411.02433 link
2024-11-06 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Xingwu Sun et.al. 2411.02265 null
2024-12-27 FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation Ziwei Zhan et.al. 2411.02115 null
2024-11-06 Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis Mohammad Zbeeb et.al. 2411.01929 link
2025-02-10 RS-MoE: A Vision-Language Model with Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering Hui Lin et.al. 2411.01595 null
2025-02-10 Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation Mingrui Liu et.al. 2411.01457 null
2024-11-06 HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Peng Tang et.al. 2411.01433 null
2024-12-12 HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy Shuqing Luo et.al. 2411.01288 link
2024-11-02 PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment Dongxu Liu et.al. 2411.01245 null
2024-11-01 MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition Cheng Yang et.al. 2411.01016 null
2024-11-01 LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Nam V. Nguyen et.al. 2411.00918 link
2024-10-16 TradExpert: Revolutionizing Trading with Mixture of Expert LLMs Qianggang Ding et.al. 2411.00782 null
2024-11-01 MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization Jingming Guo et.al. 2411.00662 link
2024-11-01 A Fast, Analytic Empirical Model of the Gaia Data Release 3 Astrometric Orbit Catalog Selection Function Casey Y. Lam et.al. 2411.00654 link
2024-10-31 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts Xiang Deng et.al. 2410.23836 null
2024-10-30 Efficient and Interpretable Grammatical Error Correction with Mixture of Experts Muhammad Reza Qorib et.al. 2410.23507 link
2024-10-30 Stealing User Prompts from Mixture of Experts Itay Yona et.al. 2410.22884 null
2024-10-30 MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning Xujia Wang et.al. 2410.22782 null
2025-02-08 ProMoE: Fast MoE-based LLM Serving using Proactive Caching Xiaoniu Song et.al. 2410.22134 null
2024-10-29 Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging Li Shen et.al. 2410.21804 null
2024-10-29 Neural Experts: Mixture of Experts for Implicit Neural Representations Yizhak Ben-Shabat et.al. 2410.21643 null
2024-11-07 FinTeamExperts: Role Specialized MOEs For Financial Analysis Yue Yu et.al. 2410.21338 null
2024-10-28 Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving Jiyao Wang et.al. 2410.21086 null
2024-10-27 Towards a Blockchain and Opportunistic Edge Driven Metaverse of Everything Paula Fraga-Lamas et.al. 2410.20594 null
2024-10-27 Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation Maohao Shen et.al. 2410.20336 null
2024-10-27 GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields Yusuke Sekikawa et.al. 2410.20306 null
2024-11-12 LLMs Can Evolve Continually on Modality for X-Modal Reasoning Jiazuo Yu et.al. 2410.20178 link
2024-10-25 DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction Zelin Zang et.al. 2410.19504 link
2025-01-27 Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis Weikai Li et.al. 2410.19225 link
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-24 Mixture of Parrots: Experts improve memorization more than reasoning Samy Jelassi et.al. 2410.19034 null
2024-10-24 MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases Zhisheng Lin et.al. 2410.18406 null
2024-10-23 Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches Kexin Feng et.al. 2410.18298 null
2024-10-23 MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning Jingfan Zhang et.al. 2410.18035 null
2024-10-24 ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference Xin He et.al. 2410.17954 null
2024-10-23 Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition Artem Basharin et.al. 2410.17765 null
2024-10-22 Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling Jialong Li et.al. 2410.17043 null
2024-10-21 LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset Ruikun Zhang et.al. 2410.16095 link
2024-10-22 CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts Zhenpeng Su et.al. 2410.16077 link
2024-10-29 Generalizing Motion Planners with Mixture of Experts for Autonomous Driving Qiao Sun et.al. 2410.15774 link
2024-11-23 ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts Xumeng Han et.al. 2410.15732 null
2024-10-20 Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs Xin Zhou et.al. 2410.15438 null
2024-11-16 LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration Yuang Ai et.al. 2410.15385 link
2024-10-19 MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning Suning Huang et.al. 2410.14972 null
2024-10-29 Collaboratively adding new knowledge to an LLM Rhui Dih Lee et.al. 2410.14753 link
2024-10-18 MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts Rachel S. Y. Teo et.al. 2410.14574 link
2024-10-18 Towards a Simple and Extensible Standard for Object-Centric Event Data (OCED) – Core Model, Design Space, and Lessons Learned Dirk Fahland et.al. 2410.14495 link
2024-10-18 ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction Haoyu He et.al. 2410.14099 link
2024-10-17 Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks Jinze Zhao et.al. 2410.13964 null
2024-10-18 MoR: Mixture of Ranks for Low-Rank Adaptation Tuning Chuanyu Tang et.al. 2410.13408 null
2024-10-16 Satellite-Terrestrial Quantum Networks and the Global Quantum Internet Andrea Conti et.al. 2410.13096 null
2024-10-16 On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs Herun Wan et.al. 2410.12600 null
2024-10-16 Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion Minkyoung Cho et.al. 2410.12592 null
2024-10-16 Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts Fanqi Yan et.al. 2410.12258 null
2025-01-03 EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference Yulei Qian et.al. 2410.12247 null
2024-10-15 MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router Yanyue Xie et.al. 2410.12013 null
2024-10-15 MoH: Multi-Head Attention as Mixture-of-Head Attention Peng Jin et.al. 2410.11842 link
2024-10-15 GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation Fei Tang et.al. 2410.11841 link
2024-10-15 Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models James Vo et.al. 2410.11654 null
2024-10-16 Quadratic Gating Functions in Mixture of Experts: A Statistical Insight Pedram Akbarian et.al. 2410.11222 null
2024-10-19 AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approach Xurui Li et.al. 2410.10896 null
2024-10-01 Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models Keivan Alizadeh et.al. 2410.10846 null
2024-10-16 Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Ziyue Li et.al. 2410.10814 link
2024-10-14 Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts Guorui Zheng et.al. 2410.10626 link
2024-10-14 Learning to Ground VLMs without Forgetting Aritra Bhowmik et.al. 2410.10491 null
2024-10-14 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts Xu Liu et.al. 2410.10469 null
2024-10-15 Ada-K Routing: Boosting the Efficiency of MoE-based LLMs Tongtian Yue et.al. 2410.10456 null
2024-10-14 Tighter Risk Bounds for Mixtures of Experts Wissam Akretche et.al. 2410.10397 null
2024-10-24 Scalable Multi-Domain Adaptation of Language Models using Modular Experts Peter Schafhalter et.al. 2410.10181 null
2024-10-16 Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models Jun Luo et.al. 2410.10114 null
2024-10-14 AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality Peijun Qing et.al. 2410.10054 link
2024-10-13 ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL Zhanqiu Guo et.al. 2410.09781 null
2024-10-13 MoIN: Mixture of Introvert Experts to Upcycle an LLM Ajinkya Tejankar et.al. 2410.09687 null
2024-10-12 GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks Dingyi Zhuang et.al. 2410.09570 null
2024-10-11 Semi-Supervised Learning of Noisy Mixture of Experts Models Oh-Ran Kwon et.al. 2410.09039 null
2024-10-11 Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering I-Chun Chen et.al. 2410.08589 null
2024-10-31 Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts Sukwon Yun et.al. 2410.08245 link
2024-11-20 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Gen Luo et.al. 2410.08202 null
2024-10-10 Efficient Dictionary Learning with Switch Sparse Autoencoders Anish Mudide et.al. 2410.08201 link
2024-10-18 More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing Sagi Shaier et.al. 2410.08003 null
2024-10-10 SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture Jiayi Han et.al. 2410.07739 null
2024-10-10 Upcycling Large Language Models into Mixture of Experts Ethan He et.al. 2410.07524 null
2024-10-09 User Feedback in Continuous Software Engineering: Revealing the State-of-Practice Anastasiia Tkalich et.al. 2410.07459 null
2024-10-11 MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts Peng Jin et.al. 2410.07348 null
2024-10-04 A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles Diego Vallarino et.al. 2410.07234 null
2024-10-09 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders David Noever et.al. 2410.06462 null
2024-10-09 Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs Ruijia Niu et.al. 2410.06431 null
2024-10-08 Probing the Robustness of Theory of Mind in Large Language Models Christian Nickel et.al. 2410.06271 null
2024-10-08 MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More Wei Huang et.al. 2410.06270 link
2024-12-17 Aria: An Open Multimodal Native Mixture-of-Experts Model Dongxu Li et.al. 2410.05993 link
2024-10-08 Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models Siqi Wang et.al. 2410.05661 null
2024-12-05 Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild Xinyu Zhao et.al. 2410.05357 link
2024-10-07 Multimodal Fusion Strategies for Mapping Biophysical Landscape Features Lucia Gordon et.al. 2410.04833 link
2024-10-06 Realizing Video Summarization from the Path of Language-based Semantic Understanding Kuan-Chen Mu et.al. 2410.04511 null
2024-10-09 Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding Wei Wu et.al. 2410.03553 null
2024-10-04 Exploring the Benefit of Activation Sparsity in Pre-training Zhengyan Zhang et.al. 2410.03440 link
2024-10-03 MLP-KAN: Unifying Deep Representation and Function Learning Yunhong He et.al. 2410.03027 link
2024-10-03 On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions Huy Nguyen et.al. 2410.02935 null
2024-10-03 Neutral residues: revisiting adapters for model extension Franck Signe Talla et.al. 2410.02744 null
2024-10-03 Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping Ziye Huang et.al. 2410.02475 null
2024-10-03 MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Zhaojian Yu et.al. 2410.02241 null
2024-10-03 Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts Minh Le et.al. 2410.02200 null
2024-10-04 Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices Andres Potapczynski et.al. 2410.02117 link
2024-10-04 EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing Haotian Sun et.al. 2410.02098 null
2024-10-02 Don’t flatten, tokenize! Unlocking the key to SoftMoE’s efficacy in deep RL Ghada Sokar et.al. 2410.01930 null
2024-09-15 Integrating AI’s Carbon Footprint into Risk Management Frameworks: Strategies and Tools for Sustainable Compliance in Banking Sector Nataliya Tkachenko et.al. 2410.01818 null
2024-10-02 Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models Shayekh Bin Islam et.al. 2410.01782 link
2024-10-02 TIC 290061484: A Triply Eclipsing Triple System with the Shortest Known Outer Period of 24.5 Days Veselin B. Kostov et.al. 2410.01711 null
2024-10-02 Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging Tingfeng Hui et.al. 2410.01610 null
2024-10-02 The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs Hong Li et.al. 2410.01417 null
2024-10-01 MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards Sheng Wang et.al. 2410.00938 null
2024-10-01 UniAdapt: A Universal Adapter for Knowledge Calibration Tai D. Nguyen et.al. 2410.00454 null
2024-10-01 Robust Traffic Forecasting against Spatial Shift over Years Hongjun Wang et.al. 2410.00373 link
2024-09-29 IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method Chaohui Xu et.al. 2410.00059 null
2024-09-30 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang et.al. 2409.20566 null
2024-09-30 HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models Bingshen Mu et.al. 2409.19878 null
2024-10-02 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Jihai Zhang et.al. 2409.19291 link
2024-11-12 SciDFM: A Large Language Model with Mixture-of-Experts for Science Liangtai Sun et.al. 2409.18412 null
2024-11-01 Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE Xun Zhu et.al. 2409.17508 link
2024-09-26 A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction Guangyu Wang et.al. 2409.17440 link
2024-09-24 Leveraging Mixture of Experts for Improved Speech Deepfake Detection Viola Negroni et.al. 2409.16077 null
2024-10-02 Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Xiaoming Shi et.al. 2409.16040 link
2024-10-31 Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM Fengrun Zhang et.al. 2409.15905 null
2024-09-24 Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks Jiayi He et.al. 2409.15695 null
2024-12-13 A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts Hugo Inzirillo et.al. 2409.15161 link
2024-09-23 Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond Hong Chen et.al. 2409.14993 null
2024-09-21 Routing in Sparsely-gated Language Models responds to Context Stefan Arnold et.al. 2409.14107 null
2024-10-01 On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists Dongyang Fan et.al. 2409.13931 link
2024-09-20 Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning Annette Spooner et.al. 2409.13791 null
2024-09-19 On the rationality problem for hypersurfaces Jan Lange et.al. 2409.12834 null
2024-09-19 Retrieval-Augmented Test Generation: How Far Are We? Jiho Shin et.al. 2409.12682 null
2024-09-19 Robust Audiovisual Speech Recognition Models with Mixture-of-Experts Yihan Wu et.al. 2409.12370 null
2024-09-18 Mixture of Diverse Size Experts Manxi Sun et.al. 2409.12210 null
2024-09-18 GRIN: GRadient-INformed MoE Liyuan Liu et.al. 2409.12136 null
2024-09-18 Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0 Zhiyong Wang et.al. 2409.11909 null
2024-09-17 LPT++: Efficient Training on Mixture of Long-tailed Experts Bowen Dong et.al. 2409.11323 null
2024-12-09 LOLA – An Open-Source Massively Multilingual Large Language Model Nikit Srivastava et.al. 2409.11272 link
2024-09-16 Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression Yi-Hsin Li et.al. 2409.10101 null
2024-11-20 MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving Enming Zhang et.al. 2409.07267 link
2024-09-10 DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models Maryam Akhavan Aghdam et.al. 2409.06669 null
2024-09-10 STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning Jaeseong Lee et.al. 2409.06211 null
2024-10-31 VE: Modeling Multivariate Time Series Correlation with Variate Embedding Shangjiong Wang et.al. 2409.06169 link
2024-09-09 Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models Hongyang Lei et.al. 2409.05929 null
2024-09-09 Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks Bo Xu et.al. 2409.05726 null
2024-09-09 Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection Tianwu Lei et.al. 2409.05611 null
2024-09-06 Hot Stars in the GALEX Ultraviolet Sky Surveys (GUVcat_AISxSDSS_HS) and the Binary Fraction of Hot Evolved Stars Luciana Bianchi et.al. 2409.04626 null
2024-09-05 Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions Zemian Ke et.al. 2409.03282 null
2024-09-05 ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding Zhengzhuo Xu et.al. 2409.03277 null
2024-09-05 xLAM: A Family of Large Action Models to Empower AI Agent Systems Jianguo Zhang et.al. 2409.03215 link
2024-09-04 Configurable Foundation Models: Building LLMs from a Modular Perspective Chaojun Xiao et.al. 2409.02877 null
2024-09-04 Pluralistic Salient Object Detection Xuelu Feng et.al. 2409.02368 null
2024-09-03 OLMoE: Open Mixture-of-Experts Language Models Niklas Muennighoff et.al. 2409.02060 link
2024-09-05 Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model Hukai Huang et.al. 2409.02050 null
2024-09-03 BEAVER: An Enterprise Benchmark for Text-to-SQL Peter Baile Chen et.al. 2409.02038 null
2024-09-03 Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information Xinyu Zhang et.al. 2409.01605 null
2024-09-02 Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning Soumajyoti Sarkar et.al. 2409.01483 null
2024-09-02 Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching Sungmin Yun et.al. 2409.01141 null
2024-09-04 Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack Guanzhong Chen et.al. 2409.00960 link
2024-09-02 Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts Youngseog Chung et.al. 2409.00879 null
2024-09-11 Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts Rhui Dih Lee et.al. 2408.17280 null
2024-08-29 Gradient-free variational learning with conditional mixture networks Conor Heins et.al. 2408.16429 link
2024-09-07 Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Yuncheng Yang et.al. 2408.15915 link
2024-08-28 Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts Nikolas Gritsch et.al. 2408.15901 null
2024-10-23 LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Fangxun Shu et.al. 2408.15881 link
2024-08-28 Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Lean Wang et.al. 2408.15664 null
2024-08-27 Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis Sakhinana Sagar Srinivas et.al. 2408.15305 null
2024-08-28 A Survey of Large Language Models for European Languages Wazir Ali et.al. 2408.15040 null
2024-08-27 MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce Hao Jiang et.al. 2408.14968 null
2024-08-24 Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings Sagar Srinivas Sakhinana et.al. 2408.13622 null
2024-09-11 Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Yikang Shen et.al. 2408.13359 null
2024-10-30 The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities Venkatesh Balavadhani Parthasarathy et.al. 2408.13296 null
2024-08-23 Guiding IoT-Based Healthcare Alert Systems with Large Language Models Yulan Gao et.al. 2408.13071 null
2024-08-23 O-Mamba: O-shape State-Space Model for Underwater Image Enhancement Chenyu Dong et.al. 2408.12816 link
2024-08-23 DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation Xiaowei Mao et.al. 2408.12809 null
2024-08-23 Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth Yuxiang Wei et.al. 2408.12803 null
2024-08-23 La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection Hang Zou et.al. 2408.12793 null
2024-10-02 SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging Mohammadreza Pourreza et.al. 2408.12733 null
2024-08-22 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Jamba Team et.al. 2408.12570 null
2024-09-09 Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators Dingkang Yang et.al. 2408.12325 null
2024-08-15 FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models Zhongyu Zhao et.al. 2408.11855 link
2024-08-21 MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing Hao Zhou et.al. 2408.11396 link
2024-08-21 KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? Xiao Han et.al. 2408.11306 link
2024-08-21 FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts Hanzi Mei et.al. 2408.11304 null
2024-08-27 Unboxing Occupational Bias: Grounded Debiasing of LLMs with U.S. Labor Data Atmika Gorti et.al. 2408.11247 null
2024-08-25 Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting Jianxiang Zhou et.al. 2408.10822 link
2024-08-20 AnyGraph: Graph Foundation Model in the Wild Lianghao Xia et.al. 2408.10700 link
2024-08-20 HMoE: Heterogeneous Mixture of Experts for Language Modeling An Wang et.al. 2408.10681 null
2024-08-19 AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference Shuzhang Zhong et.al. 2408.10284 link
2024-10-29 FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models Xiaochen Wang et.al. 2408.10276 link
2024-08-26 SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models Anke Tang et.al. 2408.10174 link
2024-11-01 Customizing Language Models with Instance-wise LoRA for Sequential Recommendation Xiaoyu Kong et.al. 2408.10159 link
2024-08-19 A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method Hang Zou et.al. 2408.09752 null
2024-08-16 Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection Haohao Zhu et.al. 2408.08551 null
2024-08-17 BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts Qizhen Zhang et.al. 2408.08274 null
2024-05-21 Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Yunxin Li et.al. 2405.11273 null
2024-05-31 Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models Xudong Lu et.al. 2402.14800 null
2024-10-29 GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts Shirley Wu et.al. 2312.04693 null
2023-09-12 Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning Ted Zadouri et.al. 2309.05444 null
2023-04-25 Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism Xin Chen et.al. 2304.11414 null
2018-06-22 Mixtures of Experts Models Isobel Claire Gormley et.al. 1806.08200 null

Speculative Decoding

Publish Date Title Authors PDF Code
2026-04-02 Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding Tao Jin et.al. 2604.02047 null
2026-04-02 Reinforcement Learning for Speculative Trading under Exploratory Framework Yun Zhao et.al. 2604.02035 null
2026-04-02 Phonon Thermal Hall Effect in quartz and its absence in silica Yu Ling et.al. 2604.01908 null
2026-03-31 Frege in the Flesh: Biolinguistics and the Neural Enforcement of Syntactic Structures Elliot Murphy et.al. 2604.00291 null
2026-03-31 Spatially modulated morphotropic phase boundaries in a compressively strained multiferroic thin film Ting-Ran Liu et.al. 2604.00288 null
2026-03-31 Blockspace Under Pressure: An Analysis of Spam MEV on High-Throughput Blockchains Wenhao Wang et.al. 2604.00234 null
2026-03-31 Cloudy With a Chance of Meatballs Wolf Cukier et.al. 2603.29883 null
2026-03-31 Detecting speculative leaks with compositional semantics Xaver Fabian et.al. 2603.29800 null
2026-03-31 Milky Way evolution on a human timescale Eugene et.al. 2603.29503 null
2026-03-31 Mexican Burrowing Toads as gravitational wave detectors Frederic V. Hessman et.al. 2603.29334 null
2026-03-30 The Binary-Binary Hierarchical System XY Leo: A Laboratory for Stellar Activity and Concealed Companions D. Koçak et.al. 2603.28934 null
2026-04-02 A Black Hole Star at Cosmic Noon: Extreme Balmer break, photospheric continuum, and broad absorption by thick winds in a Little Red Dot at z=1.7 Alberto Torralba et.al. 2603.28335 null
2026-03-30 Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting Zhen Zou et.al. 2603.28049 null
2026-03-28 SJD-VP: Speculative Jacobi Decoding with Verification Prediction for Autoregressive Image Generation Bingqi Shan et.al. 2603.27115 null
2026-03-27 TAPS: Task Aware Proposal Distributions for Speculative Sampling Mohamad Zbib et.al. 2603.27027 null
2026-03-26 S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation Ligong Han et.al. 2603.25702 null
2026-03-26 Bulge Fossil Fragments as a new population of factories of gravitational wave sources in the Galaxy F. R. Ferraro et.al. 2603.25127 null
2026-03-26 Decoding Market Emotions in Cryptocurrency Tweets via Predictive Statement Classification with Machine Learning and Transformers Moein Shahiki Tash et.al. 2603.24933 null
2026-03-25 Quantum walk with a local spin interaction Manami Yamagishi et.al. 2603.24444 null
2026-03-25 AI Fortune-Teller: Juxtaposing Shaman and AI to Reveal Human Agency in the Age of AI Soonho Kwon et.al. 2603.23811 null
2026-03-24 Mars in the Australian Press, 1875-1899. 1. Interpretation, Authority and Planetary Science Richard de Grijs et.al. 2603.23563 null
2026-03-24 SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Haoyu Huang et.al. 2603.23483 null
2026-03-24 RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue Long Mai et.al. 2603.23346 null
2026-03-24 Mars excitement in Australian newspapers, 1877-1899: Humour and the public negotiation of astronomical knowledge Richard de Grijs et.al. 2603.22906 null
2026-03-23 From Brittle to Robust: Improving LLM Annotations for SE Optimization Lohith Senthilkumar et.al. 2603.22474 null
2026-03-24 Dynamic analysis enhances issue resolution Mingwei Liu et.al. 2603.22048 null
2026-03-22 On the origin of the strong internal magnetic fields of central compact objects Kazım Yavuz Ekşi et.al. 2603.21103 null
2026-03-21 SWE-Next: Scalable Real-World Software Engineering Tasks for Agents Jiarong Liang et.al. 2603.20691 null
2026-03-21 AEGIS: From Clues to Verdicts – Graph-Guided Deep Vulnerability Reasoning via Dialectics and Meta-Auditing Sen Fang et.al. 2603.20637 null
2026-03-20 Does This Gradient Spark Joy? Ian Osband et.al. 2603.20526 null
2026-03-23 ParallelVLM: Lossless Video-LLM Acceleration with Visual Alignment Aware Parallel Speculative Decoding Quan Kong et.al. 2603.19610 null
2026-03-19 Beyond the Desk: Barriers and Future Opportunities for AI to Assist Scientists in Embodied Physical Tasks Irene Hou et.al. 2603.19504 null
2026-03-19 Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation Chanh Nguyen et.al. 2603.19418 null
2026-03-19 The Uncertain Policy Price of Scaling Direct Air Capture Leonardo Chiani et.al. 2603.19143 null
2026-03-19 A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference Yida Zhang et.al. 2603.19133 null
2026-03-19 In the Margins: An Empirical Study of Ethereum Inscriptions Xihan Xiong et.al. 2603.19086 null
2026-03-19 Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution Yifan Sui et.al. 2603.18897 null
2026-03-19 SJD-PAC: Accelerating Speculative Jacobi Decoding via Proactive Drafting and Adaptive Continuation Jialiang Kang et.al. 2603.18599 null
2026-03-19 Dream the Dream: Futuring Communication between LGBTQ+ and Cisgender Groups in Metaverse Anqi Wang et.al. 2603.18578 null
2026-03-19 SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding Shenggui Li et.al. 2603.18567 null
2026-03-18 Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing Raghavv Goel et.al. 2603.17942 null
2026-03-18 HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness Zihao Zheng et.al. 2603.17573 null
2026-03-18 “Not Just Me and My To-Do List”: Understanding Challenges of Task Management for Adults with ADHD and the Need for AI-Augmented Social Scaffolds Jingruo Chen et.al. 2603.17258 null
2026-03-17 Search For a Counterpart to the Subsolar Mass Gravitational Wave Candidate S251112cm Nicholas Vieira et.al. 2603.17009 null
2026-03-17 Characterizing Delusional Spirals through Human-LLM Chat Logs Jared Moore et.al. 2603.16567 null
2026-03-17 SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation Hang Lv et.al. 2603.16219 null
2026-03-17 Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective Noppanat Wadlom et.al. 2603.16104 null
2026-03-16 Evasive Intelligence: Lessons from Malware Analysis for Evaluating AI Agents Simone Aonzo et.al. 2603.15457 null
2026-03-16 The ALMA-QUARKS Survey: Evidence of an Explosive Molecular Outflow in IRAS 15520–5234 Ariful Hoque et.al. 2603.15040 null
2026-03-16 MMSpec: Benchmarking Speculative Decoding for Vision-Language Models Hui Shen et.al. 2603.14989 null
2026-03-16 Hyper-learning and Unlearning: A Narrative Speculation on Urbanism in Media Ecologies Anqi Wang et.al. 2603.14810 null
2026-03-14 Early Rug Pull Warning for BSC Meme Tokens via Multi-Granularity Wash-Trading Pattern Profiling Dingding Cao et.al. 2603.13830 null
2026-03-14 Measuring Primitive Accumulation: An Information-Theoretic Approach to Capitalist Enclosure in PIK2, Indonesia Sandy Hardian Susanto Herho et.al. 2603.13715 null
2026-03-13 Towards Fluent Interaction with Cyber-Physical Architecture Jesse T. Gonzalez et.al. 2603.13633 null
2026-03-13 When Drafts Evolve: Speculative Decoding Meets Online Learning Yu-Yang Qian et.al. 2603.12617 null
2026-03-12 Design Exploration of Lightweight Interactions for Awareness-Supporting Technologies in Hybrid Work Lu Liu et.al. 2603.11977 null
2026-03-12 Edge-Cloud Collaborative Speech Emotion Captioning via Token-Level Speculative Decoding in Audio-Language Models Xiangyuan Xue et.al. 2603.11397 null
2026-03-11 One-loop mass corrections and decay widths of Type II heavy string states Massimo Bianchi et.al. 2603.11343 null
2026-03-11 Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts George Saon et.al. 2603.11243 null
2026-03-11 Chasing RATs: Tracing Reading for and as Creative Activity Sophia Liu et.al. 2603.11031 null
2026-03-11 XMM-Newton Observation and Optical Monitoring of the Candidate Redback Millisecond Pulsar 1FGL J0523.5 $-$ 2529 J. P. Halpern et.al. 2603.11028 null
2026-03-11 Kinematics of Wolf-Rayet Stars in the LMC: Clues to Subtype Origins Caden Burkhardt et.al. 2603.10826 null
2026-03-11 Supersonic flow of a Chaplygin gas past a conical wing with $Λ$ -shaped cross sections Minghong Han et.al. 2603.10401 null
2026-03-10 Intrinsic Numerical Robustness and Fault Tolerance in a Neuromorphic Algorithm for Scientific Computing Bradley H. Theilman et.al. 2603.10246 null
2026-03-10 Phase diagram of 4D SU(3) Yang-Mills theory at $θ=π$ via imaginary theta simulations Akira Matsumoto et.al. 2603.09604 null
2026-03-10 Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation Luxi Lin et.al. 2603.09527 null
2026-03-09 ConFu: Contemplate the Future for Better Speculative Sampling Zongyue Qin et.al. 2603.08899 null
2026-03-09 StreamReady: Learning What to Answer and When in Long Streaming Videos Shehreen Azad et.al. 2603.08620 null
2026-03-09 Scalable On-the-fly Transcoding for Adaptive Streaming of Dynamic Point Clouds Michael Rudolph et.al. 2603.08417 null
2026-03-09 Colloidal Probe Atomic Force Microscopy Reveals Anomalous Underscreening: A Matter of Experimental Conditions Thomas Tilger et.al. 2603.08326 null
2026-03-09 EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs Chang Han et.al. 2603.08088 null
2026-03-08 DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation Shuzhang Zhong et.al. 2603.07416 null
2026-03-07 From debt crises to financial crashes (and back): a stock-flow consistent model for stock price bubbles Matheus R. Grasselli et.al. 2603.07213 null
2026-03-02 SJD-PV: Speculative Jacobi Decoding with Phrase Verification for Autoregressive Image Generation Zhehao Yu et.al. 2603.06666 null
2026-03-06 What are AI researchers worried about? Cian O’Donovan et.al. 2603.06223 null
2026-03-06 EvoESAP: Non-Uniform Expert Pruning for Sparse MoE Zongfang Liu et.al. 2603.06003 null
2026-03-09 Balancing Latency and Accuracy of Code Completion via Local-Cloud Model Cascading Hanzhen Lu et.al. 2603.05974 null
2026-03-05 Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding Ofir Ben Shoham et.al. 2603.05210 null
2026-03-04 Quantum foundations for quantum technologies in the International Year of Quantum (2025) Angelo Bassi et.al. 2603.04630 null
2026-03-04 Raman scattering spectroscopic observation of a ferroelastic crossover in bond-frustrated PrCd $_3$P$_3$ Jackson Davis et.al. 2603.04539 null
2026-03-04 Weibel Instability-Driven Seed Magnetic Fields during Reionization Jorie McDermott et.al. 2603.03608 null
2026-03-03 Accelerating OpenPangu Inference on NPU via Speculative Decoding Yuntao Dai et.al. 2603.03383 null
2026-03-03 Speculative Speculative Decoding Tanishq Kumar et.al. 2603.03251 null
2026-03-03 Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models Shubhangi Upasani et.al. 2603.02631 null
2026-03-02 Latitude-Dependent Time Variations of the Solar Tachocline Sarbani Basu et.al. 2603.02321 null
2026-03-02 Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning Jiebin Zhang et.al. 2603.01639 null
2026-03-02 KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models Zihao Zheng et.al. 2603.01581 null
2026-03-02 Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification Guang Huang et.al. 2603.01399 null
2026-03-01 Proscenium: Exploring Design Spaces of Layered Information Experience on a Large Dual-Layer Transparent Display Chen Chen et.al. 2603.01238 null
2026-02-27 Stellar engines and Dyson bubbles can be stable Colin R McInnes et.al. 2603.00203 null
2026-02-27 Betting under Common Beliefs: The Effect of Probability Weighting Patrick Beissner et.al. 2602.24194 null
2026-02-27 Task-Centric Acceleration of Small-Language Models Dor Tsur et.al. 2602.24174 null
2026-02-27 LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding Alexander Samarin et.al. 2602.23881 null
2026-02-27 The Auton Agentic AI Framework Sheng Cao et.al. 2602.23720 null
2026-02-27 Active Learning for Planet Habitability Classification under Extreme Class Imbalance R. I. El-Kholy et.al. 2602.23666 null
2026-02-25 The shape of transverse momentum spectra in hybrid hydrodynamic models Thiago S. Domingues et.al. 2602.22490 null
2026-02-25 BMN-like Matrix Models Eunwoo Lee et.al. 2602.22163 null
2026-02-25 Speculating for Epiplexity: How to Learn the Most from Speculative Design? Botao Amber Hu et.al. 2602.22132 null
2026-02-25 Tidal disruptions of rubble piles: The case of Phobos Harrison Agrusa et.al. 2602.21912 null
2026-02-24 Asymptotically (un)safe scattering amplitudes from scratch: a deep dive into the IR jungle Benjamin Knorr et.al. 2602.21285 null
2026-02-23 KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem Seongjin Cha et.al. 2602.20217 null
2026-02-23 SemanticNVS: Improving Semantic Scene Understanding in Generative Novel View Synthesis Xinya Chen et.al. 2602.20079 null
2026-02-23 Anisotropic magnons in a layered honeycomb ferromagnet Travis J. Williams et.al. 2602.19935 null
2026-02-23 Two-parameter families of MPO integrals of motion in Heisenberg spin chains Vsevolod I. Yashin et.al. 2602.19741 null
2026-02-23 Leap+Verify: Regime-Adaptive Speculative Weight Prediction for Accelerating Neural Network Training Jeremy McEntire et.al. 2602.19580 null
2026-02-21 WANSpec: Leveraging Global Compute Capacity for LLM Inference Noah Martin et.al. 2602.18931 null
2026-02-19 Insidious Imaginaries: A Critical Overview of AI Speculations Dejan Grba et.al. 2602.17383 null
2026-02-19 Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding Rahul Thomas et.al. 2602.16994 null
2026-02-19 A testable framework for AI alignment: Simulation Theology as an engineered worldview for silicon-based agents Josef A. Habdank et.al. 2602.16987 null
2026-02-18 Greedy Multi-Path Block Verification for Faster Decoding in Speculative Sampling Rahul Thomas et.al. 2602.16961 null
2026-02-18 Privacy-Aware Split Inference with Speculative Decoding for Large Language Models over Wide-Area Networks Michael Cunningham et.al. 2602.16760 null
2026-02-17 MoE-Spec: Expert Budgeting for Efficient Speculative Decoding Bradley McDanel et.al. 2602.16052 null
2026-02-17 A Theoretical Approach to Stablecoin Design via Price Windows Katherine Molinet et.al. 2602.15981 null
2026-02-17 Robot-Assisted Social Dining as a White Glove Service Atharva S Kashyap et.al. 2602.15767 null
2026-02-17 Hot subdwarf stars from the Hamburg Quasar Survey Ulrich Heber et.al. 2602.15692 null
2026-02-17 Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs Libo Zhang et.al. 2602.15318 null
2026-02-16 Distributed Semi-Speculative Parallel Anisotropic Mesh Adaptation Kevin Garner et.al. 2602.15204 null
2026-02-16 Kami of the Commons: Towards Designing Agentic AI to Steward the Commons Botao Amber Hu et.al. 2602.14940 null
2026-02-16 Predicting the success of new crypto-tokens: the Pump.fun case Giulio Marino et.al. 2602.14860 null
2026-02-16 Atomix: Timely, Transactional Tool Use for Reliable Agentic Workflows Bardia Mohammadi et.al. 2602.14849 null
2026-02-14 Speculative Decoding with a Speculative Vocabulary Miles Williams et.al. 2602.13836 null
2026-02-14 The Shadow Boss: Identifying Atomized Manipulations in Agentic Employment of XR Users using Scenario Constructions Lik-Hang Lee et.al. 2602.13622 null
2026-02-13 ORAP: Optimized Row Access Prefetching for Rowhammer-mitigated Memory Maccoy Merrell et.al. 2602.13434 null
2026-02-13 Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding Wenhui Liao et.al. 2602.12957 null
2026-02-12 Holographic Equidistribution Nico Cooper et.al. 2602.12265 null
2026-02-12 Embodied AI Agents for Team Collaboration in Co-located Blue-Collar Work Kaisa Vaananen et.al. 2602.12136 null
2026-02-12 Wisdom of the LLM Crowd: A Large Scale Benchmark of Multi-Label U.S. Election-Related Harmful Social Media Content Qile Wang et.al. 2602.11962 null
2026-02-11 What do people want to fact-check? Bijean Ghafouri et.al. 2602.10935 null
2026-02-10 Simulation of the Space-Charge-Limited Current Density for Time-Variant Pulsed Injection H. Huang et.al. 2602.09399 null
2026-02-10 Understanding Risk and Dependency in AI Chatbot Use from User Discourse Jianfeng Zhu et.al. 2602.09339 null
2026-02-09 PICASSO: Scaling CHERI Use-After-Free Protection to Millions of Allocations using Colored Capabilities Merve Gülmez et.al. 2602.09131 null
2026-02-09 Benchmarking the Energy Savings with Speculative Decoding Strategies Rohit Dutta et.al. 2602.09113 null
2026-02-09 Symplectic excision and distance rigidity Yoel Groman et.al. 2602.08969 null
2026-02-09 Three Lessons from Citizen-Centric Participatory AI Design Eike Schneiders et.al. 2602.08554 null
2026-02-09 On- and off-chain demand and supply drivers of Bitcoin price Pavel Ciaian et.al. 2602.08429 null
2026-02-09 TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration Linye Wei et.al. 2602.08404 null
2026-02-10 Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices Alejandro Ruiz y Mesa et.al. 2602.08060 null
2026-02-08 Dark Matter as Screened Ordinary Matter Colin D. Froggatt et.al. 2602.07902 null
2026-02-07 Motivic invariants of moduli stacks of Higgs bundles and bundles with connections: results and speculations Roman Fedorov et.al. 2602.07713 null
2026-02-07 Series-Parallel-Loop Decompositions of Control-flow Graphs Xuran Cai et.al. 2602.07627 null
2026-02-07 Astrophysical positronium and Dicke superradiance Abdaljalel E. Alizzi et.al. 2602.07489 null
2026-02-07 Imagining the Alien: Human Projections and Cognitive Limitations S. G. Djorgovski et.al. 2602.07284 null
2026-02-06 XShare: Collaborative in-Batch Expert Sharing for Faster MoE Inference Daniil Vankov et.al. 2602.07265 null
2026-02-06 SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding Yikang Yue et.al. 2602.07223 null
2026-02-06 When RL Meets Adaptive Speculative Training: A Unified Training-Serving System Junxiong Wang et.al. 2602.06932 null
2026-02-06 Continued fraction method for high overtone quasinormal modes in effective potentials with discontinuity Guan-Ru Li et.al. 2602.06536 null
2026-02-06 RelayGen: Intra-Generation Model Switching for Efficient Reasoning Jiwon Song et.al. 2602.06454 null
2026-02-06 Quenching Speculation in Quantum Markets via Entangled Neural Traders Kieran Hymas et.al. 2602.06367 null
2026-02-05 DFlash: Block Diffusion for Flash Speculative Decoding Jian Chen et.al. 2602.06036 null
2026-02-05 V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval Dongyang Chen et.al. 2602.06034 null
2026-02-05 Multi-Token Prediction via Self-Distillation John Kirchenbauer et.al. 2602.06019 null
2026-02-05 Measurement-Induced Dynamics of Particles and Quasiparticles in a Bose-Einstein-condensate array Huy Nguyen et.al. 2602.05924 null
2026-02-05 Prompting Destiny: Negotiating Socialization and Growth in an LLM-Mediated Speculative Gameworld Mandi Yang et.al. 2602.05864 null
2026-02-05 The near-continuum mechanism for extended Boltzmann theory: the non-equilibrium relaxation Sha Liu et.al. 2602.05775 null
2026-02-05 Variational Speculative Decoding: Rethinking Draft Training from Token Likelihood to Sequence Acceptance Xiandong Zou et.al. 2602.05774 null
2026-02-05 SDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM Acceleration Hanyu Wei et.al. 2602.05499 null
2026-02-05 TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference Jiyoung Park et.al. 2602.05145 null
2026-02-04 SPPAM: Signature Pattern Prediction and Access-Map Prefetcher Maccoy Merrell et.al. 2602.04100 null
2026-02-03 pop-cosmos: Redshifts and physical properties of KiDS-1000 galaxies Anik Halder et.al. 2602.03930 null
2026-02-03 SpecMD: A Comprehensive Study On Speculative Expert Prefetching Duc Hoang et.al. 2602.03921 null
2026-02-04 Beyond Tokens: Semantic-Aware Speculative Decoding for Efficient Inference by Probing Internal States Ximing Dong et.al. 2602.03708 null
2026-02-03 Efficient Algorithms for Partial Constraint Satisfaction Problems over Control-flow Graphs Xuran Cai et.al. 2602.03588 null
2026-02-02 The emergent Big Bang scenario Justin C. Feng et.al. 2602.02646 null
2026-02-02 An Empirical Study on Noisy Data and LLM Pretraining Loss Divergence Qizhen Zhang et.al. 2602.02400 null
2026-02-02 PRISM: Parametrically Refactoring Inference for Speculative Sampling Draft Models Xuliang Wang et.al. 2602.01762 null
2026-02-02 A Practical Tensor-Network Compression Pipeline for Production-Scale Large Language Models Sergii Kozyrev et.al. 2602.01613 null
2026-02-02 Are Security Cues Static? Rethinking Warning and Trust Indicators for Life Transitions Sarah Tabassum et.al. 2602.01544 null
2026-02-01 P-EAGLE: Parallel-Drafting EAGLE with Scalable Training Mude Hui et.al. 2602.01469 null
2026-02-01 Improve the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models Weiqing He et.al. 2602.01428 null
2026-02-01 FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching Divya Jyoti Bajpai et.al. 2602.01329 null
2026-02-01 PACER: Blockwise Pre-verification for Speculative Decoding with Adaptive Length Situo Zhang et.al. 2602.01274 null
2026-01-31 Eternagram: Inspiring Climate Action Through LLM-based Conversational Exploration of a Post-Devastation Climate Future Suifang Zhou et.al. 2602.00571 null
2026-01-31 SAGE: Accelerating Vision-Language Models via Entropy-Guided Adaptive Speculative Decoding Yujia Tong et.al. 2602.00523 null
2026-01-30 TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification Haoyun Jiang et.al. 2601.23180 null
2026-01-30 SpecIBT: Formally Verified Protection Against Speculative Control-Flow Hijacking Jonathan Baumann et.al. 2601.22978 null
2026-01-30 Beyond Medical Chatbots: Meddollina and the Rise of Continuous Clinical Intelligence Vaibhav Ram S. V. N. S et.al. 2601.22645 null
2026-01-29 Plant-Inspired Robot Design Metaphors for Ambient HRI Victor Nikhil Antony et.al. 2601.22387 null
2026-01-29 Subsolar mass black holes from stellar collapse induced by primordial black holes Thomas W. Baumgarte et.al. 2601.22220 null
2026-01-29 StarSD: One-for-Many Speculative Decoding Junhao He et.al. 2601.21622 null
2026-01-29 SPOILER-GUARD: Gating Latency Effects of Memory Accesses through Randomized Dependency Prediction Gayathri Subramanian et.al. 2601.21211 null
2026-01-29 Scaling Embeddings Outperforms Scaling Experts in Language Models Hong Liu et.al. 2601.21204 null
2026-01-28 Unplugging a Seemingly Sentient Machine Is the Rational Choice – A Metaphysical Perspective Erik J Bekkers et.al. 2601.21016 null
2026-01-28 Manipulation in Prediction Markets: An Agent-based Modeling Experiment Bridget Smart et.al. 2601.20452 null
2026-01-28 TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs Minjae Lee et.al. 2601.20357 null
2026-01-26 LTS-VoiceAgent: A Listen-Think-Speak Framework for Efficient Streaming Voice Interaction via Semantic Triggering and Incremental Reasoning Wenhao Zou et.al. 2601.19952 null
2026-01-27 The Competence Crisis: A Design Fiction on AI-Assisted Research in Software Engineering Mairieli Wessel et.al. 2601.19628 null
2026-01-27 DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference Fuliang Liu et.al. 2601.19278 link
2026-01-26 Flatter Tokens are More Valuable for Speculative Draft Model Training Jiaming Fan et.al. 2601.18902 null
2026-01-26 Towards a Proof of the Improved Quantum Null Energy Condition Ido Ben-Dayan et.al. 2601.18860 null
2026-01-26 Disk-jet-wind coupling from stellar mass to supermassive black holes Chris Done et.al. 2601.18607 null
2026-01-30 LLM-42: Enabling Determinism in LLM Inference with Verified Speculation Raja Gond et.al. 2601.17768 null
2026-01-24 Improving User Privacy in Personalized Generation: Client-Side Retrieval-Augmented Modification of Server-Side Generated Speculations Alireza Salemi et.al. 2601.17569 null
2026-01-24 Towards a Declarative Agentic Layer for Intelligent Agents in MCP-Based Server Ecosystems Maria Jesus Rodriguez-Sanchez et.al. 2601.17435 null
2026-01-24 Auditing Disability Representation in Vision-Language Models Srikant Panda et.al. 2601.17348 null
2026-01-27 From Clicks to Consensus: Collective Consent Assemblies for Data Governance Lin Kyi et.al. 2601.16752 null
2026-01-23 Integrated Photonic Quantum Computing: From Silicon to Lithium Niobate Hui Zhang et.al. 2601.16484 null
2026-01-21 MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification Jingwei Song et.al. 2601.15498 null
2026-01-23 Emergent, not Immanent: A Baradian Reading of Explainable AI Fabio Morreale et.al. 2601.15029 null
2026-01-13 On the Limits of Learned Importance Scoring for KV Cache Compression Brady Steele et.al. 2601.14279 null
2026-01-21 The Non-Predictability of Mispredicted Branches using Timing Information Ioannis Constantinou et.al. 2601.13804 null
2026-01-19 Quasinormal modes and their excitation beyond general relativity. II: isospectrality loss in gravitational waveforms Hector O. Silva et.al. 2601.13411 null
2026-01-19 The Words That Can’t Be Shared: Exploring the Design of Unsent Messages Michael Yin et.al. 2601.13343 null
2026-01-19 Time variations of the mean magnetic flux in active regions of different magneto-morphological classes Anastasiya Zhukova et.al. 2601.13168 null
2026-01-18 SplittingSecrets: A Compiler-Based Defense for Preventing Data Memory-Dependent Prefetcher Side-Channels Reshabh K Sharma et.al. 2601.12270 null
2026-01-18 Speculative Sampling with Reinforcement Learning Chenan Wang et.al. 2601.12212 null
2026-01-17 A Dynamo Confinement Scenario for the Solar Tachocline and its Implications for Spin-down in the Radiative Spreading Regime Loren I. Matilsky et.al. 2601.11943 null
2026-01-16 On Abnormal Execution Timing of Conditional Jump Instructions Annika Wilde et.al. 2601.11696 null
2026-01-15 WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching Xiangchen Li et.al. 2601.11652 null
2026-01-16 Spectral evolution of hot hybrid white dwarfs: II. Photometry Semih Filiz et.al. 2601.11191 null
2026-01-16 Coexisting electronic smectic liquid crystal and superconductivity in a Si square-net semimetal Christopher J. Butler et.al. 2601.10939 null
2026-01-14 Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation Xingyao Li et.al. 2601.09212 null
2026-01-14 SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache Chi-Chih Chang et.al. 2601.09083 null
2026-01-13 HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding Qitan Lv et.al. 2601.08273 null
2026-01-12 Spacetime Quasicrystals Latham Boyle et.al. 2601.07769 null
2026-01-12 Crypto Pricing with Hidden Factors Matthew Brigida et.al. 2601.07664 null
2026-01-12 TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees Tianyu Liu et.al. 2601.07353 null
2026-01-11 The AI Cognitive Trojan Horse: How Large Language Models May Bypass Human Epistemic Vigilance Andrew D. Maynard et.al. 2601.07085 null
2026-01-14 A binary merger product as the direct progenitor of a Type II-P supernova Zexi Niu et.al. 2601.06577 null
2026-01-14 VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit Junda Lin et.al. 2601.05755 null
2026-01-09 Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding Yuxuan Zhou et.al. 2601.05724 null
2026-01-09 Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism Yuhao Shen et.al. 2601.05524 null
2026-01-08 Multi-Scale Local Speculative Decoding for Image Generation Elia Peruzzo et.al. 2601.05149 null
2026-01-08 Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence Shengyin Sun et.al. 2601.04766 null
2026-01-08 The UnScripted Trip: Fostering Policy Discussion on Future Human-Vehicle Collaboration in Autonomous Driving Through Design-Oriented Methods Xinyan Yu et.al. 2601.04601 null
2026-01-06 Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication Daniel Qian et.al. 2601.03390 null
2026-01-06 On the Hilbert-Chow crepant resolution conjecture Denis Nesterov et.al. 2601.03036 null
2026-01-08 MiMo-V2-Flash Technical Report Xiaomi LLM-Core Team et.al. 2601.02780 null
2026-01-06 Experience and Adaptation in AI-mediated Hiring Systems: A Combined Analysis of Online Discourse and Interface Design Md Nazmus Sakib et.al. 2601.02775 null
2026-01-06 From Slaves to Synths? Superintelligence and the Evolution of Legal Personality Simon Chesterman et.al. 2601.02773 null
2026-01-06 Hypothesize-Then-Verify: Speculative Root Cause Analysis for Microservices with Pathwise Parallelism Lingzhe Zhang et.al. 2601.02736 null
2026-01-05 A modern perspective on Tutte’s homotopy theorem Matthew Baker et.al. 2601.02582 null
2026-01-06 The Betelgeuse Enigma: The Betelbuddy Hypothesis Priya Hasan et.al. 2601.02012 null
2026-01-07 FastV-RAG: Towards Fast and Fine-Grained Video QA with Retrieval-Augmented Generation Gen Li et.al. 2601.01513 null
2026-01-02 FlexSpec: Frozen Drafts Meet Evolving Targets in Edge-Cloud Collaborative LLM Speculative Decoding Yuchen Li et.al. 2601.00644 null
2026-01-01 MR-DAW: Towards Collaborative Digital Audio Workstations in Mixed Reality Torin Hopkins et.al. 2601.00326 null
2025-12-31 The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition Xiaoze Liu et.al. 2601.00065 null
2025-12-29 From Clay to Code: Typological and Material Reasoning in AI Interpretations of Iranian Pigeon Towers Abolhassan Pishahang et.al. 2601.00029 null
2025-12-31 Intriguing Magnetocaloric Effect in 6H-perovskite Ba3RRu2O9 (R=Ho, Gd, Tb, Nd) with Strong 4d-4f Correlations Mohit Kumar et.al. 2512.24758 null
2025-12-29 Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding Yue Guan et.al. 2512.23858 null
2025-12-29 Entropy-Aware Speculative Decoding Toward Improved LLM Reasoning Tiancheng Su et.al. 2512.23765 null
2025-12-27 Landauer cost in a continuous vacuum/no-vacuum measurement Lorenzo Pirovano et.al. 2512.23751 null
2025-12-29 Soft Robotic Technological Probe for Speculative Fashion Futures Amy Ingold et.al. 2512.23570 null
2025-12-29 Fuzzilicon: A Post-Silicon Microcode-Guided x86 CPU Fuzzer Johannes Lenzen et.al. 2512.23438 null
2025-12-28 An Architecture-Led Hybrid Report on Body Language Detection Project Thomson Tong et.al. 2512.23028 null
2026-01-05 AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing Jiacheng Li et.al. 2512.22455 null
2025-12-27 Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving Rui Li et.al. 2512.22420 null
2025-12-26 Eliminate Branches by Melding IR Instructions Yuze Li et.al. 2512.22390 null
2025-12-26 Accelerate Speculative Decoding with Sparse Computation in Verification Jikai Wang et.al. 2512.21911 null
2025-12-26 Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees Haodong Lei et.al. 2512.21857 null
2025-12-24 dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning Shirui Chen et.al. 2512.21446 null
2025-12-24 Parallel Token Prediction for Language Models Felix Draxler et.al. 2512.21323 null
2025-12-24 Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning Shengguang Wu et.al. 2512.20934 null
2025-12-23 Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs Rui Pan et.al. 2512.20573 null
2025-12-23 DecoKAN: Interpretable Decomposition for Forecasting Cryptocurrency Market Dynamics Yuan Gao et.al. 2512.20028 null
2025-12-22 Multimodal LLMs for Historical Dataset Construction from Archival Image Scans: German Patents (1877-1918) Niclas Griesshaber et.al. 2512.19675 null
2025-12-20 Towards Efficient Agents: A Co-Design of Inference Architecture and System Weizhe Lin et.al. 2512.18337 null
2025-12-19 Digital Bricolage: Design Speculations for Embodied Approaches to Digitized Print-based Cultural Collections Malak Sadek et.al. 2512.17590 null
2025-12-19 Accelerating Multi-modal LLM Gaming Performance via Input Prediction and Mishit Correction Ziyang Lin et.al. 2512.17250 null
2025-12-18 Machines, AI and the past//future of things Karola Köpferl et.al. 2512.16285 null
2025-12-18 Fast Collaborative Inference via Distributed Speculative Decoding Ce Zheng et.al. 2512.16273 null
2025-12-17 Optimizing Agentic Language Model Inference via Speculative Tool Calls Daniel Nichols et.al. 2512.15834 null
2025-12-14 Variable Record Table: A Unified Hardware-Assisted Framework for Runtime Security Suraj Kumar Sah et.al. 2512.15777 null
2025-12-13 TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration Ye Li et.al. 2512.15773 null
2025-12-17 Probing the dynamics of stringy flux tubes with large $R$ -charge Davide Bonomi et.al. 2512.15698 null
2025-12-17 The longest known tails of ram-pressure stripped star-forming galaxies are caused by an ICM shock in Abell 1367 H. W. Edler et.al. 2512.15660 null
2025-12-17 DEER: Draft with Diffusion, Verify with Autoregressive Models Zicong Cheng et.al. 2512.15176 null
2025-12-16 Steering Alternative Realities through Local Quantum Memory Operations Xiongfeng Ma et.al. 2512.14377 null
2025-12-16 PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion Huizheng Wang et.al. 2512.14322 null
2025-12-16 The Impact Market to Save Conference Peer Review: Decoupling Dissemination and Credentialing Karthikeyan Sankaralingam et.al. 2512.14104 null
2025-12-16 RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees Junjie Ma et.al. 2512.14069 null
2025-12-17 Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models Chendong Sun et.al. 2512.13194 null
2025-12-14 Spectral Theory of Almost Periodic Banach–Malcev Algebras and Applications to Moufang Dynamics Marwa Ennaceur et.al. 2512.12687 null
2025-12-16 Mage: Cracking Elliptic Curve Cryptography with Cross-Axis Transformers Lily Erickson et.al. 2512.12483 null
2025-12-13 Moduli stacks of quiver connections and non-Abelian Hodge theory Mahmud Azam et.al. 2512.12188 null
2025-12-13 Binarity at LOw Metallicity (BLOeM): Projected rotational velocities D. J. Lennon et.al. 2512.12102 null
2025-12-12 Universal Dynamics of Financial Bubbles in Isolated Markets: Evidence from the Iranian Stock Market Ali Hosseinzadeh et.al. 2512.12054 null
2025-12-11 CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving Dong Liu et.al. 2512.11920 null
2025-12-12 Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks Sergey Pankratov et.al. 2512.11718 null
2025-12-12 AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference Kuan-Wei Lu et.al. 2512.11280 null
2025-12-12 FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration Dongwon Jung et.al. 2512.11213 null
2025-12-11 Site Preference and Possible Coexistence of Antiferromagnetic Order and Magnetic Frustration in (Co1-xMgx)10Ge3O16 (0 <= x <= 30%) Gina Angelo et.al. 2512.11132 null
2025-12-11 Mixing by offshore wind infrastructure: Resolving the density stratified wakes past vertical cylinders Charlie J. Lloyd et.al. 2512.10751 null
2025-12-11 T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground Dmitrii Stoianov et.al. 2512.10430 null
2025-12-11 Motifs in self-organising cells Ying Chen Lim et.al. 2512.10307 null
2025-12-10 Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning Logan Robbins et.al. 2512.10054 null
2025-12-14 GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference Phuong Tran et.al. 2512.09963 null
2025-12-10 A Speculative GLRT-Backed Approach for Adversarial Resilience on Deep Learning-Based Array Processing Nian-Cin Wang et.al. 2512.09893 null
2025-12-10 Baseline: Operation-Based Evolution and Versioning of Data Jonathan Edwards et.al. 2512.09762 null
2025-12-10 WASP-12, shrouded in mystery or just cold gas? Simon Daley-Yates et.al. 2512.09593 null
2025-12-09 Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation Zhen Zou et.al. 2512.08537 null
2025-12-08 Fair Benchmarking of Optimisation Applications Frank Phillipson et.al. 2512.07915 null
2025-11-30 The Endogenous Constraint: Hysteresis, Stagflation, and the Structural Inhibition of Monetary Velocity in the Bitcoin Network (2016-2025) Hamoon Soleimani et.al. 2512.07886 null
2025-12-08 Chemical complexity in star formation induced by stellar feedback: cores shock-formed by the supernova remnant W44 G. Cosentino et.al. 2512.07562 null
2025-12-08 SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image Generation Yao Teng et.al. 2512.07503 null
2025-12-06 BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination Huizheng Wang et.al. 2512.06457 null
2025-12-05 Protocol Futuring: Speculating Second-Order Dynamics of Protocols in Sociotechnical Infrastructural Futures Botao ‘Amber’ Hu et.al. 2512.06108 null
2025-12-05 Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction Ruihong Yin et.al. 2512.05597 null
2025-12-09 Arbitrage: Efficient Reasoning via Advantage-Aware Speculation Monishwaran Maheswaran et.al. 2512.05033 null
2025-12-04 Long-term X-ray variability of the multiple-planet host L 98-59: Hints of an activity cycle I. Pillitteri et.al. 2512.04817 null
2025-12-04 RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting Siqi Wang et.al. 2512.04752 null
2025-12-03 Counting AdS Vacua Zihni Kaan Baykara et.al. 2512.04151 null
2025-12-01 Humanity in the Age of AI: Reassessing 2025’s Existential-Risk Narratives Mohamed El Louadi et.al. 2512.04119 null
2025-12-02 From Administrative Chaos to Analytical Cohorts: A Three-Stage Normalisation Pipeline for Longitudinal University Administrative Records H. R. Paz et.al. 2512.02936 null
2025-12-02 A Human-centric Framework for Debating the Ethics of AI Consciousness Under Uncertainty Zhou Ziheng et.al. 2512.02544 null
2025-12-02 SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification Zhendong Tan et.al. 2512.02337 null
2025-12-05 Much Ado About Noising: Dispelling the Myths of Generative Robotic Control Chaoyi Pan et.al. 2512.01809 null
2025-12-01 Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding Yilong Zhao et.al. 2512.01278 null
2025-11-30 Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding Pengfei Hu et.al. 2512.00805 null
2025-11-30 SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs Jiaming Xu et.al. 2512.00722 null
2025-11-30 SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving Bohan Zhao et.al. 2512.00719 null
2025-11-29 Speculating on the Role of Media Architecture in Post-disaster Rebuilding and Recovery: Insights from Architects and Interaction Designers Berk Goksenin Tan et.al. 2512.00537 null
2025-11-29 Measuring Memecoin Fragility Yuexin Xiang et.al. 2512.00377 null
2025-12-04 Retail Investor Horizon and Earnings Announcements Domonkos F. Vamossy et.al. 2512.00280 null
2025-12-05 Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match Jinze Li et.al. 2511.22972 null
2025-12-03 AI Deception: Risks, Dynamics, and Controls Boyuan Chen et.al. 2511.22619 null
2025-11-27 LLM-Cave: A benchmark and light environment for large language models reasoning and decision-making system Huanyu Li et.al. 2511.22598 null
2025-11-26 Dark Speculation: Combining Qualitative and Quantitative Understanding in Frontier AI Risk Analysis Daniel Carpenter et.al. 2511.21838 null
2025-11-26 Nuclear Detonations as Probes of Hidden Superluminal Sectors Karl Svozil et.al. 2511.21793 null
2025-11-25 The dynamic of a tax on land value : concepts, models and impact scenario Hugo Spring-Ragain et.al. 2511.21766 null
2025-11-24 Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models Linye Wei et.al. 2511.21759 null
2025-12-01 DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving Fengze Yu et.al. 2511.21669 null
2025-11-26 Weak gravity at micron scales from dark bubble cosmology and its cosmological consequences Ulf Danielsson et.al. 2511.21362 null
2025-11-25 FREE: Uncertainty-Aware Autoregression for Parallel Diffusion Transformers Xinwan Wen et.al. 2511.20390 null
2025-11-25 Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios Luohe Shi et.al. 2511.20340 null
2025-11-25 Adaptive LLM Agents: Toward Personalized Empathetic Care Priyanka Singh et.al. 2511.20080 null
2025-11-25 Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design Zixiao Huang et.al. 2511.20048 null
2025-11-24 Agint: Agentic Graph Compilation for Software Engineering Agents Abhi Chivukula et.al. 2511.19635 null
2025-11-24 AI Consciousness and Existential Risk Rufin VanRullen et.al. 2511.19115 null
2025-11-24 NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations Yejing Wang et.al. 2511.18793 null
2025-11-22 Accelerating Time Series Foundation Models with Speculative Decoding Pranav Subbaraman et.al. 2511.18191 null
2025-11-22 Revisiting $γ$ -Ray Orbital Modulation in the Redback Millisecond Pulsar PSR J2039-5617 Mengqing Zhang et.al. 2511.17900 null
2025-11-21 Broadband X-ray observations of the periodic optical source ZTF J185139.81+171430.3 and its identification as a massive intermediate polar Ren Deng et.al. 2511.17800 null
2025-11-21 Pre-cache: A Microarchitectural Solution to prevent Meltdown and Spectre Subhash Sethumurugan et.al. 2511.17726 null
2025-11-21 Which active galaxies might be neutrino emitters? Shuying Zhou et.al. 2511.16869 null
2025-11-20 Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter Qinghao Hu et.al. 2511.16665 null
2025-11-20 An observationally based wind model contemporaneous with the radio detections in $τ$ Boötis Dag Evensberget et.al. 2511.16370 null
2025-11-21 Fast LLM Post-training via Decoupled and Best-of-N Speculation Rongxin Cheng et.al. 2511.16193 null
2025-11-20 Can Online GenAI Discussion Serve as Bellwether for Labor Market Shifts? Shurui Cao et.al. 2511.16028 null
2025-11-19 Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization Rahul Krishna Thomas et.al. 2511.15898 null
2025-11-19 Fossil group origins XIV: The radial orbits of A267 S. Zarattini et.al. 2511.15786 null
2025-11-19 FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation Tingrui Shen et.al. 2511.15618 null
2025-11-24 Structural phase transitions in the van der Waals ferromagnets Fe $x$Pd${y}$Te$_2$ Rafaela F. S. Penacchio et.al. 2511.15584 null
2025-11-19 Cost-Aware Prediction (CAP): An LLM-Enhanced Machine Learning Pipeline and Decision Support System for Heart Failure Mortality Prediction Yinan Yu et.al. 2511.15357 null
2025-11-19 Gaussian Blending: Rethinking Alpha Blending in 3D Gaussian Splatting Junseo Koo et.al. 2511.15102 null
2025-11-18 Harmful Traits of AI Companions W. Bradley Knox et.al. 2511.14972 null
2025-11-18 Photometric Constraints on Intermediate-mass Black Holes in the Galactic Centre Tamojeet Roychowdhury et.al. 2511.14856 null
2025-11-23 Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning Ruoyu Qin et.al. 2511.14617 null
2025-11-18 Positive AGN feedback in the outskirts of nearby barred spiral galaxies? Bannanje Ananthamoorthy et.al. 2511.14257 null
2025-11-18 Enhanced UV emission knot in the giant radio galaxy NGC 315: Hint of patchy star formation? Bannanje Ananthamoorthy et.al. 2511.14252 null
2025-11-18 MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts Wenfeng Wang et.al. 2511.14102 null
2025-11-17 Beat the long tail: Distribution-Aware Speculative Decoding for RL Training Zelei Shao et.al. 2511.13841 null
2025-11-17 VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping Haotian Dong et.al. 2511.13587 null
2025-11-17 Tfin Crypto: From Speculation to Optimization in Risk Managed Crypto Portfolio Allocation Thanh Nguyen et.al. 2511.13239 null
2025-11-15 Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding Arun Ramachandran et.al. 2511.12031 null
2025-11-15 Educators on the Frontline: Philosophical and Realistic Perspectives on Integrating ChatGPT into the Learning Space Surajit Das et.al. 2511.11960 null
2025-11-13 Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput Jingwei Song et.al. 2511.11733 null
2025-11-09 Exploring Parallelism in FPGA-Based Accelerators for Machine Learning Applications Sed Centeno et.al. 2511.11640 null
2025-11-14 Fast and Expressive Multi-Token Prediction with Probabilistic Circuits Andreas Grivas et.al. 2511.11346 null
2025-11-14 Optimising Density Computations in Probabilistic Programs via Automatic Loop Vectorisation Sangho Lim et.al. 2511.11070 null
2025-11-13 Widening of Binaries via Non-conservative Mass Transfer as a Formation Channel for Gaia Black Hole System Aleksandra Olejak et.al. 2511.10728 null
2025-11-12 Evaluating from Benign to Dynamic Adversarial: A Squid Game for Large Language Models Zijian Chen et.al. 2511.10691 null
2025-11-08 A Mathematical Framework for AI Singularity: Conditions, Bounds, and Control of Recursive Improvement Akbar Anbar Jafari et.al. 2511.10668 null
2025-11-13 Steering Pretrained Drafters during Speculative Decoding Frédéric Berdoz et.al. 2511.09844 null
2025-11-12 Emergent Dark Matter Christian Canete et.al. 2511.09034 null
2025-11-12 TiDAR: Think in Diffusion, Talk in Autoregression Jingyu Liu et.al. 2511.08923 null
2025-11-14 Kinematic scaling relations of disc galaxies from ionised gas at $z\sim~1$ and their connection with dark matter halos Pavel E. Mancera Piña et.al. 2511.08685 null
2025-11-11 Parallel Sampling via Autospeculation Nima Anari et.al. 2511.07869 null
2025-11-11 Critical Confabulation: Can LLMs Hallucinate for Social Good? Peiqi Sui et.al. 2511.07722 null
2025-11-10 Look into your Heart – Prototypes for a Speculative Design Exploration of Personal Heart Rate Visualization Swaroop Panda et.al. 2511.07600 null
2025-11-08 In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading Shuning Lin et.al. 2511.05814 null
2025-11-06 The TeV emission of 3C273: inverse Compton radiation from shear-accelerated high-energy electrons in the large-scale jet? F. Tavecchio et.al. 2511.04433 null
2025-11-03 TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding Aditya Sridhar et.al. 2511.02017 null
2025-11-04 Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding Jungyeon Koh et.al. 2511.01695 null
2025-11-03 When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding Min Fang et.al. 2511.01282 null
2025-11-04 SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding Jameson Sandler et.al. 2511.00606 null
2025-11-01 Reject Only Critical Tokens: Pivot-Aware Speculative Decoding Amir Ziashahabi et.al. 2511.00351 null
2025-11-01 Sherlock: Reliable and Efficient Agentic Workflow Execution Yeonju Ro et.al. 2511.00330 null
2025-10-31 SpecAttn: Speculating Sparse Attention Harsh Shah et.al. 2510.27641 null
2025-10-30 Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral Ayoub Hammal et.al. 2510.27017 null
2025-10-30 CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs Zhiyuan Ning et.al. 2510.26843 null
2025-10-30 Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models Yinrong Hong et.al. 2510.26577 null
2025-10-30 Polybasic Speculative Decoding Through a Theoretical Perspective Ruilin Wang et.al. 2510.26527 null
2025-10-30 In space there will be no need to scream – Limits to the presence of giant planets in the $ζ^2$ Ret system A. Suárez Mascareño et.al. 2510.26483 null
2025-10-30 ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems Qiaoling Chen et.al. 2510.26475 null
2025-10-29 Foundations of Fiat-Denominated Loans Collateralized by Cryptocurrencies Pavel Hubáček et.al. 2510.25878 null
2025-10-29 Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation Zhi-Kai Chen et.al. 2510.25739 null
2025-10-29 Accurate Leakage Speculation for Quantum Error Correction Chaithanya Naik Mude et.al. 2510.25661 null
2025-10-29 Detuning Choice for solving MIS and MWIS Sem Saada Khelkhal et.al. 2510.25473 null
2025-10-31 MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding Runxi Huang et.al. 2510.25327 null
2025-10-31 ‘Studies for’: A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model Chihiro Nagashima et.al. 2510.25228 null
2025-10-29 Prospects for a fourth generation of leptons in a 13 TeV p-p collider Ramkrishna Joshi et.al. 2510.25190 null
2025-10-28 On the Field Excursion Bound Tom Rudelius et.al. 2510.24715 null
2025-10-28 MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration Junhyuk So et.al. 2510.24211 null
2025-10-28 SpecKD: Speculative Decoding for Effective Knowledge Distillation of LLMs Haiduo Huang et.al. 2510.24021 null
2025-10-27 Financial markets as a Le Bonian crowd during boom-and-bust episodes: A complementary theoretical framework in behavioural finance Claire Barraud et.al. 2510.23175 null
2025-10-27 Understanding In-Context Learning Beyond Transformers: An Investigation of State Space and Hybrid Architectures Shenran Wang et.al. 2510.23006 null
2025-10-27 Exploring Structures of Inferential Mechanisms through Simplistic Digital Circuits Giovanni Sileno et.al. 2510.22883 null
2025-10-26 Batch Speculative Decoding Done Right Ranran Haoran Zhang et.al. 2510.22876 null
2025-10-26 FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference Divya Jyoti Bajpai et.al. 2510.22641 null
2025-10-24 Unravelling the oxygen influence in cubic bixbyite In $_2$O$_3$ on Raman active phonon modes by isotope studies Johannes Feldl et.al. 2510.22018 null
2025-10-24 Butterfly: glo-cal effects of data, energy and industry, New Media and Performance Exhibition Catalogue Rebekah Rousi et.al. 2510.21893 null
2025-10-23 Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation Yuhan Liu et.al. 2510.20812 null
2025-10-22 Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs Hongyi Liu et.al. 2510.20064 null
2025-10-22 Speculative Sampling for Parametric Temporal Point Processes Marin Biloš et.al. 2510.20031 null
2025-10-22 New Recursions for the Canonical Scalar-Scaffolded Yang-Mills Amplitude Jeffrey V. Backus et.al. 2510.19901 null
2025-10-22 AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Yuezhou Hu et.al. 2510.19779 null
2025-10-23 Fast Inference via Hierarchical Speculative Decoding Clara Mohri et.al. 2510.19705 null
2025-10-22 CoSense-LLM: Semantics at the Edge with Cost- and Uncertainty-Aware Cloud-Edge Cooperation Hasan Akgul et.al. 2510.19670 null
2025-10-22 Fermionic fields of higher spin in de Sitter space Dionysios Anninos et.al. 2510.19652 null
2025-10-21 Reasoning Language Model Inference Serving Unveiled: An Empirical Study Qi Li et.al. 2510.18672 null
2025-10-21 From Quarter to All: Accelerating Speculative LLM Decoding via Floating-Point Exponent Remapping and Parameter Sharing Yushu Zhao et.al. 2510.18525 null
2025-10-20 Annotating the Chain-of-Thought: A Behavior-Labeled Dataset for AI Safety Antonio-Gabriel Chacón Menke et.al. 2510.18154 null
2025-10-20 A Hall viscosity for skyrmion via magnon interaction Bom Soo Kim et.al. 2510.18092 null
2025-10-20 SpecAgent: A Speculative Retrieval and Forecasting Agent for Code Completion George Ma et.al. 2510.17925 null
2025-10-18 Does GenAI Rewrite How We Write? An Empirical Study on Two-Million Preprints Minfeng Qi et.al. 2510.17882 null
2025-10-18 $ρ$ Hammer: Reviving RowHammer Attacks on New Architectures via Prefetching Weijie Chen et.al. 2510.16544 null
2025-10-18 What Limits Agentic Systems Efficiency? Song Bian et.al. 2510.16276 null
2025-10-17 Interpretable RNA-Seq Clustering with an LLM-Based Agentic Evidence-Grounded Framework Elias Hossain et.al. 2510.16082 null
2025-10-29 TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs Sibo Xiao et.al. 2510.15545 null
2025-10-23 Accelerating Mobile Language Model via Speculative Decoding and NPU-Coordinated Execution Zhiyang Chen et.al. 2510.15312 null
2025-10-16 Speculative Model Risk in Healthcare AI: Using Storytelling to Surface Unintended Harms Xingmeng Zhao et.al. 2510.14718 null
2025-10-16 xLLM Technical Report Tongxuan Liu et.al. 2510.14686 null
2025-10-15 Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving Nikos Pagonas et.al. 2510.14126 null
2025-10-15 Tests of restricted Quantum Focusing and a universal CFT bound Victor Franken et.al. 2510.13961 null
2025-10-17 What Layers When: Learning to Skip Compute in LLMs with Residual Gates Filipe Laitenberger et.al. 2510.13876 null
2025-10-15 Are Randomized Quantum Linear Systems Solvers Practical? Siddharth Hariprakash et.al. 2510.13766 null
2025-10-15 Speculating a Tactile Grammar: Toward Task-Aligned Chart Design for Non-Visual Perception Areen Khalaila et.al. 2510.13731 null
2025-10-15 Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference Nikhil Bhendawade et.al. 2510.13161 null
2025-10-14 3-Model Speculative Decoding Sanghyun Byun et.al. 2510.12966 null
2025-10-14 Language Models Model Language Łukasz Borchmann et.al. 2510.12766 null
2025-10-14 Notes on false vacuum decay in quantum Ising models Ian G. Moss et.al. 2510.12592 null
2025-10-14 A Direct Memory Access Controller (DMAC) for Irregular Data Transfers on RISC-V Linux Systems Thomas Benz et.al. 2510.12277 null
2025-10-14 How Far I’ll Go: Imagining Futures of Conversational AI with People with Visual Impairments Through Design Fiction Jeanne Choi et.al. 2510.12268 null
2025-10-13 Direct Multi-Token Decoding Xuan Luo et.al. 2510.11958 null
2025-10-13 New Tests of Low-Scale Quantum Gravity with Cosmic-Ray Collisions Manuel Ettengruber et.al. 2510.11879 null
2025-10-13 General real-valued theories with the Schröder-Bernstein property are stable Alexander Berenstein et.al. 2510.11858 null
2025-10-13 The Magic Barrier before Thermalization Lukas Ebner et.al. 2510.11681 null
2025-10-13 (Dis)Proving Spectre Security with Speculation-Passing Style Santiago Arranz-Olmos et.al. 2510.11573 null
2025-10-14 AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model Zhiwei Jin et.al. 2510.11496 null
2025-10-13 Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding Bingjie Zhu et.al. 2510.11331 null
2025-10-11 SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference Liangkun Chen et.al. 2510.10302 null
2025-10-11 Exploration of Embodied Space Experience through Umbilical Interaction: A Grounded Theory Approach Shuai Guo et.al. 2510.10258 null
2025-10-11 LAMOST J064137.77+045743.8: A New Binary of an A7-type Pulsating Subgiant and an M-type Red Dwarf Yanhui Chen et.al. 2510.10164 null
2025-10-11 Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding Payel Bhattacharjee et.al. 2510.09942 null
2025-10-10 Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy Xiaoxiao Ma et.al. 2510.09012 null
2025-10-10 Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation Yao Teng et.al. 2510.08994 null
2025-10-10 Mozart: A Chiplet Ecosystem-Accelerator Codesign Framework for Composable Bespoke Application Specific Integrated Circuits Haoran Jin et.al. 2510.08873 null
2025-10-09 Atomically resolved electron reflectivity at a metal/semiconductor interface Ding-Ming Huang et.al. 2510.07970 null
2025-10-08 OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs Jaeseong Lee et.al. 2510.07535 null
2025-10-08 Lectures on entanglement, von Neumann algebras, and emergence of spacetime Hong Liu et.al. 2510.07017 null
2025-10-08 Simulations of Globular Cluster Evolution with Multiple Stellar Populations Mirek Giersz et.al. 2510.06942 null
2025-10-07 A Meat-Summer Night’s Dream: A Tangible Design Fiction Exploration of Eating Biohybrid Flying Robots Ziming Wang et.al. 2510.06507 null
2025-10-07 Back to the Future Museum – Speculative Design for Virtual Citizen-Curated Museums Richard Rhodes et.al. 2510.06472 null
2025-10-06 Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding Shrenik Bhansali et.al. 2510.05421 null
2025-10-06 Zigzags and free adjunctions Lorenzo Riva et.al. 2510.05371 null
2025-10-06 Gromov-Witten theory, degenerations, and the tautological ring Davesh Maulik et.al. 2510.04779 null
2025-10-05 Speculative Actions: A Lossless Framework for Faster Agentic Systems Naimeng Ye et.al. 2510.04371 null
2025-10-05 Self Speculative Decoding for Diffusion Large Language Models Yifeng Gao et.al. 2510.04147 null
2025-10-04 Self-Speculative Masked Diffusions Andrew Campbell et.al. 2510.03929 null
2025-10-04 Security Analysis of Ponzi Schemes in Ethereum Smart Contracts Chunyi Zhang et.al. 2510.03819 null
2025-10-03 PrivacyMotiv: Speculative Persona Journeys for Empathic and Motivating Privacy Reviews in UX Design Zeya Chen et.al. 2510.03559 null
2025-10-03 Action Deviation-Aware Inference for Low-Latency Wireless Robots Jeyoung Park et.al. 2510.02851 null
2025-10-03 A Concept of Possibility for Real-World Events Daniel G. Schwartz et.al. 2510.02655 null
2025-10-02 Dispersion in Analogue Gravity Eren Erberk Erkul et.al. 2510.02542 null
2025-10-02 Impact of AGN and nuclear star formation on the ISM turbulence of galaxies: Insights from JWST/MIRI spectroscopy Rogemar A. Riffel et.al. 2510.02517 null
2025-09-28 DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding Guanghao Li et.al. 2510.02358 null
2025-10-02 The Disparate Impacts of Speculative Decoding Jameson Sandler et.al. 2510.02128 null
2025-10-03 Virtual fibring of manifolds and groups Dawid Kielak et.al. 2510.01805 null
2025-10-01 Theory is Shapes Matthew Varona et.al. 2510.01382 null
2025-10-01 HiSpec: Hierarchical Speculative Decoding for LLMs Avinash Kumar et.al. 2510.01336 null
2025-10-01 Combining complex Langevin dynamics with score-based and energy-based diffusion models Gert Aarts et.al. 2510.01328 null
2025-09-30 Chiral effects and Joule heating in hot and dense matter Srimoyee Sen et.al. 2510.00114 null
2025-09-29 A(I)nimism: Re-enchanting the World Through AI-Mediated Object Interaction Diana Mykhaylychenko et.al. 2509.25558 null
2025-09-29 The Stellar Content of NGC~3603 Revisited: Is the IMF Top Heavy? Philip Massey et.al. 2509.25099 null
2025-09-29 Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding Sungkyun Kim et.al. 2509.24328 null
2025-09-29 SpecExit: Accelerating Large Reasoning Model via Speculative Exit Rubing Yang et.al. 2509.24248 null
2025-09-28 HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models Zhinan Xie et.al. 2509.23928 null
2025-09-27 SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts Bingshuai Liu et.al. 2509.23232 null
2025-09-29 SAHM: State-Aware Heterogeneous Multicore for Single-Thread Performance Shayne Wadle et.al. 2509.22405 null
2025-09-26 In Their Own Words: Reasoning Traces Tailored for Small Models Make Them Better Reasoners Jaehoon Kim et.al. 2509.22230 null
2025-09-26 Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding Shijing Hu et.al. 2509.22134 null
2025-09-26 FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning Yizhou Zhang et.al. 2509.21792 null
2025-09-26 Self-Speculative Biased Decoding for Faster Live Translation Linxiao Zeng et.al. 2509.21740 null
2025-09-25 SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding Thomas Walton et.al. 2509.21689 null
2025-09-25 SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips Xinyu Lian et.al. 2509.21271 null
2025-09-24 The interstellar heritage of comets Karen Willacy et.al. 2509.20530 null
2025-09-30 Speculative Safety-Aware Decoding Xuekang Wang et.al. 2508.17739 null
2025-08-07 Hierarchical Verification of Speculative Beams for Accelerating LLM Inference Jaydip Sen et.al. 2508.03726 null
2025-07-22 Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges Senyao Li et.al. 2507.16731 null
2025-07-22 Enhancing Compiler Optimization Efficiency through Grammatical Decompositions of Control-Flow Graphs Xuran Cai et.al. 2507.16660 null
2025-07-22 Ly $α$ Emission from [OIII] Emitters Near Reionization: The role of environment in galaxy Ly$α$ detection Seyedazim Hashemi et.al. 2507.16231 null
2025-07-20 Designing Robots with, not for: A Co-Design Framework for Empowering Interactions in Forensic Psychiatry Qiaoqiao Ren et.al. 2507.14931 null
2025-07-18 On the asymptotic equidistribution of word values in symmetric groups Vadim Alekseev et.al. 2507.13928 null
2025-07-22 Gravity and the Higgs boson mass Carlo Branchina et.al. 2507.13832 null
2025-07-16 Modeling Feasible Locomotion of Nanobots for Cancer Detection and Treatment Noble Harasha et.al. 2507.12400 null
2025-07-16 Efficient Control Flow Attestation by Speculating on Control Flow Path Representations Liam Tyler et.al. 2507.12345 null
2025-07-17 DSSD: Efficient Edge-Device LLM Deployment and Collaborative Inference via Distributed Split Speculative Decoding Jiahong Ning et.al. 2507.12000 null
2025-07-16 Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential Mohammad Samragh et.al. 2507.11851 null
2025-07-16 Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI Samyam Rajbhandari et.al. 2507.11830 null
2025-07-14 Exploring ultra-high energy neutrino experiments through the lens of the transport equation Stefano Palmisano et.al. 2507.10665 null
2025-07-14 Large Interconnected Thermodynamic Systems Nearly Minimize Entropy Production Kyle J. Ray et.al. 2507.10476 null
2025-07-14 Supernova-induced binary-interaction-powered supernovae: a model for SN2022jli Ryosuke Hirai et.al. 2507.09974 null
2025-07-12 TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding Shukai Gong et.al. 2507.09252 null
2025-07-21 Bringing the Norma Dark Cloud to Light in X-rays Stephen L. Skinner et.al. 2507.09047 null
2025-07-11 On Evaluating Performance of LLM Inference Serving Systems Amey Agrawal et.al. 2507.09019 null
2025-07-10 Greening Schoolyards and the Spatial Distribution of Property Values in Denver, Colorado Mahshid Gorjian et.al. 2507.08894 null
2025-07-11 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Chenyang Song et.al. 2507.08771 null
2025-07-11 Time Variation in the TeV Cosmic Ray Anisotropy with IceCube and Energy Dependence of the Solar Dipole Perri Zilberman et.al. 2507.08242 null
2025-07-08 Optically Overluminous Tidal Disruption Events: Outflow Properties and Implications for Extremely Relativistic Disruptions Yuhan Yao et.al. 2507.06453 null
2025-07-08 Experiments to test the hypothesis for solar and dark matter axions Babette Döbrich et.al. 2507.06414 null
2025-07-08 Supernovae from stellar mergers and accretors of binary mass transfer: Implications for Type IIP, 1987A-like and interacting supernovae F. R. N. Schneider et.al. 2507.06391 null
2025-07-08 Bouncing Grains Keep Protoplanetary Disks Bright Yansong Qian et.al. 2507.06298 null
2025-07-08 Tropical Donagi theorem Felix Röhrle et.al. 2507.05987 null
2025-07-04 Impact of flavor condensate dark matter on accretion disk luminosity in spherical spacetimes Antonio Capolupo et.al. 2507.03758 null
2025-06-18 Evolution, Future of AI, and Singularity Zeki Doruk Erden et.al. 2507.02876 null
2025-07-03 NVIDIA GPU Confidential Computing Demystified Zhongshu Gu et.al. 2507.02770 null
2025-07-03 OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding Ramchalam Kinattinkara Ramakrishnan et.al. 2507.02659 null
2025-07-03 High-Order Deep Meta-Learning with Category-Theoretic Interpretation David H. Mguni et.al. 2507.02634 null
2025-07-14 FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference Xing Liu et.al. 2507.02620 null
2025-07-02 H.E.S.S. programme searching for VHE gamma rays associated with FRBs F. Aharonian et.al. 2507.02143 null
2025-07-07 Handling out-of-order input arrival in CEP engines on the edge combining optimistic, pessimistic and lazy evaluation Styliani Kyrama et.al. 2507.01461 null
2025-07-02 LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation Tianyu Liu et.al. 2507.01449 null
2025-07-01 Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding Guangyi Zhang et.al. 2507.00605 null
2025-06-30 User Concerns Regarding Social Robots for Mood Regulation: A Case Study on the “Sunday Blues” Zhuochao Peng et.al. 2507.00271 null
2025-07-08 Fully Parallelized BP Decoding for Quantum LDPC Codes Can Outperform BP-OSD Ming Wang et.al. 2507.00254 null
2025-06-30 Metal-poor single Wolf-Rayet stars: the interplay of optically thick winds and rotation Lumen Boco et.al. 2507.00137 null
2025-06-30 Segmented Operations using Matrix Multiplications Aleksandros Sobczyk et.al. 2506.23906 null
2025-06-29 From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows Mohamed Amine Ferrag et.al. 2506.23260 null
2025-06-28 Polar alignment of a circumbinary disc around a brown dwarf binary Jeremy L. Smallwood et.al. 2506.22747 null
2025-07-03 VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs Raghavv Goel et.al. 2506.22694 null
2025-06-27 QuickSilver – Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization Danush Khanna et.al. 2506.22396 null
2025-07-10 Cool Gas in the Circumgalactic Medium of Massive Post Starburst Galaxies Zoe Harvey et.al. 2506.22287 null
2025-06-26 Small Encoders Can Rival Large Decoders in Detecting Groundedness Istabrak Abbes et.al. 2506.21288 null
2025-06-26 You never have enough J/ $ψ$ events: the case for a J/$ψ$ factory Stephen Lars Olsen et.al. 2506.20975 null
2025-06-17 Utility-Driven Speculative Decoding for Mixture-of-Experts Anish Saxena et.al. 2506.20675 null
2025-07-09 Charged rotating quantum black holes Dyuman Bhattacharya et.al. 2506.19941 null
2025-06-23 Entangled Quantum Negative Energy Teleportation as a Probe of Semiclassical Gravity Daniel S. Zachary et.al. 2506.19878 null
2025-06-24 Scaling Speculative Decoding with Lookahead Reasoning Yichao Fu et.al. 2506.19830 null
2025-06-23 LLMs on a Budget? Say HOLA Zohaib Hasan Siddiqui et.al. 2506.18952 null
2025-07-10 The Full Nonlinear Vortex Tube-Vorton Method: the post-stall condition Jesus Carlos Pimentel-Garcia et.al. 2506.18719 null
2025-06-17 Semantic uncertainty in advanced decoding methods for LLM generation Darius Foodeei et.al. 2506.17296 null
2025-07-08 Capturing Misalignment Pierfrancesco Guarino et.al. 2506.17176 null
2025-06-20 ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models Bin Chen et.al. 2506.16712 null
2025-07-02 Rethinking LLM Training through Information Geometry and Quantum Metrics Riccardo Di Sipio et.al. 2506.15830 null
2025-06-15 $\texttt{SPECS}$ : Faster Test-Time Scaling through Speculative Drafts Mert Cemri et.al. 2506.15733 null
2025-06-18 CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies Donghyun Gouk et.al. 2506.15601 null
2025-06-18 PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction Shufan Li et.al. 2506.15556 null
2025-06-17 Optimistic MEV in Ethereum Layer 2s: Why Blockspace Is Always in Demand Ozan Solmaz et.al. 2506.14768 null
2025-06-17 S $^4$ C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models Tao He et.al. 2506.14158 null
2025-06-16 Stimulus Motion Perception Studies Imply Specific Neural Computations in Human Visual Stabilization David W Arathorn et.al. 2506.13506 null
2025-06-21 Exploring the Secondary Risks of Large Language Models Jiawei Chen et.al. 2506.12382 null
2025-06-14 Quantum Machine Learning Muhammad Usman et.al. 2506.12292 null
2025-06-13 Fluid-induced snap-through instability of spherical shells Pier Giuseppe Ledda et.al. 2506.12247 null
2025-06-13 Eliciting Reasoning in Language Models with Cognitive Tools Brown Ebouky et.al. 2506.12115 null
2025-06-12 SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding Ziyi Zhang et.al. 2506.11309 null
2025-06-11 Speculative Design in Spiraling Time: Methods and Indigenous HCI James Eschrich et.al. 2506.10229 null
2025-06-11 V455 Car: an oscillating eclipsing Algol-type binary in triple star system Zhao-Long Deng et.al. 2506.10124 null
2025-06-11 Patterns of Patterns III Joseph Corneli et.al. 2506.09696 null
2025-07-13 SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving Xiangchen Li et.al. 2506.09397 null
2025-06-11 A collection of results relating the geometry of plane domains and the exit time of planar Brownian motion, II Greg Markowsky et.al. 2506.09364 null
2025-07-19 Draft-based Approximate Inference for LLMs Kevin Galim et.al. 2506.08373 link
2025-06-10 Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity Lesi Chen et.al. 2506.08362 null
2025-06-09 MiniCPM4: Ultra-Efficient LLMs on End Devices MiniCPM Team et.al. 2506.07900 link
2025-06-09 FREESS: An Educational Simulator of a RISC-V-Inspired Superscalar Processor Based on Tomasulo’s Algorithm Roberto Giorgi et.al. 2506.07665 link
2025-06-09 LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments Jin Huang et.al. 2506.07416 null
2025-06-08 Exploiting Inaccurate Branch History in Side-Channel Attacks Yuhui Zhu et.al. 2506.07263 null
2025-06-07 Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit Charles Goddard et.al. 2506.06607 null
2025-06-06 Fake Friends and Sponsored Ads: The Risks of Advertising in Conversational Search Jacob Erickson et.al. 2506.06447 null
2025-07-08 On the Fundamental Impossibility of Hallucination Control in Large Language Models Michał P. Karpowicz et.al. 2506.06382 null
2025-06-06 Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): Evidence of planet-disk interaction in the 2MASSJ16120668-3010270 system C. Ginski et.al. 2506.05892 null
2025-06-10 Gumbel-max List Sampling for Distribution Coupling with Multiple Samples Joseph Rowan et.al. 2506.05632 null
2025-06-05 Accelerated Test-Time Scaling with Model-Free Speculative Sampling Woomin Song et.al. 2506.04708 null
2025-06-04 Guided Speculative Inference for Efficient Test-Time Alignment of LLMs Jonathan Geuter et.al. 2506.04118 link
2025-06-04 The Causal-Noncausal Tail Processes: An Introduction Christian Gouriéroux et.al. 2506.04046 null
2025-06-04 AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism Zhepei Wei et.al. 2506.03700 link
2025-06-04 POSS: Position Specialist Generates Better Draft for Speculative Decoding Langlin Huang et.al. 2506.03566 link
2025-06-02 Out-of-Vocabulary Sampling Boosts Speculative Decoding Nadav Timor et.al. 2506.03206 null
2025-06-03 Feedstack: Layering Structured Representations over Unstructured Feedback to Scaffold Human AI Conversation Hannah Vy Nguyen et.al. 2506.03052 null
2025-06-03 Reuse or Generate? Accelerating Code Editing via Edit-Oriented Speculative Decoding Peiding Wang et.al. 2506.02780 null
2025-06-28 Multi Layered Autonomy and AI Ecologies in Robotic Art Installations Baoyang Chen et.al. 2506.02606 null
2025-06-03 Consultant Decoding: Yet Another Synergistic Mechanism Chuanghao Ding et.al. 2506.02391 null
2025-06-02 Radiation GRMHD Models of Accretion onto Stellar-Mass Black Holes: I. Survey of Eddington Ratios Lizhong Zhang et.al. 2506.02289 null
2025-05-16 SpecMemo: Speculative Decoding is in Your Pocket Selin Yildirim et.al. 2506.01986 null
2025-05-16 Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism Yuhao Shen et.al. 2506.01979 null
2025-06-02 Synchronic Web Digital Identity: Speculations on the Art of the Possible Thien-Nam Dinh et.al. 2506.01856 null
2025-07-04 Playing with Transformer at 30+ FPS via Next-Frame Diffusion Xinle Cheng et.al. 2506.01380 null
2025-06-02 Shape Shifting Light Dark Matter Solitons Dor Ben-Amotz et.al. 2506.01282 null
2025-06-01 The $M_{\rm BH}-M_\star$ Relation of the hyperluminous Dust-obscured Quasars up to $z \sim 4$ Yibin Luo et.al. 2506.01218 null
2025-06-01 Mamba Drafters for Speculative Decoding Daewon Choi et.al. 2506.01206 null
2025-06-01 The Inverse Scaling Effect of Pre-Trained Language Model Surprisal Is Not Due to Data Leakage Byung-Doh Oh et.al. 2506.01172 null
2025-05-31 Accelerating Diffusion LLMs via Adaptive Parallel Decoding Daniel Israel et.al. 2506.00413 null
2025-05-31 Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively Jiawei Gu et.al. 2506.00396 link
2025-05-30 Cross-Attention Speculative Decoding Wei Zhong et.al. 2505.24544 null
2025-05-30 CLaSp: In-Context Layer Skip for Self-Speculative Decoding Longze Chen et.al. 2505.24196 null
2025-06-10 Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism Jinhui Wei et.al. 2505.23219 null
2025-05-28 Pre-Training Curriculum for Multi-Token Prediction in Language Models Ansar Aynetdinov et.al. 2505.22757 link
2025-05-28 Mass-feeding of jet-launching white dwarfs in grazing and common envelope evolution Noam Soker et.al. 2505.22621 null
2025-05-29 Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design Yudi Zhang et.al. 2505.22179 link
2025-05-28 RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding Yuichiro Hoshino et.al. 2505.22135 null
2025-05-28 Robust and Symmetric Magnetic Field Dependency of Superconducting Diode Effect in Asymmetric Dirac Semimetal SQUIDs H. C. Travaglini et.al. 2505.21861 null
2025-05-27 Computocene: Notes from an Age of Observation Simone Severini et.al. 2505.21744 null
2025-05-27 Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits Yeshwanth Venkatesha et.al. 2505.21594 null
2025-05-27 Hardware-Efficient Attention for Fast Decoding Ted Zadouri et.al. 2505.21487 null
2025-05-27 Pair binding and Hund’s rule breaking in high-symmetry fullerenes R. Rausch et.al. 2505.21455 null
2025-05-28 Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity Yehui Tang et.al. 2505.21411 null
2025-05-27 Repeated Auctions with Speculators: Arbitrage Incentives and Forks in DAOs Nicolas Eschenbaum et.al. 2505.21296 null
2025-05-27 SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences Jungyoub Cha et.al. 2505.20776 link
2025-05-27 Replication of Reference-Dependent Preferences and the Risk-Return Trade-Off in the Chinese Market Penggan Xu et.al. 2505.20608 null
2025-05-26 Academic Research Output Derivatives: Structuring Futures and Options on Research Output Index Amarendra Sharma et.al. 2505.20492 null
2025-05-26 Bounded cohomology, quotient extensions, and hierarchical hyperbolicity Francesco Fournier-Facio et.al. 2505.20462 null
2025-05-26 HAMburger: Accelerating LLM Inference via Token Smashing Jingyu Liu et.al. 2505.20438 null
2025-05-23 Reinforcement Speculative Decoding for Fast Ranking Yingpeng Du et.al. 2505.20316 null
2025-06-13 MoESD: Unveil Speculative Decoding’s Potential for Accelerating Sparse MoE Zongle Huang et.al. 2505.19645 null
2025-05-28 Faster and Better LLMs via Latency-Aware Test-Time Scaling Zili Wang et.al. 2505.19634 null
2025-07-23 Turing Test 2.0: The General Intelligence Threshold Georgios Mappouras et.al. 2505.19550 null
2025-05-29 DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding Yunhai Hu et.al. 2505.19201 link
2025-05-25 Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs Xuan Zhang et.al. 2505.19155 null
2025-05-24 Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding Yixuan Wang et.al. 2505.18629 null
2025-05-23 VeriThinker: Learning to Verify Makes Reasoning Model Efficient Zigeng Chen et.al. 2505.17941 link
2025-05-20 Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency Ruixiao Li et.al. 2505.17074 null
2025-05-16 SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs Jinwoo Park et.al. 2505.17052 null
2025-05-22 KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization Mingbo Song et.al. 2505.16162 null
2025-05-21 Strong Hilbert space fragmentation and fractons from subsystem and higher-form symmetries Charles Stahl et.al. 2505.15889 null
2025-05-21 Quasinormal Modes of Schwarzschild Black Holes in the Dehnen-(1, 4, 5/2) Type Dark Matter Halos Qi-Qi Liang et.al. 2505.15540 null
2025-06-03 Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding Zijian Lin et.al. 2505.15380 null
2025-05-21 SSR: Speculative Parallel Scaling Reasoning in Test-time Yuanlin Chu et.al. 2505.15340 null
2025-05-21 BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms Yunlong Hou et.al. 2505.15141 null
2025-05-20 STree: Speculative Tree Decoding for Hybrid State-Space Models Yangchao Wu et.al. 2505.14969 null
2025-05-20 On the Day They Experience: Awakening Self-Sovereign Experiential AI Agents Botao Amber Hu et.al. 2505.14893 null
2025-05-20 Unremarkable to Remarkable AI Agent: Exploring Boundaries of Agent Intervention for Adults With and Without Cognitive Impairment Mai Lee Chang et.al. 2505.14872 null
2025-05-20 X-ray properties of compact elliptical galaxies Orsolya E. Kovacs et.al. 2505.14768 null
2025-05-20 Speculative Decoding Reimagined for Multimodal Large Language Models Luxi Lin et.al. 2505.14260 link
2025-05-19 Language and Thought: The View from LLMs Daniel Rothschild et.al. 2505.13561 null
2025-05-19 HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding Siran Liu et.al. 2505.13254 null
2025-09-15 Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification Jikai Wang et.al. 2505.13204 null
2025-05-19 FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference Guangda Liu et.al. 2505.13109 null
2025-05-25 FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks Zihua Wang et.al. 2505.12728 link
2025-05-18 Traversal Verification for Speculative Tree Decoding Yepeng Weng et.al. 2505.12398 null
2025-05-16 FAIR Ecosystems for Science at Scale Sean R. Wilkinson et.al. 2505.11742 null
2025-05-16 Prime Number Error Terms Nathan Ng et.al. 2505.11295 null
2025-05-16 Beyond surfaces: quantifying internal radiative heat transport in dense materials Janak Tiwari et.al. 2505.10853 null
2025-05-16 Qualia Optimization Philip S. Thomas et.al. 2505.10779 null
2025-07-10 Anchoring AI Capabilities in Market Valuations: The Capability Realization Rate Model and Valuation Misalignment Risk Xinmin Fang et.al. 2505.10590 null
2025-05-18 MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models Mugilan Ganesan et.al. 2505.10526 null
2025-05-21 SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices Xiangwen Zhuge et.al. 2505.10259 link
2025-05-14 Chandra Rules Out Super-Eddington Accretion For Little Red Dots Andrea Sacchi et.al. 2505.09669 null
2025-06-28 Extended Structural Dynamics – Emergent Irreversibility from Reversible Dynamics Patrick BarAvi et.al. 2505.09650 null
2025-05-14 Observational study of the formation of homologous confined circular-ribbon flares Shuhong Yang et.al. 2505.09093 null
2025-05-13 Long timescale numerical simulations of large, super-critical accretion discs P. Chris Fragile et.al. 2505.08859 null
2025-05-13 Kudzu: Fast and Simple High-Throughput BFT Victor Shoup et.al. 2505.08771 null
2025-05-13 Automatic Task Detection and Heterogeneous LLM Speculative Decoding Danying Ge et.al. 2505.08600 null
2025-05-12 GUP Effective Metric Without GUP: Implications for the Sign of GUP Parameter and Quantum Bounce Yen Chin Ong et.al. 2505.07972 null
2025-05-12 Localized Gravity, de Sitter, and the Horizon Criterion Bjoern Friedrich et.al. 2505.07934 null
2025-06-22 TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking Ching Nam Hang et.al. 2505.07891 null
2025-05-08 Scaling Laws for Speculative Decoding Siyuan Yan et.al. 2505.07858 null
2025-05-12 SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models Hang Wu et.al. 2505.07680 null
2025-05-10 N-body simulations of the Self-Confinement of Viscous Self-Gravitating Narrow Eccentric Planetary Ringlets Joseph M. Hahn et.al. 2505.06639 null
2025-05-09 FastDup: a scalable duplicate marking tool using speculation-and-test mechanism Zhonghai Zhang et.al. 2505.06127 link
2025-05-08 A Physics Model for Origin of Life Paul Howard Frampton et.al. 2505.05634 null
2025-05-08 Memory Under Siege: A Comprehensive Survey of Side-Channel Attacks on Memory MD Mahady Hassan et.al. 2505.04896 null
2025-05-08 Topological phase transition to a hidden charge density wave liquid Joshua S. H. Lee et.al. 2505.04867 null
2025-05-07 SOAEsV2-7B/72B: Full-Pipeline Optimization for State-Owned Enterprise LLMs via Continual Pre-Training, Domain-Progressive SFT and Distillation-Enhanced Speculative Decoding Jingyang Deng et.al. 2505.04723 null
2025-05-06 Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation Hengyuan Hu et.al. 2505.03983 null
2025-05-06 QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies Shuyao Cheng et.al. 2505.03195 null
2025-05-04 The quest for explosive bubbles in the Indonesian Rupiah/US exchange rate: Does the uncertainty trinity matter? Abdul Khaliq et.al. 2505.02869 null
2025-05-24 Accelerating Large Language Model Reasoning via Speculative Search Zhihai Wang et.al. 2505.02865 null
2025-05-21 Dirac Singleton as a Relativistic Field Beyond Standard Model M. A. Vasiliev et.al. 2505.01915 null
2025-05-03 Speculative Evolution Through 3D Cellular Automata Amir Hossein Khazaei et.al. 2505.01692 null
2025-05-02 PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding Bradley McDanel et.al. 2505.01572 null
2025-05-12 Emotions in Artificial Intelligence Hermann Borotschnig et.al. 2505.01462 null
2025-04-29 X-ray Spectroscopy via Temporal Decomposition William Setterberg et.al. 2504.21169 null
2025-07-02 Ground to Dust: Collisional Cascades and the Fate of Kardashev II Megaswarms Brian C. Lacki et.al. 2504.21151 null
2025-06-10 EvoPort: An Evolutionary Framework for Portfolio Optimization via Randomized Alpha Discovery and Ensemble-Based Allocation Nguyen Van Thanh et.al. 2504.21095 null
2025-04-29 Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding Gabe Guo et.al. 2504.20456 link
2025-04-28 AutoJudge: Judge Decoding Without Manual Annotation Roman Garipov et.al. 2504.20039 null
2025-04-27 Detecting speculative data flow vulnerabilities using weakest precondition reasoning Graeme Smith et.al. 2504.19128 null
2025-05-25 Efficient Reasoning for LLMs through Speculative Chain-of-Thought Jikai Wang et.al. 2504.19095 link
2025-04-26 Global Simulations of Gravitational Instability in Protostellar Disks with Full Radiation Transport II. Locality of Gravitoturbulence, Clumpy Spirals, and Implications for Observable Substructure Wenrui Xu et.al. 2504.18751 null
2025-06-15 PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation Zihao An et.al. 2504.18583 null
2025-04-25 Generalizing the relativistic precession model of quasi-periodic oscillations through anharmonic corrections Roberto Giambò et.al. 2504.18403 null
2025-04-23 A Vision for AI-Driven Adaptation of Dynamic AR Content to Users and Environments Julian Rasch et.al. 2504.16562 null
2025-04-23 Hardness of Median and Center in the Ulam Metric Nick Fischer et.al. 2504.16437 null
2025-04-22 On commuting integer matrices Jonathan Chapman et.al. 2504.15839 null
2025-04-22 Delayed Keen Model with Inflation Ali Tolga Dincer et.al. 2504.15819 null
2025-04-23 Speculative Sampling via Exponential Races Szymon Kobus et.al. 2504.15475 null
2025-05-16 Rendezvous in CAVITY: Kinematics and gas properties of an isolated dwarf-dwarf merging pair in a cosmic void region Bahar Bidaran et.al. 2504.15359 null
2025-04-21 The phase diagram of CeRh ${2}$As${2}$ for out-of-plane magnetic field P. Khanenko et.al. 2504.15112 null
2025-04-21 Safety Co-Option and Compromised National Security: The Self-Fulfilling Prophecy of Weakened AI Risk Thresholds Heidy Khlaaf et.al. 2504.15088 null
2025-04-21 Note on Type $III_1$ Algebras in $ c= 1$ String Theory and Bulk Causal Diamonds T. Banks et.al. 2504.15076 null
2025-04-21 Exploring Collaborative GenAI Agents in Synchronous Group Settings: Eliciting Team Perceptions and Design Considerations for the Future of Work Janet G. Johnson et.al. 2504.14779 null
2025-05-27 BLACKOUT: Data-Oblivious Computation with Blinded Capabilities Hossam ElAtali et.al. 2504.14654 null
2025-04-25 UFO2: The Desktop AgentOS Chaoyun Zhang et.al. 2504.14603 link
2025-04-20 An interstellar mission to test astrophysical black holes Cosimo Bambi et.al. 2504.14576 null
2025-04-19 Charge Densities in Crystals and Triply-Periodic Minimal Surfaces Mengdi Yin et.al. 2504.14148 null
2025-04-18 Going Whole Hog: A Philosophical Defense of AI Cognition Herman Cappelen et.al. 2504.13988 null
2025-04-16 From job titles to jawlines: Using context voids to study generative AI systems Shahan Ali Memon et.al. 2504.13947 null
2025-03-21 Bio-crafting Architecture: Experiences of growing mycelium in minimal surface molds Anca-Simona Horvath et.al. 2504.13855 null
2025-05-28 The Sky as a Killing Horizon Níckolas de Aguiar Alves et.al. 2504.12514 null
2025-04-12 Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time Wang Yang et.al. 2504.12329 link
2025-04-18 Large Language Model-Based Knowledge Graph System Construction for Sustainable Development Goals: An AI-Based Speculative Design Perspective Yi-De Lin et.al. 2504.12309 null
2025-04-16 Purposefully Induced Psychosis (PIP): Embracing Hallucination as Imagination in Large Language Models Kris Pilcher et.al. 2504.12012 null
2025-04-16 Who Said Only Military Officers Can Deal with Uncertainty? On the Importance of Uncertainty in EdTech Data Visualisations Felicitas Macgilchrist et.al. 2504.11974 null
2025-04-15 Five dimensional rotating and Quintessence black hole and their shadows Milko Estrada et.al. 2504.11408 null
2025-04-16 Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance Shangyu Liu et.al. 2504.11197 null
2025-04-14 Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing Obfuscation Kartik Ramkrishnan et.al. 2504.10318 null
2025-04-14 Gravitational metamaterials from optical properties of spacetime media Orlando Luongo et.al. 2504.09987 null
2025-04-12 Authoritarian Recursions: How Fiction, History, and AI Reinforce Control in Education, Warfare, and Discourse Hasan Oguz et.al. 2504.09030 null
2025-04-11 SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting Jiaming Xu et.al. 2504.08850 null
2025-05-31 SD $^2$ : Self-Distilled Sparse Drafters Mike Lasby et.al. 2504.08838 null
2025-04-05 SLOs-Serve: Optimized Serving of Multi-SLO LLMs Siyuan Chen et.al. 2504.08784 null
2025-04-11 Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices Shengyuan Ye et.al. 2504.08242 null
2025-05-16 SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning Rui Pan et.al. 2504.07891 link
2025-04-10 Synthetic Fluency: Hallucinations, Confabulations, and the Creation of Irish Words in LLM-Generated Translations Sheila Castilho et.al. 2504.07680 null
2025-04-10 Proceedings of the Purposeful XR Workshop for CHI 2025 Elizabeth Childs et.al. 2504.07475 null
2025-04-09 Joint Survey Processing. III. Compact Oddballs in the COSMOS Field – Little Red Dots and Transients Yu-Heng Lin et.al. 2504.07196 null
2025-04-09 ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation Schemes Amund Bergland Kvalsvik et.al. 2504.07018 null
2025-04-08 SPIRe: Boosting LLM Inference Throughput with Speculative Decoding Sanjit Neelam et.al. 2504.06419 null
2025-04-08 Decoding the Ishango Bone: Unveiling Prehistoric Mathematical Art Jenny Baur et.al. 2504.06412 null
2025-04-08 Interplay between trimer structure and magnetic ground state in Ba5Ru3O12 probed by Neutron and muSR techniques E. Kushwaha et.al. 2504.06113 null
2025-04-08 Strong Evidence That Abiogenesis Is a Rapid Process on Earth Analogs David Kipping et.al. 2504.05993 null
2025-04-08 DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding Hossein Entezari Zarch et.al. 2504.05598 null
2025-06-03 Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution Raffi Khatchadourian et.al. 2504.05424 null
2025-04-06 pc-COP: An Efficient and Configurable 2048-p-Bit Fully-Connected Probabilistic Computing Accelerator for Combinatorial Optimization Kiran Magar et.al. 2504.04543 null
2025-06-02 Representations of $p$ -adic groups and orbits with smooth closure in a variety of Langlands parameters Kristaps Balodis et.al. 2504.04163 null
2025-04-05 PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models Haofei Yin et.al. 2504.04104 null
2025-03-23 Agentic Business Process Management: The Past 30 Years And Practitioners’ Future Perspectives Hoang Vu et.al. 2504.03693 null
2025-04-04 Ethics Readiness of Technology: The case for aligning ethical approaches with technological maturity Eline de Jong et.al. 2504.03336 null
2025-04-03 A Review of Prototyping in XR: Linking Extended Reality to Digital Fabrication Bixun Chen et.al. 2504.02998 null
2025-05-02 GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Zhiyuan Yan et.al. 2504.02782 link
2025-04-03 Black Holes, Moduli Stabilisation and the Swampland Matilda Delgado et.al. 2504.02645 null
2025-04-08 Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge Dong-Sig Han et.al. 2504.02618 null
2025-06-16 Graviton Scattering on Gravitational Atoms: Relic Graviton Shot Noise Benjamin Avila-Lopez et.al. 2504.01286 null
2025-04-01 Reminiscences about Steven Weinberg (This Time it’s Personal) C. P. Burgess et.al. 2504.01118 null
2025-04-01 Mesoscale Eddy – Internal Wave Coupling. III. The End of the Enstrophy Cascade and Maintenance of Gyre Scale Potential Vorticity Gradients Kurt L. Polzin et.al. 2504.00486 null
2025-04-01 The Impact of Triangular-Toothed Gears on the Functionality of the Antikythera Mechanism Esteban Guillermo Szigety y Gustavo Francisco Arenas et.al. 2504.00327 null
2025-06-04 Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding Aayush Gautam et.al. 2504.00030 null
2025-03-31 What the F*ck Is Artificial General Intelligence? Michael Timothy Bennett et.al. 2503.23923 null
2025-03-31 A search for the three isomers of cyano-1,3-butadiene in TMC-1: Implications for bottom-up routes involving 1,3-butadiene M. Agundez et.al. 2503.23841 null
2025-03-30 Credit, Land Speculation, and Low-Interest-Rate Policy Tomohiro Hirano et.al. 2503.23552 null
2025-03-30 The Longest Duration SGRE Event in Solar Cycle 25 Nat Gopalswamy et.al. 2503.23544 null
2025-03-30 Speculative End-Turn Detector for Efficient Speech Chatbot Assistant Hyunjong Ok et.al. 2503.23439 null
2025-03-29 Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation Dominik Macko et.al. 2503.23242 null
2025-03-28 Formation and Evolution of Compact Binaries Containing Intermediate Mass Black Holes in Dense Star Clusters` Seungjae Lee et.al. 2503.22109 null
2025-03-27 How to Constrain the Stochastic Gravitational Wave Background with Multi-Frequency Detections Eleanor Gleave et.al. 2503.21508 null
2025-03-26 Speculations on higher Fukaya categories James Pascaleff et.al. 2503.20906 null
2025-03-24 The Centers and Margins of Modeling Humans in Well-being Technologies: A Decentering Approach Jichen Zhu et.al. 2503.19132 null
2025-05-14 Spectropolarimetry of A Nuclear Transient AT2023clx: Revealing The Geometrical Alignment between The Transient Outflow and The Nuclear Dusty Region Kohki Uno et.al. 2503.19024 null
2025-03-23 A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models Zuan Xie et.al. 2503.18989 null
2025-03-23 A Multi-Model Adaptation of Speculative Decoding for Classification Somnath Roy et.al. 2503.18076 null
2025-03-20 SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs Shibo Jie et.al. 2503.16163 null
2025-03-20 “This could save us months of work” – Use Cases of AI and Automation Support in Investigative Journalism Besjon Cifliku et.al. 2503.16011 null
2025-03-20 SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models Fahao Chen et.al. 2503.15921 null
2025-03-19 Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices Ziyao Wang et.al. 2503.14932 null
2025-06-12 The Origin of the Very-High-Energy Diffuse $γ$ -Ray Emission: The Case for Galactic Source Cocoons Antonio Ambrosone et.al. 2503.14651 null
2025-05-04 Superconductivity in magnetars: Exploring type-I and type-II states in toroidal magnetic fields Mayusree Das et.al. 2503.14594 null
2025-03-26 Association of 220 PeV Neutrino KM3-230213A with Gamma-Ray Bursts Ruiqi Wang et.al. 2503.14471 null
2025-03-18 Neutron portal to ultra-high-energy neutrinos Gustavo F. S. Alves et.al. 2503.14419 null
2025-03-18 Speculative Decoding for Verilog: Speed and Quality, All in One Changran Xu et.al. 2503.14153 null
2025-03-18 Growing a Twig to Accelerate Large Vision-Language Models Zhenwei Shao et.al. 2503.14075 null
2025-03-17 ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts Evangelos Georganas et.al. 2503.13565 null
2025-03-17 Enhanced anomalous Hall effect in the topological Kagome metal Cs(V $_{1-x}$Mn$_x$)$_3$Sb$_5$ Xinmin Wang et.al. 2503.13351 null
2025-03-28 WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows Fabian Lehmann et.al. 2503.13072 link
2025-05-15 Collaborative Speculative Inference for Efficient LLM Inference Serving Luyao Gao et.al. 2503.10325 null
2025-03-13 Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding Jinze Li et.al. 2503.10135 null
2025-03-12 A practical guide to machine learning interatomic potentials – Status and future Ryan Jacobs et.al. 2503.09814 null
2025-03-11 In Search of the Potentially Hazardous Asteroids in the Taurid Resonant Swarm Jasmine Li et.al. 2503.08670 null
2025-03-11 Liquidity Competition Between Brokers and an Informed Trader Ryan Donnelly et.al. 2503.08287 null
2025-03-25 Training Domain Draft Models for Speculative Decoding: Best Practices and Insights Fenglu Hong et.al. 2503.07807 null
2025-03-10 Did smartphones break the world as we knew it? Mikhail V. Tamm et.al. 2503.07773 null
2025-03-13 Design as Hope: Reimagining Futures for Seemingly Doomed Problems JaeWon Kim et.al. 2503.07586 null
2025-03-09 A parallel parser for regular expressions Angelo Borsotti et.al. 2503.06763 null
2025-03-07 Quantum-like cognition and decision making in the light of quantum measurement theory Miho Fuyama et.al. 2503.05859 null
2025-02-25 Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research Veda C. Storey et.al. 2503.05770 null
2025-03-10 Speculative Decoding for Multi-Sample Inference Yiwei Li et.al. 2503.05330 null
2025-03-07 SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding Kaiyu Huang et.al. 2503.05096 null
2025-02-11 Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations Kunal Handa et.al. 2503.04761 null
2025-03-19 Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling Yan Li et.al. 2503.04398 null
2025-03-06 A possible jet and corona configuration for Swift J1727.8–1613 during the hard state Jing-Qiang Peng et.al. 2503.04044 null
2025-03-05 RASD: Retrieval-Augmented Speculative Decoding Guofeng Quan et.al. 2503.03434 null
2025-03-26 SpecInF: Exploiting Idle GPU Resources in Distributed DL Training via Speculative Inference Filling Cunchi Lv et.al. 2503.02550 null
2025-04-02 Linear Representations of Political Perspective Emerge in Large Language Models Junsol Kim et.al. 2503.02080 link
2025-04-23 EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test Yuhui Li et.al. 2503.01840 link
2025-03-03 Efficient Long-Term Structural Reliability Estimation with Non-Gaussian Stochastic Models: A Design of Experiments Approach Sebastian Winter et.al. 2503.01566 null
2025-03-17 MeshPad: Interactive Sketch-Conditioned Artist-Designed Mesh Generation and Editing Haoxuan Li et.al. 2503.01425 null
2025-03-24 Turbulence in virtual: II. Origin of skewness and dual fraction processes Xunchuan Liu et.al. 2503.01160 null
2025-03-02 DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting Kai Lv et.al. 2503.00784 link
2025-03-02 Speculative Ad-hoc Querying Haoyu Li et.al. 2503.00714 link
2025-03-04 Tutorial Proposal: Speculative Decoding for Efficient LLM Inference Heming Xia et.al. 2503.00491 null
2025-03-01 Peek into the `White-Box’: A Field Study on Bystander Engagement with Urban Robot Uncertainty Xinyan Yu et.al. 2503.00337 null
2025-03-01 Doraemon’s Gadget Lab: Unpacking Human Needs and Interaction Design in Speculative Technology Tram Thi Minh Tran et.al. 2503.00257 null
2025-02-28 Broadband pulsed quadrature measurements with calorimeters Ezad Shojaee et.al. 2503.00188 null
2025-02-28 AMuLeT: Automated Design-Time Testing of Secure Speculation Countermeasures Bo Fu et.al. 2503.00145 link
2025-02-28 Assessment of universal relations among second-order moments of relativistic stars via reformulated perturbation equations Koutarou Kyutoku et.al. 2503.00098 null
2025-02-14 A Short History of Rocks: or, How to Invent Quantum Computing David Wakeham et.al. 2503.00005 null
2025-05-13 Nano Drone-based Indoor Crime Scene Analysis Martin Cooney et.al. 2502.21019 null
2025-03-04 Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff Maximilian Holsman et.al. 2502.20704 link
2025-02-28 MonadBFT: Fast, Responsive, Fork-Resistant Streamlined Consensus Mohammad Mussadiq Jalalzai et.al. 2502.20692 null
2025-03-24 Turbulence in virtual: Origin of the variance and skewness of density function Xunchuan Liu et.al. 2502.20458 null
2025-02-27 Long-Context Inference with Retrieval-Augmented Speculative Decoding Guanzheng Chen et.al. 2502.20330 link
2025-04-28 Frobenius subalgebra lattices in tensor categories Mainak Ghosh et.al. 2502.19876 null
2025-03-04 Speculative Decoding and Beyond: An In-Depth Survey of Techniques Yunhai Hu et.al. 2502.19732 null
2025-02-26 From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens Tong Wu et.al. 2502.18890 link
2025-02-26 Reimagining Personal Data: Unlocking the Potential of AI-Generated Images in Personal Data Meaning-Making Soobin Park et.al. 2502.18853 null
2025-02-26 Towards Optimal Multi-draft Speculative Decoding Zhengmian Hu et.al. 2502.18779 null
2025-03-02 Variability of Central Stars of Planetary Nebulae with the Zwicky Transient Facility. II. Long-Timescale Variables including Wide Binary and Late Thermal Pulse Candidates Soumyadeep Bhattacharjee et.al. 2502.18651 null
2025-02-27 Kinematics of metallicity populations in Omega Centauri using Gaia Focused Product Release and Hubble Space Telescope Nagaraj Vernekar et.al. 2502.17755 null
2025-02-24 Knowledge Distillation with Training Wheels Guanlin Liu et.al. 2502.17717 null
2025-02-24 THOR: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX Farshad Dizani et.al. 2502.17658 null
2025-02-24 LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification Penghui Yang et.al. 2502.17421 link
2025-02-24 Defects in the $β$-Ga$_2$O$_3$($\bar201$)/HfO$_2$ MOS system and the effect of thermal treatments Khushabu. S. Agrawal et.al. 2502.17112 null
2025-05-25 CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter Yepeng Weng et.al. 2502.16880 null
2025-02-24 APINT: A Full-Stack Framework for Acceleration of Privacy-Preserving Inference of Transformers based on Garbled Circuits Hyunjun Cho et.al. 2502.16877 null
2025-04-03 Towards Reinforcement Learning for Exploration of Speculative Execution Vulnerabilities Evan Lai et.al. 2502.16756 null
2025-02-22 Fluctuating Lattice, Several Energy Scales Holger Bech Nielsen et.al. 2502.16369 null
2025-02-21 DReSD: Dense Retrieval for Speculative Decoding Milan Gritta et.al. 2502.15572 link
2025-02-27 PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System Yintao He et.al. 2502.15470 null
2025-02-24 Ultra-high-energy $γ$ -ray emission associated with the tail of a bow-shock pulsar wind nebula Zhen Cao et.al. 2502.15447 null
2025-02-21 TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding Zhaoxuan Wu et.al. 2502.15197 null
2025-02-21 A Critical Examination of the Nested Leaky Box Model for Galactic Cosmic Ray Transport Benedikt Schroer et.al. 2502.15115 null
2025-03-11 FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Weilin Zhao et.al. 2502.14856 null
2025-05-07 Fusion rules and structure constants of E-series minimal models Rongvoram Nivesvivat et.al. 2502.14295 null
2025-02-19 Which Attention Heads Matter for In-Context Learning? Kayo Yin et.al. 2502.14010 link
2025-03-17 NVR: Vector Runahead on NPUs for Sparse Memory Access Hui Wang et.al. 2502.13873 null
2025-02-19 Hierarchical accretion flow from the G351 infrared dark filament to its central cores H. Beuther et.al. 2502.13866 null
2025-02-19 C2T: A Classifier-Based Tree Construction Method in Speculative Decoding Feiye Huo et.al. 2502.13652 null
2025-02-19 Near-extremal dumb holes and some aspects of the Hawking effect Akshat Pandey et.al. 2502.13557 null
2025-02-19 Radio observations of the ultra-long GRB 220627A reveal a hot cocoon supporting the blue supergiant progenitor scenario James K. Leung et.al. 2502.13435 null
2025-02-18 Inconsistent metallicity spreads in first generation stars of globular clusters from high resolution spectroscopy and HST photometry Eugenio Carretta et.al. 2502.13206 null
2025-02-17 SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs Yige Xu et.al. 2502.12134 null
2025-02-16 AI Generations: From AI 1.0 to AI 4.0 Jiahao Wu et.al. 2502.11312 null
2025-02-16 Coherent Spin Pumping Originated from Sub-Terahertz Néel Vector Dynamics in Easy Plane α-Fe2O3/Pt Gregory Fritjofson et.al. 2502.11281 null
2025-02-16 GRIFFIN: Effective Token Alignment for Faster Speculative Decoding Shijing Hu et.al. 2502.11018 link
2025-02-05 QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache Rishabh Tiwari et.al. 2502.10424 null
2025-02-13 Rosette Nebula Outburst Gaia 24djk from the Young Stellar Object V557 Mon Adolfo S. Carvalho et.al. 2502.09523 null
2025-02-13 $^{18}$ F-FDG brain PET hypometabolism in post-SARS-CoV-2 infection: substrate for persistent/delayed disorders? Eric Guedj et.al. 2502.09077 null
2025-02-13 CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality Razvan-Gabriel Dumitru et.al. 2502.08923 link
2025-03-19 Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding Ziyao Wang et.al. 2502.08020 null
2025-04-13 Regular Black Holes in Lovelock gravity with a Degenerate AdS Ground State and their shadows Milko Estrada et.al. 2502.07992 null
2025-03-06 Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs Ruichen Zhang et.al. 2502.07942 null
2025-02-05 Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference Toby Simonds et.al. 2502.06833 null
2025-02-10 Persistent spin grids with spin-orbit coupled 2D electron gas A. V. Poshakinskiy et.al. 2502.06745 null
2025-03-27 LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models Sihwan Park et.al. 2502.06352 link
2025-02-10 Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE Haiduo Huang et.al. 2502.06282 link
2025-02-08 Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding Sukmin Cho et.al. 2502.05609 link
2025-01-31 Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies Nadav Timor et.al. 2502.05202 null
2025-02-07 Learning Universal Multi-level Market Irrationality Factors to Improve Stock Return Forecasting Chen Yang et.al. 2502.04737 null
2025-02-06 Speeding up Speculative Decoding via Approximate Verification Meiyu Zhong et.al. 2502.04557 null
2025-02-06 Gig2Gether: Data-sharing to Empower, Unify and Demystify Gig Work Jane Hsieh et.al. 2502.04482 null
2025-02-06 The Evolution of Hypervelocity Supernova Survivors and the Outcomes of Interacting Double White Dwarf Binaries Ken J. Shen et.al. 2502.04451 null
2025-02-06 Properties of the emission region in pulsars with opposite subpulse drift directions in different profile components H. M. Tedila et.al. 2502.03833 null
2025-02-05 COSMOS-Web: The emergence of the Hubble Sequence M. Huertas-Company et.al. 2502.03532 null
2025-02-13 FSLH: Flexible Mechanized Speculative Load Hardening Roberto Blanco et.al. 2502.03203 null
2025-02-05 How probable is the Lyman- $α$ damping wing in the spectrum of the redshift z = 5.9896 quasar ULAS J0148+0600? Fiona Sawyer et.al. 2502.03085 null
2025-02-05 A comprehensive study of the gas-phase formation network of HC $_5$ N: theory, experiments, observations and models Lisa Giani et.al. 2502.03046 null
2025-04-17 The connection between high-redshift galaxies and Lyman $α$ transmission in the Sherwood-Relics simulations of patchy reionisation Luke Conaboy et.al. 2502.02983 null
2025-02-05 Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation Jingyu Liu et.al. 2502.02789 link
2025-02-04 EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization Yize Wu et.al. 2502.02493 null
2025-02-04 M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference Nikhil Bhendawade et.al. 2502.02040 null
2025-02-03 Cosmic Ray Feedback in Massive Halos: Implications for the Distribution of Baryons Eliot Quataert et.al. 2502.01753 null
2025-02-01 Speculative Ensemble: Fast Large Language Model Ensemble via Speculation Jiale Fu et.al. 2502.01662 link
2025-02-03 Time-dependent solutions of biadjoint scalar field theories Kymani Armstrong-Williams et.al. 2502.01294 null
2025-02-02 Constructing AI ethics narratives based on real-world data: Human-AI collaboration in data-driven visual storytelling Mengyi Wei et.al. 2502.00637 null
2025-02-01 Predicting the number density of heavy seed massive black holes due to an intense Lyman-Werner field Hannah O’Brennan et.al. 2502.00574 null
2025-02-04 Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation Yang Cao et.al. 2502.00500 null
2025-02-14 Reward-Guided Speculative Decoding for Efficient LLM Reasoning Baohao Liao et.al. 2501.19324 null
2025-01-31 Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment Gregor Bachmann et.al. 2501.19309 null
2025-02-19 Emancipatory Information Retrieval Bhaskar Mitra et.al. 2501.19241 null
2025-01-31 Trading Inference-Time Compute for Adversarial Robustness Wojciech Zaremba et.al. 2501.18841 null
2025-01-30 Human Re-ID Meets LVLMs: What can we expect? Kailash Hambarde et.al. 2501.18698 null
2025-01-28 How Hamilton-Jacobi formalism helps to address the physical meaning of the wave function in Bohmian mechanics Arnaud Amblard et.al. 2501.16989 null
2025-03-04 Distilling Large Language Models for Network Active Queue Management Deol Satish et.al. 2501.16734 null
2025-01-24 The disrupting and growing open cluster spiral arm patterns of the Milky Way Xiaochen Liu et.al. 2501.14215 null
2025-01-19 Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks Diego Gosmar et.al. 2501.13946 link
2025-01-23 Inflaton Self Resonance, Oscillons, and Gravitational Waves in Small Field Polynomial Inflation Manuel Drees et.al. 2501.13811 null
2025-01-23 Considerations on the Origin of IRAS 19312+1950 Based on Long-Term Maser Observations Huan-Xue Feng et.al. 2501.13769 null
2025-01-23 Compiler Support for Speculation in Decoupled Access/Execute Architectures Robert Szafarczyk et.al. 2501.13553 null
2025-02-01 Concentration in Governance Control Across Decentralised Finance Protocols Thomas Eisermann et.al. 2501.13377 link
2025-01-22 The outer structure of old star clusters in the Small Magellanic Cloud Andrés E. Piatti et.al. 2501.13062 null
2025-01-22 Entanglement dynamics in collision models and entanglement quilts Le Hu et.al. 2501.12629 null
2025-01-22 Link in $\mathbb{R}\mathbb{P}^3$ and the Topological Vertex John Chae et.al. 2501.12566 null
2025-01-21 AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding Zikun Li et.al. 2501.12162 null
2025-01-20 MIDIS: Quantifying the AGN component of X-ray-detected galaxies Steven Gillman et.al. 2501.11491 null
2025-01-23 The JWST EXCELS survey: an extremely metal-poor galaxy at $z=8.271$ hosting an unusual population of massive stars F. Cullen et.al. 2501.11099 null
2025-01-30 Vortices for lake equations (review with questions and speculations) Jair Koiller et.al. 2501.10433 null
2025-01-17 From strong to weak correlations in breathing-mode kagome van der Waals materials: Nb $_3$(F,Cl,Br,I)$_8$ as a robust and versatile platform for many-body engineering Joost Aretz et.al. 2501.10320 null
2025-01-16 25 years of XMM-Newton observations of the Sgr A complex: 3D distribution and internal structure of the clouds G. Stel et.al. 2501.09737 null
2025-01-16 Weak electronic correlations in the cobalt oxychalcogenide superconductor Na2CoSe2O Zhenchao Wu et.al. 2501.09675 null
2025-02-11 Anatomy of a Digital Bubble: Lessons Learned from the NFT and Metaverse Frenzy Daisuke Kawai et.al. 2501.09601 null
2025-01-16 A universal break in energy functions of three hyperactive repeating fast radio bursts Q. Wu et.al. 2501.09248 null
2025-01-15 The emission of interpulses by a 6.45-hour period coherent radio transient Y. W. J. Lee et.al. 2501.09133 null
2025-01-13 Cassiopeia A’s Reverse Shock and its Effects on the Expanding SN Ejecta Robert A. Fesen et.al. 2501.07708 null
2025-01-11 Is the Monetary Transmission Mechanism Broken? Time for People’s Quantitative Easing Sebastian Dragoe et.al. 2501.06575 null
2025-01-27 QPEs as Lense-Thirring precession of super-Eddington flows M. Middleton et.al. 2501.06185 link
2025-01-10 Analysing the coverage of the University of Bologna’s publication metadata in an existing source of open research information Erica Andreose et.al. 2501.05821 null
2025-01-09 Accelerated Diffusion Models via Speculative Sampling Valentin De Bortoli et.al. 2501.05370 null
2025-01-09 The CO-Fuelled Time Machine: Tracing Birth Conditions and Terrestrial Planet Formation Outcomes in HD 163296 through Pebble Drift-induced CO Enhancements Joe Williams et.al. 2501.05316 null
2025-01-09 Observational Study of the Atmospheric Gravity Waves in the lower Solar Atmosphere Ravi Chaurasiya et.al. 2501.05042 null
2025-01-07 Transparent Decompilation for Timing Side-Channel Analyses Santiago Arranz Olmos et.al. 2501.04183 null
2025-01-07 Spin Environment of a Superconducting Qubit in High Magnetic Fields S. Günzler et.al. 2501.03661 null
2025-01-07 Neural Cellular Automata and Deep Equilibrium Models Zhibai Jia et.al. 2501.03573 null
2025-01-07 CI at Scale: Lean, Green, and Fast Dhruva Juloori et.al. 2501.03440 null
2025-01-02 Vertex algebras, topological defects, and Moonshine Roberto Volpato et.al. 2412.21141 null
2024-12-30 Strategic Learning and Trading in Broker-Mediated Markets Alif Aqsha et.al. 2412.20847 null
2024-12-28 From Worms to Mice: Homeostasis Maybe All You Need Jesus Marco de Lucas et.al. 2412.20090 null
2025-01-13 HADES: Hardware Accelerated Decoding for Efficient Speculation in Large Language Models Ze Yang et.al. 2412.19925 null
2024-12-27 Cosmohedra Nima Arkani-Hamed et.al. 2412.19881 null
2024-12-27 Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design Junjie Zhang et.al. 2412.19439 null
2024-12-25 Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference Libo Zhang et.al. 2412.18934 null
2024-12-25 AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures Situo Zhang et.al. 2412.18910 null
2024-12-23 The Unique Helium Nova V445 Puppis Ejected $\gg$0.001 M$_{\odot}$ in the Year 2000 and Will Not Become a Type Ia Supernova Bradley E. Schaefer et.al. 2412.17286 null
2024-12-20 Gravitational Observatories in AdS $_4$ Dionysios Anninos et.al. 2412.16305 null
2024-12-20 Two-Part Interplanetary Type II Solar Radio Bursts Silja Pohjolainen et.al. 2412.15961 null
2025-01-10 Minimizing speculation overhead in a parallel recognizer for regular texts Angelo Borsotti et.al. 2412.14975 null
2025-01-13 $\mathcal{N}=2$ superconformal gravitino in harmonic superspace Evgeny Ivanov et.al. 2412.14822 null
2025-02-07 The JWST/NIRSpec view of the nuclear region in the prototypical merging galaxy NGC 6240 Matteo Ceci et.al. 2412.14685 null
2024-12-18 Fermion-Portal Dark Matter at a High-Energy Muon Collider Pouya Asadi et.al. 2412.14235 null
2024-12-18 Current and secular accretion rates of EX Hydrae K. Beuermann et.al. 2412.13850 null
2024-12-18 Fool’s gold: ligand-receptor interactions and the origins of life Betony Adams et.al. 2412.13836 null
2024-12-18 Diffusion models and stochastic quantisation in lattice field theory Gert Aarts et.al. 2412.13704 null
2024-12-17 Distributed Speculative Execution for Resilient Cloud Applications Tianyu Li et.al. 2412.13314 null
2024-12-17 Where do X-ray low surface brightness clusters sit with respect to filaments? S. Zarattini et.al. 2412.13258 null
2024-12-17 Agnosticism About Artificial Consciousness Tom McClelland et.al. 2412.13145 null
2024-12-17 Insight into the Starburst Nature of Galaxy GN-z11 with JWST MIRI Spectroscopy J. Álvarez-Márquez et.al. 2412.12826 null
2025-03-18 Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models Seungeun Oh et.al. 2412.12687 null
2024-12-26 Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree Xiangxiang Gao et.al. 2412.12639 null
2024-12-15 Heat kernel and local index theorem for open complex manifolds with $\mathbb{C}^{\ast }$ -action Jih-Hsin Cheng et.al. 2412.11037 null
2024-12-14 The JWST-NIRCam View of Sagittarius C. II. Evidence for Magnetically Dominated HII Regions in the CMZ John Bally et.al. 2412.10983 null
2025-02-23 Interference in Fuzzy Dark Matter Filaments: Idealised Models and Statistics Tim Zimmermann et.al. 2412.10829 null
2025-02-10 Constrained Decoding with Speculative Lookaheads Nishanth Nakshatri et.al. 2412.10418 null
2025-01-15 Asymmetric Temperature Variations In Protoplanetary disks: I. Linear Theory, Corotating Spirals, and Ring Formation Zhaohuan Zhu et.al. 2412.09571 null
2024-12-12 AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs’ Complex Reasoning Capabilities Fabrizio Davide et.al. 2412.09385 null
2024-12-11 Can transformative AI shape a new age for our civilization?: Navigating between speculation and reality Jesus L. Lobo et.al. 2412.08273 null
2024-12-10 Mapping the spatial extent of HI-rich absorbers using MgII absorption along gravitational arcs Trystyn A. M. Berg et.al. 2412.07652 null
2024-12-26 CoinCLIP: A Multimodal Framework for Assessing Viability in Web3 Memecoins Hou-Wan Long et.al. 2412.07591 null
2024-12-10 Modeling Speculative Trading Patterns in Token Markets: An Agent-Based Analysis with TokenLab Mengjue Wang et.al. 2412.07512 null
2024-12-10 KPZ-like scaling on a high-dimensional hypersphere Daniil Fedotov et.al. 2412.07432 null
2024-12-10 Exploring types I and IIA effective actions through T-duality Mohammad R. Garousi et.al. 2412.07234 null
2024-12-10 Relativistic Mott transition in strongly correlated artificial graphene Liguo Ma et.al. 2412.07150 null
2024-12-10 Gravitational focusing and horizon entropy for higher-spin fields Zihan Yan et.al. 2412.07107 null
2024-12-09 Inelastic H + H $^+_3$ Collision rates and their impact in the determination of the excitation temperature of H$^+_3$ Daniel Felix-Gonzalez et.al. 2412.06697 null
2024-12-09 Systematic comparison of deep generative models applied to multivariate financial time series Howard Caulfield et.al. 2412.06417 null
2024-12-09 Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects Louis Milliken et.al. 2412.06294 link
2024-12-06 Revisiting the hallmark freezing and melting points in colloidal dispersions and the search for the elusive coexistence region J. Galen Wang et.al. 2412.05422 null
2024-12-06 Penetrative rotating magnetoconvection subject to lateral variations in temperature gradients Tirtharaj Barman et.al. 2412.05235 null
2024-12-06 Predictive Window Decoding for Fault-Tolerant Quantum Programs Joshua Viszlai et.al. 2412.05115 null
2024-12-04 Successive magnetic transitions in the spin-5/2 easy-axis triangular-lattice antiferromagnet Na $_2$BaMn(PO$_4$)$_2$ : A neutron diffraction study Chuandi Zhang et.al. 2412.03149 null
2025-01-02 The Reality of AI and Biorisk Aidan Peppin et.al. 2412.01946 null
2024-12-02 PLD+: Accelerating LLM inference by leveraging Language Model Artifacts Shwetha Somasundaram et.al. 2412.01447 null
2024-12-02 Enhanced solid solution hardening by off-center substitutional solute atoms in α-Ti Zi-Han Yu et.al. 2412.01298 null
2024-11-25 Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration Zhuofan Wen et.al. 2412.00061 null
2024-11-12 The Copernican Argument for Alien Consciousness; The Mimicry Argument Against Robot Consciousness Eric Schwitzgebel et.al. 2412.00008 null
2024-11-28 Night-Side Relativistic Electron Precipitation Bursts in the Outer Radiation Belt: Insights from ELFIN and THEMIS Xi Lu et.al. 2411.19232 null
2024-11-27 Magnetic field tuned superconducting and normal phase magnetism in CeCo ${0.5}$Rh${0.5}$In$_{5}$ A. Howell et.al. 2411.18540 null
2024-11-27 Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding Ziyin Zhang et.al. 2411.18462 link
2024-11-27 6G Takes Shape Jeffrey G. Andrews et.al. 2411.18435 null
2024-11-27 An evolution of matrix-valued orthogonal polynomials Erik Koelink et.al. 2411.18362 null
2024-11-27 Comprehensive Kernel Safety in the Spectre Era: Mitigations and Performance Evaluation (Extended Version) Davide Davoli et.al. 2411.18094 null
2024-12-25 Stellar evolution along the AGB as revealed by the shape of Miras’ visual light curves D. T. Hoai et.al. 2411.18044 null
2024-11-26 Stable curves and chromatic polynomials Bernhard Reinke et.al. 2411.17551 null
2024-12-08 A revamped understanding of Cosmic Rays and Gamma-Ray Bursts A. De Rújula et.al. 2411.15850 null
2024-11-20 The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz David Noever et.al. 2411.14486 null
2024-12-03 Mediating Modes of Thought: LLM’s for design scripting Moritz Rietschel et.al. 2411.14485 null
2024-11-21 THz optical response of Ba(Fe ${1-x}$Ni$_x$)$_2$As$_2$ films analyzed within the three-band Eliashberg s$\pm$ -wave model Yurii A. Aleshchenko et.al. 2411.14011 null
2024-11-27 Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding Hyun Ryu et.al. 2411.13157 null
2024-11-20 Far-field Boundary Conditions for Airfoil Simulation at High Incidence in Steady, Incompressible, Two-dimensional Flow Narges Golmirzaee et.al. 2411.13077 null
2024-11-19 Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing Ruyi Ding et.al. 2411.12508 null
2025-09-30 Continuous Speculative Decoding for Autoregressive Image Generation Zili Wang et.al. 2411.11925 null
2024-12-26 Teapot: Efficiently Uncovering Spectre Gadgets in COTS Binaries Fangzheng Lin et.al. 2411.11624 null
2024-11-30 Diversity of disc viscosities can explain the period ratios of resonant and non-resonant systems of hot super-Earths and mini-Neptunes Bertram Bitsch et.al. 2411.11452 null
2024-11-25 First memoir on the asymptotics of certain infinite products Wadim Zudilin et.al. 2411.11100 null
2024-11-17 FastDraft: How to Train Your Draft Ofir Zafrir et.al. 2411.11055 null
2024-12-16 SAM Decoding: Speculative Decoding via Suffix Automaton Yuxuan Hu et.al. 2411.10666 link
2024-11-15 Moving Forward: A Review of Autonomous Driving Software and Hardware Systems Xu Wang et.al. 2411.10291 null
2024-11-14 Cosmic inflation in an extended non-commutative foliated quantum gravity: the wave function of the universe César A. Zen Vasconcellos et.al. 2411.09756 null
2024-11-15 Provocation: Who benefits from “inclusion” in Generative AI? Samantha Dalal et.al. 2411.09102 null
2024-11-13 Thought Experiments in Design Fiction for Visualization Swaroop Panda et.al. 2411.08621 null
2025-01-01 A Geometric Substructure for Quantum Dynamics Anthony John Bracken et.al. 2411.08230 null
2025-01-11 The Grass of the Universe: Rethinking Technosphere, Planetary History, and Sustainability with Fermi Paradox Lukáš Likavčan et.al. 2411.08057 null
2024-11-12 A rich structure of renormalization group flows for Higgs-like models in 4 dimensions André LeClair et.al. 2411.07476 null
2024-11-12 Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions Siddharth Agarwal et.al. 2411.07444 null
2024-11-11 The Inherent Adversarial Robustness of Analog In-Memory Computing Corey Lammie et.al. 2411.07023 null
2024-11-10 Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents Yu Gu et.al. 2411.06559 link
2024-11-10 MOCCA-III: Effects of pristine gas accretion and cluster migration on globular cluster evolution, global parameters and multiple stellar populations Mirek Giersz et.al. 2411.06421 null
2024-11-10 Generating Mixcode Popular Songs with Artificial Intelligence: Concepts, Plans, and Speculations Abhishek Kaushik et.al. 2411.06420 null
2024-11-08 SSSD: Simply-Scalable Speculative Decoding Michele Marzollo et.al. 2411.05894 null
2024-11-08 SpecHub: Provable Acceleration to Multi-Draft Speculative Decoding Ryan Sun et.al. 2411.05289 link
2024-11-07 SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference Gabriele Oliaro et.al. 2411.04975 null
2024-11-06 The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation Lawrence Stewart et.al. 2411.03786 null
2024-11-05 Remarkable Scale Relation, Approximate SU(5), Fluctuating Lattice Holger Bech Nielsen et.al. 2411.03552 null
2024-11-05 Shared Memory-Aware Latency-Sensitive Message Aggregation for Fine-Grained Communication Kavitha Chandrasekar et.al. 2411.03533 null
2024-11-07 A high resolution simulation of protoplanetary disk turbulence driven by the vertical shear instability Karim Shariff et.al. 2411.03467 null
2024-11-04 PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption Yifan Tan et.al. 2411.03357 null
2024-11-05 On the possible core shift break in relativistic jets E. E. Nokhrina et.al. 2411.02925 null
2024-11-04 A proof of self-organized criticality in a sandpile Christopher Hoffman et.al. 2411.02541 null
2025-02-07 Pseudo Transitions in the Finite-Size Blume-Capel Model Lei Shi et.al. 2411.01743 null
2024-11-05 Privacy Risks of Speculative Decoding in Large Language Models Jiankun Wei et.al. 2411.01076 null
2024-10-30 Accelerated AI Inference via Dynamic Execution Methods Haim Barad et.al. 2411.00853 null
2024-11-05 A Theoretical Perspective for Speculative Decoding Algorithm Ming Yin et.al. 2411.00841 null
2024-10-31 Interpretable Language Modeling via Induction-head Ngram Models Eunji Kim et.al. 2411.00066 link
2024-10-31 ALISE: Accelerating Large Language Model Serving with Speculative Scheduling Youpeng Zhao et.al. 2410.23537 null
2024-10-30 Flavor Patterns of Fundamental Particles from Quantum Entanglement? Jesse Thaler et.al. 2410.23343 null
2024-10-29 Lost and Found in Speculation: Hybrid Speculative Vulnerability Detection Mohamadreza Rostami et.al. 2410.22555 null
2025-02-10 Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding Bohan Li et.al. 2410.21951 null
2024-10-29 Rapid cooling of the Cassiopeia A neutron star due to superfluid quantum criticality Hao-Fu Zhu et.al. 2410.21945 null
2024-10-28 Model-agnostic basis functions for the 2-point correlation function of dark matter in linear theory Aseem Paranjape et.al. 2410.21374 link
2024-10-11 The Social Impact of Generative LLM-Based AI Yu Xie et.al. 2410.21281 null
2024-10-28 On the limits of informationally efficient stock markets: New insights from a chartist-fundamentalist model Laura Gardini et.al. 2410.21198 null
2024-10-27 A Jet-Induced Shock in a Young, Powerful Radio Galaxy at z=3.00 Nick Seymour et.al. 2410.20609 null
2024-10-27 FIRP: Faster LLM inference via future intermediate representation prediction Pengfei Wu et.al. 2410.20488 null
2024-10-27 Inevitable Trade-off between Watermark Strength and Speculative Sampling Efficiency for Language Models Zhengmian Hu et.al. 2410.20418 null
2024-10-31 Fast Best-of-N Decoding via Speculative Rejection Hanshi Sun et.al. 2410.20290 link
2024-10-24 Intention Is All You Need Advait Sarkar et.al. 2410.18851 null
2024-10-24 AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability Sudhanshu Agrawal et.al. 2410.18351 null
2024-10-23 Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits Ashish Khisti et.al. 2410.18234 null
2025-02-10 Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition Artem Basharin et.al. 2410.17765 null
2024-10-22 AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration Bradley McDanel et.al. 2410.17375 link
2024-10-22 Remote Timing Attacks on Efficient Language Model Inference Nicholas Carlini et.al. 2410.17175 null
2024-10-23 Quantum many-body scars as remnants of stable many-body periodic orbits Keita Omiya et.al. 2410.16916 null
2024-10-22 Chiral polaritonics: cavity-mediated enantioselective excitation condensation Rosario R. Riso et.al. 2410.16861 null
2024-10-22 An Extreme Radio Fluctuation of Pulsar B1929 $+$ 10 Zhengli Wang et.al. 2410.16816 null
2024-10-21 Galaxy Size and Mass Build-up in the First 2 Gyrs of Cosmic History from Multi-Wavelength JWST NIRCam Imaging Natalie Allen et.al. 2410.16354 null
2024-10-30 TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling Jiahao Qiu et.al. 2410.16033 null
2024-10-21 Efficient and Universally Accessible Cross-Chain Options without Upfront Holder Collateral Zifan Peng et.al. 2410.15724 null
2024-10-21 Investigating Unusual H $α$ Features towards the Scutum Supershell R. Alsulami et.al. 2410.15712 null
2024-10-17 Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding Tan Dat Nguyen et.al. 2410.13839 null
2024-10-17 Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions Michael J. Q. Zhang et.al. 2410.13788 null
2024-10-17 Looking Inward: Language Models Can Learn About Themselves by Introspection Felix J Binder et.al. 2410.13787 link
2024-10-17 PGC 44685: A Dwarf Star-forming Lenticular Galaxy with Wolf-Rayet Population Shiying Lu et.al. 2410.13119 null
2024-10-16 Gravitational instantons and the quality problem of the QCD axion: Facts, speculations, and statements in between Pier Giuseppe Catinari et.al. 2410.12741 null
2024-10-15 Evolution of Ferromagnetism and Electrical Resistivity in Sb-Doped Cr4PtGa17 Chaoguo Wang et.al. 2410.12078 null
2024-10-15 MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation Chenxi Wang et.al. 2410.11779 link
2024-10-15 DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure Yunfan Xiong et.al. 2410.11744 null
2024-10-15 Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling Wenda Xu et.al. 2410.11325 null
2025-02-01 QSpec: Speculative Decoding with Complementary Quantization Schemes Juntao Zhao et.al. 2410.11305 null
2024-11-20 Unveiling dust, molecular gas, and high star formation efficiency in extremely UV bright star-forming galaxies at $z\sim 2.1-3.6$ M. Dessauges-Zavadsky et.al. 2410.11121 null
2024-10-01 Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models Keivan Alizadeh et.al. 2410.10846 null
2024-10-15 The Discovery of Polarized Water Vapor Megamaser Emission in a Molecular Accretion Disk Jack F. Gallimore et.al. 2410.10569 null
2024-10-14 Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation Siru Ouyang et.al. 2410.10141 null
2024-11-12 Self-Data Distillation for Recovering Quality in Pruned Large Language Models Vithursan Thangarasa et.al. 2410.09982 null
2024-10-13 Super-Bandgap Electroluminescence from Cesium Lead Bromide Justin Sculley et.al. 2410.09702 null
2024-10-21 On Two Nucleons Near Unitarity with Perturbative Pions Yu Ping Teng et.al. 2410.09653 null
2024-10-11 Compact [OIII] emission-line regions (“Green Seeds”) in $\mathrm{Hα}$ emitters at Cosmic Noon from JWST Observations Nuo Chen et.al. 2410.08520 null
2024-10-09 SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration Heming Xia et.al. 2410.06916 link
2025-02-06 Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level Xinyi Zeng et.al. 2410.06809 null
2024-10-08 ParallelSpec: Parallel Drafter for Efficient Speculative Decoding Zilin Xiao et.al. 2410.05589 null
2024-10-09 Density estimation with LLMs: a geometric investigation of in-context learning trajectories Toni J. B. Liu et.al. 2410.05218 null
2024-10-08 Efficient Inference for Large Language Model-based Generative Recommendation Xinyu Lin et.al. 2410.05165 null
2024-10-04 Density functional theory based investigation of heavy fermion band candidates in triplet superconductor UTe2 Shouzheng Liu et.al. 2410.03840 null
2024-10-04 Mixture of Attentions For Speculative Decoding Matthieu Zimmer et.al. 2410.03804 null
2024-10-03 AI-rays: Exploring Bias in the Gaze of AI Through a Multimodal Interactive Installation Ziyao Gao et.al. 2410.03786 null
2024-09-24 Nonmetric geometric flows and quasicrystalline topological phases for dark energy and dark matter in $f(Q)$ cosmology L. Bubuianu et.al. 2410.03700 null
2025-01-31 LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding Doohyuk Jang et.al. 2410.03355 null
2024-10-04 Generative Edge Detection with Stable Diffusion Caixia Zhou et.al. 2410.03080 null
2024-10-03 Inductive Generative Recommendation via Retrieval-based Speculation Yijie Ding et.al. 2410.02939 link
2024-10-03 The Stellar Initial Mass Function of Early Dark Matter-free Gas Objects William Lake et.al. 2410.02868 null
2024-10-03 Atoms near a conducting wedge: decay rates and entanglement around a corner Romuald Kilianski et.al. 2410.02349 null
2024-10-02 Time Variation of the Solar Tachocline Sarbani Basu et.al. 2410.01895 null
2024-12-25 Interpretable Contrastive Monte Carlo Tree Search Reasoning Zitian Gao et.al. 2410.01707 link
2024-10-02 Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding Yao Teng et.al. 2410.01699 link
2024-12-09 Forte : Finding Outliers with Representation Typicality Estimation Debargha Ganguly et.al. 2410.01322 link
2024-10-02 Speculative Coreset Selection for Task-Specific Fine-tuning Xiaoyu Zhang et.al. 2410.01296 null
2024-10-01 Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity Michael R. Metel et.al. 2410.01028 null
2024-10-01 A Scheduling-Aware Defense Against Prefetching-Based Side-Channel Attacks Till Schlüter et.al. 2410.00452 null
2024-11-12 Galactic center G objects as dust-enshrouded stars near the supermassive black hole Michal Zajaček et.al. 2410.00304 null
2024-09-30 Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface Wenyue Hua et.al. 2410.00079 null
2024-09-30 Statistical view of orbital circularisation with 14 000 characterised TESS eclipsing binaries L. W. IJspeert et.al. 2409.20540 null
2024-09-30 New HI observations Toward the NGC 5055 Galaxy Group with FAST Xiao-Lan Liu et.al. 2409.20109 null
2024-09-27 Thermal Conductivity of Cubic Silicon Carbide Single Crystals Heavily Doped by Nitrogen Zifeng Huang et.al. 2409.18843 null
2024-09-27 SpecCFA: Enhancing Control Flow Attestation/Auditing via Application-Aware Sub-Path Speculation Adam Caulfield et.al. 2409.18403 null
2025-03-17 Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference Zongyue Qin et.al. 2409.16560 null
2024-09-22 ALMASOP. The Localized and Chemically rich Features near the Bases of the Protostellar Jet in HOPS 87 Shih-Ying Hsu et.al. 2409.14445 null
2024-09-21 Triangulating on Possible Futures: Conducting User Studies on Several Futures Instead of Only One Antti Salovaara et.al. 2409.14137 null
2024-09-29 String Invention, Viable 3-3-1 Model, Dark Matter Black Holes Holger B. Nielsen et.al. 2409.13776 null
2024-09-20 Interstellar Glycolaldehyde, Methyl Formate, and Acetic Acid. II. Chemical Modeling of the Bimodal Abundance Pattern in NGC 6334I Brielle M. Shope et.al. 2409.13673 null
2024-09-20 A Comparison between Financial and Gambling Markets Haoyu Liu et.al. 2409.13528 null
2024-12-12 Consequences of Minimal Entanglement in Bosonic Field Theories Spencer Chang et.al. 2409.13030 null
2024-09-17 UNCOVER: Significant Reddening in Cosmic Noon Quiescent Galaxies Jared Siegel et.al. 2409.11457 null
2024-09-17 The ALMA-CRISTAL Survey: Spatially-resolved Star Formation Activity and Dust Content in 4 < z < 6 Star-forming Galaxies Juno Li et.al. 2409.10961 null
2024-12-14 Improving Multi-candidate Speculative Decoding Xiaofan Lu et.al. 2409.10644 link
2024-09-16 Aggregation-diffusion in heterogeneous environments Jonathan R. Potts et.al. 2409.10147 link
2024-12-12 Pure Lovelock Gravity regular black holes Milko Estrada et.al. 2409.09559 null
2024-09-14 Ground State Phase Diagram of $\text{SU}(3)$ $t$-$J$ Chain Junhao Zhang et.al. 2409.09344 null
2024-12-02 Two-Time Relativistic Bohmian Model of Quantum Mechanics Giuseppe Raguní et.al. 2409.09049 null
2024-09-13 Dynamic Simultaneous Multithreaded Architecture Daniel Ortiz-Arroyo et.al. 2409.07903 null
2024-09-09 DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL Arturo Gonzalez-Escribano et.al. 2409.06075 null
2024-10-05 Predicting Foreign Exchange EUR/USD direction using machine learning Kevin Cedric Guyard et.al. 2409.04471 null
2024-09-05 Evidence for Dust Depletion in a Misaligned Protoplanetary Disk with JWST C. C. Espaillat et.al. 2409.03702 null
2024-09-04 Cavitating bubbles in condensing gas as a means of forming clumps, chondrites, and planetesimals Eugene Chiang et.al. 2409.02978 null
2024-09-03 Light-Ray Wave Functions and Integrability Alexandre Homrich et.al. 2409.02160 null
2024-09-03 Foreactor: Exploiting Storage I/O Parallelism with Explicit Speculation Guanzhou Hu et.al. 2409.01580 null
2024-09-02 A Comprehensive Analysis of the Future of Atomically Precise Manufacturing Vadym Shvydun et.al. 2409.00955 null
2024-08-30 Dynamic Depth Decoding: Faster Speculative Decoding for LLMs Oscar Brown et.al. 2409.00142 null
2024-08-29 LightSLH: Provable and Low-Overhead Spectre v1 Mitigation through Targeted Instruction Hardening Yiming Zhu et.al. 2408.16220 null
2024-08-28 An Empirical Study of API Misuses of Data-Centric Libraries Akalanka Galappaththi et.al. 2408.15853 null
2024-08-28 Indirect nonlinear interaction between toroidal Alfvén eigenmode and ion temperature gradient mode mediated by zonal structures Qian Fang et.al. 2408.15782 null
2025-02-27 Learning Harmonized Representations for Speculative Sampling Lefan Zhang et.al. 2408.15766 null
2024-08-29 Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation Lujun Gui et.al. 2408.15562 null
2024-11-18 The companion mass distribution of post common envelope hot subdwarf binaries: evidence for boosted and disrupted magnetic braking? Lisa Blomberg et.al. 2408.15334 null
2024-08-27 The Way To Circumbinary Planets Hans J Deeg et.al. 2408.15307 null
2024-12-26 The Mamba in the Llama: Distilling and Accelerating Hybrid Models Junxiong Wang et.al. 2408.15237 link
2024-08-26 SO as shock tracer in protoplanetary disks: the AB Aurigae case A. Dutrey et.al. 2408.14276 null
2024-08-25 The origins of noise in the Zeeman splitting of spin qubits in natural-silicon devices Juan S. Rojas-Arias et.al. 2408.13707 null
2024-07-22 Simopt – Simulation pass for Speculative Optimisation of FPGA-CAD flow Eashan Wadhwa et.al. 2408.12676 null
2024-12-19 Exposing Shadow Branches Chrysanthos Pepi et.al. 2408.12592 null
2024-08-22 Enhancing Causal Discovery in Financial Networks with Piecewise Quantile Regression Cameron Cornell et.al. 2408.12210 null
2024-08-21 Electrostatic Origins of the Dirichlet Principle Steven Deckelman et.al. 2408.12002 null
2024-09-04 Parallel Speculative Decoding with Adaptive Draft Length Tianyu Liu et.al. 2408.11850 link
2024-08-21 Chemical models of interstellar glycine and adenine precursor aminoacetonitrile (NH2CH2CN) Xia Zhang et.al. 2408.11776 null
2024-08-20 High detection significance of the dark substructure in gravitational lens SDSSJ0946+1006 is revealed by image pixel supersampling Quinn E. Minor et.al. 2408.11090 null
2024-08-23 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Jian Chen et.al. 2408.11049 link
2024-08-20 Revisiting the measurements and interpretations of DLVO forces Bo Feng et.al. 2408.10870 null
2024-08-19 Constraining the Generalized Tolman-Oppenheimer-Volkoff (GTOV) equation with Bayesian analysis Franciele M. da Silva et.al. 2408.10425 null
2024-08-18 A new measure of risk using Fourier analysis Michael Grabinski et.al. 2408.10279 null
2024-08-19 Excitonic-trion population in two-dimensional halide perovskites Efstratios Manousakis et.al. 2408.10097 null
2024-08-16 Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling Xianzhen Luo et.al. 2408.08696 null
2024-08-15 KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning Kaiqi Zhang et.al. 2408.08146 null
2024-08-19 Coupling without Communication and Drafter-Invariant Speculative Decoding Majid Daliri et.al. 2408.07978 link
2024-12-06 The Small Sizes and High Implied Densities of `Little Red Dots’ with Balmer Breaks Could Explain Their Broad Emission Lines Without an AGN Josephine F. W. Baggen et.al. 2408.07745 null
2024-08-14 Only One Relation Possible? Modeling the Ambiguity in Event Temporal Relation Extraction Yutong Hu et.al. 2408.07353 null
2024-07-23 Stablecoin Runs and Disclosure Policy in the Presence of Large Sales Brian Zhu et.al. 2408.07227 null
2024-08-13 Speculations on Uncertainty and Humane Algorithms Nicholas Gray et.al. 2408.06736 null
2024-08-15 Inefficiencies of Carbon Trading Markets Nicola Borri et.al. 2408.06497 null
2024-08-12 Correct Wrong Path Bhargav Reddy Godala et.al. 2408.05912 null
2024-08-11 A Decoding Acceleration Framework for Industrial Deployable LLM-based Recommender Systems Yunjia Xi et.al. 2408.05676 link
2024-08-16 Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion Jacob K Christopher et.al. 2408.05636 null
2024-08-09 Recurrent Stochastic Fluctuations with Financial Speculation Tomohiro Hirano et.al. 2408.05047 null
2024-08-08 HotStuff-1: Linear Consensus with One-Phase Speculation Dakai Kang et.al. 2408.04728 null
2024-08-08 CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding Sophia Ho et.al. 2408.04678 null
2024-08-08 Black hole mass and optical radiation mechanism of the tidal disruption event AT 2023clx Shiyan Zhong et.al. 2408.04448 null
2024-08-05 Rich dynamical behaviors from a digital reversal operation Yannis Almirantis et.al. 2408.02527 null
2024-08-08 A speculative model for cyclic information preservation in Kerr-Newman spacetime using closed timelike curves Aviral Damle et.al. 2408.02116 null
2024-08-06 Selection bias obfuscates the discovery of fast radio burst sources Mohit Bhardwaj et.al. 2408.01876 null
2024-08-03 Dissolution zone model of the oxide structure in additively manufactured dispersion-strengthened alloys Wenyuan Hou et.al. 2408.01845 null
2024-08-02 AT2023vto: An Exceptionally Luminous Helium Tidal Disruption Event from a Massive Star Harsh Kumar et.al. 2408.01482 null
2024-08-01 Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection Steven Fincke et.al. 2408.00914 null
2024-08-01 Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding Bin Xiao et.al. 2408.00264 null
2024-07-31 Designing Beyond Current Conceptualizations of Spaceflight Experiences James Cole et.al. 2408.00085 null
2024-07-31 Revisiting the fundamental metallicity relation with observation and simulation Chengyu Ma et.al. 2407.21716 null
2024-07-31 The Bulk Densities of Small Solar System Bodies as a Probe of Planetesimal Formation Misako Tatsuuma et.al. 2407.21386 null
2024-08-19 Instantons and the Large N=4 Algebra Edward Witten et.al. 2407.20964 null
2024-07-17 Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies Lachlan McGinness et.al. 2407.20244 null
2024-08-19 Reduced decay in Josephson coupling across ferromagnetic junctions with spin-orbit coupling layers Ivan Kindiak et.al. 2407.19799 null
2024-07-26 Ionized and cold gas components in low surface brightness galaxy AGC 102004 Tian-Wen Cao et.al. 2407.18530 null
2024-07-25 Phase transitions in (2 + 1)D subsystem-symmetric monitored quantum circuits Cole Kelson-Packer et.al. 2407.18340 null
2024-08-31 Uniqueness of an $E_8$ model of elementary particles Robert A. Wilson et.al. 2407.18279 null
2024-07-24 Automorphisms of Calabi-Yau threefolds from algebraic dynamics and the second Chern class Keiji Oguiso et.al. 2407.17297 null
2024-07-24 Mapping the individual, social, and biospheric impacts of Foundation Models Andrés Domínguez Hernández et.al. 2407.17129 null
2024-07-04 Integrated Deflector Shield Technology for Spacecraft Florian Neukart et.al. 2407.16701 null
2024-07-23 Graph-Structured Speculative Decoding Zhuocheng Gong et.al. 2407.16207 null
2024-07-22 AI for Handball: predicting and explaining the 2024 Olympic Games tournament with Deep Learning and Large Language Models Florian Felice et.al. 2407.15987 null
2024-07-22 An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph B. Kaan Karamete et.al. 2407.15906 null
2024-07-23 Unveiling the Multifaceted GRB 200613A: Prompt Emission Dynamics, Afterglow Evolution, and the Host Galaxy’s Properties Shao-Yu Fu et.al. 2407.15824 null
2024-11-21 SNIP: Speculative Execution and Non-Interference Preservation for Compiler Transformations Sören van der Wall et.al. 2407.15080 null
2024-10-21 Is the difference between deep hedging and delta hedging a statistical arbitrage? Pascal François et.al. 2407.14736 link
2024-07-19 Rational Bubbles: A Clarification Tomohiro Hirano et.al. 2407.14017 null
2024-07-18 Surface roughening in nanoparticle catalysts Cameron J. Owen et.al. 2407.13643 null
2024-07-18 SecScale: A Scalable and Secure Trusted Execution Environment for Servers Ani Sunny et.al. 2407.13572 null
2024-07-17 RTL Verification for Secure Speculation Using Contract Shadow Logic Qinhan Tan et.al. 2407.12232 null
2024-07-16 Breakup dynamics of a neutron-halo projectile on heavy target at deep sub-barrier energies B. Mukeru et.al. 2407.12129 null
2024-11-16 PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation Branden Butler et.al. 2407.11798 null
2024-10-02 Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference Zongyue Qin et.al. 2407.09722 null
2024-07-17 Accelerating the inference of string generation-based chemical reaction models for industrial applications Mikhail Andronov et.al. 2407.09685 null
2024-09-12 Krylov complexity and chaos in deformed SYK models Shira Chapman et.al. 2407.09604 null
2024-07-21 6G: The Intelligent Network of Everything – A Comprehensive Vision, Survey, and Tutorial Harri Pennanen et.al. 2407.09398 null
2024-07-11 Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting Zilong Wang et.al. 2407.08223 null
2024-07-10 Purity benchmarking study of error coherence in a single Xmon qubit Auda Zhu et.al. 2407.07960 null
2024-07-10 Carbon Pricing and Resale in Emission Trading Systems Peyman Khezr et.al. 2407.07386 null
2024-08-21 Fuzzy Spheres in Stringy Matrix Models: Quantifying Chaos in a Mixed Phase Space Paolo Amore et.al. 2407.07259 null
2024-07-09 Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1) Yanlong Peng et.al. 2407.06590 null
2024-07-05 Statistical investigations into the geometry and homology of random programs Jon Sporring et.al. 2407.04854 null
2024-07-05 Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models Bolaji Yusuf et.al. 2407.04641 null
2024-11-13 Black Holes with a charged quantum dust core R. Casadio et.al. 2407.04146 null
2024-08-23 A distance conjecture beyond moduli? Cédric Debusschere et.al. 2407.03715 null
2024-07-03 Braneworld Black Bounce to Transversable Wormhole Analytically Connected to an asymptotically $AdS_5$ Boundary T. M. Crispim et.al. 2407.03528 null
2024-07-03 Origin of anomalous magnetotransport in kagome superconductors AV ${3}$Sb${5}$ (A=K,Rb,Cs) A. E. Koshelev et.al. 2407.03189 null
2024-09-24 Large-scale ordered magnetic fields generated in mergers of helium white dwarfs Rüdiger Pakmor et.al. 2407.02566 null
2024-07-02 A thermodynamic model of inflation without inflaton field Jesus Anaya-Galeana et.al. 2407.02429 null
2024-07-02 MICONIC: JWST/MIRI MRS observations of the nuclear and circumnuclear regions of Mrk231 A. Alonso-Herrero et.al. 2407.02180 null
2024-07-02 S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models Parsa Kavehzadeh et.al. 2407.01955 null
2024-08-31 Description of molecular chirality and its analysis with high harmonic generation Akihito Kato et.al. 2407.01947 null
2024-07-01 Universal properties of residual moments in heavy-fermion metals Ewan Scott et.al. 2407.01218 null
2024-07-01 Staying vigilant in the Age of AI: From content generation to content authentication Yufan Li et.al. 2407.00922 null
2025-04-14 Block Verification Accelerates Speculative Decoding Ziteng Sun et.al. 2403.10444 null
2024-03-06 Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement Wonseok Jeon et.al. 2402.14160 null
2025-07-08 Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding Zhuoming Chen et.al. 2402.12374 null
2025-02-06 Decoding Speculative Decoding Minghao Yan et.al. 2402.01528 null
2024-04-10 Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO Haim Barad et.al. 2311.04951 null
2023-08-10 Accelerating LLM Inference with Staged Speculative Decoding Benjamin Spector et.al. 2308.04623 null
2023-05-22 Fast Inference from Transformers via Speculative Decoding Yaniv Leviathan et.al. 2211.17192 null
2023-10-31 Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation Heming Xia et.al. 2203.16487 null

Multimodal System

Publish Date Title Authors PDF Code
2026-03-31 GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation Rui Xie et.al. 2603.26266 null
2026-03-26 DRiffusion: Draft-and-Refine Process Parallelizes Diffusion Models with Ease Runsheng Bai et.al. 2603.25872 null
2026-04-01 DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving Pengxuan Yang et.al. 2603.24587 null
2026-04-01 SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems Chung-En Johnny Yu et.al. 2603.23853 null
2026-03-19 6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models Rundong Su et.al. 2603.18742 null
2026-03-18 DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving Zilin Huang et.al. 2603.18315 null
2026-03-13 Draft-and-Target Sampling for Video Generation Policy Qikang Zhang et.al. 2603.13438 null
2026-02-20 Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning Earl J St Sauver et.al. 2603.13243 null
2026-03-11 COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints Mohammad Saeid Anwar et.al. 2603.10436 null
2026-03-09 SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving Ayush Barik et.al. 2603.07865 null
2026-03-08 MWM: Mobile World Models for Action-Conditioned Consistent Prediction Han Yan et.al. 2603.07799 null
2026-02-27 SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching Yasaman Haghighi et.al. 2602.24208 null
2026-02-26 LE-NeuS: Latency-Efficient Neuro-Symbolic Video Understanding via Adaptive Temporal Verification Shawn Liang et.al. 2602.23553 null
2026-02-17 Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs Libo Zhang et.al. 2602.15318 null
2026-02-13 AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers Dong Liu et.al. 2602.13357 null
2026-02-11 FastUSP: A Multi-Level Collaborative Acceleration Framework for Distributed Diffusion Model Inference Guandong Li et.al. 2602.10940 null
2026-02-24 Mapping Gemma3 onto an Edge Dataflow Architecture Shouyu Du et.al. 2602.06063 null
2026-02-04 Annotation Free Spacecraft Detection and Segmentation using Vision Language Models Samet Hicsonmez et.al. 2602.04699 null
2026-02-05 PIO-FVLM: Rethinking Training-Free Visual Token Reduction for VLM Acceleration from an Inference-Objective Perspective Haokui Zhang et.al. 2602.04657 null
2026-02-03 ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and Expression Mingxuan Wang et.al. 2602.03477 null
2026-02-03 SwiftVLM: Efficient Vision-Language Model Inference via Cross-Layer Token Bypass Chen Qian et.al. 2602.03134 null
2026-01-31 APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation Daoxuan Zhang et.al. 2602.00551 null
2026-01-20 Likelihood-Separable Diffusion Inference for Multi-Image MRI Super-Resolution Samuel W. Remedios et.al. 2601.14030 null
2026-01-19 AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation Xuecheng Chen et.al. 2601.12742 null
2026-01-26 ViSIL: Unified Evaluation of Information Loss in Multimodal Video Captioning Po-han Li et.al. 2601.09851 null
2025-12-30 Bridging the Perception-Cognition Gap:Re-engineering SAM2 with Hilbert-Mamba for Robust VLM-based Medical Diagnosis Hao Wu et.al. 2512.24013 null
2025-12-29 Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution Hexin Zhang et.al. 2512.23532 null
2025-12-23 Input-Adaptive Visual Preprocessing for Efficient Fast Vision-Language Model Inference Putu Indah Githa Cahyani et.al. 2512.20839 null
2025-12-21 AsyncDiff: Asynchronous Timestep Conditioning for Enhanced Text-to-Image Diffusion Inference Longhuan Xu et.al. 2512.18675 null
2025-12-18 Collaborative Edge-to-Server Inference for Vision-Language Models Soochang Song et.al. 2512.16349 null
2025-12-16 Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models Chiyue Wei et.al. 2512.14661 null
2025-12-10 LISN: Language-Instructed Social Navigation with VLM-based Controller Modulating Junting Chen et.al. 2512.09920 null
2025-12-05 Training-Time Action Conditioning for Efficient Real-Time Chunking Kevin Black et.al. 2512.05964 null
2025-12-05 Quantitatively mapping the Eady model onto a two-layer quasi-geostrophic model Julie Meunier et.al. 2512.05902 null
2025-12-05 Non-equilibrium formulation for inertial particles in turbulent swirling flows Bernardo L. Español et.al. 2512.05855 null
2025-12-05 HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models Shizhuo Mao et.al. 2512.05746 null
2025-12-05 ProPhy: Progressive Physical Alignment for Dynamic World Simulation Zijun Wang et.al. 2512.05564 null
2025-12-05 Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models Weijue Bu et.al. 2512.05546 null
2025-12-04 Uncertainty Quantification for Scientific Machine Learning using Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN) Y. Sungtaek Ju et.al. 2512.05306 null
2025-12-04 CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators Xianglong Hou et.al. 2512.05297 null
2025-12-04 XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots Tianyi Wang et.al. 2512.05270 null
2025-12-04 NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation Yu Zeng et.al. 2512.05106 null
2025-12-04 TV2TV: A Unified Framework for Interleaved Language and Video Generation Xiaochuang Han et.al. 2512.05103 null
2025-12-04 Hybrid-Diffusion Models: Combining Open-loop Routines with Visuomotor Diffusion Policies Jonne Van Haastregt et.al. 2512.04960 null
2025-12-04 FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action Tokenization Yicheng Liu et.al. 2512.04952 null
2025-12-04 YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance Junjie Zheng et.al. 2512.04779 null
2025-12-04 MemLoRA: Distilling Expert Adapters for On-Device Memory Systems Massimo Bini et.al. 2512.04763 null
2025-12-04 E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving Yihong Tang et.al. 2512.04733 null
2025-12-04 Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild Yigui Feng et.al. 2512.04728 null
2025-12-05 Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Yubo Huang et.al. 2512.04677 null
2025-12-04 Persson’s Theory of Purely Normal Elastic Rough Surface Contact: A Tutorial Based on Stochastic Process Theory Yang Xu et.al. 2512.04648 null
2025-12-04 VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory Yifei Yu et.al. 2512.04519 null
2025-12-04 GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis Changjin Kim et.al. 2512.04456 null
2025-12-04 NORi: An ML-Augmented Ocean Boundary Layer Parameterization Xin Kai Lee et.al. 2512.04452 null
2025-12-04 FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination Chengyang He et.al. 2512.04381 null
2025-12-03 Decoding Large Language Diffusion Models with Foreseeing Movement Yichuan Mo et.al. 2512.04135 null
2025-12-03 DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment Sheng-Hao Liao et.al. 2512.03981 null
2025-12-03 Refining Machine Learning Potentials through Thermodynamic Theory of Phase Transitions Paul Fuchs et.al. 2512.03974 null
2025-12-03 Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization Lianyu Pang et.al. 2512.03964 null
2025-12-03 OmniDexVLG: Learning Dexterous Grasp Generation from Vision Language Model-Guided Grasp Semantics, Taxonomy and Functional Affordance Lei Zhang et.al. 2512.03874 null
2025-12-03 Fully Unsupervised Self-debiasing of Text-to-Image Diffusion Models Korada Sri Vardhana et.al. 2512.03749 null
2025-12-03 PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention Ziwen Li et.al. 2512.03724 null
2025-12-03 GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces Melis Ocal et.al. 2512.03683 null
2025-12-03 ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers Feice Huang et.al. 2512.03673 null
2025-12-03 V-ITI: Mitigating Hallucinations in Multimodal Large Language Models via Visual Inference-Time Intervention Nan Sun et.al. 2512.03542 null
2025-12-03 CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving Zhijian Qiao et.al. 2512.03510 null
2025-12-03 KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models Rhys Newbury et.al. 2512.03450 null
2025-12-03 MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification Yujian Zhao et.al. 2512.03404 null
2025-12-03 Push-broom Mapping of Galaxies and Supernova Remnants with the SPRITE CubeSat Elena Carlson et.al. 2512.03329 null
2025-12-02 Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time Daniel D. Richman et.al. 2512.03312 null
2025-12-02 Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling Yueru Jia et.al. 2512.03044 null
2025-12-03 LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization Zhihan Xiao et.al. 2512.02933 null
2025-12-02 AutoNeural: Co-Designing Vision-Language Models for NPU Inference Wei Chen et.al. 2512.02924 null
2025-12-02 Glance: Accelerating Diffusion Models with 1 Sample Zhuobai Dong et.al. 2512.02899 null
2025-12-03 SwarmDiffusion: End-To-End Traversability-Guided Diffusion for Embodiment-Agnostic Navigation of Heterogeneous Robots Iana Zhura et.al. 2512.02851 null
2025-12-02 Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach Siyuan Yang et.al. 2512.02834 null
2025-12-02 Reasoning-Aware Multimodal Fusion for Hateful Video Detection Shuonan Yang et.al. 2512.02743 null
2025-12-02 VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm Zhenkai Wu et.al. 2512.02700 null
2025-12-02 PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution Zhongbao Yang et.al. 2512.02681 null
2025-12-02 Spatially-Grounded Document Retrieval via Patch-to-Region Relevance Propagation Agathoklis Georgiou et.al. 2512.02660 null
2025-12-02 Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training Hong-Jie You et.al. 2512.02652 null
2025-12-02 YingVideo-MV: Music-Driven Multi-Stage Video Generation Jiahui Chen et.al. 2512.02492 null
2025-12-02 Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources Phuc Pham et.al. 2512.02438 null
2025-12-02 VACoT: Rethinking Visual Data Augmentation with VLMs Zhengzhuo Xu et.al. 2512.02361 null
2025-12-02 Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective Qiyao Xue et.al. 2512.02340 null
2025-12-01 ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation Chenyang Gu et.al. 2512.02013 null
2025-12-01 Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding Zahra Mahdavi et.al. 2512.01922 null
2025-12-01 Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models Yudi Wu et.al. 2512.01831 null
2025-12-01 CauSight: Learning to Supersense for Visual Causal Discovery Yize Zhang et.al. 2512.01827 null
2025-12-01 Weight Space Representation Learning with Neural Fields Zhuoqian Yang et.al. 2512.01759 null
2025-12-01 DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models Wanpeng Zhang et.al. 2512.01715 null
2025-12-01 DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models Patrick Kwon et.al. 2512.01686 null
2025-12-01 GRASP: Guided Residual Adapters with Sample-wise Partitioning Felix Nützel et.al. 2512.01675 null
2025-12-01 SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge Yumeng He et.al. 2512.01629 null
2025-12-01 Reconstructing Multi-Scale Physical Fields from Extremely Sparse Measurements with an Autoencoder-Diffusion Cascade Letian Yi et.al. 2512.01572 null
2025-12-01 Hawkes process with a diffusion-driven baseline: long-run behavior, inference, statistical tests Maya Sadeler Perrin et.al. 2512.01447 null
2025-12-01 Existence of two thresholds in a bistable equation with nonlocal competition Matthieu Alfaro et.al. 2512.01435 null
2025-12-01 MDiff4STR: Mask Diffusion Model for Scene Text Recognition Yongkun Du et.al. 2512.01422 null
2025-12-01 FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution Seungho Choi et.al. 2512.01390 null
2025-12-01 Consistency Flow Model Achieves One-step Denoising Error Correction Codes Haoyu Lei et.al. 2512.01389 null
2025-12-01 Qualitatively distinct mechanisms of noise-induced escape in diffusively coupled bistable elements Hidemasa Ishii et.al. 2512.01388 null
2025-12-01 Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators Medha Sawhney et.al. 2512.01370 null
2025-12-01 TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance Pei Yang et.al. 2512.01314 null
2025-12-01 Inversions of stochastic processes from ergodic measures of Nonlinear SDEs Hongyu Liu et.al. 2512.01307 null
2025-11-30 PIANO: Physics-informed Dual Neural Operator for Precipitation Nowcasting Seokhyun Chin et.al. 2512.01062 null
2025-11-29 EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients He-Yen Hsieh et.al. 2512.00670 null
2025-11-28 Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent Jianzhe Lin et.al. 2511.23436 null
2025-11-28 LFM2 Technical Report Alexander Amini et.al. 2511.23404 null
2025-11-28 SafeHumanoid: VLM-RAG-driven Control of Upper Body Impedance for Humanoid Robot Yara Mahmoud et.al. 2511.23300 null
2025-11-28 Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering Qiming Li et.al. 2511.23231 null
2025-11-28 Obstruction reasoning for robotic grasping Runyu Jiao et.al. 2511.23186 null
2025-11-28 InstanceV: Instance-Level Video Generation Yuheng Chen et.al. 2511.23146 null
2025-11-28 db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism Siqi Chen et.al. 2511.23113 null
2025-11-28 MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents Ruoxuan Zhang et.al. 2511.23055 null
2025-11-28 Time Extrapolation with Graph Convolutional Autoencoder and Tensor Train Decomposition Yuanhong Chen et.al. 2511.23037 null
2025-11-28 Masked Diffusion for Generative Recommendation Kulin Shah et.al. 2511.23021 null
2025-11-28 BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation Zeyu Zhang et.al. 2511.22973 null
2025-11-28 Seeing before Observable: Potential Risk Reasoning in Autonomous Driving via Vision Language Models Jiaxin Liu et.al. 2511.22928 null
2025-11-27 CAPE: Context-Aware Diffusion Policy Via Proximal Mode Expansion for Collision Avoidance Rui Heng Yang et.al. 2511.22773 null
2025-11-27 Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Z-Image Team et.al. 2511.22699 null
2025-11-27 Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield Dongyang Liu et.al. 2511.22677 null
2025-11-27 VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models Silin Cheng et.al. 2511.22664 null
2025-11-27 Geometrically-Constrained Agent for Spatial Reasoning Zeren Chen et.al. 2511.22659 null
2025-11-27 Beyond Success: Refining Elegant Robot Manipulation from Mixed-Quality Data via Just-in-Time Intervention Yanbo Mao et.al. 2511.22555 null
2025-11-27 Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration Mengyu Yang et.al. 2511.22533 null
2025-11-27 CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving Zhaohui Wang et.al. 2511.22532 null
2025-11-26 Canvas-to-Image: Compositional Image Generation with Multimodal Controls Yusuf Dalva et.al. 2511.21691 null
2025-11-26 Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving Haohong Lin et.al. 2511.21584 null
2025-11-26 Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy Teng Hu et.al. 2511.21579 null
2025-11-26 IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference Wanli Zhong et.al. 2511.21513 null
2025-11-26 MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices Shuai Zhang et.al. 2511.21475 null
2025-11-26 Odin: Oriented Dual-module Integration for Text-rich Network Representation Learning Kaifeng Hong et.al. 2511.21416 null
2025-11-26 From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting Umang Agarwal et.al. 2511.21215 null
2025-11-26 Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models Changlin Li et.al. 2511.21122 null
2025-11-26 From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models Hengyu Fu et.al. 2511.21103 null
2025-11-26 OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection Chujie Wang et.al. 2511.21064 null
2025-11-26 GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision Yuxiao Xiang et.al. 2511.20994 null
2025-11-25 Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy Inkook Chun et.al. 2511.20906 null
2025-11-25 Test-Time Alignment of Text-to-Image Diffusion Models via Null-Text Embedding Optimisation Taehoon Kim et.al. 2511.20889 null
2025-11-25 Symbiotic Brain-Machine Drawing via Visual Brain-Computer Interfaces Gao Wang et.al. 2511.20835 null
2025-11-25 Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion Samuele Dell’Erba et.al. 2511.20821 null
2025-11-25 Text-Guided Semantic Image Encoder Raghuveer Thirukovalluru et.al. 2511.20770 null
2025-11-25 Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout Hidir Yesiltepe et.al. 2511.20649 null
2025-11-25 LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight Yunze Man et.al. 2511.20648 null
2025-11-25 Image2Gcode: Image-to-G-code Generation for Additive Manufacturing Using Diffusion-Transformer Model Ziyue Wang et.al. 2511.20636 null
2025-11-25 MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models Chieh-Yun Chen et.al. 2511.20629 null
2025-11-25 Latent Diffusion Inversion Requires Understanding the Latent Space Mingxing Rao et.al. 2511.20592 null
2025-11-25 Anatomica: Localized Control over Geometric and Topological Properties for Anatomical Diffusion Models Karim Kadry et.al. 2511.20587 null
2025-11-25 Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models Shamima Hossain et.al. 2511.20531 null
2025-11-25 Efficient and Fast Generative-Based Singing Voice Separation using a Latent Diffusion Model Genís Plaja-Roglans et.al. 2511.20470 null
2025-11-25 Object-Centric Vision Token Pruning for Vision Language Models Guangyuan Li et.al. 2511.20439 null
2025-11-25 Image-Free Timestep Distillation via Continuous-Time Consistency with Trajectory-Sampled Pairs Bao Tang et.al. 2511.20410 null
2025-11-25 FREE: Uncertainty-Aware Autoregression for Parallel Diffusion Transformers Xinwan Wen et.al. 2511.20390 null
2025-11-25 Modified Equations for Stochastic Optimization Stefan Perko et.al. 2511.20322 null
2025-11-25 TReFT: Taming Rectified Flow Models For One-Step Image Translation Shengqian Li et.al. 2511.20307 null
2025-11-25 HVAdam: A Full-Dimension Adaptive Optimizer Yiheng Zhang et.al. 2511.20277 null
2025-11-25 Rectified Flow for Vision-Aided mmWave V2I Beam Prediction Can Zheng et.al. 2511.20265 null
2025-11-25 In-Context Compositional Learning via Sparse Coding Transformer Wei Chen et.al. 2511.20194 null
2025-11-25 Spatially Resolved Plasma Diagnostics of the Supernova Remnant DEM L71 using the Reflection Grating Spectrometer Yuki Amano et.al. 2511.20112 null
2025-11-25 iRadioDiff: Physics-Informed Diffusion Model for Indoor Radio Map Construction and Localization Xiucheng Wang et.al. 2511.20015 null
2025-11-25 CounterVQA: Evaluating and Improving Counterfactual Reasoning in Vision-Language Models for Video Understanding Yuefei Chen et.al. 2511.19923 null
2025-11-25 Scale Where It Matters: Training-Free Localized Scaling for Diffusion Models Qin Ren et.al. 2511.19917 null
2025-11-24 Mixture of Horizons in Action Chunking Dong Jing et.al. 2511.19433 null
2025-11-24 Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Yiming Qin et.al. 2511.19418 null
2025-11-24 Predicting partially observable dynamical systems via diffusion models with a multiscale inference scheme Rudy Morel et.al. 2511.19390 null
2025-11-24 Efficiency vs. Fidelity: A Comparative Analysis of Diffusion Probabilistic Models and Flow Matching on Low-Resource Hardware Srishti Gupta et.al. 2511.19379 null
2025-11-24 DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation Zehong Ma et.al. 2511.19365 null
2025-11-24 Rethinking Intermediate Representation for VLM-based Robot Manipulation Weiliang Tang et.al. 2511.19315 null
2025-11-24 CDLM: Consistency Diffusion Language Models For Faster Sampling Minseo Kim et.al. 2511.19269 null
2025-11-24 SimDiff: Simpler Yet Better Diffusion Model for Time Series Point Forecasting Hang Ding et.al. 2511.19256 null
2025-11-24 Learning Plug-and-play Memory for Guiding Video Diffusion Models Selena Song et.al. 2511.19229 null
2025-11-24 EEG-VLM: A Hierarchical Vision-Language Model with Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction Xihe Qiu et.al. 2511.19155 null
2025-11-24 MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images Qirui Wang et.al. 2511.19119 null
2025-11-24 A Self-Conditioned Representation Guided Diffusion Model for Realistic Text-to-LiDAR Scene Generation Wentao Qu et.al. 2511.19004 null
2025-11-24 BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models Juncheng Li et.al. 2511.18921 null
2025-11-24 EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models Wenhao Xu et.al. 2511.18920 null
2025-11-24 MatMart: Material Reconstruction of 3D Objects via Diffusion Xiuchao Wu et.al. 2511.18900 null
2025-11-24 Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference Wengyi Zhan et.al. 2511.18875 null
2025-11-24 UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model Changxin Huang et.al. 2511.18845 null
2025-11-24 DiP: Taming Diffusion Models in Pixel Space Zhennan Chen et.al. 2511.18822 null
2025-11-24 Mitigating Long-Tail Bias in HOI Detection via Adaptive Diversity Cache Yuqiu Jiang et.al. 2511.18811 null
2025-11-24 MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent Yuxia Fu et.al. 2511.18810 null
2025-11-21 SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding Nikolay Nikolov et.al. 2511.17411 null
2025-11-21 SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion Jiajie Guo et.al. 2511.17308 null
2025-11-21 A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback Bulat Khaertdinov et.al. 2511.17255 null
2025-11-21 FlexiFlow: decomposable flow matching for generation of flexible molecular ensemble Riccardo Tedoldi et.al. 2511.17249 null
2025-11-21 FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle Mario Markov et.al. 2511.17171 null
2025-11-21 One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution Yushun Fang et.al. 2511.17138 null
2025-11-21 Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models He Huang et.al. 2511.17094 null
2025-11-21 Diversity Has Always Been There in Your Visual Autoregressive Models Tong Wang et.al. 2511.17074 null
2025-11-21 DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing Hao Chen et.al. 2511.17038 null
2025-11-21 Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation Aniketh Iyengar et.al. 2511.17031 null
2025-11-21 VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions Qianyi Shao et.al. 2511.16998 null
2025-11-21 MultiPriv: Benchmarking Individual-Level Privacy Reasoning in Vision-Language Models Xiongtao Sun et.al. 2511.16940 null
2025-11-21 UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation Chi Zhang et.al. 2511.16917 null
2025-11-21 Align & Invert: Solving Inverse Problems with Diffusion and Flow-based Models via Representational Alignment Loukas Sfountouris et.al. 2511.16870 null
2025-11-20 Towards Unified Vision Language Models for Forest Ecological Analysis in Earth Observation Xizhe Xue et.al. 2511.16853 null
2025-11-20 TRIM: Scalable 3D Gaussian Diffusion Inference with Temporal and Spatial Trimming Zeyuan Yin et.al. 2511.16642 null
2025-11-21 VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference Ziyan Liu et.al. 2511.16449 null
2025-11-20 Decoupling Complexity from Scale in Latent Diffusion Model Tianxiong Zhong et.al. 2511.16117 null
2025-11-20 T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs Shao-Jun Xia et.al. 2511.16107 null
2025-11-20 Learning Tractable Distributions Of Language Model Continuations Gwen Yidou-Weng et.al. 2511.16054 null
2025-11-20 Understanding and improving axial detection in optical tweezers based on the interference of forward- and backward- scattered light Isaac Pérez Castillo et.al. 2511.16036 null
2025-11-20 Physics-Guided Inductive Spatiotemporal Kriging for PM2.5 with Satellite Gradient Constraints Shuo Wang et.al. 2511.16013 null
2025-11-19 Breaking the Bottleneck with DiffuApriel: High-Throughput Diffusion LMs with Mamba Backbone Vaibhav Singh et.al. 2511.15927 null
2025-11-19 Think Visually, Reason Textually: Vision-Language Synergy in ARC Beichen Zhang et.al. 2511.15703 null
2025-11-19 MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping Yushi Huang et.al. 2511.15690 null
2025-11-19 Theoretical Closed-loop Stability Bounds for Dynamical System Coupled with Diffusion Policies Gabriel Lauzier et.al. 2511.15520 null
2025-11-19 What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs Zhihan Ren et.al. 2511.15316 null
2025-11-19 Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning Yuxuan Gu et.al. 2511.15190 null
2025-11-19 A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models Duo Li et.al. 2511.15098 null
2025-11-19 Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis Chengyu Xie et.al. 2511.15092 null
2025-11-19 Reasoning via Video: The First Evaluation of Video Models’ Reasoning Abilities through Maze-Solving Tasks Cheng Yang et.al. 2511.15065 null
2025-11-19 Aligning Generative Music AI with Human Preferences: Methods and Challenges Dorien Herremans et.al. 2511.15038 null
2025-11-18 Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge Antonia Ebner et.al. 2511.14744 null
2025-11-18 Oscillation Quenching Induced By Time-Varying Coupling Functions Dushko Stavrov et.al. 2511.14370 null
2025-11-18 Bridging the Gap Between Bayesian Deep Learning and Ensemble Weather Forecasts Xinlei Xiong et.al. 2511.14218 null
2025-11-18 InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior Weimin Bai et.al. 2511.14208 null
2025-11-18 Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion Zhuo Li et.al. 2511.14178 null
2025-11-18 Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation Yu Zhong et.al. 2511.14131 null
2025-11-18 Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations Yiqing Shen et.al. 2511.14100 null
2025-11-18 GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards Yule Liu et.al. 2511.14045 null
2025-11-18 Flood-LDM: Generalizable Latent Diffusion Models for rapid and accurate zero-shot High-Resolution Flood Mapping Sun Han Neo et.al. 2511.14033 null
2025-11-17 Single Tensor Cell Segmentation using Scalar Field Representations Kevin I. Ruiz Vargas et.al. 2511.13947 null
2025-11-17 Mapping the Cosmic-Ray Ionization Rate in the Local Galaxy with H $_3^+$ Nick Indriolo et.al. 2511.13915 null
2025-11-17 Distribution Matching Distillation Meets Reinforcement Learning Dengyang Jiang et.al. 2511.13649 null
2025-11-17 CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding Shrenik Patel et.al. 2511.13644 null
2025-11-17 Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling Adam Hazimeh et.al. 2511.13478 null
2025-11-18 Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline Rui Zuo et.al. 2511.13442 null
2025-11-17 Local asymptotic normality for discretely observed McKean-Vlasov diffusions Akram Heidari et.al. 2511.13366 null
2025-11-17 TransFit-CSM: A Fast, Physically Consistent Framework for Interaction-Powered Transients Yu-Hao Zhang et.al. 2511.13265 null
2025-11-17 GenTract: Generative Global Tractography Alec Sargood et.al. 2511.13183 null
2025-11-17 Conditional Diffusion Model for Multi-Agent Dynamic Task Decomposition Yanda Zhu et.al. 2511.13137 null
2025-11-17 MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images Doanh C. Bui et.al. 2511.13099 null
2025-11-17 MeanFlow Transformers with Representation Autoencoders Zheyuan Hu et.al. 2511.13019 null
2025-11-17 SAGE: Spuriousness-Aware Guided Prompt Exploration for Mitigating Multimodal Bias Wenqian Ye et.al. 2511.13005 null
2025-11-17 Infinite-Story: A Training-Free Consistent Text-to-Image Generation Jihun Park et.al. 2511.13002 null
2025-11-17 Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention Taiye Chen et.al. 2511.12940 null
2025-11-17 Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models Guoyan Wang et.al. 2511.12937 null
2025-11-17 Method of Manufactured Learning for Solver-free Training of Neural Operators Arth Sojitra et.al. 2511.12890 null
2025-11-17 BrainNormalizer: Anatomy-Informed Pseudo-Healthy Brain Reconstruction from Tumor MRI via Edge-Guided ControlNet Min Gu Kwak et.al. 2511.12853 null
2025-11-16 Prompt-Driven Domain Adaptation for End-to-End Autonomous Driving via In-Context RL Aleesha Khurram et.al. 2511.12755 null
2025-11-16 Backdoor Attacks on Open Vocabulary Object Detectors via Multi-Modal Prompt Tuning Ankita Raj et.al. 2511.12735 null
2025-11-16 QPU Micro-Kernels for Stencil Computation Stefano Markidis et.al. 2511.12617 null
2025-11-16 CoTBox-TTT: Grounding Medical VQA with Visual Chain-of-Thought Boxes During Test-time Training Jiahe Qian et.al. 2511.12446 null
2025-11-16 RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning Jingqi Xu et.al. 2511.12428 null
2025-11-14 PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision–Language Models Nhat Hoang-Xuan et.al. 2511.11502 null
2025-11-14 Planetary nebulae as tracers of stellar population properties: a pilot study with MUSE Ana Inés Ennis et.al. 2511.11479 null
2025-11-14 DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference Farhana Amin et.al. 2511.11446 null
2025-11-14 BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning Lan Li et.al. 2511.11421 null
2025-11-14 EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment Ruoxi Cheng et.al. 2511.11301 null
2025-11-14 GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving Fabian Schmidt et.al. 2511.11266 null
2025-11-14 CountSteer: Steering Attention for Object Counting in Diffusion Models Hyemin Boo et.al. 2511.11253 null
2025-11-14 Viper-F1: Fast and Fine-Grained Multimodal Understanding with Cross-Modal State-Space Modulation Quoc-Huy Trinh et.al. 2511.11177 null
2025-11-14 Explainable Deep Convolutional Multi-Type Anomaly Detection Alex George et.al. 2511.11165 null
2025-11-14 Non-Gaussianity-induced enhanced target-finding dynamics of confined colloids Guirec de Tournemire et.al. 2511.11117 null
2025-11-14 Sheaf Cohomology of Linear Predictive Coding Networks Jeffrey Seely et.al. 2511.11092 null
2025-11-14 SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation Sumin Yu et.al. 2511.11014 null
2025-11-14 VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Xinlei Yu et.al. 2511.11007 null
2025-11-14 CLUE: Controllable Latent space of Unprompted Embeddings for Diversity Management in Text-to-Image Synthesis Keunwoo Park et.al. 2511.10993 null
2025-11-14 Binary Verification for Zero-Shot Vision Jeffrey Liu et.al. 2511.10983 null
2025-11-13 FengHuang: Next-Generation Memory Orchestration for AI Inferencing Jiamin Li et.al. 2511.10753 null
2025-11-13 Diffusion in the stochastic Klein-Gordon equation Jonathan Oppenheim et.al. 2511.10738 null
2025-11-13 Reaching for the Edge II: Stellar Halos out to Large Radii as a Tracer of Dark Matter Halo Mass Katya Leidig et.al. 2511.10723 null
2025-11-14 OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer Haosong Peng et.al. 2511.10560 null
2025-11-13 A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space Huijie Liu et.al. 2511.10555 null
2025-11-13 SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation Wei Li et.al. 2511.10518 null
2025-11-13 Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models Zhengtao Zou et.al. 2511.10292 null
2025-11-13 PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning Yanbei Jiang et.al. 2511.10279 null
2025-11-13 LiNeXt: Revisiting LiDAR Completion with Efficient Non-Diffusion Architectures Wenzhe He et.al. 2511.10209 null
2025-11-13 AI-Integrated Decision Support System for Real-Time Market Growth Forecasting and Multi-Source Content Diffusion Analytics Ziqing Yin et.al. 2511.09962 null
2025-11-13 Remember Me: Bridging the Long-Range Gap in LVLMs with Three-Step Inference-Only Decay Resilience Strategies Peng Gao et.al. 2511.09868 null
2025-11-12 From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance Jeongho Min et.al. 2511.09820 null
2025-11-12 Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models Konstantinos M. Dafnis et.al. 2511.09809 null
2025-11-12 HeatGen: A Guided Diffusion Framework for Multiphysics Heat Sink Design Optimization Hadi Keramati et.al. 2511.09578 null
2025-11-12 Controllable protein design through Feynman-Kac steering Erik Hartman et.al. 2511.09216 null
2025-11-12 FSampler: Training Free Acceleration of Diffusion Sampling via Epsilon Extrapolation Michael A. Vladimir et.al. 2511.09180 null
2025-11-12 Emission-Line and Continuum Reverberation Mapping of the NLS1 Galaxy WPVS 48 M. A. Probst et.al. 2511.09153 null
2025-11-12 Diff-V2M: A Hierarchical Conditional Diffusion Model with Explicit Rhythmic Modeling for Video-to-Music Generation Shulei Ji et.al. 2511.09090 null
2025-11-12 Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference Chengze Jiang et.al. 2511.09064 null
2025-11-12 Expand Your SCOPE: Semantic Cognition over Potential-Based Exploration for Embodied Visual Navigation Ningnan Wang et.al. 2511.08935 null
2025-11-12 From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model Hanbo Cheng et.al. 2511.08930 null
2025-11-12 TiDAR: Think in Diffusion, Talk in Autoregression Jingyu Liu et.al. 2511.08923 null
2025-11-12 Machines Serve Human: A Novel Variable Human-machine Collaborative Compression Framework Zifu Zhang et.al. 2511.08915 null
2025-11-04 The Pervasive Blind Spot: Benchmarking VLM Inference Risks on Everyday Personal Videos Shuning Zhang et.al. 2511.02367 null
2025-10-26 Encoder-Decoder Diffusion Language Models for Efficient Training and Inference Marianne Arriola et.al. 2510.22852 null
2025-10-26 FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference Divya Jyoti Bajpai et.al. 2510.22641 null
2025-10-28 Token-Level Inference-Time Alignment for Vision-Language Models Kejia Chen et.al. 2510.21794 null
2025-10-20 SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference Samir Khaki et.al. 2510.17777 null
2025-10-22 VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models Qilin Liao et.al. 2510.17759 null
2025-10-16 Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference Natan Bagrov et.al. 2510.14624 null
2025-10-13 Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation Maggie Wang et.al. 2510.11689 null
2025-10-13 When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models Samer Al-Hamadani et.al. 2510.11302 null
2025-10-11 Efficient Navigation in Unknown Indoor Environments with Vision-Language Models D. Schwartz et.al. 2510.04991 null
2025-10-03 TridentServe: A Stage-level Serving System for Diffusion Pipelines Yifei Xia et.al. 2510.02838 null
2025-10-26 EVODiff: Entropy-aware Variance Optimized Diffusion Inference Shigui Li et.al. 2509.26096 null
2025-09-28 Sequential Diffusion Language Models Yangzhou Liu et.al. 2509.24007 null
2025-09-28 HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models Zhinan Xie et.al. 2509.23928 null
2025-11-27 Manifold-Aware Diffusion-Augmented Contrastive Learning for Noise-Robust Biosignal Representation Rami Zewail et.al. 2509.20048 null
2025-09-20 Eye Gaze Tells You Where to Compute: Gaze-Driven Efficient VLMs Qinyu Chen et.al. 2509.16476 null
2025-09-21 SpecVLM: Fast Speculative Decoding in Vision-Language Models Haiduo Huang et.al. 2509.11815 null
2025-09-15 STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs Han Liang et.al. 2509.04719 null
2025-08-26 MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs Sixun Dong et.al. 2508.18264 null
2025-08-20 GM-Skip: Metric-Guided Transformer Block Skipping for Efficient Vision-Language Models Lianming Huang et.al. 2508.18227 null
2025-08-21 Pretrained Diffusion Models Are Inherently Skipped-Step Samplers Wenju Xu et.al. 2508.15233 null
2025-08-11 AdaptInfer: Adaptive Token Pruning for Vision-Language Model Inference with Dynamical Text Guidance Weichen Zhang et.al. 2508.06084 null
2025-08-07 Real-Time Iteration Scheme for Diffusion Policy Yufei Duan et.al. 2508.05396 null
2025-07-23 Accelerating Parallel Diffusion Model Serving with Residual Compression Jiajun Luo et.al. 2507.17511 null
2025-07-11 BlindSight: Harnessing Sparsity for Efficient VLMs Tharun Adithya Srikrishnan et.al. 2507.09071 null
2025-09-30 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? Mingyuan Wu et.al. 2506.17417 null
2025-06-20 Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models Michael Plainer et.al. 2506.17139 null
2025-06-18 VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service Xiasi Wang et.al. 2506.15755 null
2025-07-01 Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model Anirud Aggarwal et.al. 2506.15682 null
2025-06-12 Adding simple structure at inference improves Vision-Language Compositionality Imanol Miranda et.al. 2506.09691 null
2025-06-09 Event-Priori-Based Vision-Language Model for Efficient Visual Understanding Haotong Qin et.al. 2506.07627 null
2025-09-03 RNE: plug-and-play diffusion inference-time control and energy-based training Jiajun He et.al. 2506.05668 null
2025-10-10 Can Vision Language Models Infer Human Gaze Direction? A Controlled Study Zory Zhang et.al. 2506.05412 null
2025-10-05 Inference-time Scaling of Diffusion Models through Classical Search Xiangcheng Zhang et.al. 2505.23614 null
2025-05-27 InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling Xiaoxiao Jiang et.al. 2505.20600 null
2025-05-25 SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation Shenggan Cheng et.al. 2505.19151 null
2025-06-13 VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis Tina Khezresmaeilzadeh et.al. 2505.18570 null
2025-05-23 VERDI: VLM-Embedded Reasoning for Autonomous Driving Bowen Feng et.al. 2505.15925 null
2025-05-20 Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism Kunyun Wang et.al. 2505.14741 null
2025-04-14 Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization Haiyong Yu et.al. 2504.09927 null
2025-04-15 Metropolis-Hastings Captioning Game: Knowledge Fusion of Vision Language Models via Decentralized Bayesian Inference Yuta Matsui et.al. 2504.09620 null
2025-03-17 VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers Ruanjun Li et.al. 2503.09387 null
2025-02-20 Light communicative materials Hongshuang Guo et.al. 2503.05744 null
2025-02-21 Evaluating Precise Geolocation Inference Capabilities of Vision Language Models Neel Jay et.al. 2502.14412 null
2025-10-08 Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search Yuta Oshima et.al. 2501.19252 null
2025-02-10 Membership Inference Attacks Against Vision-Language Models Yuke Hu et.al. 2501.18624 null
2025-03-10 Probing the Quantum Nature of Gravity through Classical Diffusion Oliviero Angeli et.al. 2501.13030 null
2025-01-16 PATCHEDSERVE: A Patch Management Framework for SLO-Optimized Hybrid Resolution Diffusion Serving Desen Sun et.al. 2501.09253 null
2025-01-16 StructSR: Refuse Spurious Details in Real-World Image Super-Resolution Yachao Li et.al. 2501.05777 link
2024-12-19 Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model Minglong Xue et.al. 2412.14630 link
2025-06-30 Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension Xiyao Wang et.al. 2412.03704 link
2024-12-05 A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs Wangbo Zhao et.al. 2412.03324 link
2024-12-02 [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster Qizhe Zhang et.al. 2412.01818 link
2025-03-30 Staleness-Centric Optimizations for Parallel Diffusion MoE Inference Jiajun Luo et.al. 2411.16786 null
2024-11-01 VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration Dezhan Tu et.al. 2410.23317 null
2025-01-07 Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance Dongmin Park et.al. 2410.22376 link
2024-10-30 Natural Language Inference Improves Compositionality in Vision-Language Models Paola Cascante-Bonilla et.al. 2410.22315 null
2024-10-18 Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models Jie Ren et.al. 2410.13088 null
2025-02-11 ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time Yi Ding et.al. 2410.06625 null
2024-10-08 A scaling limit for additive functionals Thibaud Taillefumier et.al. 2410.06383 null
2024-09-03 CT-SDM: A Sampling Diffusion Model for Sparse-View CT Reconstruction across All Sampling Rates Liutao Yang et.al. 2409.01571 null
2024-07-27 Faster Image2Video Generation: A Closer Look at CLIP Image Embedding’s Impact on Spatio-Temporal Cross-Attentions Ashkan Taghipour et.al. 2407.19205 null
2024-07-15 LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis Zhenxiong Tan et.al. 2407.10468 link
2024-06-13 DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning Xuemin Hu et.al. 2406.09089 null
2024-10-03 I4VGen: Image as Free Stepping Stone for Text-to-Video Generation Xiefan Guo et.al. 2406.02230 null
2025-01-14 Amortizing intractable inference in diffusion models for vision, language, and control Siddarth Venkatraman et.al. 2405.20971 null
2024-05-30 DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation Zachary Novack et.al. 2405.20289 null
2024-05-26 Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference Xunpeng Huang et.al. 2405.16387 null
2025-04-16 Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models Katherine Xu et.al. 2405.14828 null
2024-04-25 Inferring solid-state diffusivity in lithium-ion battery active materials: improving upon the classical GITT method A. Emir Gumrukcuoglu et.al. 2404.16658 null
2024-11-05 Private Attribute Inference from Images with Vision-Language Models Batuhan Tömekçe et.al. 2404.10618 null
2024-05-02 Privacy-Preserving Diffusion Model Using Homomorphic Encryption Yaojian Chen et.al. 2403.05794 link
2024-05-08 ToDo: Token Downsampling for Efficient Generation of High-Resolution Images Ethan Smith et.al. 2402.13573 null
2024-06-03 DITTO: Diffusion Inference-Time T-Optimization for Music Generation Zachary Novack et.al. 2401.12179 null
2023-12-10 Statistical Spatially Inhomogeneous Diffusion Inference Yinuo Ren et.al. 2312.05793 null
2023-07-31 Cross-Modal Concept Learning and Inference for Vision-Language Models Yi Zhang et.al. 2307.15460 null
2024-01-04 Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference Zihao Yu et.al. 2305.17423 link
2023-10-25 ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval Kexun Zhang et.al. 2302.02285 link
2021-08-11 Manifold-aware Synthesis of High-resolution Diffusion from Structural Imaging Benoit Anctil-Robitaille et.al. 2108.04135 null
2021-12-22 Functional Data Analysis with Rough Sample Paths? Neda Mohammadi et.al. 2105.12035 null
2014-06-03 $C^0$ -estimates and smoothness of solutions to the parabolic equation defined by Kimura operators Camelia A. Pop et.al. 1406.0742 null
2015-04-01 On nonnegative unbiased estimators Pierre E. Jacob et.al. 1309.6473 null